Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 8, 2014

Clojure: Programming with Hand Tools

Filed under: Clojure,Programming — Patrick Durusau @ 8:34 pm

Clojure: Programming with Hand Tools by Tim Ewald.

From the description:

For most of human history, furniture was built by hand using a small set of simple tools. This approach connects you in a profoundly direct way to the work, your effort to the result. This changed with the rise of machine tools, which made production more efficient but also altered what’s made and how we think about making it in in a profound way. This talk explores the effects of automation on our work, which is as relevant to software as it is to furniture, especially now that once again, with Clojure, we are building things using a small set of simple tools.

Tim Ewald designs and builds software systems. After 20 years using object oriented languages, he embraces Clojure because it provides the closes connection to the work and most directly expresses his intent. Tim works on the Datomic team at Cognitect, where he is a Vice President.

If Simon St. Laurent didn’t see this presentation live, he will appreciate this video.

I suspect you will come away with a new appreciation for how our tools and expectations shape the solutions we “see.”

But more than that, with time and effort, you may start to notice the edges of our present tools and expectations, so you can roll them back.

What lies beyond remains for you to discover.

GNU Screen

Filed under: Linux OS — Patrick Durusau @ 4:46 pm

GNU Screen by Stephen Turner.

Speaking of useful things like R and Swirl reminded me of this post by Stephen:

This is one of those things I picked up years ago while in graduate school that I just assumed everyone else already knew about. GNU screen is a great utility built-in to most Linux installations for remote session management. Typing ‘screen’ at the command line enters a new screen session. Once launched, you can start processes in the screen session, detach the session with Ctrl-a+d, then reattach at a later point and resume where you left off. See this screencast I made below:

I’m not sure why but ‘screen’ has never come up that I recall.

Take a look at Stephen’s screencast and/or man screen.

Learn R Interactively with Swirl

Filed under: Programming,R,Statistics — Patrick Durusau @ 4:38 pm

Learn R Interactively with Swirl by Nathan Yau.

I guess R counts as “learning to code.” 😉

If you need more detail than Nathan outlines, consider these from Swirl:

The swirl R package is designed to teach you statistics and R simulateously and interactively. If you are new to R, have no fear. On this page, we’ll walk you through each of the steps required to begin using swirl today!

Step 1: Get R

In order to run swirl, you must have R installed on your computer.

If you need to install R, you can do so here.

For help installing R, check out one of the following videos (courtesy of Roger Peng at Johns Hopkins Biostatistics):

Step 2 (optional): Get RStudio

In addition to R, it’s highly recommended that you install RStudio, which will make your experience with R much more enjoyable.

If you need to install RStudio, you can do so here. You probably want the download located under the heading Recommended For Your System.

Step 3: Get swirl!

Open RStudio (or just plain R if you don’t have RStudio) and type the following into the console:

> install.packages("swirl")

Note that the > symbol at the beginning of the line is R’s prompt for you type something into the console. We include it here so you know that this command is to be typed into the console and not elsewhere. The part you type begins after >.

Step 4: Start swirl and let the fun begin!

This is the only step that you will repeat every time you want to run swirl. First, you will load the package using the library() function. Then you will call the function that starts the magic! Type the following, pressing Enter after each line:

> library("swirl")

> swirl()

And you’re off to the races! Please visit our Help page if you have trouble completing any of these steps.

Other R links:

The R Project Resources and Links.

CRAN – Packages

Swirl

Big Mechanism (DARPA)

Filed under: Bioinformatics,Funding — Patrick Durusau @ 4:19 pm

Big Mechanism, Solicitation Number: DARPA-BAA-14-14

Reponse Data: Mar 18, 2014 12:00 pm Eastern

From the solicitation:

Here is one way to factor the technologies in the Big Mechanism program:

  1. Read abstracts and papers to extract fragments of causal mechanisms;
  2. Assemble fragments into more complete Big Mechanisms;
  3. Explain and reason with Big Mechanisms.

Here is a sample from Reading:

As with all natural language processing, Reading is bedeviled by ambiguity [5]. The mapping of named entities to biological entities is many-to-many. Context matters, but is often missing; for example, the organism in which a pathway is studied might be mentioned once at the beginning of a document and ignored thereafter. Although the target semantics involves processes, these can be described at different levels of detail and precision. For example, “β-catenin is a critical component of Wnt-mediated transcriptional activation” tells us only that β-catenin is involved in a process; whereas, “ARF6 activation promotes the intracellular accumulation of β-catenin” tells us that ARF6 promotes a process; and “L-cells treated with the GSK3 β inhibitor LiCl (50 mM) . . . showed a marked increase in β-catenin fluorescence within 30 – 60 min” describes the kinetics of a process. Processes also can be described as modular abstractions, as in “. . . the endocytosis of growth factor receptors and robust activation of extracellular signal-regulated kinase”. It might be possible to extract causal skeletons of complicated processes (i.e., the entities and how they causally influence each other) by reading abstracts, but it seems likely that extracting the kinetics of processes will require reading full papers. It is unclear whether this program will be able to provide useful explanations of processes if it doesn’t extract the kinetics of these processes.

An interesting solicitation. Yes?

I thought it was odd though that the solicitation starts out with:

DARPA is soliciting innovative research proposals in the area of reading research papers and abstracts to construct and reason over explanatory, causal models of complicated systems. Proposed research should investigate innovative approaches that enable revolutionary advances in science, devices, or systems. Specifically excluded is research that primarily results in evolutionary improvements to the existing state of practice. (emphasis added)

But then gives the goals for 18 months in part as:

  • Development of a formal representation language for biological processes;
  • Extraction of fragments of known signaling networks from a relatively small and carefully selected corpus of texts and encoding of these fragments in a formal language;

Aren’t people developing formal languages to track signaling networks right now? I am puzzled how that squares with the criteria:

Specifically excluded is research that primarily results in evolutionary improvements to the existing state of practice. ?

It does look like an exciting project, assuming it isn’t limited to current approaches.

News Genius

Filed under: Annotation,Interface Research/Design,News,Social Networks — Patrick Durusau @ 3:52 pm

News Genius (about page)

From the webpage:

What is News Genius?

http://news.rapgenius.com/General-dwight-d-eisenhower-d-day-message-sent-just-prior-to-invasion-annotated

News Genius helps you make sense of the news by putting stories in context, breaking down subtext and bias, and crowdsourcing knowledge from around the world!

You can find speeches, interviews, articles, recipes, and even sports news, from yesterday and today, all annotated by the community and verified experts. With everything from Eisenhower speeches to reports on marijuana arrest horrors, you can learn about politics, current events, the world stage, and even meatballs!

Who writes the annotations?

Anyone can! Just create an account and start annotating. You can highlight any line to annotate it yourself, suggest changes to existing annotations, and even put up your favorite texts. Getting started is very easy. If you make good contributions, you’ll earn News IQ™, and if you share true knowledge, eventually you’ll be able to edit and annotate anything on the site.

How do I make verified annotations on my own work?

Verified users are experts in the news community. This includes journalists, like Spencer Ackerman, groups like the ACLU and Smart Chicago Collaborative, and even U.S. Geological Survey. Interested in getting you or your group verified? Sign up and request your verified account!

Sam Hunting forwarded this to my attention.

Interesting interface.

Assuming that you created associations between the text and annotator without bothering the author, this would work well for some aspects of a topic map interface.

I did run into the problem that who gets to be the “annotation” depends on who gets there first. If you pick text that has already been annotated, at most you can post a suggestion or vote it up or down.

BTW, this started as a music site so when you search for topics, there are a lot of rap, rock and poetry hits. Not so many news “hits.”

You can imagine my experience when I searched for “markup” and “semantics.”

I probably need to use more common words. 😉

I don’t know the history of the site but other than the not more than one annotation rule, you can certainly get started quickly creating and annotating content.

That is a real plus over many of the interfaces I have seen.

Comments?

PS: The only one annotation rule is all the more annoying when you find that very few Jimi Hendrix songs have any parts that are not annotated. 🙁

Visualizing History

Filed under: Charts,Graphics,Visualization — Patrick Durusau @ 3:25 pm

Visualizing History by Ben Jones.

From the post:

When studying history, we ask questions of the past, seeking to understand what happened in the lives of the people who have gone before us, and why. A data visualization of history suggests and answers a thousand questions. Sometimes, the value in a chart or graph of history is that it proposes new questions to ask of the past, questions that we wouldn’t have thought to ask unless the information were presented to us in a visual way.

Ben makes imaginative use of Gantt charts to illustrate:

  • American Presidencies
  • History of Civilizations
  • History of the Patriarchs
  • Political History (scandal)
  • and others.

I have always thought of Gantt charts as useful for projects, etc., but they work well in other contexts as well.

Not as flashy as some charts, but also less difficult to interpret.

Arabic Natural Language Processing

Filed under: Language,Natural Language Processing — Patrick Durusau @ 3:14 pm

Arabic Natural Language Processing

From the webpage:

Arabic is the largest member of the Semitic language family and is spoken by nearly 500 million people worldwide. It is one of the six official UN languages. Despite its cultural, religious, and political significance, Arabic has received comparatively little attention by modern computational linguistics. We are remedying this oversight by developing tools and techniques that deliver state-of-the-art performance in a variety of language processing tasks. Machine translation is our most active area of research, but we have also worked on statistical parsing and part-of-speech tagging. This page provides links to our freely available software along with a list of relevant publications.

Software and papers from the Stanford NLP group.

An important capability to add to your toolkit, especially if you are dealing with the U.S. security complex.

I first saw this at: Stanford NLP Group Tackles Arabic Machine Translation.

Patenting Emotional Analysis?

Filed under: Patents — Patrick Durusau @ 11:53 am

BehaviorMatrix Issues Groundbreaking Foundational Patent for Advertising Platforms

From the post:

Applied behavior analytics company BehaviorMatrix, LLC, announced today that it was granted a groundbreaking foundational patent by the United States Patent and Trademark Office (USPTO) that establishes a system for classifying, measuring and creating models of the elements that make up human emotions, perceptions and actions leveraged from the Internet and social media. The BehaviorMatrix patent, U.S. patent number 8,639,702, redefines the online advertising platform and social CRM industries by ushering in a new era of assessment and measurement of emotion by digital means. It covers a method for detecting and measuring emotional signals within digital content. Emotion is based on perception – and perception is based on exteroceptive stimuli – what is seen, touched, tasted, heard and smelled.

If you look up U.S. patent number 8,639,702, you will find this background of the invention:

A data element forms the premise on which an inference may be drawn and represents the lowest level of abstraction from which information and then knowledge are derived. In humans, the perception of environment or condition is comprised of data gathered by the senses, i.e., the physiological capacity to provide input for perception. These “senses” are formally referred to as the exteroceptive senses and in humans comprise quantifiable or potential sensory data including, sight, smell, hearing, touch, taste, temperature, pressure, pain, and pleasure, the admixture of which determine the spectrum of human emotion states and resultant behaviors.

Potentials in these senses work independently, or in combination, to produce unique perceptions. For instance, the sense of sight is primarily used to identify a food item, but the flavor of the food item incorporates the senses of both taste and smell.

In biological terms, behavior can generally be regarded as any action of an organism that changes its relationship to its environment. Definable and measurable behaviors are predicated on the association of stimuli within the domain of exteroceptive sensation, to perception, and ultimately, a behavioral outcome.

The ability to determine the exteroceptive association and impact on behavior from data that is not physical but exists only in digital form has profound implications for how data is viewed, both intrinsically and associatively.

An advantage exists, therefore, for a system and method for dynamically associating digital data with values that approximate exteroceptive stimuli potentials, and from those values forecasting probabilistically the likely behavioral response to that data, thereby promoting the ability to design systems and models to predict behavioral outcomes that are inherently more accurate in determining behavioral response. In turn, interfaces and computing devices may be developed that would “expect” certain behaviors, or illicit them through the manipulation of data. Additionally, models could be constructed to classify data not only for the intrinsic value of the data but for the potential behavioral influence inherent in the data as well.

Really? People’s emotions influence their “digital data?” You mean like a person’s emotions influence their speech (language, volume), their body language (tense, pacing), their communications (angry tweets, letters, emails), etc.?

Did you ever imagine such a thing? Emotions being found in digital data?

Have I ever mentioned: Emotional Content of Suicide Notes, Jacob Tuckman; Robert J. Kleiner; Martha Lavell, Am J Psychiatry 1959;116:59-63, to you?

Abstract:

An analysis was made of the emotional content of notes left by 165 suicides in Philadelphia over a 5-year period. Over half the notes showed such positive affect as gratitude, affection, and concern for the welfare of others, while only 24% expressed hostile or negative feelings directed toward themselves or the outside world, and 25% were completely neutral in affect.2. Persons aged 45 and over showed less affect than those under 45, with a concomitant increase in neutral affect.3. Persons who were separated or divorced showed more hostility than those single, married, or widowed.4. It is believed that these findings have certain implications for further understanding of suicide and ultimate steps toward prevention. The recognition that positive or neutral feelings are present in the majority of cases should lead to a more promising outlook in the care and treatment of potential suicides if they can be identified.

That was written in 1959. Do you think it is a non-obvious leap to find emotions in digital data? Just curious.

Of course, that isn’t the only thing claimed for this invention:

capable of detecting one or data elements including, without limitation, temperature, pressure, light, sound, motion, distance and time.

Wow, it can also act as a thermostat.

Feel free to differ I think measuring emotion in all form of communications, including digital data, has been around for a long time.

The filing fee motivated U.S. Patent Office is doing a very poor job of keeping the prior art commons free of patents that infringe on it. Which costs everyone, except for patent trolls, a lot of time and effort.

Thoughts on a project to defend the prior art commons more generally? How much does the average patent infringement case cost? Win or lose?

Of course I am thinking about integrating search across the patent database with searches in relevant domains for prior art to load up examiners with detailed reports of prior art.

Stopping a patent in its tracks avoids more expensive methods later on.

Could lead to a new yearly patent count: Patents Denied.

February 7, 2014

Making Music with Clojure – An Introduction to MIDI

Filed under: Clojure,Music — Patrick Durusau @ 7:22 pm

Making Music with Clojure – An Introduction to MIDI by @taylodl’s.

From the post:

This post takes a break from Functional JavaScript and has a little fun making music. We’re going to be using Clojure, a Lisp language for the JVM, so that we can utilize the JVM’s MIDI implementation. No experience with music or MIDI is required though a familiarity with Clojure or any other Lisp is helpful.

I’m using Clojure for its functional similarities to JavaScript—the syntax of the languages are different but the underlying programming philosophies are similar. For this post I’m assuming you already have Clojure and Leiningen installed on your system. See Clojure Quick Start for everything you need to get Clojure and Leiningen installed and running on your system.

Once you have everything installed you can create a new midi-sequencer project by executing:

Accessibility, that is what I like about this post. Being innocent of any musical playing ability, the history of music remains silent for me unless I can find a recording. Or program a computer to perform it.

MIDI production isn’t the same thing as a live or recorded performance by a real musician, but it is better than a silent page.

Enjoy!

PS: Not all extant music is recorded or performed. Some resources to explore:

Digital Image Archive of Medieval Music

Music Manuscripts (British Library)

Music Manuscripts Online (The Morgan Library & Museum, 42,000 pages)

Wikipedia list of Online Digital Musical Document Libraries

42 Rules to Lead…

Filed under: Programming — Patrick Durusau @ 6:45 pm

42 Rules to Lead by from the Man Who Defined Google’s Product Strategy

From the post:

Almost four years ago, Jonathan Rosenberg sent an email around Google stumping for more open systems, products, and services. As SVP of Products, he argued that more openness would mean a better Google and a better world. But not everyone agreed. In fact, it set off a fiery internal debate that culminated in the Google Blog post: “The Meaning of Open,” authored by Rosenberg. At the time, he was building teams around Chrome and Android. And his prescience paid off. Today, they are two of Google’s most strategically important gains.

It wasn’t all wins, though. Rosenberg will be the first one to tell you that a lot of failures, false starts and tough breaks got him there. He shared these hard-won lessons in a lecture to students graduating from Claremont McKenna College, his alma mater, and we’ve brought ‘Rosenberg’s Rules’ to you.

A great set of rules. Although I have never seen management strategies go beyond cant.

If nothing else, a metric for judging a present or future employer.

BTW, I seriously disagree with:

#10 Crowded is creative.

There’s a certain electricity that comes from working in a crowded, bustling space. “Offices should be designed for energy and interactions, not for isolation and status.”

“Working from home is a malignant, metastasizing cancer. Ban it.” (emphasis in original)

You don’t have to take my word for it, take a look at Peopleware by Tom DeMarco and Timothy Lister.

To measure the impact of “crowded” on programming DeMarco and Lister gave surveys to 600 programmers from 92 countries who participated in “Coding War Games.”

The survey covered the environmental conditions under which the programmers worked.

DeMarco and Lister’s Table 8.3, reads:

Table of the Best and Worst Performers in the Coding War Games

Environmental Factor Those Who Performed in 1st Quartile Those Who Performed in 4th Quartile
1. How much dedicated workspace do you have? 78 sq. ft. 48 sq. ft.
2. Is it acceptably quiet? 57% yes 29% yes
3. Is it acceptably private? 62% yes 19% yes
4. Can you silence your phone? 52% yes 10% yes
5. Can you divert your calls? 76% yes 19% yes
6. Do people often interrupt you needlessly? 38% yes 76% yes

Of course, if programming in the bottom quartile is acceptable in your organization, create a crowded work space.

The electricity you feel is from friction between productivity dollars as they drain away.

Monuments Men

Filed under: Crowd Sourcing — Patrick Durusau @ 5:21 pm

Monuments Men

From the post:

During World War II, an unlikely team of soldiers was charged with identifying and protecting European cultural sites, monuments, and buildings from Allied bombing. Officially named the Monuments, Fine Arts, and Archives (MFAA) Section, this U.S. Army unit included art curators, scholars, architects, librarians, and archivists from the U.S. and Britain. They quickly became known as The Monuments Men. These documents are drawn from MFAA members’ personal papers held at the Archives of American Art.

Towards the end of the war, their mission changed to one of locating and recovering works of art that had been looted by the Nazis. The Monuments Men uncovered troves of stolen art hidden across Germany and Austria—some in castles, others in salt mines. They rescued some of history’s greatest works of art.

Among the holdings of the Archives of American Art are the papers of Monuments Men George Leslie Stout, James J. Rorimer, Walker Hancock, Thomas Carr Howe, S. Lane Faison, Walter Horn, and Otto Wittman. These personal archives tell a fascinating story.

These documents—and many more including photographs of the recovery operations—are on display in the Lawrence A. Fleischman Gallery at the Donald W. Reynolds Center for American Art and Portraiture in Washington D.C. between February 7 and April 20, 2014 to see the original documents in person. The exhibition is also available online at Monuments Men: On the Frontline to Save Europe’s Art, 1942–1946.

You would know, there is a movie with the same name, just to confuse itself with this project: The Monuments Men. 😉

This is one of many transcription projects at the Smithsonian Transcription Center. Site navigation is problematic, particularly since projects are listed under departments known mostly to insiders.

Crowd sourced transcription helps correct the impression that knowledge starts with digital documents.

Should it happen to spread, someday, to biblical studies, even the average reader would realize the eclectic nature of any modern Bible translation.

Welsh Newspapers Online – 27 new publications

Filed under: Data,History,News — Patrick Durusau @ 4:29 pm

Welsh Newspapers Online – 27 new publications

Fromm the post:

There is great excitement today as we release 27 publications (200,000 pages) from the Library’s rich collection on Welsh Newspapers Online.

Take a trip back in time from the comfort of your home or office and discover millions of freely available articles published before 1919.

The resource now allows you to search and read over 630,000 pages from almost 100 newspaper publications from the National Library’s collection, and this will grow to over 1 million pages as more publications are added during 2014. Among the latest titles are Y Negesydd, Caernarvon and Denbigh Herald, Glamorgan Gazette, Carmarthen Journal, Welshman, and Rhondda Leader, not forgetting Y Drych, the weekly newspaper for the Welsh diaspora in America.

The resource also includes some publications that were digitised for The Welsh Experience of World War One project.

Browse the resource and discover unique information on a variety of subjects, including family history, local history and much more that was once difficult to find unless the researcher was able to browse through years of heavy volumes.

The linguistic diversity of the WWW just took a step in the right direction thanks to the National Library of Wales.

Can a realization that recorded texts are semantically diverse (diachronically and synchronically) be far behind?

I cringe every time the U.S. Supreme Court treats historical language as transparent to a “plain reading.”

Granting that I have an agenda to advance by emphasis on the historical context of the language, just as they do with a facile reading devoid of historical context.

Still, I think my approach requires less suspension of disbelief than their’s.

Stunning Maps of World Topography [In 3 Lines of R]

Filed under: Maps,Topography — Patrick Durusau @ 4:06 pm

Stunning Maps of World Topography by James Chesire.

From the post:

Robin Edwards, a researcher at UCL CASA, has created these stunning topographic maps using the high resolution elevation data provided by the British Oceanographic Data Centre. The transitions from black (high areas) to blue (low areas) give the maps a slightly ethereal appearance to dramatic effect.

The maps are truly impressive.

BTW, the maps really required “3 lines of R.”

Self-promotion for game developers

Filed under: Interface Research/Design,Marketing — Patrick Durusau @ 3:38 pm

Self-promotion for game developers by Raph Koster.

From the post:

I’m writing this for Mattie Brice, who was just listed as one of Polygon’s 50 game newsmakers of the year.

We had a brief Twitter exchange after I offered congratulations, in which she mentioned that she didn’t know she could put this on a CV, and that she “know[s] nothing of self-promotion.” I have certainly never been accused of that, so this is a rehash of stuff I have written elsewhere and elsewhen.

To be clear, this post is not about marketing your games. It is about marketing yourself, and not even that, but about finding your professional place within the industry.

Ralph’s advice is applicable to any field. Read it but more than that, take it to heart.

While you are there, take a look at: Theory of Fun for Game Design by Raph Koster.

There are no rules that say topic map applications have to be drudgery.

I first saw this in a tweet by Julia Evans.

What’s behind a #1 ranking?

Filed under: Data Mining,Visualization — Patrick Durusau @ 3:08 pm

What’s behind a #1 ranking? by Manny Morone.

From the post:

Behind every “Top 100” list is a generous sprinkling of personal bias and subjective decisions. Lacking the tools to calculate how factors like median home prices and crime rates actually affect the “best places to live,” the public must take experts’ analysis at face value.

To shed light on the trustworthiness of rankings, Harvard researchers have created LineUp, an open-source application that empowers ordinary citizens to make quick, easy judgments about rankings based on multiple attributes.

“It liberates people,” says Alexander Lex, a postdoctoral researcher at the Harvard School of Engineering and Applied Sciences (SEAS). “Imagine if a magazine published a ranking of ‘best restaurants.’ With this tool, we don’t have to rely on the editors’ skewed or specific perceptions. Everybody on the Internet can go there and see what’s really in the data and what part is personal opinion.”

So intuitive and powerful is LineUp, that its creators—Lex; his adviser Hanspeter Pfister, An Wang Professor of Computer Science at SEAS; Nils Gehlenborg, a research associate at Harvard Medical School; and Marc Streit and Samuel Gratzl at Johannes Kepler University in Linz—earned the best paper award at the IEEE Information Visualization (InfoVis) conference in October 2013.

LineUp is part of a larger software package called Caleydo, an open-source visualization framework developed at Harvard, Johannes Kepler University, and Graz University of Technology. Caleydo visualizes genetic data and biological pathways—for example, to analyze and characterize cancer subtypes.

LineUp software: http://lineup.caleydo.org/

From the LineUp homepage:

While the visualization of a ranking itself is straightforward, its interpretation is not, because the rank of an item represents only a summary of a potentially complicated relationship between its attributes and those of the other items. It is also common that alternative rankings exist which need to be compared and analyzed to gain insight into how multiple heterogeneous attributes affect the rankings. Advanced visual exploration tools are needed to make this process efficient.

Interesting contrast. The blog post says that LineUp: “[we can see] what’s really in the data and what part is personal opinion” to “gain insight into how multiple heterogeneous attributes affect the rankings” at the website.

I think the website is being more realistic.

Being able to explore how the “multiple heterogeneous attributes affect the rankings” enables you to deliver rankings as close as possible to your boss’ or client’s expectations.

You can just imagine what software promoters will be doing with this. Our software is up 500% percent (Translation: We had 10 users, now we have 50 users.)

When asked they will truthfully say, it’s the best data we have.

CouchDB: The Definitive Guide [online]

Filed under: CouchDB — Patrick Durusau @ 1:55 pm

CouchDB: The Definitive Guide by J. Chris Anderson, Jan Lehnardt and Noah Slater.

The text for the 2010 version can be viewed online in English, Deutsch, Français, Espaùol.

If you want to brush up on a foreign language, what better way that working through a CS text? 😉

If you would like an easier task, try reading/commenting on the current draft. (English only)

I first saw this in a tweet from Gergő István Nagy.

Lessons From “Behind The Bloodshed”

Filed under: Data,Data Mining,Visualization — Patrick Durusau @ 12:22 pm

Lessons From “Behind The Bloodshed”

From the post:

Source has published a fantastic interview with the makers of Behind The Bloodshed, a visual narrative about mass killings produced by USA Today.

The entire interview with Anthony DeBarros is definitely worth a read but here are some highlights and commentary.

A synopsis of data issues in the production of “Behind The Bloodshed.”

Great visuals, as you would expect from USA Today.

A good illustration of simplifying a series of complex events for persuasive purposes.

That’s not a negative comment.

What other purpose would communication have if not to “persuade” others to act and/or believe as we wish?

I first saw this in a tweet by Bryan Connor.

Twitter Data Grants [Following 0 Followers 524,870 + 1]

Filed under: Data,Tweets — Patrick Durusau @ 9:43 am

Introducing Twitter Data Grants by Raffi Krikorian.

Deadline: March 15, 2014

From the post:

Today we’re introducing a pilot project we’re calling Twitter Data Grants, through which we’ll give a handful of research institutions access to our public and historical data.

With more than 500 million Tweets a day, Twitter has an expansive set of data from which we can glean insights and learn about a variety of topics, from health-related information such as when and where the flu may hit to global events like ringing in the new year. To date, it has been challenging for researchers outside the company who are tackling big questions to collaborate with us to access our public, historical data. Our Data Grants program aims to change that by connecting research institutions and academics with the data they need.

….

If you’d like to participate, submit a proposal here no later than March 15th. For this initial pilot, we’ll select a small number of proposals to receive free datasets. We can do this thanks to Gnip, one of our certified data reseller partners. They are working with us to give selected institutions free and easy access to Twitter datasets. In addition to the data, we will also be offering opportunities for the selected institutions to collaborate with Twitter engineers and researchers.

We encourage those of you at research institutions using Twitter data to send in your best proposals. To get updates and stay in touch with the program: visit research.twitter.com, make sure to follow @TwitterEng, or email data-grants@twitter.com with questions.

You may want to look at Twitter Engineering to see what has been of recent interest.

Tracking social media during the Arab Spring to separate journalists from participants could be interesting.

BTW, a factoid for today: @TwitterEng had 524,870 followers and 0 following when I first saw the grant page. Now they have 524,871 followers and 0 following. 😉

There’s another question: Who has the best following/follower ratio? Any patterns there?

I first saw this in a tweet by Gregory Piatetsky.

February 6, 2014

FoundationDB 2.0 is OUT!

Filed under: Cybersecurity,FoundationDB — Patrick Durusau @ 9:37 pm

Version 2.0 is here: PHP, Golang, Directory layer, TLS Security, and More! by David Rosenthal.

From the post:

We’re very excited to introduce FoundationDB 2.0. FoundationDB combines the power of ACID transactions with the scalability, fault tolerance, and operational elegance of distributed NoSQL databases. This release was driven by specific customer feedback for increased language support, network security, and higher-level tools for managing data within FoundationDB.

FoundationDB 2.0 adds Go and PHP to the list of languages with native FoundationDB support. There also are two new layers available in all languages: The Subspace layer provides an easy way to define and manage subspaces of keys via key prefixes. The Directory layer manages the efficient allocation and management of virtual “directories” of keys and values within a database. They work together as the recommended way to efficiently organize different kinds of data within a single FoundationDB database.

Along with the additional language and layer support, 2.0 also ships with full Transport Layer Security which encrypts all FoundationDB network traffic, enabling security and authentication between both servers and clients via a public/private key infrastructure. This allows FoundationDB to safely run on an untrusted LAN or WAN. (emphasis added)

If you know of a trusted LAN or WAN, please leave a comment below. 😉

After commenting, download a copy of FoundationDB 2.0 and see what you think of the key management features.

Datomic R-trees

Filed under: Clojure,Datomic,R-Trees — Patrick Durusau @ 9:26 pm

Datomic R-trees by James Sofra.

From the description:

Slides for a talk given at Melbourne Functional Users Group on an R-tree based spatial indexer for Datomic.

The slides do a good job explaining the advantages of Datomic for spatial data and using R-trees with it.

References from the slides that you will find helpful:

R-TREES. A Dynamic Index Structure for Spatial Searching. A. Guttman (1984)

Sort-based query-adaptive loading of R-trees, Daniar Achakeev, Bernhard Seeger, Peter Widmayer. (2012)

Sort-based parallel loading of R-trees, Daniar Achakeev, Marc Seidemann, Markus Schmidt, Bernhard Seeger. (2012)

The R*-tree: an efficient and robust access method for points and rectangles (1990), by Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger. (1990)

OMT: Overlap Minimizing Top-down Bulk Loading Algorithm for R-tree, Taewon Lee, Sukho Lee. (2003)

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree, Lars Arge. (2004)

Compact Hilbert indices, Christopher Hamilton. (2006)

R-Trees: Theory and Applications, Yannis Manolopoulos, Alexandros Nanopoulos, Apostolos N. Papadopoulos and Yannis Theodoridis. (2006)

See also: https://github.com/jsofra/datomic-rtree

Map-D: A GPU Database…

Filed under: GPU,MapD,NVIDIA — Patrick Durusau @ 8:34 pm

Map-D: A GPU Database for Real-time Big Data Analytics and Interactive Visualization by Todd Mostak (map-D) and Tom Graham (map-D). (MP4)

From the description:

map-D makes big data interactive for anyone! map-D is a super-fast GPU database that allows anyone to interact and visualize streaming big data in real time. Its unique architecture runs 70-1,000x faster than other in-memory databases or big data analytics platforms. To boot, it works with any size or kind of dataset; works with data that is streaming live on to the system; uses cheap, off-the-shelf hardware; is easily scalable.map-D is focused on learning from big data. At the moment, the map-D team is working on projects with MIT CSAIL, the Harvard Center for Geographic Analysis and the Harvard-Smithsonian Center for Astrophysics. Join Todd Mostak and Tom Graham, key members of the map-D team, as they demonstrate the speed and agility of map-D and describe the live processing, search and mapping of over 1 billion tweets.

I have been haunting the GTC On-Demand page waiting for this to be posted.

I had to download the MP4. (Approximately 124 MB) Suspect they are creating a lot of traffic at the GTC On-Demand page.

As a bonus, see also:

Map-D: GPU-Powered Databases and Interactive Social Science Research in Real Time by Tom Graham (Map_D) and Todd Mostak (Map_D) (streaming) or PDF.

From the description:

Map-D (Massively Parallel Database) uses multiple NVIDIA GPUs to interactively query and visualize big data in real-time. Map-D is an SQL-enabled column store that generates 70-400X speedups over other in-memory databases. This talk discusses the basic architecture of the system, the advantages and challenges of running queries on the GPU, and the implications of interactive and real-time big data analysis in the social sciences and beyond.

Suggestions of more links/papers on Map-D greatly appreciated!

Enjoy!

PS: Just so you aren’t too shocked, the Twitter demo involves scanning a billion row database in 5 mili-seconds.

Knowledge Base Completion…

Filed under: Knowledge,Knowledge Discovery,Searching — Patrick Durusau @ 8:01 pm

Knowledge Base Completion via Search-Based Question Answering by Robert West, et.al.

Abstract:

Over the past few years, massive amounts of world knowledge have been accumulated in publicly available knowledge bases, such as Freebase, NELL, and YAGO. Yet despite their seemingly huge size, these knowledge bases are greatly incomplete. For example, over 70% of people included in Freebase have no known place of birth, and 99% have no known ethnicity. In this paper, we propose a way to leverage existing Web-search–based question-answering technology to fill in the gaps in knowledge bases in a targeted way. In particular, for each entity attribute, we learn the best set of queries to ask, such that the answer snippets returned by the search engine are most likely to contain the correct value for that attribute. For example, if we want to find Frank Zappa’s mother, we could ask the query “who is the mother of Frank Zappa”. However, this is likely to return “The Mothers of Invention”, which was the name of his band. Our system learns that it should (in this case) add disambiguating terms, such as Zappa’s place of birth, in order to make it more likely that the search results contain snippets mentioning his mother. Our system also learns how many different queries to ask for each attribute, since in some cases, asking too many can hurt accuracy (by introducing false positives). We discuss how to aggregate candidate answers across multiple queries, ultimately returning probabilistic predictions for possible values for each attribute. Finally, we evaluate our system and show that it is able to extract a large number of facts with high confidence.

I was glad to see this paper was relevant to searching because any paper with Frank Zappa and “The Mothers of Invention” in the abstract deserves to be cited. 😉 I will tell you that story another day.

It’s heavy reading and I have just begun but I wanted to mention something from early in the paper:

We show that it is better to ask multiple queries and aggregate the results, rather than rely on the answers to a single query, since integrating several pieces of evidence allows for more robust estimates of answer correctness.

Does the use of multiple queries run counter to the view that querying a knowledge base, be it RDF or topic maps or other, should result in a single answer?

If you were to ask me a non-trivial question five (5) days in a row (same question) you would get at least five different answers. All in response to the same question but eliciting slightly different information.

Should we take the same approach to knowledge bases? Or do we in fact already do take that approach by querying search engines with slightly different queries?

Thoughts?

I first saw this in a tweet by Stefano Bertolo.

Should Everybody Learn to Code?

Filed under: Computer Science,CS Lectures,Programming — Patrick Durusau @ 7:41 pm

Should Everybody Learn to Code? by Esther Shein.

Interesting essay but most of the suggestions read like this one:

Just as students are taught reading, writing, and the fundamentals of math and the sciences, computer science may one day become a standard part of a K–12 school curriculum. If that happens, there will be significant benefits, observers say. As the kinds of problems we will face in the future will continue to increase in complexity, the systems being built to deal with that complexity will require increasingly sophisticated computational thinking skills, such as abstraction, decomposition, and composition, says Wing.

“If I had a magic wand, we would have some programming in every science, mathematics, and arts class, maybe even in English classes, too,” says Guzdial. “I definitely do not want to see computer science on the side … I would have computer science in every high school available to students as one of their required science or mathematics classes.”

But university CS programs for the most part don’t teach people to code. Rather they teach computer science in the abstract.

Moreover, coding practice isn’t necessary to contribute to computer science, as illustrated in “History of GIS and Early Computer Cartography Project,” by John Hessler, Cartographic Specialist, Geography and Map Division, Library of Congress.

As part of a project to collect early GIS materials, the following was discovered in an archive:

One set of papers in particular, which deserves much more attention from today’s mapmakers, historians, and those interested in the foundations of current geographic thought, is the Harvard Papers in Theoretical Geography. These papers, subtitled, “Geography and the properties of surfaces,” detail the lab’s early experiments in the computer analysis of cartographic problems. They also give insight into the theoretical thinking of many early researchers as they experimented with theorems from algebraic topology, complex spatial analysis algorithms, and various forms of abstract algebras to redefine the map as a mathematical tool for geographic analysis. Reading some of the titles in the series today, for example, “Hyper-surfaces and Geodesic Lines in 4-D Euclidean Space and The Sandwich Theorem: A Basic One for Geography,” gives one a sense of the experimentation and imaginative thinking that surrounded the breakthroughs necessary for the development of our modern computer mapping systems.

And the inspiration for this work?

Aside from the technical aspects that archives like this reveal, they also show deeper connections with cultural and intellectual history. They demonstrate how the practitioners and developers of GIS found themselves compelled to draw both distinctions and parallels with ideas that were appearing in the contemporary scholarly literature on spatial and temporal reasoning. Their explorations into this literature was not limited to geographic ideas on lived human space but also drew on philosophy, cognitive science, pure mathematics, and fields like modal logic—all somehow to come to terms with the diverse phenomena that have spatiotemporal extent and that might be mapped and analyzed.

Coding is a measurable activity but being measurable doesn’t mean it is the only way to teach abstract thinking skills.

The early days of computer science, including compiler research, suggest coding isn’t require to learn abstract thinking skills.

Coding is a useful skill but let’s not confuse a skill or even computer science with abstract thinking skills. Abstract thinking is needed in many domains and we will all profit from not defining it too narrowly.

I first saw this in a tweet from Tim O’Reilly, who credits Simon St. Laurent with the discovery.

February 5, 2014

Dead Sea Scrolls Updated!

Filed under: Bible — Patrick Durusau @ 4:26 pm

Well, actually not! 😉 but the Leon Levy Dead Sea Scrolls Digital Library has been upgraded!

From their Facebook page:

A second, upgraded version of the Leon Levy Dead Sea Scrolls Digital Library was launched today. Visitors to the new website (www.deadseascrolls.org.il) will be able to view and explore 10,000 newly uploaded images of unprecedented quality. The website also offers accompanying explanations pertaining to a variety of manuscripts, such as the book of Exodus written in paleo-Hebrew script, the books of Samuel, the Temple Scroll, Songs of Shabbat Sacrifice, and New Jerusalem.

The upgraded website comprises many improvements: 10,000 new multispectral images, improved metadata, additional manuscript descriptions, content pages translated into Russian and German in addition to the current languages, a faster search engine, easy access from the site to the facebook page and to twitter and more.

Imagine that! 10,000 new images.

Pass this on to academic publishers worried about control over works they can’t give away.

The ranks of the “frightened by public access” to non-commercial content are growing thinner.

CIARD RING

Filed under: Agriculture,DCAT — Patrick Durusau @ 4:06 pm

CIARD RING

From the about page:

The CIARD Routemap to Information Nodes and Gateways (RING) is a project implemented within the Coherence in Information for Agricultural Research for Development (CIARD) initiative and is led by the Global Forum on Agricultural Research (GFAR).

The RING is a global directory of web-based information services and datasets for agricultural research for development (ARD). It is the principal tool created through the CIARD initiative to allow information providers to register their services and datasets in various categories and so facilitate the discovery of sources of agriculture-related information across the world.

The RING aims to provide an infrastructure to improve the accessibility of the outputs of agricultural research and of information relevant to ARD management.

The registry of resources is being leveraged to provide more advanced services, based on the Data Catalogue Vocabulary (DCAT).

Agriculture is an ongoing and vital activity. No shortage of data to be collected, reconciled and repackaged as an information product.

I first saw this in a tweet by Stefano Bertolo.

SNB Graph Generator

Filed under: Benchmarks,Linked Data — Patrick Durusau @ 3:47 pm

Social Network Benchmark (SNB) Graph Generator by Peter Boncz.

Slides from FOSDEM2014.

Be forewarned, the slide are difficult to read due to heavy background images.

Slide 17 will be of interest because of computed “…similarity of two nodes based on their (correlated) properties.” (Rhymes with “merging.”) Computationally expensive.

Slide 18, disregard nodes with too large similarity distance.

Slide 41 points to:

github.com/ldbc

ldbc.eu:8090/display/TUC

And a truncated link that I think points to:

LDBC_Status of the Semantic Publishing Benchmark.pdf but it is difficult to say because that link opens a page of fifteen (15) PDF files.

If you select “download all” it will deliver the files to you in one zip file.

Patent Search and Analysis Tools

Filed under: Intellectual Property (IP),Patents,Searching — Patrick Durusau @ 2:54 pm

Free and Low Cost Patent Search and Analysis Tools: Who Needs Expensive Name Brand Products? by Jackie Hutter.

From the post:

In private conversations, some of my corporate peers inform me that they pay $1000′s per year (or even per quarter for larger companies) for access to “name brand” patent search tools that nonetheless do not contain accurate and up to date information. For example, a client tells me that one of these expensive tools fails to update USPTO records on a portfolio her company is monitoring and that the PAIR data is more than 1 year out of date. This limits the effectiveness of the expensive database by requiring her IP support staff to check each individual record on a regular basis to update the data. Of course, this limitation defeats the purpose of spending the big bucks to engage with a “name brand” search tool.

Certainly, one need not have sympathy for corporate IP professionals who manage large department budgets–if they spend needlessly on “name brand” tools and staff to manage the quality of such tools, so be it. But most companies with IP strategy needs do not have money and staff to purchase such tools, let alone to fix the errors in the datasets obtained from them. Others might wish not to waste their department budgets on worthless tools. To this end, over the last 5 years, I have used a number of free and low cost tools in my IP strategy practice. I use all of these tools on a regular basis and have personally validated the quality and validity of each one for my practice.
….

Jackie makes two cases:

First, there are free tools that perform as well or better than commercial patent tools. A link is offered to a list of them.

Second, and more importantly from my perspective, is the low cost tools leave a lot to be desired in terms of UI and usability.

Certainly enough room for an “inexpensive” but better than commercial-grade patent search service to establish a market.

Or perhaps a more expensive “challenge” tool that warns subscribers about patents close to theirs.

I first saw this in a tweet by Lutz Maicher.

Hshtags (Search Engine)

Filed under: Hashtags,Search Engines — Patrick Durusau @ 2:34 pm

Hshtags (Search Engine)

At this point you can select to search hashtags on Facebook, Flickr, Instagram, Twitter and Vimeo. Which means, of course, that you have to authorize Hshtags to see your posts, friends, post for you (which I have never understood), etc.

How useful Hshtags will be depends on the subject. I can’t imagine very much content of interest on Facebook, Flickr and Instagram about semantic integration. Could be, not to blame the medium, but it seems unlikely.

For my purposes, searching across both Twitter and Vimeo for “popular” hashtags will be useful (as popular as semantic integration ever gets).

More useful to me would be a search engine that reported tags used by blogs with links back to the blogs using those tags.

That would be really useful in terms of defining communities and using terminology that is widely accepted. Even if just WordPress, Blogger, and other major blogging platforms.

One very nice aspects of Hshtags, the registration is in a large enough font to be easily readable!!! It’s a small thing but deeply appreciated none the less.

I first saw this in a tweet from Inge Henriksen.

Neo4j 2.0.1 Maintenance Release

Filed under: Graphs,Neo4j — Patrick Durusau @ 1:56 pm

Neo4j 2.0.1 Maintenance Release by Mark Needham.

From the post:

Today we’re releasing the latest version of the 2.0 series of Neo4j, version 2.0.1. For more details on Neo4j 2.0.0 see the December release blog post.

This is a maintenance release and has no new features although it contains significant stability and performance improvements.

Download. Release notes.

Take the time to review the GraphGist Challenge entries for the December contest.

While you do that, look for graph community members to follow on Twitter.

« Newer PostsOlder Posts »

Powered by WordPress