Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 19, 2011

Overview: Visualization to Connect the Dots

Filed under: Analytics,Java,Scala,Visualization — Patrick Durusau @ 7:54 pm

Overview is Hiring!

I don’t think I have ever re-posted a job ad but this one merits wide distribution:

We need two Java or Scala ninjas to build the core analytics and visualization components of Overview, and lead the open-source development community. You’ll work in the newsroom at AP’s global headquarters in New York, which will give you plenty of exposure to the very real problems of large document sets.

The exact responsibilities will depend on who we hire, but we imagine that one of these positions will be more focused on user experience and process design, while the other will do the computer science heavy lifting — though both must be strong, productive software engineers. Core algorithms must run on a distributed cluster, and scale to millions of documents. Visualization will be through high-performance OpenGL. And it all has to be simple and obvious for a reporter on deadline who has no time to fight technology. You will be expected to implement complex algorithms from academic references, and expand prototype techniques into a production application.

From the about page:

Overview is an open-source tool to help journalists find stories in large amounts of data, by cleaning, visualizing and interactively exploring large document and data sets. Whether from government transparency initiatives, leaks or Freedom of Information requests, journalists are drowning in more documents than they can ever hope to read.

There are good tools for searching within large document sets for names and keywords, but that doesn’t help find stories we’re not looking for. Overview will display relationships among topics, people, places and dates to help journalists to answer the question, “What’s in there?”

We’re building an interactive system where computers do the visualization, while a human guides the exploration. We will also produce documentation and training to help people learn how to use this system. The goal is to make this capability available to anyone who needs it.

Overview is a project of The Associated Press, supported by the John S. and James L. Knight Foundation as part of its Knight News Challenge. The Associated Press invests its resources to advance the news industry, delivering fast, unbiased news from every corner of the world to all media platforms and formats. The Knight News Challenge is an international contest to fund digital news experiments that use technology to inform and engage communities.

Sounds like a project that is worth supporting to me!

Analytics are great, but subject identity would be more useful.

Apply if you have the skill sets, repost the link, and/or volunteer to carry the good news of topic maps to the project.

July 10, 2011

Jark

Filed under: Clojure,Java — Patrick Durusau @ 3:38 pm

Jark

From the webpage:

A tool to manage classpaths and clojure namespaces on a persistent JVM

Why Jark

Startup time of the Java Virtual Machine(JVM) is too slow and thereby command-line applications on the JVM are sluggish and very painful to use. And there is no existing simple way for multiple clients to access the same instance of the JVM. Jark is an attempt to run a persistent JVM daemon and provide a set of utilities to control and operate on it.

Jark is intended to

  • deploy, maintain and debug clojure programs on remote hosts
  • provide an easy interface to run clojure programs on the command-line
  • provide a set of useful namespace and classpath utilities
  • provide a secure and robust implementation of a JVM daemon that multiple clients can connect to, seamlessly
  • provide a thin client that can run on any OS platform and with minimum runtime dependencies.
  • be VM agnostic: support for all VMs that clojure runs on in the future

In case you need a persistent JVM daemon.

July 7, 2011

Boilerpipe

Filed under: Data Mining,Java — Patrick Durusau @ 4:15 pm

Boilerpipe

From the webpage:

The boilerpipe library provides algorithms to detect and remove the surplus “clutter” (boilerplate, templates) around the main textual content of a web page.

The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.

Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.

Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0.

Should save you some time when harvesting data from webpages.

July 6, 2011

The Neo4j Rest API. My Notebook

Filed under: Bioinformatics,Biomedical,Java,Neo4j — Patrick Durusau @ 2:14 pm

The Neo4j Rest API. My Notebook

From the post:

Neo4j is a open-source graph engine implemented in Java. This post is my notebook for the Neo4J-server, a server combining a REST API and a webadmin application into a single stand-alone server.

Nothing new in this Neo4j summary but Pierre Lindenbaum profiles himself: “PhD in Virology, bioinformatics, genetics, science, geek, java.”

Someone worth watching in the Neo4j/topic map universe.

July 4, 2011

Spring Data Graph with Neo4j Support

Filed under: Graphs,Java,Neo4j,Spring Data — Patrick Durusau @ 6:05 pm

Spring Data Graph with Neo4j Support

From the homepage:

Spring Data Graph enables POJO based development for Graph Databases like Neo4j. It extends annotated entity classes with transparent mapping functionality. A template programming model equivalent to well known Spring templates is also supported. Spring Data Graph is part of the bigger Spring Data project which aims to provide convenient support for NoSQL databases.


Here is an overview of Spring Data Graph features

  • Support for property graphs (nodes connected via relationships, each with arbitrary properties)
  • Transparent mapping of annotated POJO entities (via AspectJ
  • Neo4jTemplate with convenient API, exception translation and optional transaction management
  • Different type representation strategies for keeping type information in the graph
  • Dynamic type projections (duck typing)
  • Spring Data Commons Repositories Support
  • Cross-store support for partial JPA – Graph Entities
  • Neo4j Traversal support on dynamic fields and via repository methods
  • Neo4j Indexing support (including full-text and numeric range queries)
  • Support for JSR-303 (Bean Validation)
  • Support for the Neo4j Server
  • Support for running as extensions in the Neo4j Server

If Neo4j or another NoSQL database is on your agenda, take a long look.

March 27, 2011

Category Theory for the Java Programmer

Filed under: Category Theory,Java — Patrick Durusau @ 3:15 pm

Category Theory for the Java Programmer

From the post:

There are several good introductions to category theory, each written for a different audience. However, I have never seen one aimed at someone trained as a programmer rather than as a computer scientist or as a mathematician. There are programming languages that have been designed with category theory in mind, such as Haskell, OCaml, and others; however, they are not typically taught in undergraduate programming courses. Java, on the other hand, is often used as an introductory language; while it was not designed with category theory in mind, there is a lot of category theory that passes over directly.

I’ll start with a sentence that says exactly what the relation is of category theory to Java programming; however, it’s loaded with category theory jargon, so I’ll need to explain each part.

A collection of Java interfaces is the free3 cartesian4 category2 with equalizers5 on the interface6 objects1 and the built-in7 objects.

Interested?

March 19, 2011

Domain-Specific Languages:
An Annotated Bibliography

Filed under: Domain-Specific Languages,Java — Patrick Durusau @ 6:04 pm

Domain-Specific Languages: An Annotated Bibliography

Interesting but a decade old.

Anyone have a suggestion for a more recent bibliography of DSLs?

For Java I have seen: DSLs in Java from Pure Danger Tech Alex Miller’s technical blog.

DSLs are important for two reasons:

  1. Their use in creating topic map authoring languages for particular domains.
  2. The use of topic maps to map between DSLs, topic map kind and otherwise.

From the topic map side of the house, any suggestion for subjects that would merit a DSL for authoring topic maps?

PS: To what extent would you include defaulting subjects for later insertion in the construction of a DSL for topic maps?

Such as entry of say baseball players in a baseball DSL defaults a player association with their last known team unless otherwise specified?

March 14, 2011

Groovy

Filed under: Domain-Specific Languages,Groovy,Java — Patrick Durusau @ 7:56 am

Groovy

I am particularly interested in Groovy’s support for Domain-Specific Languages.

It occurs to me that providing users with a domain-specific language is very close to issues that surround the design of interfaces for users.

That is you don’t write a “domain-specific language” and then expect others to use it. Well, you could but uptake might be iffy.

Rather the development of a “domain-specific language” is done with subject matter experts and their views are incorporated into the language.

Sounds like that might be an interesting approach to authoring topic maps in some contexts.

From the website:

  • is an agile and dynamic language for the Java Virtual Machine
  • builds upon the strengths of Java but has additional power features inspired by languages like Python, Ruby and Smalltalk
  • makes modern programming features available to Java developers with almost-zero learning curve
  • supports Domain-Specific Languages and other compact syntax so your code becomes easy to read and maintain
  • makes writing shell and build scripts easy with its powerful processing primitives, OO abilities and an Ant DSL
  • increases developer productivity by reducing scaffolding code when developing web, GUI, database or console applications
  • simplifies testing by supporting unit testing and mocking out-of-the-box
  • seamlessly integrates with all existing Java classes and libraries
  • compiles straight to Java bytecode so you can use it anywhere you can use Java

Questions:

  1. What areas of library activity already have Domain-Specific Languages, albeit not in executable computer syntaxes?
  2. Which ones do you think would benefit from the creation of an executable Domain-Specific Language?
  3. How would you use topic maps to document such a Domain-Specific Language?
  4. How would your topic map record changing interpretations over time for apparently constant terms?

February 17, 2011

Encog Java and DotNet Neural Network Framework

Filed under: .Net,Encog,Java,Machine Learning,Neural Networks,Silverlight — Patrick Durusau @ 6:56 am

Encog Java and DotNet Neural Network Framework

From the website:

Encog is an advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks. Encog has been in active development since 2008.

Encog is available for Java, .Net and Silverlight.

An important project for at least two reasons.

First, the obvious applicability to the creation of topic maps using machine learning techniques.

Second, it demonstrates that supporting Java, .Net and Silverlight, isn’t, you know, all that weird.

The world is changing and becoming, somewhat more interoperable.

Topic maps has a role to play in that process, both in terms of semantic interoperability of the infrastructure as well as the data it contains.

February 15, 2011

Scala: Introduction to Scala for Java Programmers

Filed under: Java,Scala — Patrick Durusau @ 11:27 am

Scala: Introduction to Scala for Java Programmers by Adam Rabung.

Useful for Java programmers looking at Scala for topic map development.

« Newer Posts

Powered by WordPress