Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 15, 2011

Network Workbench

Filed under: Networks,Visualization — Patrick Durusau @ 6:25 am

Network Workbench: A Workbench for Network Scientists

From the website:

Network Workbench: A Large-Scale Network Analysis, Modeling and Visualization Toolkit for Biomedical, Social Science and Physics Research.

This project will design, evaluate, and operate a unique distributed, shared resources environment for large-scale network analysis, modeling, and visualization, named Network Workbench (NWB). The envisioned data-code-computing resources environment will provide a one-stop online portal for researchers, educators, and practitioners interested in the study of biomedical, social and behavioral science, physics, and other networks.

The NWB will support network science research across scientific boundaries. Users of the NWB will have online access to major network datasets or can upload their own networks. They will be able to perform network analysis with the most effective algorithms available. In addition, they will be able to generate, run, and validate network models to advance their understanding of the structure and dynamics of particular networks. NWB will provide advanced visualization tools to interactively explore and understand specific networks, as well as their interaction with other types of networks.

A major computer science challenge is the development of an algorithm integration framework that supports the easy integration and dissemination of existing and new algorithms and can deal with the multitude of network data formats in existence today. Another challenge is the design and implementation of an easy to use menu-based, online portal interface for interactive algorithm selection, data manipulation, user and session management. The NWB will be evaluated in diverse research projects and educational settings in biology, social and behavioral science, and physics research. It will be well documented and available as open source for easy duplication and usage at other sites. An annual summer school and a series of workshops and tutorials are planned to introduce the tool to diverse research communities.

April 14, 2011

Deduplication

Filed under: Duplicates,Lucene,Record Linkage — Patrick Durusau @ 7:25 am

Deduplication

Lars Marius Garshol slides from an internal Bouvet conference on deduplication of data.

And, DUplicate KillEr, DUKE.

As Lars points out, people have been here before.

I am not sure I share Lars’ assessment of the current state of record linkage software.

Consider for example, FRIL – Fine-Grained Record Integration and Linkage Tool, which is described as:

FRIL is FREE open source tool that enables fast and easy record linkage. The tool extends traditional record linkage tools with a richer set of parameters. Users may systematically and iteratively explore the optimal combination of parameter values to enhance linking performance and accuracy.
Key features of FRIL include:

  • Rich set of user-tunable parameters
  • Advanced features of schema/data reconciliation
  • User-tunable search methods (e.g. sorted neighborhood method, blocking method, nested loop join)
  • Transparent support for multi-core systems
  • Support for parameters configuration
  • Dynamic analysis of parameters
  • And many, many more…

I haven’t used FRIL but do note that it has documentation, videos, etc. for user instruction.

I have reservations about record linkage in general, but those are concerns about re-use of semantic mappings and not record linkage per se.

Collaborative Graph Editing with GitGraph

Filed under: Blueprints,Graphs,Gremlin — Patrick Durusau @ 7:25 am

Collaborative Graph Editing with GitGraph

Anyone interested in graph editing really needs to watch this series of posts on the gremlin-users list.

Joshua Shinavier started the thread with a post that read in part:

I would like to draw attention to a new utility for Blueprints which was motivated as follows. Lately, I have been faced with the problem of trying to synchronize graph-y data between a mobile phone and a desktop application. This is hard not only because the data model I had in mind, RDF, is complicated, but also by some basic requirements (on top of just getting the data to look the same on both devices):

1) it should be possible to load only a portion of the data on the phone, and to push and pull changes to that portion without corrupting the overall graph
2) it should be easy to revert changes, and it would be nice to be able to branch
3) collaborators should be able to contribute changes to the graph, as well

This morning, it occurred to me that we could have these features in Blueprints if we just serialize graphs in a way which plays well with Git. I then spent all day coding, and the result is GitGraph, a persistent Graph implementation (currently layered on top of TinkerGraph) which stores its data in a hierarchy of canonically
ordered, diff-friendly plain text files. You can check a GitGraph directory into GitHub, fork, edit and merge it just as you would a piece of software. Also cool:

1) you can load subdirectories of a GitGraph as standalone graphs, and edit them independently of the rest of the graph
2) placing two or more GitGraphs in the same directory creates a super-GitGraph which you can load as one graph. You can then create edges which span the two graphs and create new top-level vertices. You can go back to a view of the individual graphs at any time.
3) no additional API, apart from the GitGraph constructor

Not hard to see why that provoked a wave of enthusiastic posts.

Somebody is going to hate me: NoSPARQL – Post

Filed under: Graphs,SPARQL — Patrick Durusau @ 7:24 am

Somebody is going to hate me: NoSPARQL

A refreshing look at database technology by someone with a sense of history, at least of technology.

Our techniques, capabilities and concerns have evolved or at least changed since the last time various database technologies collided.

The outcome is too uncertain to predict but interesting times are ahead!

There’s no schema for Science – CouchDB in Research

Filed under: CouchDB,NoSQL — Patrick Durusau @ 7:23 am

There’s no schema for Science – CouchDB in Research

Erlang Factory 2011 video of presentation by Nitin Borwankar on CouchDB.

From the website:

The cutting edge and constantly evolving nature of scientific research makes it very hard to use relational databases to model scientific data. When a hypothesis changes, the observations change and the schema changes – large volumes of data may have to be migrated. This makes it very hard for researchers and they end up using spreadsheets and flat files since they are more flexible. Enter CouchDB and the schemaless model. The talk will take three real world examples and generalize to extract some principles and help identify where you might apply these.

TMDM to Redis Schema (paper)

Filed under: Graphs,Redis,TMDM — Patrick Durusau @ 7:23 am

Yet another mapping of the Topic Maps Data Model to Redis schema

By Johannes Schmidt :

In this document another mapping of the Topic Maps Data Model (TMDM) [3] to Redis key-value store [8] schema is drafted. An initial mapping [5] of the TMDM to Redis schema has been provided by the Topic Maps Lab of the University of Leipzig [9]. The main motivation is not to design a “better” schema but to simply do a mapping of the TMDM to a key-value store schema. Some valuable enhancements for the Topic Maps Lab schema are created, though.

Possible guide to mapping the TMDM to key-value store databases.

Something to consider would be mapping the TMDM to a graph database.

Do topic, association, and occurrence become nodes?

ZGRViewer, a GraphViz/DOT Viewer

Filed under: Graphs,Visualization — Patrick Durusau @ 7:22 am

ZGRViewer, a GraphViz/DOT Viewer

From the website:

ZGRViewer is a graph visualizer implemented in Java and based upon the Zoomable Visual Transformation Machine. It is specifically aimed at displaying graphs expressed using the DOT language from AT&T GraphViz and processed by programs dot, neato or others such as twopi.

ZGRViewer is designed to handle large graphs, and offers a zoomable user interface (ZUI), which enables smooth zooming and easy navigation in the visualized structure.

ZGRViewer should be able to load any file that uses the DOT language to describe the graph.

Cytoscape

Filed under: Graphs,Networks,Visualization — Patrick Durusau @ 7:21 am

Cytoscape: An Open Source Platform for Complex-Network Analysis and Visualization

From the website:

Cytoscape is an open source software platform for visualizing complex-networks and integrating these with any type of attribute data. A lot of plugins are available for various kinds of problem domains, including bioinformatics, social network analysis, and semantic web.

Alluded to in: AllegroMCOCE: GPU-accelerated Cytoscape Plugin TM Explorer? but encountering Cytoscape again, decided it needed a separate posting.

April 13, 2011

Neo4j 1.3 “Abisko Lampa” Released

Filed under: Graphs,Neo4j — Patrick Durusau @ 1:26 pm

Neo4j 1.3 “Abisko Lampa” Released

From the release post:

Today we’ve released the 1.3 GA (General Availability) version of the popular open source graph database Neo4j. As well as a slew of new features and improvements, we’re thrilled to announce that the community edition is now entirely GPLv3 licensed – permissive and accommodating to your needs!

….

Each database can now contain 32 billion nodes/relationships and up to 64 billion properties. That’s enough to usefully store Earth’s population for a while to come.

….

The database footprint has been reduced, thanks to an new storage strategy for common short strings (e.g. zip codes). For many data sets, this results in dramatically smaller files on disk and a bonus performance bump.

….

The completely revamped Webadmin tool looks stunning and is much more usable, with a new integrated visual data browser that lets you jump into and explore a pictorial representation of the graph from any Webadmin screen in just a click!

What are you waiting for? Go download a copy. 😉

One Mashboard to Rule Them All

Filed under: BI,Data Integration,Mashups — Patrick Durusau @ 1:26 pm

One Mashboard to Rule Them All

From the announcement:

Webinar Overview: We’ll be showcasing real-world examples of Jaspersoft dashboards,adding to your already extensive technical knowledge. Dashboards, with their instant answers for executives and business users, and mashboards, ideal for integrating multiple data sources for improved organizational decision-making are among the most frequently requested BI deliverables. Join us for everything you wanted to know about Jaspersoft Platforms.

April 20, 2011 1:00 pm, Eastern Daylight Time (New York, GMT-04:00)
April 20, 2011 10:00 am, Pacific Daylight Time (San Francisco, GMT-07:00)
April 20, 2011 6:00 pm, Western European Summer Time (London, GMT+01:00)

There is an open source side to Jaspersoft, Jasperforge.org.

Stats from the JasperForge.org site:

206224 members
163 today
1707 last 7 days
6643 last 30 days
13176296 downloads
255 public projects
182 private projects
85193 forum entries

A community where I would like to pose the question: “How do you re-use a mashup created by someone else?”

And given that it has an open source side, a place to pose topic maps as an answer.

Knowledge Representation and Reasoning with Graph Databases

Filed under: Artificial Intelligence,Graphs,Knowledge Representation — Patrick Durusau @ 1:24 pm

Knowledge Representation and Reasoning with Graph Databases

Just in case you aren’t following Marko A. Rodriguez:

A graph database and its ecosystem of technologies can yield elegant, efficient solutions to problems in knowledge representation and reasoning. To get a taste of this argument, we must first understand what a graph is. A graph is a data structure. There are numerous types of graph data structures, but for the purpose of this post, we will focus on a type that has come to be known as a property graph. A property graph denotes vertices (nodes, dots) and edges (arcs, lines). Edges in a property graph are directed and labeled/typed (e.g. “marko knows peter”). Both vertices and edges (known generally as elements) can have any number of key/value pairs associated with them. These key/value pairs are called properties. From this foundational structure, a suite of questions can be answered and problems solved.

See the post for the details.

7th international Digital Curation Conference

Filed under: Conferences,Curation — Patrick Durusau @ 1:23 pm

7th international Digital Curation Conference

Call for papers where topics include:

  • Lessons learned from the inter-disciplinary use of open data: examples of enablers, barriers and success stories
  • Curation of mixed data collections, with open and sensitive or private content
  • Gathering evidence for benefits of data sharing
  • Building capacity for the effective management, sharing and reuse of open data
  • Scale issues in the management of sensitive data
  • Tensions between maintaining quality and openness
  • Linked data, open data, closed data and provenance
  • Technical and organisational solutions for data security
  • Developing new metrics for open data
  • Ethical issues and personal data
  • Legislation and open data

Submission deadline: 25 July 2011

Conference:

5 – 7 December 2011
Marriott Royal Hotel, Bristol, UK

Topincs 5.4.3

Filed under: Topic Map Software,Topincs — Patrick Durusau @ 1:22 pm

Topincs 5.4.3

Robert Cerny has released Topincs 5.4.3.

From the release:

This version simplifies the Topincs help system. The self-explanatory nature of using Topincs leaves the help usually just as a list of available keyboard shortcuts. The help can be accessed with Alt-Shift-H or Alt-Shift-Control.

In addition some bugs were fixed and the access to resources (like images) for styling a Topincs store was made possible.

April 12, 2011

SemTech 2011 – London/Washington, D.C.

Filed under: Conferences,Semantics — Patrick Durusau @ 12:06 pm

SemTech 2011 – London/Washington, D.C.

London: September 25-27, 2011

Washington, D.C. November 29 – December 1, 2011

Apparently a common deadline for proposals of Monday, May 2, 2011.

Interesting range of semantic web, linked data, text analysis, data/content management, etc.

I suppose the real question is balancing the hassle of U.K. airport security against the odds of nasty weather in D.C.

Since both are virtually certain, it may come down to which one is more cost effective and convenient.

Interactive Graph – cube

Filed under: Graphs,Visualization — Patrick Durusau @ 12:05 pm

Interactive Graph – cube

The link I saw took me to the cube.

Quite by accident I happened upon:

http://arborjs.org/halfviz/#/mystery-of-the-secret-room

Now that starts to look really interesting!

Enjoy!

Last Call: RDFa Core 1.1, XHTML+RDFa 1.1

Filed under: RDFa — Patrick Durusau @ 12:04 pm

Last Call: RDFa Core 1.1, XHTML+RDFa 1.1

After posting the link to the slides on RDFa1.1 and R2ML, I went to the W3C website to check on the proposed revision of RDF (more on that later).

Anyway, I ran across the last call on RDFa Core 1.1, which reads in part:

The RDFa Working Group has published Last Call Working Drafts of RDFa Core 1.1 and XHTML+RDFa 1.1. The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience.

But where it says: “These documents contain significant amounts of structured data, which is largely unavailable to tools and applications.”, that’s not really true.

Unless search and rendering engines have been doing a real good imitation of treating that structured data as though it were available.

What I think is meant is that the semantics of the structured data has not been specified, which is an entirely different question that making it available to tools and applications.

It is an important difference because as the experience with Linked Data shows, people have many different semantics that they associate with the same data.

When tools can read, or remain ignorant of, the many different semantics associated with the same data, what is this …new world of user functionality… that becomes available?

That’s the part that I am missing.

RDFa1.1 and R2ML

Filed under: R2ML,RDFa — Patrick Durusau @ 12:03 pm

RDFa1.1 and R2ML

A presentation from November, 2010 at the Benelux Semantic Web Meetup, Amsterdam by Ivan Herman of the W3C.

Nothing startling if you have kept up with RDF/RDFa work at the W3C but a good summary if you have not.

Cloudera Hadoop V3 Features

Filed under: Hadoop — Patrick Durusau @ 12:03 pm

Cloudera Hadoop V3 Features

There isn’t any point in copying the long feature list for V3, go have a look for yourself!

And/or bookmark the www.cloudera.com homepage.

I am particularly interested in this release because it includes support for 64-bit Ubuntu.

Spreadsheet Data Connector Released

Filed under: Data Mining,Software,Topic Map Software — Patrick Durusau @ 12:02 pm

Spreadsheet Data Connector Released

From the website:

This project contains an abstract layer on top of the Apache POI library. This abstraction layer provides the Spreadsheet Query Language – eXql and additional method to access spreadsheets. The current version is designed to support the XLS and XLSX format of Microsoft© Excel® files.

The Spreadsheet Data Connector is well suited for all use cases where you have to access data in Excel sheets and you need a sophisticated language to address and query the data.

Will have to ask when we will see a connector for ODF based spreadsheets.

April 11, 2011

A Data Parallel toolkit for Information Retrieval

Filed under: Data Mining,Information Retrieval,Search Algorithms,Searching — Patrick Durusau @ 5:53 am

A Data Parallel toolkit for Information Retrieval

From the website:

Many modern information retrieval data analyses need to operate on web-scale data collections. These collections are sufficiently large as to make single-computer implementations impractical, apparently necessitating custom distributed implementations.

Instead, we have implemented a collection of Information Retrieval analyses atop DryadLINQ, a research LINQ provider layer over Dryad, a reliable and scalable computational middleware. Our implementations are relatively simple data parallel adaptations of traditional algorithms, and, due entirely to the scalability of Dryad and DryadLINQ, scale up to very large data sets. The current version of the toolkit, available for download below, has been successfully tested against the ClueWeb corpus.

Are you using large data sets in the construction of your topic maps?

Where large is taken to mean data sets in the range of one billion documents. (http://boston.lti.cs.cmu.edu/Data/clueweb09/)

The authors of this work are attempting to extend access to large data sets to a larger audience.

Did they succeed?

Is their work useful for smaller data sets?

What tools would you add to assist more specifically with topic map construction?

University of Florida Sparse Matrix Collection

Filed under: Algorithms,Computational Geometry,Graphs,Matrix,Networks — Patrick Durusau @ 5:49 am

University of Florida Sparse Matrix Collection

It’s not what you think. Well, it is but it is so much more. You really have to see the images at this site.

Abstract (from the paper by the same title):

We describe the University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications. The Collection is widely used by the numerical linear algebra community for the development and performance evaluation of sparse matrix algorithms. It allows for robust and repeatable experiments: robust because performance results with artificially-generated matrices can be misleading, and repeatable because matrices are curated and made publicly available in many formats. Its matrices cover a wide spectrum of domains, include those arising from problems with underlying 2D or 3D geometry (as structural engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, and other discretizations) and those that typically do not have such geometry (optimization, circuit simulation, economic and financial modeling, theoretical and quantum chemistry, chemical process simulation, mathematics and statistics, power networks, and other networks and graphs). We provide software for accessing and managing the Collection, from MATLAB, Mathematica, Fortran, and C, as well as an online search capability. Graph visualization of the matrices is provided, and a new multilevel coarsening scheme is proposed to facilitate this task.

A Java viewer for matrices is also found here.

CGAL: Computational Geometry Algorithms Library

Filed under: Computational Geometry — Patrick Durusau @ 5:46 am

CGAL: Computational Geometry Algorithms Library

From the website:

The goal of the CGAL Open Source Project is to provide easy access to efficient and reliable geometric algorithms in the form of a C++ library. CGAL is used in various areas needing geometric computation, such as: computer graphics, scientific visualization, computer aided design and modeling, geographic information systems, molecular biology, medical imaging, robotics and motion planning, mesh generation, numerical methods… More on the projects using CGAL web page.

The Computational Geometry Algorithms Library (CGAL), offers data structures and algorithms like triangulations (2D constrained triangulations and Delaunay triangulations in 2D and 3D, periodic triangulations in 3D), Voronoi diagrams (for 2D and 3D points, 2D additively weighted Voronoi diagrams, and segment Voronoi diagrams), polygons (Boolean operations, offsets, straight skeleton), polyhedra (Boolean operations), arrangements of curves and their applications (2D and 3D envelopes, Minkowski sums), mesh generation (2D Delaunay mesh generation and 3D surface and volume mesh generation, skin surfaces), geometry processing (surface mesh simplification, subdivision and parameterization, as well as estimation of local differential properties, and approximation of ridges and umbilics), alpha shapes, convex hull algorithms (in 2D, 3D and dD), search structures (kd trees for nearest neighbor search, and range and segment trees), interpolation (natural neighbor interpolation and placement of streamlines), shape analysis, fitting, and distances (smallest enclosing sphere of points or spheres, smallest enclosing ellipsoid of points, principal component analysis), and kinetic data structures.

A useful tool for when your topic map includes subjects that don’t come with handy labels but exist in real-time data.

NSF Workshop on Algorithms In The Field: Request Ideas

Filed under: Algorithms — Patrick Durusau @ 5:44 am

NSF Workshop on Algorithms In The Field: Request Ideas

From the post:

NSF Workshop on Algorithms In The Field (W8F) will assemble CSists from networking, databases, social networks, data mining, machine learning, graphics/vision/geometry/robotics and those from algorithms and theoretical computer science. It is a wider collection than you’d find in most meetings yet small enough to discuss or even debate.

Many of you are native experts in 8F. Many of you have first hand experience of being algorithms researchers and working with others on a common challenge or vice versa. Also, many of you have thought about frustrations and triumphs, in collaborations that involve algorithms/theory researchers.

The post goes on to list some ideas and suggests that you can roll your own.

The workshop is in the DIMACS Center, CoRE Building, Rutgers University, Piscataway, NJ, May 16–18, 2011, by invitation only, so get your suggestions/applications in early.

This would be a perfect venue for suggestions/discussions of merging algorithms.

The Wekinator

Filed under: Classifier,Machine Learning,Music Retrieval — Patrick Durusau @ 5:42 am

The Wekinator: Software for using machine learning to build real-time interactive systems

This looks very cool!

I can imagine topic maps of sounds/gestures in a number of contexts that would be very interesting.

From the website:

The Wekinator is a free software package to facilitate rapid development of and experimentation with machine learning in live music performance and other real-time domains. The Wekinator allows users to build interactive systems by demonstrating human actions and computer responses, rather than by programming.

Example applications:

  • Creation of new musical instruments
    • Create mappings between gesture and computer sounds. Control a drum machine using your webcam! Play Ableton using a Kinect!

  • Creation of gesturally-controlled animations and games
    • Control interactive visual environments like Processing or Quartz Composer, or game engines like Unity, using gestures sensed from webcam, Kinect, Arduino, etc.

  • Creation of systems for gesture analysis and feedback
    • Build classifiers to detect which gesture a user is performing. Use the identified gesture to control the computer or to inform the user how he’s doing.

  • Creation of real-time music information retrieval and audio analysis systems
    • Detect instrument, genre, pitch, rhythm, etc. of audio coming into the mic, and use this to control computer audio, visuals, etc.

  • Creation of other interactive systems in which the computer responds in real-time to some action performed by a human user (or users)
    • Anything that can output OSC can be used as a controller
    • Anything that can be controlled by OSC can be controlled by Wekinator

Probabilistic Models in the Study of Language

Filed under: Language,Probalistic Models — Patrick Durusau @ 5:41 am

Probabilistic Models in the Study of Language

From the website:

I’m in the process of writing a textbook on the topic of using probabilistic models in scientific work on language ranging from experimental data analysis to corpus work to cognitive modeling. A current (partial) draft is available here. The intended audience is graduate students in linguistics, psychology, cognitive science, and computer science who are interested in using probabilistic models to study language. Feedback (both comments on existing drafts, and expressed desires for additional material to include!) is more than welcome — send it to rlevy@ucsd.edu.

Just scanning the chapters titles this looks like a useful work for anyone concerned with language issues.

ElasticSearch.org Website Search: Field Notes

Filed under: Search Engines,Searching — Patrick Durusau @ 5:40 am

ElasticSearch.org Website Search: Field Notes

From the post:

Field notes gathered during installing and configuring ElasticSearch for http://elasticsearch.org

ElasticSearch is something you are going to encounter and these sysadmin type notes should get you started.

Movies with multiple Harry Potter wizards

Filed under: Humor,Marketing — Patrick Durusau @ 5:39 am

Movies with multiple Harry Potter wizards

From Flowingdata.com:

I feel like whenever I watch a British film, I see a Harry Potter wizard or witch in it. I guess I’m not imagining things. The Ragbag had a similar curiosity and graphed all the films with four or more wizards in it — all 24 of them.

Something for everyone to consider adding to their Harry Potter topic maps.

Rawlings is said to be considering electronic versions of the Potter series. For a reported $100 Million. I would take a chance on digital piracy for $100 Million. 😉

A topic map to navigate the series, merged in some of the better fan material could be quite interesting.

April 10, 2011

TCS: Call for papers on Graph Searching

Filed under: Graphs,Networks,Search Algorithms,Searching — Patrick Durusau @ 2:52 pm

TCS: Call for papers on Graph Searching

From the call:

Manuscripts are solicited for a special issue in the journal “Theoretical Computer Science” (TCS) on “Theory and Applications of Graph Searching Problems”. This special issue will be dedicated to the 60th birthday of Lefteris M. Kirousis.

….

  • Graph Searching and Logic
  • Graph Parameters Related to Graph Searching
  • Graph searching and Robotics
  • Conquest and Expansion Games
  • Database Theory and Robber and Marshals Games
  • Probabilistic Techniques in Graph Searching
  • Monotonicity and Connectivity in Graph Searching
  • New Variants of Graph Searching
  • Graph Searching and Distributed Computing
  • Graph Searching and Network Security

Deadline for submission is: 31 July 2011.

Interesting as a submission venue or waiting for this issue to appear.

When the Data Struts Its Stuff

Filed under: Data Analysis,Visualization — Patrick Durusau @ 2:52 pm

When the Data Struts Its Stuff

A New York Times piece on data visualization.

Probably not anything you don’t already know or at least suspect but it is well written and emphasizes the riches that await discovery.

Think of it as setting the bar for topic map applications that are going to attract a lot of positive press.

« Newer PostsOlder Posts »

Powered by WordPress