Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 19, 2011

uClassify

Filed under: Classifier — Patrick Durusau @ 4:19 pm

uClassify

Web-based interface that allows users to create their own text classifiers.

For example, web-interface allows up to 400 words to classify a text.

Has possibilities, both as a service as well as an idea for a number of similar services.

Open Search With Lucene & Solr

Filed under: Lucene,Solr — Patrick Durusau @ 4:13 pm

Open Search With Lucene & Solr

Nothing new but a nice overview of Lucene and Lucene based search applications.

For authoring and maintaining topic maps a good knowledge of search engines is essential.

Intelligent Ruby + Machine Learning

Filed under: Artificial Intelligence,Learning Classifier,Machine Learning — Patrick Durusau @ 4:11 pm

Intelligent Ruby + Machine Learning

Entertaining slide deck that argues more data is better than less data and more models.

That its essential point but it does conclude with useful references.

It also has examples that may increase your interest in “machine learning.”

February 18, 2011

Ehchache Search

Filed under: Ehcache — Patrick Durusau @ 7:00 am

Ehcache Search

From the website:

Ehcache Search is an addition to the core Ehcache API that lets you query, search and analyze billions of cache entries in memory, with results to complex searches returned in less than a second. By querying the cache directly, you can avoid the time-consuming and expensive process of querying the database, then mapping query results to cache lookups.

I ran across this yesterday in a web article with no hyperlinks and pointers only to other articles on itself. Honoring that degree of self-absorption by not mentioning it by name. 😉

Another potential way to improve topic map engine performance.

There is a recent survey of Ehcache users that you may want to read.

Large Scale Packet Dump Analysis with MongoDB

Filed under: Marketing,MongoDB,Subject Identity — Patrick Durusau @ 6:51 am

Large Scale Packet Dump Analysis with MongoDB

I mention this because it occurs to me that distributed topic maps could be a way to track elusive web traffic that passes through any number of servers from one location to another.

I will have to pull Stevens’ TCP/IP Illustrated off the shelf to look up the details.

Thinking that subject identity in this case would be packet content and not the usual identifiers.

And that with distributed topic maps, no one map would have to process all the load.

Instead, upon request, delivering up proxies to be merged with other proxies, which could then be displayed as partial paths through the next works with the servers where changes took place being marked.

The upper level topic maps being responsible for processing summaries of summaries of data, but with the ability to drill back down into the actual data.

True, there is a lot of traffic, but simply by dumping all the porn, that reduces the problem by a considerable percentage. I am sure there are other data collection improvements that could be made.

NECOBELAC

Filed under: Biomedical,Marketing,Medical Informatics — Patrick Durusau @ 5:37 am

NECOBELAC

From the webpage:

NECOBELAC is a Network of Collaboration Between Europe & Latin American-Caribbean countries. The project works in the field of public health NECOBELAC aims to improve scientific writing, promote open access publication models, and foster technical and scientific cooperation between Europe & Latin American Caribbean (LAC) countries.

NECOBELAC acts through training activities in scientific writing and open access by organizing courses for trainers in European and LAC institutions.

Topic maps get mentioned in the faqs for the project: NECOBELAC Project FAQs

Is there any material (i.e. introductory manuals) explaining how the topic maps have been generated as knowledge representation and how can be optimally used?

Yes, a reliable tool introducing the scope and use of the topic maps is represented by the “TAO of topic maps” by Steve Pepper. This document clearly describes the characteristics of this model, and provides useful examples to understand how it actually works.

Well,…, but this is 2011 and NECOBELAC represents a specific project focused on public health.

Perhaps using the “TAO of topic maps” as a touchstone, but we surely can produce more project specific guidance. Yes?

Please post a link about your efforts or a comment here if you decide to help out.

Linked Data: Evolving the Web into a Global Data Space – Book

Filed under: Linked Data — Patrick Durusau @ 5:19 am

Linked Data: Evolving the Web into a Global Data Space by Tom Heath and Christian Bizer.

Abstract:

The World Wide Web has enabled the creation of a global information space comprising linked documents. As the Web becomes ever more enmeshed with our daily lives, there is a growing desire for direct access to raw data not currently available on the Web or bound up in hypertext documents. Linked Data provides a publishing paradigm in which not only documents, but also data, can be a first class citizen of the Web, thereby enabling the extension of the Web with a global data space based on open standards – the Web of Data. In this Synthesis lecture we provide readers with a detailed technical introduction to Linked Data. We begin by outlining the basic principles of Linked Data, including coverage of relevant aspects of Web architecture. The remainder of the text is based around two main themes – the publication and consumption of Linked Data. Drawing on a practical Linked Data scenario, we provide guidance and best practices on: architectural approaches to publishing Linked Data; choosing URIs and vocabularies to identify and describe resources; deciding what data to return in a description of a resource on the Web; methods and frameworks for automated linking of data sets; and testing and debugging approaches for Linked Data deployments. We give an overview of existing Linked Data applications and then examine the architectures that are used to consume Linked Data from the Web, alongside existing tools and frameworks that enable these. Readers can expect to gain a rich technical understanding of Linked Data fundamentals, as the basis for application development, research or further study.

A free HTML version is reported to be due out 1 March 2011.

Unless I am seriously mistaken (definitely a possibility), all our data and structures that hold our data, already have semantics, but thanks for asking.

To enable semantic integration we need to make those semantics explicit but that hardly requires conversion into Linked Data format.

Any more than Linked Data format enables more linking than it takes away.

As a matter of fact it takes away a lot of linking, at least if you follow its logic, because linked data can only link to other linked data. How unfortunate.

The other question I will have to ask, after a decent period following the appearance of the book, is what about the data structures of Linked Data? Do they also qualify as first class citizens of the Web?

Linked Data-a-thon – ISWC 2011

Filed under: Conferences,Linked Data,Marketing,Semantic Web — Patrick Durusau @ 5:17 am

Linked Data-a-thon http://iswc2011.semanticweb.org/calls/linked-data-a-thon/

I looked at the requirements for the Linked Data-a-thon, which include:

  • make use of Linked Data consumed from multiple data sources
  • be able to make use of additional data from other Linked Data sources
  • be accessible from the Web
  • satisfy the special requirement which will be announced on October 1, 2011.

It would not be hard to fashion a topic map application that consumed Linked Data, made use of additional data from other Linked Data sources and was accessible from the Web.

What would be interesting would be to reliably integrate other information sources, that were not Linked Data with Linked Data sources.

Don’t know about the special requirement.

One person in a team of people would actually have to be attending the conference to enter.

Anyone interested in discussing such a entry?

Suggested Team title: Linked Data Cake (1 Tsp Linked Data, 8 Cups Non-Linked Data, TM Oven – Set to Merge)

Kinda long and pushy but why not?

What better marketing pitch for topic maps than to leverage present investments in Linked Data into a meaningful result with non-linked data.

It isn’t like there is a shortage of non-linked data to choose from. 😉

10th International Semantic Web Conference (ISWC 2011) -Call for Papers

Filed under: Conferences,Ontology,Semantic Web — Patrick Durusau @ 5:15 am

10th International Semantic Web Conference (ISWC 2011) – Call for Papers

The 10th International Semantic Web Conference (ISWC 2011) will be in Bonn, Germany, Otober 23-27, 2011.

From the call:

Key Topics

  • Management of Semantic Web Data
  • Natural Language Processing
  • Ontologies and Semantics
  • Semantic Web Engineering
  • Social Semantic Web
  • User Interfaces to the Semantic Web
  • Applications of the Semantic Web

Tracks and Due Dates:

Research Papers
http://iswc2011.semanticweb.org/calls/research-papers/

Semantic Web In Use
http://iswc2011.semanticweb.org/calls/semantic-web-in-use/

Posters and Demos
http://iswc2011.semanticweb.org/calls/posters-and-demos/

Doctoral Consortium
http://iswc2011.semanticweb.org/calls/doctoral-consortium/

Tutorials http://iswc2011.semanticweb.org/calls/tutorials/

Workshops http://iswc2011.semanticweb.org/calls/workshops/

Semantic Web Challenge http://iswc2011.semanticweb.org/calls/semantic-web-challenge/

Linked Data-a-thon
http://iswc2011.semanticweb.org/calls/linked-data-a-thon/

DataLift

Filed under: Dataset,Linked Data,Ontology,RDF — Patrick Durusau @ 5:12 am

DataLift

The DataLift project will no doubt produce some useful tools and output but reading its self-description:

The project will provide tools allowing to facilitate each step of the publication process:

  1. selecting ontologies for publishing data
  2. converting data to the appropriate format (RDF using the selected ontology)
  3. publishing the linked data
  4. interlinking data with other data sources

I am struck by how futile the effort sounds in the face of petabytes of data flow, changing semantics of that data and changing semantics of other data, with which it might be interlinked.

The nearest imagery I can come up with is trying to direct the flow of a tsunami with a roll of paper towels.

It is certainly brave (I forgo usage of the other term) to try but ultimately isn’t very productive.

First, any scheme that start with conversion to a particular format is an automatic loser.

The source format is itself composed of subjects that are discarded by the conversion process.

Moreover, what if we disagree about the conversion?

Remember all the semantic diversity that gave rise to this problem? Where did it get off to?

Second, the interlinking step introduces brittleness into the process.

Both in terms of the ontology that any particular data must follow but also in terms of resolution of any linkage.

Other data sources can only be linked in if they use the correct ontology and format. And that assumes they are reachable.

I hope the project does well, but at best it will result in another semantic flavor to be integrated using topic maps.

*****
PS: The use of data heaven betrays the religious nature of the Linked Data movement. I don’t object to Linked Data. What I object to is the missionary conversion aspects of Linked Data.

DSPL: Dataset Publishing Language

DSPL: Dataset Publishing Language

From the website:

DSPL is the Dataset Publishing Language, a representation language for the data and metadata of datasets. Datasets described in this format can be processed by Google and visualized in the Google Public Data Explorer.

Features:

  • Use existing data: Just add an XML metadata file to your existing CSV data files
  • Powerful visualizations: Unleash the full capabilities of the Google Public Data Explorer, including the animated bar chart, motion chart, and map visualization
  • Linkable concepts: Link to concepts in other datasets or create your own that others can use
  • Multi-language: Create datasets with metadata in any combination of languages
  • Geo-enabled: Make your data mappable by adding latitude and longitude data to your concept definitions. For even easier mapping, link to Google’s canonical geographic concepts.
  • Fully open: Freely use the DSPL format in your own applications

For the details:

A couple quick observations:

Geared towards data that can be captured in csv files, which are considerable and interesting data sets, but only a slice of all data.

Did not appear on a quick scan of the tutorial or developer guide to provide a way to specify properties for topics.

Did not appear to provide a way to specify when (or why) topic could be merged with one another.

Plus marks for enabling navigation by topics, but that is like complimenting a nautical map for having the compass directions isn’t it?

I think this could be a very good tool for investigating data or even showing, but if you had a topic map, sort of illustrations to clients.

Moving up in the stack, both virtual as well as actual, of reading materials on my desk.

​Data visualizations for a changing world

Filed under: Data Source,Visualization — Patrick Durusau @ 5:07 am

?Data visualizations for a changing world introduced The Google Public Data Explorer.

From the website:

he Google Public Data Explorer makes large datasets easy to explore, visualize and communicate. As the charts and maps animate over time, the changes in the world become easier to understand. You don’t have to be a data expert to navigate between different views, make your own comparisons, and share your findings.
Explore the data

Students, journalists, policy makers and everyone else can play with the tool to create visualizations of public data, link to them, or embed them in their own webpages. Embedded charts and links can update automatically so you’re always sharing the latest available data…..

A deeply useful achievement but one that leaves a lot of hard to specify semantics on the cutting room floor.

For example:

  1. What subjects are being visualized? That is what would I look for additional information about if I wanted to create another visualization?
  2. What relationship between the subjects is being illustrated? Would help in terms of looking for more information relevant to that relationship.
  3. How can I specify either #1 or #2 so that I can pass that along to someone else? (Asynchronous communication or recordation of insights)

*****
PS: Thanks to Benjamin Bock for reminding me to cover this and the data exploration language from Google.

The Next Generation of Apache Hadoop MapReduce

Filed under: Algorithms,Hadoop,MapReduce,NoSQL,Topic Maps — Patrick Durusau @ 5:02 am

The Next Generation of Apache Hadoop MapReduce by Arun C Murthy (@acmurthy)

From the webpage:

In the Big Data business running fewer larger clusters is cheaper than running more small clusters. Larger clusters also process larger data sets and support more jobs and users.

The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. We are developing the next generation of Apache Hadoop MapReduce that factors the framework into a generic resource scheduler and a per-job, user-defined component that manages the application execution. Since downtime is more expensive at scale high-availability is built-in from the beginning; as are security and multi-tenancy to support many users on the larger clusters. The new architecture will also increase innovation, agility and hardware utilization.

Since I posted the note about OpenStack and it is Friday, it seemed like a natural. Something to read over the weekend!

Saw this first at Alex Popescu’s myNoSQL – The Next Generation of Apache Hadoop MapReduce, which is sporting a new look!

Building a free, massively scalable cloud computing platform

Filed under: Algorithms,Cloud Computing,Topic Maps — Patrick Durusau @ 4:59 am

Building a free, massively scalable cloud computing platform

Soren Hansen at FOSDEM 2011:

A developer’s look into Openstack architecture

OpenStack is a very new, very popular cloud computing project, backed by Rackspace and NASA. It’s all free software (Apache Licensed) and is in production use already.

It’s written entirely in Python and uses Twisted, Eventlet, AMQP, SQLAlchemy, WSGI and many other high quality libraries and standards.

We’ll take a detailed look at the architecture and dive into some of the challenges we’ve faced building a platform that is supposed to handle millions of gigabytes of data and millions of virtual machines, and how we’ve dealt with them.

Video of the presentation

This presentation will be of interest to those who think the answer to topic map processing is to turn up the power dial.

That is one answer and it may be the best one under some circumstances.

That said, given the availability of cloud computing resources, the question becomes one of rolling your own or simply buying the necessary cycles.

Unless you are just trying to throw money at your IT department (that happens even in standards organizations), I suspect buying the cycles is the most cost effective option.

“Free” software really isn’t, particularly server class software, unless you have all volunteers administrators, donated hardware, backup facilities, etc.

Note that NASA, one of the sponsors of this project, can whistle up if not language authors, major contributors to any software package of interest to it. Can your organization say the same thing?

Just to be fair, my suspicions are in favor of a mix of algorithm development, innovative data structures and high end computing structures. (Yes, I know, I cheated and made three choices. Author’s choice.)

New RDF Charter
(rant on backwards compatibility)

Filed under: RDF — Patrick Durusau @ 4:49 am

RDF Working Group Charter

A new RDF Working Group charter has been announced.

One serious problem:

For all new features, backwards compatibility with the current version of RDF is of great importance. This means that all efforts should be made so that

  • any valid RDF graphs (in terms of the RDF 2004 version) should remain valid in terms of a new version of RDF; and
  • any RDF or RDFS entailment drawn on RDF graphs using the 2004 semantics should be valid entailment in terms of a new version of RDF and RDFS

Care should be taken to not jeopardize exisiting RDF deployment efforts and adoption. In case of doubt, the guideline should be not to include a feature in the set of additions if doing so might raise backward compatibility issues.

What puzzles me is why this mis-understanding of backwards compatibility continues to exist.

Any RDF graph would remain valid under 2004 RDF and 2004 semantics, remain 2004 semantics. OK, so?

What is the difficulty with labeling the new version of RDF, RDF-NG? With appropriate tokens in any syntax?

True, that might mean that 7 year old software and libraries might not continue to work. How many users do you think are intentionally using 7 year old software?

Ah, you mean you are still writing calls to 7 year old libraries in your software? Whose bad is that?

Same issue is about to come up in other circles, some closer to home than others.

*****

PS: This particularly annoying is that some vendors insist (falsely) that ISO is too slow for their product development models.

How can ISO be too slow if every error is enshrined forever in the name of backwards compatibility?

If RDF researchers haven’t learned anything they would do differently in RDF in the last seven years, well, that’s just sad.

February 17, 2011

Uncertain Future of Topic Maps?

Filed under: Marketing,Topic Maps — Patrick Durusau @ 7:08 am

I was puzzled by a recent comment concerning the “..uncertain future of topic maps…?”

Some possible sources of an “uncertain future” for topic maps:

  1. Unchanging one world language and semantic adopted universally, plus conversion of all existing data into that language/semantic.
  2. Universal agreement to ignore semantic differences. Increase in death rates stabilizes the world’s population at 17th century levels.*
  3. Extinction of human race.

I hate to be disagreeable, ;-), but I really don’t think any of those three causes are likely to imperil the future of topic maps.

To be honest there could be a fourth cause:

4. As a community we promote topic maps so poorly or create such inane interfaces that users prefer to suffer from semantic impedance than use topic maps.

And there is a fifth cause that I don’t recall anyone addressing:

5. There are people who benefit from semantic impedance and non-sharing.

I think we can address #4 and #5 as a matter of marketing efforts.

Starting by asking yourself: Who benefits from a reduction (or maintenance) of semantic impedance?

*****

*No need for alarm. That is just a wild guess on my part. Has the same validity as software and music piracy figures. The difference is I am willing to admit that I made it up to sound good.

Clever Algorithms: Nature-Inspired Programming Recipes

Filed under: Algorithms,Artificial Intelligence — Patrick Durusau @ 7:03 am

Clever Algorithms: Nature-Inspired Programming Recipes

From the website:

The book “Clever Algorithms: Nature-Inspired Programming Recipes” by Jason Brownlee PhD describes 45 algorithms from the field of Artificial Intelligence. All algorithm descriptions are complete and consistent to ensure that they are accessible, usable and understandable by a wide audience.

5 Reasons To Read:

  1. 45 algorithms described.
  2. Designed specifically for Programmers, Research Scientists and Interested Amateurs.
  3. Complete code examples in the Ruby programming language.
  4. Standardized algorithm descriptions.
  5. Algorithms drawn from the popular fields of Computational Intelligence, Metaheuristics, and Biologically Inspired Computation.

A recent advance in graph node coloring reduced the problem from millions of years to under 7 seconds on a desktop box.

Question: Are you willing to step beyond the current orthodoxy?

TMQL4J 3.0 release!

Filed under: TMQL,TMQL4J — Patrick Durusau @ 6:59 am

TMQL4J 3.0 release!

From the website:

The new version 3.0.0 of the tmql4j query suite was released at google code. In this version tmql4j is more flexible and powerful to satisfy every business use case.

The new version 3.0.0 of the tmql4j query suite was released at google code. In this version tmql4j is more flexible and powerful to satisfy every business use case.

As a major modification, the engine architecture and processing model was changed. The new suite contains two different TMQL runtimes, one for each TMQL draft. The drafts are split to avoid ambiguity and conflicts during the querying process. The stack-based processing model is replaced by a more flexible one to enable multi-threaded optimizations.

Each style of the 2008 draft and each part of the topic map modification language ( TMQL-ML ) has been realized in different modules. Because of that, the user can decide which styles and parts of the query language should be supported.

In addition, a new language module was added to enable flexible template definitions, which enables control of the result format of the querying process in the most powerful way. Templates can be used to return results in HTML, XML, JSON or any other format. The results will be embedded automatically by the query processor.

Looking forward to reviewing the documentation. Quite possibly posting some additional exercise material.

Encog Java and DotNet Neural Network Framework

Filed under: .Net,Encog,Java,Machine Learning,Neural Networks,Silverlight — Patrick Durusau @ 6:56 am

Encog Java and DotNet Neural Network Framework

From the website:

Encog is an advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks. Encog has been in active development since 2008.

Encog is available for Java, .Net and Silverlight.

An important project for at least two reasons.

First, the obvious applicability to the creation of topic maps using machine learning techniques.

Second, it demonstrates that supporting Java, .Net and Silverlight, isn’t, you know, all that weird.

The world is changing and becoming, somewhat more interoperable.

Topic maps has a role to play in that process, both in terms of semantic interoperability of the infrastructure as well as the data it contains.

MongoDB – Usage Posts

Filed under: MongoDB — Patrick Durusau @ 6:52 am

MongoDB

A series of posts and presentations on use of MongoDB by Boxed Ice.

From the website:

Post index

Presentation slides:

Good resource on MongoDB and an example of a type of posting that be useful for topic maps.

As much a note to myself as anyone, thinking a series of posts on a data set of general interest might attract favorable attention.

On Sharding Graph Databases

Filed under: Graphs,Sharding — Patrick Durusau @ 6:50 am

On Sharding Graph Databases

Jim Webber starts a discussion on sharding graph databases.

Interested to learn how lessons here can be applied to sharding topic maps.

HBase and Lucene for realtime search

Filed under: HBase,Lucene — Patrick Durusau @ 6:48 am

HBase and Lucene for realtime search

From the post that starts this exciting thread:

I’m curious as to what a ‘good’ approach would be for implementing search in HBase (using Lucene) with the end goal being the integration of realtime search into HBase. I think the use case makes sense as HBase is realtime and has a write-ahead log, performs automatic partitioning, splitting of data, failover, redundancy, etc. These are all things Lucene does not have out of the box, that we’d essentially get for ‘free’.

For starters: Where would be the right place to store Lucene segments or postings? Eg, we need to be able to efficiently perform a linear iteration of the per-term posting list(s).

Thanks!

Jason Rutherglen

This could definitely have legs for exploring data sets, authoring topic maps or assuming a dynamic synonyms table, composed of conditions for synonymy, even acting as a topic map engine.

Will keep a close eye on this activity.

Solandra

Filed under: Cassandra,Solr — Patrick Durusau @ 6:46 am

Solandra

From the website:

Solandra is a real-time distributed search engine built on Apache Solr and Apache Cassandra.

At its core Solandra is a tight integration of Solr and Cassandra, meaning within a single JVM both Solr and Cassandra are running, and documents are stored and disributed using Cassandra’s data model.

Solandra makes managing and dynamically growing Solr simple(r).

See the Solandra wiki for more details.

The more searching that occurs across diverse data sets, the more evident the use case(s) for topic maps will become.

Will you be there to answer the call?

Cablemap

Filed under: Marketing,Wikileaks — Patrick Durusau @ 6:42 am

Cablemap

Just in case you have been in a coma for the last 6 months or in solitary confinement, Wikileaks is publishing a set of diplomatic cables it describes as follows:

Wikileaks began on Sunday November 28th publishing 251,287 leaked United States embassy cables, the largest set of confidential documents ever to be released into the public domain. The documents will give people around the world an unprecedented insight into US Government foreign activities.

The cables, which date from 1966 up until the end of February this year, contain confidential communications between 274 embassies in countries throughout the world and the State Department in Washington DC. 15,652 of the cables are classified Secret.

….

The cables show the extent of US spying on its allies and the UN; turning a blind eye to corruption and human rights abuse in “client states”; backroom deals with supposedly neutral countries; lobbying for US corporations; and the measures US diplomats take to advance those who have access to them.

This document release reveals the contradictions between the US’s public persona and what it says behind closed doors – and shows that if citizens in a democracy want their governments to reflect their wishes, they should ask to see what’s going on behind the scenes.

The online treatments I have seen by the Guardian and the New York Times are more annoying than the parade of horrors suggested by US government sources.

True, the cables show diplomats to be venal and dishonest creatures in the service of even more venal and dishonest creatures but everyone outside of an asylum and over 12 years of age knew that already.

Just as everyone knew that US foreign policy benefits friends and benefactors of elected US officials, not the general U.S. population.

Here is the test: Look over all the diplomatic cables since 1966 and find one where the result benefited you personally. Now pick one at random and identify the person or group who benefited from the activity or policy discussed in the cable.

A topic map that matched up individuals or groups who benefited from the activities or policies discussed in the cables would be a step towards being more than annoying.

Topic mapping in Google map locations for those individuals or representatives of those groups, would be more than annoying still.

Add the ability to seamlessly integrate leaked information into another intelligence system, you are edging towards the potential of topic maps.

Cablemap is a step towards the production of a Cablegate resource that is more than simply annoying.

Visualising Twitter Dynamics in Gephi, Part 1 – Post

Filed under: Gephi,Visualization — Patrick Durusau @ 6:39 am

Visualising Twitter Dynamics in Gephi, Part 1

From the post:

In the following posts I’m finally keeping my promise to explore in earnest the use of Gephi’s dynamic timeline feature for visualising Twitter-based discussions as they unfolded in real time. A few months ago, Jean posted a first glimpse of our then still very experimental data on Twitter dynamics, with a string of caveats attached – and I followed up on this a little while later with some background on the Gawk scripts we’re using to generate timeline data in GEXF format from our trusty Twapperkeeper archives (note that I’ve updated one of the scripts in that post, to make the process case-insensitive). Building on those posts, here I’ll outline the entire process and show some practical results (disclaimer: actual dynamic animations will follow in part two, tomorrow – first we’re focussing on laying the groundwork).

This article was mentioned in Dynamic Twitter graphs with R and Gephi (clip and code) as an interesting example of “aging” edges.

While there is an obvious time component to tweets, is there an implied relevancy based on time for other information as well?

Tactical information should be displayed to ground level commanders and be sans longer term planning data, while for command headquarters, tactical information is just clutter on the display.

Looks like a fruitful area for exploration.

February 16, 2011

Playing with Scala’s pattern matching – Post

Filed under: Pattern Matching,Scala — Patrick Durusau @ 1:28 pm

Playing with Scala’s pattern matching

François Sarradin writes:

How many times have you been stuck in your frustration because you were unable to use strings as entries in switch-case statements. Such an ability would be really useful for example to analyze the arguments of your application or to parse a file, or any content of a string. Meanwhile, you have to write a series of if-else-if statements (and this is annoying). Another solution is to use a hash map, where the keys are those strings and values are the associated reified processes, for example a Runnable or a Callable in Java (but this is not really natural, long to develop, and boring too).

If a switch-case statement that accepts strings as entries would be a revolution for you, the Scala’s pattern matching says that this is not enough! Indeed, there are other cases where a series of if-else-if statements would be generously transformed into a look-alike switch-case statement. For example, it would be really nice to simplify a series of instanceof and cast included in if-else-if to execute the good process according to the type of a parameter.

In this post, we see the power of the Scala’s pattern matching in different use cases.

What language you choose for topic map development is going to depend upon its pattern matching abilities.

Here’s a chance to evaluate Scala in that regard.

Tsearch2 – full text extension for PostgreSQL

Filed under: PostgreSQL,TSearch — Patrick Durusau @ 1:12 pm

Tsearch2 – full text extension for PostgreSQL

Following up on the TSearch Primer post from yesterday.

This is the current documentation for Tsearch2.

Wordnik – 10 million API Requests a Day on MongoDB and Scala – Post

Filed under: MongoDB,Scala — Patrick Durusau @ 1:08 pm

Wordnik – 10 million API Requests a Day on MongoDB and Scala

From the website:

Wordnik is an online dictionary and language resource that has both a website and an API component. Their goal is to show you as much information as possible, as fast as we can find it, for every word in English, and to give you a place where you can make your own opinions about words known. As cool as that is, what is really cool is the information they share in their blog about their experiences building a web service. They’ve written an excellent series of articles and presentations you may find useful: (see the post)

Of course, what I find fascinating is the “…make your own opinions about words known” aspect of the system.

Even so, from a scaling standpoint, this sounds like an impressive bit of work.

Definitely worth a look.

Programming in Topincs – Post

Filed under: Topic Map Systems,Topincs — Patrick Durusau @ 1:05 pm

Programming in Topincs

The main goal of programming in Topincs is to provide services that aggregate, transform, and manipulate the data in a Topincs store.

Robert Cerny walks through an invoicing system for a software company using the Topincs store.

Very good introduction to Topincs!

Thanks Robert!

Introduction to Categories and Categorical Logic

Filed under: Category Theory — Patrick Durusau @ 1:02 pm

Introduction to Categories and Categorical Logic by Samson Abramsky and Nikos Tzevelekos.

Category theory is important for theoretical CS and I suspect should skilled explanations come along, CS practice as well.

From the preface:

The aim of these notes is to provide a succinct, accessible introduction to some of the basic ideas of category theory and categorical logic. The notes are based on a lecture course given at Oxford over the past few years. They contain numerous exercises, and hopefully will prove useful for self-study by those seeking a first introduction to the subject, with fairly minimal prerequisites. The coverage is by no means comprehensive, but should provide a good basis for further study; a guide to further reading is included.

The main prerequisite is a basic familiarity with the elements of discrete mathematics: sets, relations and functions. An Appendix contains a summary of what we will need, and it may be useful to review this first. In addition, some prior exposure to abstract algebra—vector spaces and linear maps, or groups and group homomorphisms—would be helpful.

Under further reading, F. Lawvere and S. Schanuel, Conceptual Mathematics: A First Introduction to Categories, Cambridge University Press, 1997, is described by the authors as “idiosyncratic.” Perhaps so but I found it to be a useful introduction.

« Newer PostsOlder Posts »

Powered by WordPress