Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 7, 2011

Boilerpipe

Filed under: Data Mining,Java — Patrick Durusau @ 4:15 pm

Boilerpipe

From the webpage:

The boilerpipe library provides algorithms to detect and remove the surplus “clutter” (boilerplate, templates) around the main textual content of a web page.

The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.

Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.

Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0.

Should save you some time when harvesting data from webpages.

July 6, 2011

Multiple Criteria Decision Aid Bibliography

Filed under: Bibliography,Decision Making — Patrick Durusau @ 2:16 pm

Multiple Criteria Decision Aid Bibliography

I stumbled over this site while looking for a free copy of Amos Tversky’s “Features of Similarity” paper to cite for my readers. (I never was able to find a copy that wasn’t behind a pay-per-view wall. Sorry.)

It is maintained by the LAMSADE laboratory as materials on decision making, which identification of a subject certainly falls into that category.

The LAMSADE laboratory has been established in 1974 as a joint laboratory of the Université Paris-Dauphine and the CNRS. Its central research activity lies at the interface of two fundamental scientific areas: Computer Science and Decision Making (and, more generally, Operations Research).

LAMSADE’s research themes are both theoretical and applied and cover decision making, decision theory, social choice, operations research, combinatorial optimization, computational complexity, mathematical programming, interactions between decision and artificial intelligence, massive data computation, and information systems.

And yes, it is no mistake, the first entry in the bibliography is from 1736.

Enjoy!

Applying Graph Analysis and Manipulation to Data Stores

Filed under: Graphs,Neo4j — Patrick Durusau @ 2:15 pm

Applying Graph Analysis and Manipulation to Data Stores

Roberto V. Zicari interviews Marko Rodriguez and Peter Neubauer.

From the summary:

Marko Rodriguez and Peter Neubauer, the leaders of TinkerPop, the open-source graph software initiative, discuss the project, its members, and its goals.

TinkerPop facilitates the application of graphs to various engineering problems, the leaders say. A graph can “rapidly traverse structures to an arbitrary depth . . . and with an arbitrary path description.” Graph processing software provides a unique way of thinking about data processing that Rodriguez and Neubauer call the graph traversal pattern. “This mind set is much different from the set theoretic notions of the relational database world,” they say. “In the world of graphs, everything is seen as a walk — a traversal.”

Building a Database-Backed Clojure Web
Application

Filed under: Clojure,Cloud Computing — Patrick Durusau @ 2:15 pm

Building a Database-Backed Clojure Web Application

From the webpage:

This article will explore creating a database-backed Clojure web application and deploying it to the Heroku Cedar stack.

The app we’ll be building is called Shouter, a small Twitter clone that lets users enter in “shouts” which are stored in a PostgreSQL database and displayed on the front page of the app. You can see an example of the finished Shouter deployed to Heroku or view the finished source.

See Heroku to sign up for its cloud application platform.

I started to tease the DevCenter about the article Building a Facebook Application since Google is attempting to do the same thing. 😉

The I found that the article covers, however briefly, the Graph API and Open Graph Protocol, which makes it of more than passing interest for topic map applications.

The Neo4j Rest API. My Notebook

Filed under: Bioinformatics,Biomedical,Java,Neo4j — Patrick Durusau @ 2:14 pm

The Neo4j Rest API. My Notebook

From the post:

Neo4j is a open-source graph engine implemented in Java. This post is my notebook for the Neo4J-server, a server combining a REST API and a webadmin application into a single stand-alone server.

Nothing new in this Neo4j summary but Pierre Lindenbaum profiles himself: “PhD in Virology, bioinformatics, genetics, science, geek, java.”

Someone worth watching in the Neo4j/topic map universe.

SERIMI

Filed under: Ontology,RDF — Patrick Durusau @ 2:13 pm

SERIMI (version 0.9), a tool for automatic RDF data interlinking

From the announcement:

SERIMI matches instances between a source and a target dataset, without prior knowledge of the data, domain or schema of these datasets. Experiments conducted with benchmark collections demonstrate that our approach considerably outperforms published state-of-the-art automatic approaches for solving the interlinking problem in the Linked Data Cloud. An updated reference alignment between Dailymed[1] and TCM[2] that can be used as a golden set is also available for download.

[1] http://code.google.com/p/junsbriefcase/wiki/TGDdataset
[2] http://www4.wiwiss.fu-berlin.de/dailymed/

For the details, see: SERIMI-TECH-REPORT-v2.pdf.

Just skimmed the paper before posting. Deeply interesting work based on Tversky’s contrast model. “Tversky, A. (1977). Features of similarity. Psychological Review 84 (4), 327–352.” As of today, Tversky’s work has been cited 1598 times so it will take a while to look through the subsequent work.

STI Innsbruck

Filed under: OWL,RDF,RDFa,Semantic Web — Patrick Durusau @ 2:12 pm

STI Innsbruck

From the about page:

The Semantic Technology Institute (STI) Innsbruck, formerly known as DERI Innsbruck, was founded by Univ.-Prof. Dr. Dieter Fensel in 2002 and has developed into a challenging and dynamic research institute of approximately 40 people. STI Innsbruck collaborates with an international network of institutes in Asia, Europe and the USA, as well as with a number of global industrial partners.

STI Innsbruck is a founding member of STI International, a collaborative association of leading European and world wide initiatives, ensuring the success and sustainability of semantic technology development. STI Innsbruck utilizes this network, as well as contributing to it, in order to increase the impact of the research conducted within the institute. For more details on Semantics, check this interview with Frank Van Harmelen: “Search and you will find“.

I won’t try to summarize the wealth of resources you will find at STI Innsbruck. From the reading list for the curriculum to the listing of tools and publications, you will certainly find material of interest at this site.

For an optimistic view of Semantic Web activity see the interview with Frank Van Harelen.

A Survey On Data Interlinking Methods

Filed under: Linked Data,LOD,RDF — Patrick Durusau @ 2:11 pm

A Survey On Data Interlinking Methods by Stephan Wölger, Katharina Siorpaes, Tobias Bürger, Elena Simperl, Stefan Thaler, and, Christian Hofer.

From the introduction:

In 2007 the Linking Open Data (LOD) community project started an initiative which aims at increased use of Semantic Web applications. Such applications on the one hand provide new means to enrich a user’s web experience but on the other hand also require certain standards to be adhered to. Two important requirements when it comes to Semantic Web applications are the availability of RDF datasets on the web and having typed links between these datasets in order to be able to browse the data and to jump between them in various directions.

While there exist tools that create RDF output automatically from the application level and tools that create RDF from web sites, interlinking the resulting datasets is still a task that can be cumbersome for humans (either because there is a lack of insentives or due the non-availability of user friendly tools) or not doable for machines (due to the manifoldness of domains). Despite the fact that there are more and more interlinking tools available, those either can be applied only for certain domains of the real world (e.g. publications) or they can be used just for interlinking a specific type of data (e.g. multimedia data).

Another interesting survey article from the Semantic Technology Institute (STI) Innsbruck, University of Innsbruck.

I like the phrase “…manifoldness of domains.” RDF output is useful information about data. The problem I foresee is that the semantics it represents are local, hence the “manifoldness of domains.” Not always, there are some domains that are so close as to not be distinguishable, one from the other, and linking RDF will work quite well.

One imagines that RDF based interlinking OfficeDepot, Staples and OfficeMax should not be difficult. Tiresome, not terribly interesting, but not difficult. And that could prove to be useful for personal and corporate buyers seeking price breaks or competitors trying to decide on loss leaders. Not a lot of reasoning to be done except by the buyers and sellers.

I am sure there would still be some domain differences between those vendors but having a common mapping from one vendor number to all three vendor numbers could prove to be very useful for customers and distributors alike.

For more complex/abstract domains, where “…manifoldness of domains.” is an issue, you can use topic maps.

Joint International Semantic Technology
Conference (JIST2011)

Filed under: Conferences,OWL,RDF,Semantic Web — Patrick Durusau @ 2:10 pm

Joint International Semantic Technology Conference (JIST2011) Dec. 4-7, 2011, Hangzhou, China

Important Dates:


– Submissions due: August 15, 2011, 23:59 (11:59pm) Hawaii time

– Notification: September 22, 2011, 23:59 (11:59pm) Hawaii time

– Camera ready: October 3, 2011, 23:59 (11:59pm) Hawaii time

– Conference dates: December 4-7, 2011

From the call:

The Joint International Semantic Technology Conference (JIST) is a regional federation of Semantic Web related conferences. The mission of JIST is to bring together researchers in disciplines related to the semantic technology from across the Asia-Pacific Region. JIST 2011 incorporates the Asian Semantic Web Conference 2011 (ASWC 2011) and Chinese Semantic Web Conference 2011 (CSWC 2011).

Prof. Ian Horrocks (Oxford University) scheduled to present a keynote address.

July 5, 2011

A Survey On Games For Knowledge Acquisition

Filed under: Authoring Semantics,Games,Semantic Web,Semantics — Patrick Durusau @ 1:41 pm

A Survey On Games For Knowledge Acquisition by Stefan Thaler, Katharina Siorpaes, Elena Simperl, and, Christian Hofer.

Abstract:

Many people dedicate their free time with playing games or following game related activities. The Casual Games Market Report 2007[3] names games with more than 300 million downloads. Moreover, the Casual Games Association reports more than 200 million casual gamers worldwide [4]. People play them for various reasons, such as to relax, to be entertained, for the need of competition and to be thrilled[9]. Additionally they want to be challenged, mentally as well skill based. As earlier mentioned there are tasks that are relatively easy to complete by humans but computationally rather infeasible to solve[27]. The idea to integrate such tasks as goal of games has been created and realized in platforms such as OntoGame[21],GWAP[26] and others. Consequently, they have produced a win-win situation where people had fun playing games while actually doing something useful, namely producing output data which can be used to improve the experience when dealing with data. That is why we in this describe state of the art games. Firstly, we briefly introduce games for knowledge acquisition. Then we outline various games for semantic content creation we found, grouped by the task they attempt to fulfill. We then provide an overview over these games based on various criteria in tabular form.

Interesting survey of the field that will hopefully be updated every year or even made into an online resource that can change as new games emerge.

Curious about two possibilities for semantic games:

1) Has anyone made a first-person shooter game based on recognition of facial images of politicians? Thinking that if you were given a set of “bad” guys to recognize for each level, you could shot those plus the usual combatants. The images in the game would be draw from news footage, etc. Thinking this might attract political devotees. I even have a good name for it: “Term Limits.”

2) On the theory that there is no one nosier than a neighbor, why not create an email tagging game where anonymous co-workers get to tag your email (both in and out)? That would be one way to add semantic value to corporate email and generate a lot of interest in doing so. Possible name: “Heard at Water Cooler.”

Additive Semantic Apps.

Filed under: Annotation,Authoring Semantics,Conferences,Semantics — Patrick Durusau @ 1:40 pm

10 Ways to make your Semantic App. addictive – Revisited

People after my own heart! Let’s drop all the pretense! We want people to use our apps to the exclusion of other apps. We want people to give up sleep to use our apps! We want people to call in sick, forget to eat, forget to put out the cat…, sorry, got carried away. 😉

Seriously, creating apps that people “buy into” is critical for the success of any app and no less so for semantic apps.

The less colorful summary about the workshop says:

In many application scenarios useful semantic content can hardly be created (fully) automatically, but motivating people to become an active part of this endeavor is still an art more than a science. In this tutorial we will look into fundamental design issues of semantic-content authoring technology – and of the applications deploying such technology – in order to find out which incentives speak to people to become engaged with the Semantic Web, and to determine the ways these incentives can be transferred into technology design. We will present how methods and techniques from areas as diverse as participation management, usability engineering, mechanism design, social computing, and game mechanics can be jointly applied to analyze semantically enabled applications, and subsequently design incentives-compatible variants thereof. The discussion will be framed by three case studies on the topics of enterprise knowledge management, media and entertainment, and IT ecosystems, in which combinations of these methods and techniques has led to increased user participation in creating useful semantic descriptions of various types of digital resources – text documents, images, videos and Web services and APIs. Furthermore, we will revisit the best practices and guidelines that have been at the core of an earlier version of this tutorial at the previous edition of the ISWC in 2010, following the empirical findings and insights gained during the operation of the three case studies just mentioned. These guidelines provide IT developers with a baseline to create technology and end-user applications that are not just functional, but facilitate and encourage user participation that supports the further development of the Semantic Web.

Well, they can say: “…facilitate and encourage user participation…” but I’m in favor of addition. 😉

BTW, notice the Revisited in the title?

You can see the slides from last year, 10 Ways to make your Semantic App. addictive, while you are waiting for this year’s workshop. (I am searching for videos but so far have come up empty. Maybe the organizers can film the presentations this year?)

Date: October 23 or 24, half day
Place: Bonn, Germany, Maritim Bonn

INSEMTIVES

Filed under: Annotation,Authoring Semantics,Games,Semantics — Patrick Durusau @ 1:40 pm

INSEMTIVES: Incentives for Semantics

From the about:

The objective of INSEMTIVES is to bridge the gap bet­ween human and computational intell­igence in the current semantic content authoring R&D land­scape. The project aims at pro­ducing metho­dologies, methods and tools that enable the massive creation and feasible manage­ment of semantic cont­ent in order to facilitate the world-­wide up­take of semantic tech­nologies.

You have to hunt for it (better navigation needed?) but there is a gaming kit for INSEMTIVES at SourceForge.

A mother lode of resources on methods for the creation of semantic content that aren’t boring. 😉

OntoGame

Filed under: Annotation,Games,Semantics — Patrick Durusau @ 1:39 pm

OntoGame: Games for Creation of Semantic Content

From the about page:

OntoGame’s goal is to build games that can be used to create semantic content, a process that often can not be solved automatically but requires the help of humans. Games are a good way to wrap up and hide this complex process of semantic content creation and can attract a great number of people.

If you are tired of over-engineered interfaces for semantic annotation of content with menus, context sensitive help that you can’t turn off, and dizzying choices for ever step, you will find OntoGame a refreshing step in another direction.

Current games include:

  • Tubelink
  • Seafish
  • SpotTheLink
  • OntoPronto
  • OntoTube

Tubelink and Seafish are single player games so I tried both of those.

Tubelink is an interesting idea, a video plays and you select tags for items that appear in the video and place them in a crystal ball. Placing the tags in the crystal ball at the time the item appears in the video results in more points. Allegedly (I never got this far) if you put enough tags in the crystal ball it bursts and you go to the next level. All I ever got was a video of a car stereo with a very distracting sound track. A button to skip the current video would be a useful addition.

Seafish has floating images, some of which are similar to a photo you are shown on the lower right. You have to separate the ones that are similar from those that are not by “catching” them and placing them in separate baskets. The thumbnail images enlarge when you hover over them.

Neither one of them is “Tetris” nor “Grand Theft Auto IV“, but gaming for the sake of gaming has developed over decades. Gaming for a useful purpose should be encouraged. It will catch up soon enough.

Meme Diffusion Through Mass Social Media

Filed under: Meme,Social Networks — Patrick Durusau @ 1:38 pm

Meme Diffusion Through Mass Social Media

Abstract:

The project is aimed at modeling the diffusion of information online and empirically discriminating among models of mechanisms driving the spread of memes. We explore why some ideas cause viral explosions while others are quickly forgotten. Our analysis goes beyond the traditional approach of applied epidemic diffusion processes and focuses on cascade size distributions and popularity time series in order to model the agents and processes driving the online diffusion of information, including: users and their topical interests, competition for user attention, and the chronological age of information. Completion of our project will result in a better understanding of information flow and could assist in elucidating the complex mechanisms that underlie a variety of human dynamics and organizations. The analysis will involve studying meme diffusion in large-scale social media by collecting and analyzing massive streams of public micro-blogging data.

The project stands to benefit both the research community and the public significantly. Our data will be made available via APIs and include information on meme propagation networks, statistical data, and relevant user and content features. The open-source platform we develop will be made publicly available and will be extensible to ever more research areas as a greater preponderance of human activities are replicated online. Additionally, we will create a web service open to the public for monitoring trends, bursts, and suspicious memes. This service could mitigate the diffusion of false and misleading ideas, detect hate speech and subversive propaganda, and assist in the preservation of open debate.

NSF grant to date of a little over $900K.

I wonder about a web service to: “… mitigate the diffusion of false and misleading ideas, detect hate speech and subversive propaganda, and assist in the preservation of open debate.”

The definitions of “false and misleading ideas,” as well as “hate speech and subversive propaganda,” vary from community to community.

July 4, 2011

Digital Diplomatics 2011

Filed under: Conferences,Semantic Web,Semantics — Patrick Durusau @ 6:06 pm

Digital Diplomatics 2011: Tools for the Digital Diplomatist (program)

From the Call for Papers:

Studying medieval documents the scholars never had a fundamental opposition on using modern technology to support their research. Nevertheless no technology since the introduction of photography had such an impact on questions and methods of diplomatics as the computer had: Digital imaging gives us cheap reproductions at high quality, so nowadays large copora of documents are to be found online. Digital imaging allows manipulations to make apparently invisible traces visible. Modern information technology gives us access to huge text corpora in which single words and phrases can be found thus helping to indicate relationsships, to retrieve parallel texts for comparision or plot geographical and temporal distributions.

The conference aims at presenting projects which working to enlarge the digitised charter corpus on the one hand and on the other hand will put a particular focus on research applying information technology on medieval and early modern charters aiming at pure diplomatic questions as well as historic or philologic research.

The organizer of the conference therefore invite proposals dealing with questions like:

  • How can we improve the access to digital charter corpora?
  • How can the presentation of digital charter corpora help research with them?
  • Are there experiences in the application of complex information technologies (like named entity recognition, ontologies, data-mining, text-mining, automatic authorship identification, pattern analysis, optical character recognition, advanced statistics etc.) for diplomatic research?
  • Have digital charter copora developed new research interests?
  • Are there old research questions to be tackled by the digital technologies and digital charter corpora?
  • Which well establish methods can’t be accelerated by digital technologies?
  • How far the internet the has changed scholarly communication in diplomatics?
  • How you shape digitization projects of charters to meet research needs?

The papers on this program address some of the areas that made me interested in topic maps.

Commercial semantic issues pale beside those of academic textual analysis and research.

Spring Data Graph with Neo4j Support

Filed under: Graphs,Java,Neo4j,Spring Data — Patrick Durusau @ 6:05 pm

Spring Data Graph with Neo4j Support

From the homepage:

Spring Data Graph enables POJO based development for Graph Databases like Neo4j. It extends annotated entity classes with transparent mapping functionality. A template programming model equivalent to well known Spring templates is also supported. Spring Data Graph is part of the bigger Spring Data project which aims to provide convenient support for NoSQL databases.


Here is an overview of Spring Data Graph features

  • Support for property graphs (nodes connected via relationships, each with arbitrary properties)
  • Transparent mapping of annotated POJO entities (via AspectJ
  • Neo4jTemplate with convenient API, exception translation and optional transaction management
  • Different type representation strategies for keeping type information in the graph
  • Dynamic type projections (duck typing)
  • Spring Data Commons Repositories Support
  • Cross-store support for partial JPA – Graph Entities
  • Neo4j Traversal support on dynamic fields and via repository methods
  • Neo4j Indexing support (including full-text and numeric range queries)
  • Support for JSR-303 (Bean Validation)
  • Support for the Neo4j Server
  • Support for running as extensions in the Neo4j Server

If Neo4j or another NoSQL database is on your agenda, take a long look.

Visualizing Mahout’s Output…

Filed under: Clojure,Mahout,Visualization — Patrick Durusau @ 6:05 pm

Visualizing Mahout’s output with Clojure and Incanter

From the post:

Some Clojure code to visualize clusters built using Apache Mahout implementation of the K-Means clustering algorithm.

The code retrieves the output of the algorithm (clustered-points and centroids) from HDFS, builds a Clojure friendly representation of the output (a map and a couple of lazy-seqs) and finally uses Incanter’s wrapper around JFreeChart to visualize the results.

Another tool for data miners and visualizers.

Translating SPARQL queries into SQL using R2RML

Filed under: R2RML,SPARQL,SQL,TMQL — Patrick Durusau @ 6:04 pm

Translating SPARQL queries into SQL using R2RML

From the post:

The efficient translation of SPARQL into SQL is an active field of research in the academy and in the industry. In fact, a number of triple stores are built as a layer on top of a relational solution. Support for SPARQL in these RDF stores supposes the translation of the SPARQL query to a SQL query that can be executed in a certain relational schema.

Some foundational papers in the field include “A Relational Algebra for SPARQL” by Richard Cyganiak that translates the semantics of SPARQL as they were finally defined by the W3C to the Relational Algebra semantics or “Semantics preserving SPARQL-to-SQL translation” by Chebotko, Lu and Fotohui, that introduces an algorithm to translate SPARQL queries to SQL queries.

This latter paper is specially interesting because the translation mechanism is parametric on the underlying relational schema. This makes possible to adapt their translation mechanism to any relational database using a couple of mapping functions, alpha and beta, that map a triple pattern of the SPARQL query and a triple pattern and a position in the triple to a table and a column in the database.

Provided that R2RML offers a generic mechanism for the description of relational databases, in order to support SPARQL queries in any R2RML RDF graph, we just need to find an algorithm that receives as an input the R2RML mapping and builds the mapping functions required by Chebotko et alter algorithm.

The straightest way to accomplished that is using the R2RML mapping to generate a virtual table with a single relation with only subject, predicate and object. The mapping for this table is trivial. A possible implementation of this algorithm can be found in the following Clojure code. (I added links to the Cyganiak and Chebotko papers.)

I recommend this post, as well as the Cyganiak and Chebotko papers to anyone interested in TMQL as background reading. Other suggestions?

OrganiK Knowledge Management System

Filed under: Filters,Indexing,Knowledge Management,Recommendation,Text Analytics — Patrick Durusau @ 6:03 pm

OrganiK Knowledge Management System (wiki)

OrganiK Knowledge Management System (homepage)

I encountered the OrganiK project while searching for something else (naturally). 😉

From the homepage:

Objectives of the Project

The aim of the OrganiK project is to research and develop an innovative knowledge management system that enables the semantic fusion of enterprise social software applications. The system accumulates information that can be exchanged among one or several collaborating companies. This enables an effective management of organisational knowledge and can be adapted to functional requirements of smaller and knowledge-intensive companies.

More info..

Main distinguishing features

The set of OrganiK KM Client Interfaces comprises of a Wiki, a Blog, a Social Bookmarking and a Search Component that together constitute a Collaborative Workspace for SME knowledge workers. Each of the components consists of a Web-based client interface and a corresponding server engine.
The components that comprise the Business Logic Layer of the OrganiK KM Server are:

  • the Recommender System,
  • the Semantic Text Analyser,
  • the Collaborative Filtering Engine
  • the Full-text Indexer

More info…

Interesting project but the latest news item dates from 2008. Not encouraging.

I checked the source code and the most recent update was August, 2010. Much more encouraging.

Have written for more recent news.

RavenDB

Filed under: Database,NoSQL,RavenDB — Patrick Durusau @ 6:03 pm

RavenDB

Raven is an Open Source (with a commercial option) document database for the .NET/Windows platform. Raven offers a flexible data model design to fit the needs of real world systems. Raven stores schema-less JSON documents, allow you to define indexes using Linq queries and focus on low latency and high performance.

  • Scalable infrastructure: Raven builds on top of existing, proven and scalable infrastructure
  • Simple Windows configuration: Raven is simple to setup and run on windows as either a service or IIS7 website
  • Transactional: Raven support System.Transaction with ACID transactions. If you put data in it, that data is going to stay there
  • Map/Reduce: Easily define map/reduce indexes with Linq queries
  • .NET Client API: Raven comes with a fully functional .NET client API which implements Unit of Work and much more
  • RESTful: Raven is built around a RESTful API

Haven’t meant to neglect the .Net world, just don’t visit there very often. 😉 Will try to do better in the future.

July 3, 2011

SwiftRiver/Ushahidi

Filed under: Filters,Linguistics,Natural Language Processing,NoSQL,Python — Patrick Durusau @ 7:34 pm

SwiftRiver

From the Get Started page:

The mission of the SwiftRiver initiative is to democratize access to the tools used to make sense of data.

To achieve this goal we’ve taken two approaches, apps and APIs. Apps are user facing and should be tools that are easy to understand, deploy and use. APIs are machine facing and extract meta-context that other machines (apps) use to convey information to the end user.

SwiftRiver is an opensource platform that aims to allow users to do three things well: 1) structure unstructured data feeds, 2) filter and prioritize information conditionally and 3) add context to content. Doing these things well allows users to pull in real-time content from Twitter, SMS, Email or the Web and to make sense of data on the fly.

The Ushahidi logo at the top will take you to a common wiki for Ushahidi and SwithRiver.

And the Ushahidi link in text takes you to: Ushahidi:

We are a non-profit tech company that develops free and open source software for information collection, visualization and interactive mapping.

Home of:

  • Ushahidi Platform: We built the Ushahidi platform as a tool to easily crowdsource information using multiple channels, including SMS, email, Twitter and the web.
  • SwiftRiver: SwiftRiver is an open source platform that aims to democratize access to tools for filtering & making sense of real-time information.
  • Crowdmap: When you need to get the Ushahidi platform up in 2 minutes to crowdsource information, Crowdmap will do it for you. It’s our hosted version of the Ushahidi platform.
  • It occurs to me that mapping email feeds would fit right into my example in Marketing What Users Want…And An Example.

    The state of semantic technology today —

    Filed under: Semantics — Patrick Durusau @ 7:33 pm

    The state of semantic technology today — Overview of the First Seals Evaluations Campaign

    When I saw the white paper title, I was thinking, that’s a lot of work and would be incredibly useful, even if it wasn’t truly complete. Technology changes rapidly enough in this area for no fixed report to ever claim completeness past its actual publication date. Maybe not even then.

    But you can imagine my disappointment when I read at page 9 that Data & Metadata Management, Ontology Customization, Ontology Evolution, Ontology Instance Generation, were not covered at all and only one part (of eight) for Semantic Web Services, two parts of five for Ontology Engineering, and three parts of three were covered for Querying and Reasoning.

    Even more so when I read that three reasoners were evaluated. (You can look at semantic reasoner at Wikipedia to see a list of eighteen semantic reasoners.)

    Somehow I expected more from a state of semantic technology today white paper. You?

    SEALS – Semantic Evaluation At Large Scale

    Filed under: Semantics — Patrick Durusau @ 7:32 pm

    SEALS – Semantic Evaluation At Large Scale

    From the homepage:

    The SEALS Project is developing a reference infrastructure known as the SEALS Platform to facilitate the formal evaluation of semantic technologies. This allows both large-scale evaluation campaigns to be run (such as the International Evaluation Campaigns for Semantic Technologies) as well as ad-hoc evaluations by individuals or organizations.

    The SEALS Platform page reports a bit more detail:

    At the core of the SEALS project is the development and maintenance of the SEALS Platform. The SEALS Platform will be an independent, open, scalable, extensible and sustainable infrastructure that will allow the remote evaluation of semantic technologies by providing an integrated set of evaluation services and test suites.

    Furthermore, the platform will provide easy and free access to evaluation services and to the results of the evaluations performed, allowing researchers and users to effectively compare available technologies, helping them to select appropriate technologies and advancing the state of the art through continuous evaluation. Our long-term goal is that the SEALS Platform be actively used and managed by the semantic community well beyond the lifetime of the SEALS project.

    The SEALS consortium is in the final stages of implementing a pre-production version of the SEALS Platform to be ready for August 2011. The pre-production version of the platform will subsequently be improved and extended towards a production environment to be released by the end of 2011 for the execution of the 2nd Evaluation Campaign in Winter summer 2011. Feedback from the first and second evaluation campaigns will be incorporated into the specification and design of the final version of the platform to be released on July 2012.

    Not much to say before the software release in August, 2011.

    Clojure – Functional Programming for the JVM

    Filed under: Clojure,Functional Programming — Patrick Durusau @ 7:31 pm

    Clojure – Functional Programming for the JVM by R. Mark Volkmann.

    From the introduction:

    The goal of this article is to provide a fairly comprehensive introduction to the Clojure programming language. A large number of features are covered, each in a fairly brief manner. Feel free to skip around to the sections of most interest. The section names in the table of contents are hyperlinks to make this easier when reading on-line.

    Who’s Your Daddy?

    Filed under: Data Source,Dataset,Marketing,Mashups,Social Graphs,Social Networks — Patrick Durusau @ 7:30 pm

    Who’s Your Daddy? (Genealogy and Corruption, American Style)

    NPR (National Public Radio) News broadcast the opinion this morning that Brits are marginally less corrupt than Americans. Interesting question. Was Bonnie less corrupt than Clyde? Debate at your leisure but the story did prompt me to think of an excellent resource for tracking both U.S. and British style corruption.

    Probably all the talk of lineage in the news lately but why not use the genealogy records that are gathered so obsessively to track the soft corruption of influence?

    Just another data set to overlay on elected, appointed, and hired positions, lobbyists, disclosure statements, contributions, known sightings, congressional legislation and administrative regulations, etc. Could lead to a “Who’s Your Daddy?” segment on NPR where employment or contracts are questioned naming names. That would be interesting.

    It also seems more likely to be effective than the “disclose your corruption” sunlight approach. Corruption is never confessed, it has to be rooted out.

    July 2, 2011

    NoSQL and the Windows Azure Platform

    Filed under: NoSQL — Patrick Durusau @ 3:18 pm

    The Windows Club reports on a new MS whitepaper: NoSQL and the Windows Azure Platform.

    To give you an idea of the “flavor” of the whitepaper, consider the following paragraph:

    But tooling has its value, and that value tends to increase over time, when the imperative of raw implementation has passed and need for smooth maintenance and troubleshooting becomes more pronounced (and economically impactful). The design, diagnostic and operational monitoring capabilities of SQL Server’s tools are significant, and have evolved over the roughly 20-year existence of the product. These tools, including SQL Server Management Studio and its execution plan window, aid greatly in preventing problems, and in solving them quickly when they do arise. NoSQL databases’ more minimalist tooling approach leads to more manual and time-consuming management and troubleshooting than is the case with SQL Azure (which is compatible with SQL Server’s tools), and may also make the process more error prone. The cost impact of this can be significant.

    MS should empower customers to choose between NoSQL and MS SQL Server solutions in using Windows Azure. The SQL Server group will continue to flog its products but it isn’t (or shouldn’t be) seen as synonymous with MS.

    Being the road is a much stronger position than being a building along side the road. Roads get repaired, repaved, widened, extended, while buildings along side the road…, well, you know that part.

    Semantic Oil-Spots

    Filed under: Marketing — Patrick Durusau @ 3:16 pm

    While reading one of the surveys on Big Data it occurred to me that the W3C was correct about one thing. Data without semantics isn’t going to be very useful.

    Attempting to impose semantics world wide reminds me of an article by Mehar Omar Khan, Don’t Try to Arrest the Sea: An Alternative Approach for Afghanistan. I comment it to you for reading but in summary it advocates what is known in some circles as the oil-spot strategy.

    That is to create safe havens that offer benefits to the local populace and use those to attract others to the same benefits.

    Topic maps, unlike some semantic strategies, have the potential to be semantic oil-spots. Their semantics are driven by the group or department where they are deployed and do not require consent or agreement beyond that range.

    Which means that the group or department can begin to derive benefits from their use of topic maps, resulting in benefits that are not accruing to others. This allows topic maps and their use to sell themselves, rather than being imposed from the top down. (The FBI Virtual Case File project being a well known example of top down IT planning.)

    Mehar Omar Khan summarizes his strategy as:

    Don’t try to arrest the sea. Create islands. Having gone well past the phase of breaking the back of Al-Qaeda and dispersing the Taliban, concentrate on ‘creating and building’ examples. Set the beacon and you’ll see that all the lost ships and boats will come ashore.

    Where are you setting your next topic map beacon?

    Big Data: The next frontier…

    Filed under: BigData,Marketing — Patrick Durusau @ 3:16 pm

    Big Data: The next frontier innovation, competition, and productivity

    McKinsey Global Institute (MGI) study which briefly summarizes as:

    MGI studied big data in five domains—health care in the United States, the public sector in Europe, retail in the United States, and manufacturing and personal location data globally. Big data can generate value in each. For example, a retailer using big data to the full could increase its operating margin by more than 60 percent. Harnessing big data in the public sector has enormous potential, too. If US health care were to use big data creatively and effectively to drive efficiency and quality, the sector could create more than $300 billion in value every year. Two-thirds of that would be in the form of reducing US health care expenditure by about 8 percent. In the developed economies of Europe, government administrators could save more than €100 billion ($149 billion) in operational efficiency improvements alone by using big data, not including using big data to reduce fraud and errors and boost the collection of tax revenues. And users of services enabled by personal location data could capture $600 billion in consumer surplus. The research offers seven key insights.

    The opportunity to increase an operating margin by 60 percent is likely to get any CE0’s attention.

    However, I would advise that you read the full report and pay close attention to the seventh insight that concludes this summary and the report:

    Several issues will have to be addressed to capture the full potential of big data. Policies related to privacy, security, intellectual property, and even liability will need to be addressed in a big data world. Organizations need not only to put the right talent and technology in place but also structure workflows and incentives to optimize the use of big data. Access to data is critical—companies will increasingly need to integrate information from multiple data sources, often from third parties, and the incentives have to be in place to enable this.

    Guess what one word is never used in the full report (156 pages)? Starts with an “s.”

    Give up? Semantics.

    Privacy, IP, security, etc., are more popular topics but if you were to open up to public access all 6,000 plus HR systems at the Pentagon, evil doers would have as much trouble as the GAO in auditing it. Why? A lack of documented semantics. Eventually they too would throw up their hands and move onto more useful (from their perspective) activities.

    The potential for value and all the popular problems are present in Big Data, but semantics come first. Otherwise it’s just a Big Mess.

    uClassify

    Filed under: Classifier,Tagging — Patrick Durusau @ 3:15 pm

    uClassify

    From the webpage:

    uClassify is a free web service where you can easily create your own text classifiers. You can also directly use classifiers that have already been shared by the community.

    Examples:

    • Language detection
    • Web page categorization
    • Written text gender and age recognition
    • Mood
    • Spam filter
    • Sentiment
    • Automatic e-mail support
    • See below for some examples

    So what do you want to classify on? Only your imagination is the limit!

    As of 1 July 2011, thirty-seven public classifiers are waiting on you and your imagination.

    The emphasis is on tagging documents.

    How useful is tagging documents when a search results in > 100 documents? Would your answer be the same or different if the search results were < 20 documents? What if the search results were > 500 documents?

    I first saw this at textifter blog in the post A Classifier for the Masses.

    10+ Free Resources for Learning Clojure

    Filed under: Clojure — Patrick Durusau @ 3:10 pm

    10+ Free Resources for Learning Clojure

    From the webpage:

    Clojure is a dialect of the LISP programming language that runs on the Java Virtual Machine. It’s becoming increasingly popular as a modern functional programming language. This week O’Reilly Radar blogger Stuart Sierra called it “the hot new language of the moment.” He describes Clojure as “Lisp meets Java with a side of Erlang.” Interested? Here are a few free resources to get you started.

    Let me add one to that list:

    Clojure Libraries: Not a learning resource per se but one that you will find useful soon enough.

    « Newer PostsOlder Posts »

    Powered by WordPress