January « 2011 « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 31, 2011

Introduction to Riak Video with Rusty Klophaus – Post

Filed under: NoSQL,Riak — Patrick Durusau @ 1:58 pm

Introduction to Riak Video with Rusty Klophaus from MyNoSQL by Alex Popescu. Viewable online or downloadable in a couple of formats.

Starts with the observation that there are 47 different NoSQL projects. Doesn’t list them.

I would watch this at the PivotLabs link because the related talks.

Oh, Riak homepage.

While I like the video, it is also an example that you don’t need high end video production or editing to produce useful video of presentations.

I mention as an answer to conferences that protest they need expensive equipment to video presentations.

That is simply not the case and anyone who says otherwise, to be generous, is mis-informed.

Comments Off

MongoVUE

Filed under: MongoDB,NoSQL — Patrick Durusau @ 1:49 pm

MongoVUE

From the website:

What Is MongoVUE?

MongoVUE is a GUI (graphical user interface) application that helps you administer, develop and learn MongoDB.

MongoVUE is FREE to use.

To run properly, MongoVUE requires Microsoft .NET Framework 2.0 SP1 installed on your computer.

Tools for working with NoSQL databases are starting to appear.

Any thoughts on this one that you would like to share?

Comments Off

Applicatives are generalized functors

Filed under: Category Theory,Scala — Patrick Durusau @ 9:50 am

Applicatives are generalized functors

A continuation of Heiko Seeberger’s coverage of Scala and category theory.

Highly recommended.

Comments Off

Redis Tutorials

Filed under: Redis — Patrick Durusau @ 7:35 am

Redis Tutorials

As of 01-31-2011 you will find at DeGizmo:

Getting Started: Redis and Python
Redis: Relations in a NoSQL world
Redis: Relations in a NoSQL world: Using Hashes
(To-Do) – Redis on the web: Redis and Django
(To-Do) – Redis and Express: Ultra-fast REST API

For topic mappers who are interested in Python and Redis, not a bad place to start.

Update: 23 November 2011.

As of today, parts 4 and 5 remain to-do items.

Comments Off

Who Identified Roger Magoulas?

Filed under: Examples,Sets,Topic Maps — Patrick Durusau @ 7:26 am

Did you know that Roger Magoulas appears 28 times on the O’Reilly website? (as of 01-29-2010)

With the following 5 hyperlink texts:

http://people.oreilly.com/roger for http://answers.oreilly.com/topic/2427-how-to-speed-up-machine-learning-using-a-set-oriented-approach/, one article on set-oriented processing
http://radar.oreilly.com/rogerm/index.html for http://radar.oreilly.com/2011/01/faster-machine-learning.html, the other article on set-oriented processing
http://www.oreillynet.com/pub/au/2717 another entry at O’Reilly
http://www.youtube.com/results?search_query=roger%20magoulas&search=tag or at least his name was so tagged in:
http://radar.oreilly.com/2010/01/collecting-aggregating-analyzing-data-exhaust.html
http://en.oreilly.com/gspeast2008/public/schedule/speaker/5107

Can you name the year that Tim O’Reilly used a hyperlink for Roger Magoulas three times but hasn’t since then?

One consistent resolution for Roger Magoulas, reflecting updates and presented without hand-authoring HTML would be nice.

But, that’s just me.

What do you think?

Comments Off

Pseudo-Code: A New Definition

Filed under: Machine Learning,Sets,Subject Identity,Topic Maps — Patrick Durusau @ 7:24 am

How to Speed up Machine Learning using a Set-Oriented Approach

The detail article for Need faster machine learning? Take a set-oriented approach, which I mentioned in a separate post.

Well, somewhat more detail.

Gives new meaning to pseudo-code:

The application side becomes:

Computing the model:

Fetch “compute-model over data items”

Classifying new items:

Fetch “classify over data items”

I am reminded of the cartoon with two people at a blackboard and one of them says: I think you should be more explicit in step two., where the text reads: Then a miracle occurs.

How about you?

Comments Off

Hurl is now open! – Post

Filed under: Topic Map Software — Patrick Durusau @ 7:19 am

Hurl is now open!

From the website:

Chris Wanstrath and I originally developed Hurl for the 2009 Rails Rumble where it won Most Complete. The idea was to create a simple web version of cURL, a command-line tool often used to test web APIs.

Hurl is super easy – just enter a URL and any extra parameters such as HTTP headers, body parameters, and authentication and then click “Send.” You’ll get the response and can save and share it.

By open sourcing the code behind Hurl we hope that other developers will be able to build on the concept. One very requested feature is to create an embeddable version of Hurl that can be used in developer documentation for easy try-it functionality.

Not topic map specific but certainly will be useful for developers working on web APIs for topic map software.

Comments Off

OpenData + R + Google = Easy Maps
(& Lessons for Topic Maps)

Filed under: Examples,Mapping,R — Patrick Durusau @ 7:15 am

OpenData + R + Google = Easy Maps from James Chesire (via R-Bloggers is a compelling illustration of the use of R for mapping.

It also illustrates a couple of principles that are important for topic map authors to keep in mind:

1) An incomplete [topic] map is better than no [topic] map at all.

Chesire could have waited until he had all the data from every agency studying the issue of child labor and reconciling that data with field surveys, plus published reports from news organizations, etc., but then we would not have this article would we?

We also would not have a useful mapping of the data we have on hand.

I mention this one first because it is one that afflicts me the most.

I work on example topic maps but because they aren’t complete I am reluctant to see them as being in publishable shape.

The principle from software coding, release early and often, should be the operative principle for topic map authoring.

2) There is no true view of the data that should be honored.

Many governments of countries on this map would dispute the accuracy of the data. And your point would be?

Every map tells a story from a point of view.

There isn’t any reason for your topic map to await approval of any particular group or organization included in it.

A world of data awaits us as topic mappers.

The only question is whether we are going to step up to take advantage of it?

*****
PS: My position on incomplete topic maps is not inconsistent with my view on PR driven SQL data dumps that are topic maps in name only. As they say, you can put lipstick on a pig, ….

Comments Off

Tutorial: Developing in Erlang with Webmachine, ErlyDTL, and Riak

Filed under: Erlang,NoSQL,Riak — Patrick Durusau @ 7:08 am

Tutorial: Developing in Erlang with Webmachine, ErlyDTL, and Riak

From Alex Popescu’s MyNoSQL blog:

Part 1

In Part 1 of the series we covered the basics of getting the development environment up and running. We also looked at how to get a really simple ErlyDTL template rendering

Part 2

There are a few reasons this series is targeting this technology stack. One of them is uptime. We’re aiming to build a site that stays up as much as possible. Given that, one of the things that I missed in the previous post was setting up a load balancer. Hence this post will attempt to fill that gap.

Part 3 In this post we’re going to cover:

A slight refactor of code structure to support the “standard” approach to building applications in Erlang using OTP.

Building a small set of modules to talk to Riak.

Creation of some JSON helper functions for reading and writing data.

Calling all the way from the Webmachine front-end to Riak to extract data and display it in a browser using ErlyDTL templates.

Erlang is important for anyone building high availability (think telecommunications) systems that can be dynamically reconfigured without taking the systems offline.

Comments Off

January 30, 2011

Hubs and Connectors: Understanding Networks Through Data Visualization – Post

Filed under: Interface Research/Design,Topic Map Software — Patrick Durusau @ 8:44 pm

Hubs and Connectors: Understanding Networks Through Data Visualization

I have been shying away from the rash of LinkedIn graph visualizations but then I ran across this one by Whitney Hess at her Pleasure + Pain: Improving the human experience one day at a time blog.

The title alone made me take a double take.

This post merits your reading as an introduction to network analysis, albeit presented in an easy to understand way with eye-candy along the way.

While you are there, check out her archives and other posts as well.

Such as: 10 Most Common Misconceptions About User Experience Design

If I could get topic map project managers to read one article, it would be that one.

Comments (1)

January 29, 2011

Need faster machine learning? Take a
set-oriented approach

Filed under: Machine Learning,Sets,Subject Identity — Patrick Durusau @ 5:00 pm

Need faster machine learning? Take a set-oriented approach.

Roger Magoulas, using not small iron reports:

The result: The training set was processed and the sample data set classified in six seconds. We were able to classify the entire 400,000-record data set in under six minutes — more than a four-orders-of-magnitude records processed per minute (26,000-fold) improvement. A process that would have run for days, in its initial implementation, now ran in minutes! The performance boost let us try out different feature options and thresholds to optimize the classifier. On the latest run, a random sample showed the classifier working with 92% accuracy.

set-oriented machine learning makes for:

Handling larger and more diverse data sets

Applying machine learning to a larger set of problems

Faster turnarounds

Less risk

Better focus on a problem

Improved accuracy, greater understanding and more usable results

Seems to me sameness of subject representation is a classification task. Yes?

Going from days to minutes sounds attractive to me.

How about you?

Comments (1)

R & Subject Identity/Identification

Filed under: R,Subject Identity — Patrick Durusau @ 4:13 pm

While posting R Books for Undergraduates, it occurred to me that having examples of using R for subject identity/identification would be helpful.

I could create examples of first instance, but that would be a lot of work.

Not to mention limiting me to domain in which I have some interest and expertise.

What if I were to re-cast existing R examples as subject identity/identification issues?

That saves me the time of creating new examples.

More importantly, gives me a ready made audience to chime in on how I did with subject identity:

correct
close but incorrect
incorrect
incorrect and far away
incoherent
what subject did I think I was talking about?
etc.

More than one answer is possible for any one example.

Comments Off

R Books for Undergraduate Students

Filed under: R — Patrick Durusau @ 3:53 pm

R Books for Undergraduate Students.

Recommended R titles by Colin Gillespie, Statistics Lecturer, Newcastle University – University of Newcastle, Newcastle Upon Tyne, United Kingdom.

R is useful for exploring and analyzing data sets in order to discover, confirm or investigate subjects and their identification.

Comments (1)

January 28, 2011

Unified Intelligence: Completing the Mosaic of Analytics

Filed under: Analytics,Data Analysis — Patrick Durusau @ 10:15 am

Unified Intelligence: Completing the Mosaic of Analytics

Tuesday, Feb. 15 @ 4 ET

From the announcement:

Seeing the big picture requires a convergence of both structured and unstructured data. While each side of that puzzle presents challenges, the unstructured world poses a wider range of issues that must be resolved before meaningful analysis can be done. However, many organizations are discovering that new technologies can be employed to process and transform this unwieldy data, such that it can be united with the traditional realm of business intelligence to bring new meaning and context to analytics.

Register for this episode of The Briefing Room to learn from veteran Analyst James Taylor about how companies can incorporate unstructured data into their decision systems and processes. Taylor will be briefed by Sid Probstein of Attivio, who will tout his company’s patented technology, the Active Intelligence Engine, which uses inverted indexing and a mathematical graph engine to extract, process and align unstructured data. A host of Attivio connectors allow integration with most analytical and many operational systems, including the capability for hierarchical XML data.

I am not real sure what a non-mathematical graph engine would look like but this could be fun.

It is also an opportunity to learn something about how others view the world.

Comments Off

CouchDB 1.0.2: 3rd is Lucky – Post

Filed under: CouchDB,NoSQL — Patrick Durusau @ 9:45 am

CouchDB 1.0.2: 3rd is Lucky

Alex Popescu covers the release of CouchDB 1.0.2.

A point release with new features.

Comments Off

Next Generation Data Integration – Webinar

Filed under: Data Integration,Marketing — Patrick Durusau @ 9:41 am

Next Generation Data Integration

Date: April 12, 2011 Time: 9:00AM PT

Speaker: Philip Russom

From the website:

Data integration (DI) has undergone an impressive evolution in recent years. Today, DI is a rich set of powerful techniques, including ETL (extract, transform, and load), data federation, replication, synchronization, change data capture, natural language processing, business-to-business data exchange, and more. Furthermore, vendor products for DI have achieved maturity, users have grown their DI teams to epic proportions, competency centers regularly staff DI work, new best practices continue to arise (like collaborative DI and agile DI), and DI as a discipline has earned its autonomy from related practices like data warehousing and database administration.

Given these and the many other generational changes data integration has gone through recently, it’s natural that many people aren’t quite up-to-date with the full potential of modern data integration. Based on a recent TDWI Best Practices report this webinar seeks to cure that malady by redefining data integration in modern terms, plus showing where it’s going with its next generation. This information will help user organizations make more enlightened decisions, as they upgrade, modernize, and expand existing data integration solutions, plus plan infrastructure for next generation data integration.

Every group (tribe as Jack Park would call them) has its own terminology when it comes to data and managing data.

As you can tell from the description of the webinar, data integration is concerned with many of the same issues as topic maps. Albeit under different names.

Regard this as an opportunity to visit another tribe and learn some new terminology.

And some new ideas you can use with topic maps.

Comments Off

Alchemy Database: A Hybrid Relational-Database/NOSQL-Datastore

Filed under: Alchemy Database,NoSQL — Patrick Durusau @ 7:59 am

Alchemy Database: A Hybrid Relational-Database/NOSQL-Datastore

From the website:

Alchemy Database is a lightweight SQL server that is built on top of the NOSQL datastore redis. It supports redis data-structures and redis commands and supports (de)normalisation of these data structures (lists,sets,hash-tables) to/from SQL tables. Lua is deeply embedded and lua scripts can be run internally on Alchemy’s data objects. Alchemy Database is not only a data storage Swiss Army Knife, it is also blazingly fast and extremely memory efficient.

Speed is achieved by being an event driven network server that stores ALL data in RAM and achieves disk persistence by using a spare cpu-core to periodically log data changes (i.e. no threads, no locks, no undo-logs, no disk-seeks, serving data over a network at RAM speed)

Storage data structures w/ very low memory overhead and data compression, via algorithms w/ insignificant performance hits, greatly increase the amount of data you can fit in RAM

Optimising to the SQL statements most commonly used in OLTP workloads yields a lightweight SQL server designed for low latency at high concurrency (i.e. mindblowing speed).

The Philosophy of Alchemy Database is that RAM is now affordable enough to be able to store ENTIRE OLTP Databases in a single machine’s RAM (e.g. Wikipedia’s DB was 50GB in 2009 and a Dell PowerEdge R415 w/ 64GB RAM costs $4000), as long as the data is made persistent to disk. So Alchemy Database provides a non-blocking event-driven network-I/O-based relational-database, with very little memory overhead, that does the most common OLTP SQL statements amazingly fast and then throws in the NOSQL Data-store redis to create fantastic optimisation possibilities.

Leaving words/phrases like, blazingly fast, amazingly fast, fantastic optimisation, mindblowing speed, to one side, one does wonder how it performs for a topic map?

Reports welcome!

Comments (1)

NLP (Natural Language Processing) tools

Filed under: Authoring Topic Maps,Natural Language Processing,Topic Models — Patrick Durusau @ 7:50 am

Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources

From Stanford University.

It may not be every NLP resource but it is the place to start looking if you are looking for a new tool.

This should give you an idea of the range of tools that could be applied to the AF war diaries for example.

Comments Off

Sho: the .NET Playground for Data

Filed under: Data Analysis,Visualization — Patrick Durusau @ 7:37 am

Sho: the .NET Playground for Data

Since we are talking about data analysis and display tools.

From the website:

Sho is an interactive environment for data analysis and scientific computing that lets you seamlessly connect scripts (in IronPython) with compiled code (in .NET) to enable fast and flexible prototyping. The environment includes powerful and efficient libraries for linear algebra as well as data visualization that can be used from any .NET language, as well as a feature-rich interactive shell for rapid development.

Comments Off

Building a Better Word Cloud – Post

Filed under: Clustering — Patrick Durusau @ 7:31 am

Building a Better Word Cloud

Drew Conway talks about why word clouds don’t work (space based display of non-spatial data is how I would summarize it, but see for yourself).

He then proceeds to create a comparative word cloud. Palin and Obama on the Arizona shootings.

I include this post here as a caution that space based clustering can be mis-leading if not outright deceptive.

Comments Off

Node.js Key-Value Stores – Post

Filed under: Key-Value Stores — Patrick Durusau @ 7:27 am

Node.js Key-Value Stores

A blog post by Alex Popescu on the use of node.js as a key-value store.

Alfred: in-process key-value store. You can read more about it here

node-dirty: tiny and fast key-value store with append-only disk log

And I’d bet there are more to come. Real question though, is how many will be around in 6-12 months.

I suspect the number of node.js key-value stores will increase and then decrease.

I am not sure I understand the implied problem?

But then I remember when there were > 300+ formats for document conversion.

There are fewer than that now. Well, at least major ones.

Is an upsurge in solutions, experimentation and a winnowing down, until the next explosion of creativity, a bad thing?

Comments Off

Sofia-ML and Maui: Two Cool Machine Learning and Extraction libraries – Post

Filed under: Extraction,Machine Learning — Patrick Durusau @ 7:21 am

Sofia-ML and Maui: Two Cool Machine Learning and Extraction libraries

Jeff Dalton reports on two software packages for text analysis.

These are examples of just some of the tools that could be run on a corpus like the Afghan War Diaries.

Comments Off

Functional Data Structures – Post

Filed under: Data Structures,Topic Map Software,Topic Maps — Patrick Durusau @ 7:18 am

On the Theoretical Computer Science blog the following question was asked:

What’s new in purely functional data structures since Okasaki?

Since Chris Okasaki’s 1998 book “Purely functional data structures”, I haven’t seen too many new exciting purely functional data structures appear; I can name just a few:…

What follows is a listing of resources that will be of interest to topic map researchers.

Comments Off

Why Command Helpers Suck – Post

Filed under: Database,Examples,Mapping — Patrick Durusau @ 6:53 am

Why Command Helpers Suck is an amusing rant by Kristina Chodorow (author of MongdoDB: The Definitive Guide) on the different command helpers for the same underlying database commands.

Shades of XWindows documentation and the origins of topic maps. Same commands, different terminology.

If as Robert Cerny has suggested topic maps don’t offer something new then I think it is fair to observe that the problems topic maps work to solve aren’t new either.

A bit more seriously, topic maps could offer Kristina a partial solution.

Imagine a utility for command helpers that is actively maintained and that has a mapping between all the known command helpers and a given database command.

Just enter the command you know and the appropriate command is sent to the database.

That is the sort of helper application that could easily find a niche.

The master mapping could be maintained with full identifications, notes, etc. but there needs to be a compiled version for speed of response.

Comments Off

January 27, 2011

Comet – An Example of the New Key-Code Databases – Post

Filed under: NoSQL — Patrick Durusau @ 2:38 pm

Comet – An Example of the New Key-Code Databases

Another NoSQL database.

The post summaries the goals of Comet, which is described as: … an extensible storage service that allows clients to inject snippets of code that control their data’s behavior inside the storage service.

One thing you will notice fairly quickly when reading Comet: An active distributed key-value store is that the authors were not trying to build a fully generalized solution.

They had specific requirements in mind to be met and if your needs fall outside those requirements, you need to look elsewhere.

Rather refreshing to find a project that expressly isn’t trying to replace MS Office or Facebook.

That still leaves a lot of interesting and commercially successful work to be done.

Comments Off

Isidorus

Filed under: Isidorus,Topic Map Software — Patrick Durusau @ 2:02 pm

Isidorus

From the website:

Isidorus is an Open Source Topic Map engine actively developed using sbcl and elephant. Isidorus supports import and export of XTM 1.0 and 2.0, full versioning, merge semantics, an Atom-based RESTful API and Topic Map querying — with more to come.

Current areas of development include:

Enforcements of constraints (TMCL)

Json-import / export and a AJAX front end for data curation

Enhanced querying

Also note:

A Virtual Box image of a pre-installed isidorus-environment on an Ubuntu-Linux system is available at: http://festus.textgrid.it.fh-worms.de/TMRA2009/isidorus-vbox-image.tar.gz.

Comments Off

Flapjax

Filed under: Software,Web Applications — Patrick Durusau @ 2:01 pm

Flapjax

From the website:

Flapjax is a new programming language designed around the demands of modern, client-based Web applications. Its principal features include:

Event-driven, reactive evaluation

An event-stream abstraction for communicating with web services

Interfaces to external web services

Flapjax is easy to learn: it is just a JavaScript framework. Furthermore, because Flapjax is built entirely atop JavaScript, it runs on traditional Web browsers without the need for plug-ins or other downloads. It integrates seamlessly with existing JavaScript code and other frameworks.

Don’t know if anyone will find this useful but some of the demos looked interesting.

Thought it would be worth mentioning for anyone looking to build client-based topic map applications.

Comments Off

Baltimore – Semi-Transparent or Semi-Opaque?

Filed under: Data Source,Marketing — Patrick Durusau @ 10:09 am

Open Baltimore is leading the way towards semi-transparent or semi-opaque government.

You be the judge.

The City of Baltimore is leading in placing hundreds of data sets online.

But is that being semi-transparent or semi-opaque?

Data sets I would like to see:

City contracts, their amounts and who was successful at bidding on them?
Successful bidders not be corporate names but who owns them? Who works there? What lawyers represent them?
What are the relationships, personal, business, etc., between staff, elected officials and anyone who does business with the city?
Same questions for school, fire, police and other departments.
Code violations, what are they, which inspectors write them, for what locations?
Arrests made of who, by which officers, for what crimes, locations and times.
etc. (these are illustrations and not an exhaustive list)

Make no mistake, I am grateful for the information the city has already provided.

What they have provided took a lot of work and will be useful for a number of purposes.

But I don’t want people to think that a large number of data sets means transparency.

Transparency involves questions of relevant data and meaningful ways to evaluate it and to connect it to other data.

Comments (1)

Think Outside the (Comment) Box

Filed under: Database,Semantics,Software — Patrick Durusau @ 8:36 am

Think Outside the (Comment) Box: Social Applications for Publishers

From the announcement:

Learn about the next generation of social applications and how publishers are leveraging them for editorial and financial benefit.

I will spare you the rest of the breathless language.

Still, I will be there and suggest you be there as well.

Norm Walsh, who needs no introduction in markup circles, works at MarkLogic.

That gives me confidence this may be worth hearing.

Details:

February 9, 2011 – 8:00 am pacific, 11:00 am eastern – 4:00 pm GMT

*****
PS: For anyone who has been under a rock for the last several years, MarkLogic makes an excellent XML database solution.

See for example, MarkMail, a collection of technical mailing lists from around the web.

Searching it also illustrates how much semantic improvement can be made to searching.

Comments Off

Infochimps

Filed under: Data Source,Marketing,Mashups,Topic Maps — Patrick Durusau @ 8:17 am

Infochimps.com

Another free data source. (Commercial plans also available.)

Large number of data sources and what looks like a friendly number of free API calls while you are building an application.

Observation: Finding one data source or project seems to lead to several others in the same area.

Definitely worth a visit.

*****
PS: The abundance of online data sources opens the door to semantic mappings (can you say topic maps?) that enhance the value of these data sets.

Such as resolving the semantic impedance between the data sets.

Topic map artifacts as commercial products.

The trick is going to be discovering (and resolving) semantic impedances that people are willing to pay to avoid.

Comments (2)

Older Posts »