Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 11, 2011

Hello Daleks!

Filed under: Neo4j — Patrick Durusau @ 7:21 pm

The Power of the Daleks (and Neo4j)

Ian Robinson writes:

Jim Webber and I have been busy these last few months writing a tutorial for Neo4j. As we did with REST in Practice, we’ve chosen an example domain that falls a little outside the enterprise norm. With REST in Practice we chose the everyday world of the coffee shop; this time round, we’ve gone for the grand old universe of Doctor Who.

As we’ve worked on the tutorial, we’ve built up a Doctor Who data model that shows how Neo4j can be used to address several different data and domain concerns. For example, part of the dataset includes timeline data, comprising seasons, stories and episodes; elsewhere we’ve social network-like data, with characters connected to one another through being companions, allies or enemies of the Doctor. It’s a messy and densely-connected dataset – much like the data you might find in a real-world enterprise. Some of it is of high quality, some of it is lacking in detail. And for every seeming invariant in the domain, there’s an awkward exception – again, much like the real world. For example, each incarnation of the Doctor has been played by one actor, except the first, who was originally played by William Hartnell, and then later by Richard Hurdnall, who reprised the tetchy first incarnation some years after William Hartnell’s death for the twenty-fifth anniversary story, The Five Doctors. Each episode has a title; at least, they did when the series first began, in 1963, and they do today. But towards the end of the third season, the original series stopped assigning individual episode titles (in the original series, stories typically took place over several episodes); a story’s episodes were simply labelled Part 1, Part 2, etc. And so on.

Beat’s the hell out of “Hello World!” doesn’t it?

CXAIR Dynamic Venn Diagrams

Filed under: CXAIR,Venn Diagrams,Visualization — Patrick Durusau @ 7:18 pm

CXAIR Dynamic Venn Diagrams

From the description:

Brief overview of how Venn diagrams work in CXAIR. The Venn’s are created on the fly and allow you to find relationships in your data in an extremely visual and easy to use way.

Take the five (5) minutes it will take to watch this video.

This is an example of a good interface. Good, not great.

Works well with crisp data and sharp boundaries. Fuzzy or uncertain data, perhaps not as good.

Still, there is a lot of data with (alleged anyway) sharp boundaries. CXAIR is a good tool for exploring such data in order to create a topic map.

See: http://www.connexica.com/ for more details.


Research question: To what extent is the overlapping of properties the specification of complex identities? Works in the demonstration to identify collective subjects. (A subject that is a collection of subjects.) Should work for singleton subjects (depending on your definition).

Would it be helpful to have associations with other subjects displayed while constructing an identification for a singleton subject? How would you decide which ones to display?

Neo4j 1.4 “Kiruna Stol” GA

Filed under: Graphs,Neo4j — Patrick Durusau @ 7:17 pm

Neo4j 1.4 “Kiruna Stol” GA

After six milestones, Neo4j 1.4 has appeared for general availability!

A quick listing of the highlights:

  • Cypher Query Language
  • Automatic Indexing
  • Index Improvements
  • Self Relationships
  • Performance Improvements
  • REST API Improvements
  • Webadmin Improvements
  • New Server Management Scripts

It just keeps getting better!

July 10, 2011

YouTube on Oracle’s Exadata?

Filed under: BigData,Open Source,SQL — Patrick Durusau @ 3:40 pm

Big data vs. traditional databases: Can you reproduce YouTube on Oracle’s Exadata?

Review of a report by Cowen & Co. analyst Peter Goldmacher on Big Data and traditional relational database vendors. Goldmacher is quoted as saying:

We believe the vast majority of data growth is coming in the form of data sets that are not well suited for traditional relational database vendors like Oracle. Not only is the data too unstructured and/or too voluminous for a traditional RDBMS, the software and hardware costs required to crunch through these new data sets using traditional RDBMS technology are prohibitive. To capitalize on the Big Data trend, a new breed of Big Data companies has emerged, leveraging commodity hardware, open source and proprietary technology to capture and analyze these new data sets. We believe the incumbent vendors are unlikely to be a major force in the Big Data trend primarily due to pricing issues and not a lack of technical know-how.

I doubt traditional relational database vendors like Oracle are going to be sitting targets for “…a new breed of Big Data companies….”

True, the “new breed” companies come without some of the licensing costs of traditional vendors, but licensing costs are only one factor in choosing a vendor.

The administrative and auditing requirements for large government contracts, for example, are likely only to be met by large traditional vendors.

And it is the skill with which Big Data is analyzed that makes it of interest to a customer. Skills that traditional vendors have in depth to bring to commodity hardware and open source technology.

Oracle, for example, could slowly replace its licensing revenue stream with a data analysis revenue stream that “new breed” vendors would find hard to match.

Or to paraphrase Shakespeare:

NewBreed:

“I can analyze Big Data.”

Oracle:

“Why, so can I, or so can any man; But will it be meaningful?”

(Henry IV, Part 1, Act III, Scene 1)


BTW, ZDNet forgot to mention in its coverage of this story that Peter Goldmacher worked for Oracle Corporation early in his career. His research coverage entry reads in part:

He started his career at Oracle, working for six years in variety of departments including sales ops, consulting, marketing, and finance, and he has also worked at BMC Software as Director, Corporate Planning and Strategy. (Accessed 10 June 2011, 11:00 AM, East Coast Time)


In the interest of fairness, I should point out that after Oracle’s acquisition of Sun Microsystems, they have sponsored my work as the OpenDocument Format (ODF) editor. I don’t speak on behalf of Oracle with regard to ODF, much less its other ventures. Their sponsorship simply enables me to devote time to the ODF project.

How To Create a Hello World Page with structr
(Hello World for topic maps?)

Filed under: Authoring Topic Maps,Neo4j,structr — Patrick Durusau @ 3:40 pm

How To Create a Hello World Page with structr

Guide to creating a “Hello World” page with structr, which is a Neo4j-based CMS.

While looking at the guide, it occurred to me that most users are only going to create pages. I can’t imagine most sysadmins giving users the ability to create domains, sites or even templates. Users are going to author pages. And when their applications open up, it is going to be inside a domain/site and have only certain templates they can use, as part of a workflow to others.

Shouldn’t that be the same case for topic maps? That is the average user does not author a topic map, does not author default subjects or identifiers, probably doesn’t author identifiers at all. And for that matter, whatever they do author, is part of a workflow with others. Yes?

Has the problem, at least in part, been that topic map explanations explain too much? That for some specific domain, we say what to do and it simply works?

Take www.worldcat.org for example. A topic map authoring interface to that resource should allow users to select one or more entries, returned from a query, which all share the same ISBN, as being the same item.

For example, search for “Real World Haskell.” Six items are returned, with the first two obviously being the same title. The first entry has the following ISBN entry: 9780596514983 0596514980, the second entry has: 0596514980 : 9780596514983. That’s right. Addition of a colon separator and the ISBN numbers are reversed. Rather than two entries, a topic map should allow me to mark this as one entry and to process the underlying data to present it as such. Including all the libraries with holdings.

That should not require any more effort on my part than choosing these entries as identical items. Ideally that choice on my part should accrue to the benefit of any subsequent users searching for the same entry.

The third and fourth items are the same text but in Japanese. My personal modeling choice would be to merge them and the sixth item (the Safari edition) as language and media variants respectively. Might need language/media variant choices.

No thorny theoretical issues, immediate benefit to current and subsequent users. Is that a way forward?


Deduplication of the WorldCat files by automated means is certainly possible and a value-add. But given the number of users who already consult WorldCat on a daily basis, why not take advantage of their human expertise?

2011 Digital Universe Study

Filed under: Marketing — Patrick Durusau @ 3:39 pm

2011 Digital Universe Study by IDC, sponsored by EMC

Textual excerpts: Extracting Value from Chaos

Multimedia: The 2011 Digital Universe Study: Extracting Value from Chaos

Lots of impressive (scary?) numbers about the expansion of the digital universe.

I rather liked the first technical call to action:

Investigate the new tools for creating metadata — the information you will need to understand which data is needed when and for what. Big data will be a fountain of big value only if it can speak to you through metadata.

And that no present metadata methodology was cited as “the” solution.

Sounds like we all have a lot of work and experimenting ahead of us.

I don’t think we are ever going to find the technology solution to any issue in the Digital Universe. Our companions are too busy inventing new issues and reinventing old ones in new guises for that to happen.

Flintstone

Filed under: Key-Value Stores,PHP — Patrick Durusau @ 3:39 pm

Flintstone

From the website:

What is Flintstone?

A key/value database store using flat files for PHP.

Features include:

  • Memory efficient
  • File locking
  • Caching
  • Gzip compression
  • Easy to use

Since I have covered other key/value store databases I thought I should include this one as well.

Could be useful for quick mockups.

Although I am mindful of a developer who complained to me about a script that was supposed to be too limited for long term production use by the customer. Later found out it met their production requirements quite nicely.

The lesson I drew from that was a need for greater transparency with customers and licensing.

Joy of Clojure – Bibliography

Filed under: Bibliography,Clojure — Patrick Durusau @ 3:38 pm

Joy of Clojure – Bibliography

OK, I’m an academic. I like bibliographies. 😉

I noticed it repeats what you will find in the The Joy of Clojure. That’s useful if you are at the library without your copy of Joy and can’t remember a particular citation. But not very useful otherwise.

Suggestion: Make the bibliography a dynamic one that accepts suggested annotated references from readers that point to particular sections or discussions in the text. Posting subject to the judgment of the authors.

Could prove to be useful in the event of a second or following edition.

Jark

Filed under: Clojure,Java — Patrick Durusau @ 3:38 pm

Jark

From the webpage:

A tool to manage classpaths and clojure namespaces on a persistent JVM

Why Jark

Startup time of the Java Virtual Machine(JVM) is too slow and thereby command-line applications on the JVM are sluggish and very painful to use. And there is no existing simple way for multiple clients to access the same instance of the JVM. Jark is an attempt to run a persistent JVM daemon and provide a set of utilities to control and operate on it.

Jark is intended to

  • deploy, maintain and debug clojure programs on remote hosts
  • provide an easy interface to run clojure programs on the command-line
  • provide a set of useful namespace and classpath utilities
  • provide a secure and robust implementation of a JVM daemon that multiple clients can connect to, seamlessly
  • provide a thin client that can run on any OS platform and with minimum runtime dependencies.
  • be VM agnostic: support for all VMs that clojure runs on in the future

In case you need a persistent JVM daemon.

July 9, 2011

Topincs: A software for rapid development of web databases

Filed under: Topic Map Software,Topincs — Patrick Durusau @ 7:03 pm

Topincs: A software for rapid development of web databases, Robert Cerny’s paper for KMIS 2011.

Describes Topincs, a system that conceals many of the complexities of topic maps from users, while delivering on the value proposition of topic maps.

Think of Topincs as avoiding the topic map equivalent of: “How many people would use word processors if they had to learn LaTeX first?”

Data journalism, data tools, and the
newsroom stack

Filed under: Data Analysis,Data Mining — Patrick Durusau @ 7:02 pm

Data journalism, data tools, and the newsroom stack by Alex Howard.

From the post:

MIT’s recent Civic Media Conference and the latest batch of Knight News Challenge winners made one reality crystal clear: as a new era of technology-fueled transparency, innovation and open government dawns, it won’t depend on any single CIO or federal program. It will be driven by a distributed community of media, nonprofits, academics and civic advocates focused on better outcomes, more informed communities and the new news, whatever form it is delivered in.

The themes that unite this class of Knight News Challenge winners were data journalism and platforms for civic connections. Each theme draws from central realities of the information ecosystems of today. Newsrooms and citizens are confronted by unprecedented amounts of data and an expanded number of news sources, including a social web populated by our friends, family and colleagues. Newsrooms, the traditional hosts for information gathering and dissemination, are now part of a flattened environment for news, where news breaks first on social networks, is curated by a combination of professionals and amateurs, and then analyzed and synthesized into contextualized journalism.

Pointers to the newest resources and analysis of the issues of “transparency, innovation and open government….”

Until government transparency becomes public and cumulative, it will be personal and transitory.

Topic maps have the capability to make it the former instead of the latter.

Big Data and the Semantic Web

Filed under: BigData,Semantic Web — Patrick Durusau @ 7:01 pm

Big Data and the Semantic Web

Edd Dumbill sees Big Data as answering the semantic questions posed by the Semantic Web:

Conventionally, semantic web systems generate metadata and identified entities explicitly, ie. by hand or as the output of database values. But as anybody who’s tried to get users to do it will tell you, generating metadata is hard. This is part of why the full semantic web dream isn’t yet realized. Analytical approaches take a different approach: surfacing and classifying the metadata from analysis of the actual content and data itself. (Freely exposing metadata is also controversial and risky, as open data advocates will attest.)

Once big data techniques have been successfully applied, you have identified entities and the connections between them. If you want to join that information up to the rest of the web, or to concepts outside of your system, you need a language in which to do that. You need to organize, exchange and reason about those entities. It’s this framework that has been steadily built up over the last 15 years with the semantic web project.

To give an already widespread example: many data scientists use Wikipedia to help with entity resolution and disambiguation, using Wikipedia URLs to identify entities. This is a classic use of the most fundamental of semantic web technologies: the URI.

I am not sure where Edd gets: “Once big data techniques have been successfully applied, you have identified entities and the connections between them.” Really? Or is that something hoped for in the future? A general solution to entity extraction and discovery of relationships remains a research topic.

Big Data will worsen the semantic poverty of the Semantic Web and drive the search for tools and approaches to address that poverty.

Neo4j 1.4 M06 “Kiruna Stol”

Filed under: Graphs,Neo4j,NoSQL — Patrick Durusau @ 7:00 pm

Neo4j 1.4 M06 “Kiruna Stol”

From the Neo4j blog:

It’s been just a week since the Neo4j 1.4 M05 release, and though we’re pleased with the way the feature set has evolved, during testing we found a potential corruption bug in that specific milestone.

To address that issue, this week we’re releasing our sixth milestone towards the 1.4 GA release. This milestone is likely to be the last of the series for the 1.4 release, and if the community feedback is positive we will transition into our GA release shortly.

Sounds like Neo4j 1.4 is arriving soon!

Indexing in Cassandra

Filed under: Cassandra,Indexing — Patrick Durusau @ 7:00 pm

Indexing in Cassandra

From the post:

I’m writing this up because there’s always quite a bit of discussion on both the Cassandra and Hector mailing lists about indexes and the best ways to use them. I’d written a previous post about Secondary indexes in Cassandra last July, but there are a few more options and considerations today. I’m going to do a quick run through of the different approaches for doing indexes in Cassandra so that you can more easily navigate these and determine what’s the best approach for your application.

Good article on indexes in Cassandra.

July 8, 2011

Managing Knowledge in Organizational Memory Using Topic Maps

Filed under: Organizational Memory — Patrick Durusau @ 3:56 pm

Managing Knowledge in Organizational Memory Using Topic Maps by Les Miller (Iowa State University, USA); Sree Nilakanta (Iowa State University, USA); Yunan Song (Iowa State University, USA); Lei Zhu (Iowa State University, USA); Ming Hua (Iowa State University, USA).

Abstract:

Organizational memories play a significant role in knowledge management, but several challenges confront their use. Artifacts of OM are many and varied. Access and use of the stored artifact are influenced by the user’s understanding of these information objects as well as their context. Theories of distributed cognition and the notion of community of practice are used to develop a model of the knowledge management system. In the present work we look at a model for managing organizational memory knowledge. Topic maps are used in the model to represent user cognition of contextualized information. A visual approach to topic maps proposed in the model also allows for access and analysis of stored memory artifacts. The design and implementation of a prototype to test the feasibility of the model is briefly examined.

Apologies for not finding a more accessible copy of this paper. Please post if you locate one.

The use of topic maps with organizational memory highlights one of the advantages (and costs) of topic maps.

Test yourself this way:

Take a blank sheet of paper and write down one fact you needed or process that you followed for three work related activities yesterday.

How many of those facts or processes would be known by someone outside your department?

I would be willing to bet none of them. Why? Even if you are in something as common as fast food, you still have to know which supervisor to call if there is an emergency, the correct process for storing supplies at your site and any quirks in your local machinery. All of which contribute to the smooth running of the operation. All of which would be unknown to someone outside your particular location.

Gathering that level of information about an organization is incredibly useful, the up side being lower impact from supervisor or staff turn over. The down side is that it requires management to create a culture of preserving organizational memory. That in part involves giving staff a stake in that preservation. The “local” knowledge part can be managed by topic maps.

Exposing Databases…

Filed under: Clojure,R2RML — Patrick Durusau @ 3:56 pm

Exposing databases as linked data using clojure, compojure and R2RML

From the post:

Some of these problems could be solved using some of the technologies developed as part of the semantic web initiative in the past decade. People have started referring to this pragmatic approach to the semantic web with a new title: Linked Data. The pragmatic approach here means putting less emphasis in the inference and ontological layers of the semantic web and just focusing in offer a simple way to expose data in the web linking resources across different web application and data sources.

Many interesting technologies are being developed under the linked data monicker or are commonly associated to it, RDFa for instance. Another of these technologies is R2RML: RDB to RDF Mapping Language.

R2RML describes a standard vocabulary to lift relational data to a RDF graph. It also provides a standard mapping for the relational data. This RDF graph can be serialized to some transport format: RDFa, Turtle, XML and then retrieved by the client. The client can store the triples in the graph locally and use the standard query language SPARQL to retrieve the information. Data from different applications using the same vocabulary (FOAF, GoodRelations) can be easily mixed and manipulated by the client in the same triple store. Furthermore, links to other resources can be inserted inside the RDF graph leading to the discovery of additional information.

Exposing data, even as an RDF graph, is a good thing.

Discovering additional information is also a good thing.

But they fall short of specifying the semantics of data, which is necessary to enable reliable identification of data with the same semantics.

Beating the Averages

Filed under: Lisp — Patrick Durusau @ 3:55 pm

Betting the Averages

Great summer reading for anyone who wants a successful startup or simply to improve an ongoing software company. (although the latter is probably the harder of the two tasks)

A couple of quotes to get you interested in reading more:

But I think I can give a kind of argument that might be convincing. The source code of the Viaweb editor was probably about 20-25% macros. Macros are harder to write than ordinary Lisp functions, and it’s considered to be bad style to use them when they’re not necessary. So every macro in that code is there because it has to be. What that means is that at least 20-25% of the code in this program is doing things that you can’t easily do in any other language. However skeptical the Blub programmer might be about my claims for the mysterious powers of Lisp, this ought to make him curious. We weren’t writing this code for our own amusement. We were a tiny startup, programming as hard as we could in order to put technical barriers between us and our competitors.

A suspicious person might begin to wonder if there was some correlation here. A big chunk of our code was doing things that are very hard to do in other languages. The resulting software did things our competitors’ software couldn’t do. Maybe there was some kind of connection. I encourage you to follow that thread. There may be more to that old man hobbling along on his crutches than meets the eye.

And consider his advice on evaluating competitors:

If you ever do find yourself working for a startup, here’s a handy tip for evaluating competitors. Read their job listings. Everything else on their site may be stock photos or the prose equivalent, but the job listings have to be specific about what they want, or they’ll get the wrong candidates.

During the years we worked on Viaweb I read a lot of job descriptions. A new competitor seemed to emerge out of the woodwork every month or so. The first thing I would do, after checking to see if they had a live online demo, was look at their job listings. After a couple years of this I could tell which companies to worry about and which not to. The more of an IT flavor the job descriptions had, the less dangerous the company was. The safest kind were the ones that wanted Oracle experience. You never had to worry about those. You were also safe if they said they wanted C++ or Java developers. If they wanted Perl or Python programmers, that would be a bit frightening– that’s starting to sound like a company where the technical side, at least, is run by real hackers. If I had ever seen a job posting looking for Lisp hackers, I would have been really worried.

This was written in 2003.

How would you update his advice on evaluating job descriptions at other startups?

Languages of the World (Wide Web)
(Google Research Blog)

Filed under: Language — Patrick Durusau @ 3:54 pm

Languages of the World (Wide Web)

Interesting post about linking between sites in different languages, if using somewhat outdated (2008) data. The authors allude to later data but give no specifics.

I mention it here as an example of where different subjects (the websites in particular languages), are treated as collective subjects for the purpose of examining links (associations in topic map speak) between the collective subjects.

Or as described by the authors:

To see the connections between languages, start by taking the several billion most important pages on the web in 2008, including all pages in smaller languages, and look at the off-site links between these pages. The particular choice of pages in our corpus here reflects decisions about what is `important’. For example, in a language with few pages every page is considered important, while for languages with more pages some selection method is required, based on pagerank for example.

We can use our corpus to draw a very simple graph of the web, with a node for each language and an edge between two languages if more than one percent of the offsite links in the first language land on pages in the second. To make things a little clearer, we only show the languages which have at least a hundred thousand pages and have a strong link with another language, meaning at least 1% of off-site links go to that language. We also leave out English, which we’ll discuss more in a moment. (Figure 1)

Being able to decompose the collective subjects to reveal numbers for sites in particular locations or particular sites would have made this study more compelling.

Integrating Neo4J with SQL Server

Filed under: Neo4j — Patrick Durusau @ 3:54 pm

Integrating Neo4J with SQL Server by Cem Güler.

This article illustrates a path that topic maps can (and have in some cases) take by curing known weaknesses or issues with a popular software product in enterprise environments.

By focusing on a particular issue and supplementing (rather than replacing) existing software, a topic map based application may be less resistance from existing IT infrastructures.

Hadoop and MapReduce

Filed under: Hadoop,MapReduce — Patrick Durusau @ 3:53 pm

Hadoop and MapReduce

Nice slidedeck on Hadoop and MapReduce by Friso van Vollenhoven.

Can’t tell what illustration/explanation is going to “click” for someone so keep this one in mind when discussing Hadoop/MapReduce.

Clojure – PragPub July 2011

Filed under: Clojure — Patrick Durusau @ 3:52 pm

Clojure – PragPub July 2011

Where you will find:

Clojure Building Blocks
by Jean-François “Jeff” Héon
Jeff introduces Clojure fundamentals and uses them to show why you might want to explore this language further.

Clojure Collections
by Steven Reynolds
Steven explains the benefits of immutability and explores how Clojure’s data collections handle it.

Create Unix Services with Clojure
by Aaron Bedra
Aaron is the coauthor (with Stuart Halloway) of the forthcoming Programming Clojure, Second Edition. Here he gives a practical, hands-on experience with Clojure.

Growing a DSL with Clojure
by Ambrose Bonnaire-Sergeant
From seed to full bloom, Ambrose takes us through the steps to grow a domain-specific language in Clojure.

Graphity Diagram Editor

Filed under: Graphics — Patrick Durusau @ 3:51 pm

Graphity Diagram Editor

From the website:

Graphity Diagram Editor

Graphity is a diagram editor that can be used to quickly and effectively generate drawings and to apply automatic layouts to a range of different diagrams and networks like:

  • Flowcharts
  • Social networks
  • Computer networks
  • UML diagrams
  • Business process modeling diagrams

Graphity is an application for the Adobe® Flash® Player. It can be used with any browser which has a Flash Player plugin. No installation is necessary.

Graphity can be used for free and without registration directly from this web page!

Graphs anyone?

disclojure

Filed under: Clojure — Patrick Durusau @ 7:55 am

disclojure: public disclosure of all things Clojure

Collection of pointers to resources, projects and groups.

July 7, 2011

LAC Releases Government of Canada Core
Subject Thesaurus

Filed under: Government Data,RDF,SKOS — Patrick Durusau @ 4:30 pm

LAC Releases Government of Canada Core Subject Thesaurus

From the post:

The government of Canada has released a new downloadable version of its Core Subject Thesaurus in SKOS/RDF format. According to Library and Archives Canada, “The Government of Canada Core Subject Thesaurus is a bilingual thesaurus consisting of terminology that represents all the fields covered in the information resources of the Government of Canada. Library and Archives Canada is exploring the potential for linked data and the semantic web with LAC vocabularies, metadata and open content.”

When you reach the post with links to the vocabulary you will find it is also available as XML and CVS.

There are changes from the 2009 version.

Here’s an example:

old form new form French equivalent
Adaptive aids
(for persons
with disabilities)
Assistive Technologies Technologie d’aide

Did you notice that the old form and new form don’t share a single word in common?

Imagine that, an unstable core subject thesaurus.

Over time, more terms will be added, changed and deleted. Is there a topic map in the house?

JSON-LD – Expressing Linked Data in JSON

Filed under: JSON,Linked Data — Patrick Durusau @ 4:29 pm

JSON-LD – Expressing Linked Data in JSON

I mentioned recently a mailing list on Linked Data in JSON.

From the webpage:

JSON-LD (JavaScript Object Notation for Linked Data) is a lightweight Linked Data format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on the already successful JSON format and provides a way to help JSON data interoperate at Web-scale. If you are already familiar with JSON, writing JSON-LD is very easy. There is a smooth migration path from the JSON you use today, to the JSON-LD you will use in the future. These properties make JSON-LD an ideal Linked Data interchange language for JavaScript environments, Web services, and unstructured databases such as CouchDB and MongoDB.

Short example or two plus links to other resources.

_Let_It_Crash_

Filed under: Erlang — Patrick Durusau @ 4:29 pm

_Let_It_Crash_

Pavlo Baron’s highly entertaining slides from SEACOM ’11.

Useful for making the argument against purely defensive programming but also planning for what to do when crashes occur.

The images add a lot to the presentation. I need to remember that for future presentations.

Use Cases Solved in Redis
(TM Use Cases?)

Filed under: NoSQL,Redis — Patrick Durusau @ 4:17 pm

11 Common Web Use Cases Solved in Redis

From the webpage:

In How to take advantage of Redis just adding it to your stack Salvatore ‘antirez’ Sanfilippo shows how to solve some common problems in Redis by taking advantage of its unique data structure handling capabilities. Common Redis primitives like LPUSH, and LTRIM, and LREM are used to accomplish tasks programmers need to get done, but that can be hard or slow in more traditional stores. A very useful and practical article. How would you accomplish these tasks in your framework?

Good post about Redis and common web use cases.

Occurs to me that I don’t have a similar list for topic maps (whatever software you use) as a technology.

Sure, topic map apply when you need to have a common locus for information about a subject or need better modeling of relationships, but that’s all rather vague and hand-wavy.

Here are two examples that are more concrete:

The small office supply store on the town square (this is a true story) had its own internal inventory system with numbers, etc. The small store ordered from several larger suppliers, who all had their own names and internal numbers for the same items. A stable mapping wasn’t an option because the numbers used both by the large suppliers (as well as the descriptions) and the manufacturers were subject to change and reuse.

The small office supply store could see the value in a topic map but the cost in employee time to match up the inventory numbers was less than construction and maintenance of a topic map on top of their internal system. I would say that dynamic inventory control is a topic maps use case.

The other use case involves medical terminology. A doctor I know covers the hospital for an entire local medical practice. He isn’t a specialist in any of the fields covered by the practice so he has to look up the latest medical advances in several fields. Like all of us, he has terms that he learned in for particular conditions, which aren’t the ones in the medical databases. So he has trouble searching from time to time.

He recognized the value of a topic map being able to create a mapping between his terminology and the terminology used by the medical database. It would enable him to search more quickly and effectively. Unfortunately the problem, in these economic times, isn’t pinching enough to result in a project. Personalized search interfaces are another topic map use case.

What’s yours?

MapR Releases Commercial Distributions based on Hadoop

Filed under: Hadoop — Patrick Durusau @ 4:16 pm

MapR Releases Commercial Distributions based on Hadoop

From the post:

MapR Technologies released a big data toolkit, based on Apache Hadoop with their own distributed storage alternative to HDFS. The software is commercial, with MapR offering both a free version, M3, as well as a paid version, M5. M5 includes snapshots and mirroring for data, Job Tracker recovery, and commercial support. MapR’s M5 edition will form the basis of EMC Greenplum’s upcoming HD Enterprise Edition, whereas EMC Greenplum’s HD Community Edition will be based on Facebook’s Hadoop distribution rather than MapR technology.

At the Hadoop Summit last week, MapR Technologies announced the general availability of their "Next Generation Distribution for Apache Hadoop." InfoQ interviewed CEO John Schroeder and VP Marketing Jack Norris to learn more about their approach. MapR claims to improve MapReduce and HBase performance by a factor of 2-5, and to eliminate single points of failure in Hadoop. Schroeder says that they measure performance against competing distributions by timing benchmarks such as DFSIO, Terasort, YCSB, Gridmix, and Pigmix. He also said that customers testing MapR’s technology are seeing a 3-5 times improvement in performance against previous versions of Hadoop that they use. Schroeder reports that they had 35 beta testers and that they showed linear scalability in clusters of up to 160 nodes. MapR reports that several of the beta test customers now have their technology in production – including one that has a 140 node cluster in production, and another that "is looking at deploying MapR on 2000 nodes." By comparison, Yahoo is believed to run the largest Hadoop clusters, comprised of 4000 nodes running Apache Hadoop and competitor Cloudera claimed to have more than 80 customers running Hadoop in production in March 2011, with 22 clusters running Cloudera’s distribution that are over a petabyte as of July 2011.

Remember, Hadoop is a buzz word in U.S. government circles.

MongoSV

Filed under: MongoDB,NoSQL — Patrick Durusau @ 4:16 pm

MongoSV

From the homepage:

MongoSV was a four-track, one-day conference on December 3, 2010 at Microsoft Research Silicon Valley in Mountain View, CA. The main conference track featured 10gen founders Dwight Merriman and Eliot Horowitz, as well as Roger Bodamer, the head of 10gen’s west coast operations, and several of the key engineers developing the MongoDB project. These sessions were geared towards developers and administrators interested in learning how to use the database, with sessions on schema design, indexing, administration, deployment strategies, scaling, and other features. A second track showcased several high-profile deployments of the database at Shutterfly, Craigslist, IGN, Intuit, Wordnik, and more. For more experienced users of the database, there were several advanced sessions, covering the storage engine, replication, sharding, and consistency models.

Excellent collection of videos and slides on MongoDB and various aspects of its use.

Wordnick

Filed under: Graphs,MongoDB,Subject Identity — Patrick Durusau @ 4:15 pm

Wordnick – Building a Directed Graph with MongoDB

Tony Tam slide deck on directed graphs and MongoDB.

Emphasizes what graph you build depends on your application needs. Much like using your tests for subject identity. You could always use mine but never quite as well or as accurately as your own.

« Newer PostsOlder Posts »

Powered by WordPress