Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 10, 2012

When It Comes to Data Quality Delivery, the Soft Stuff is the Hard Stuff (Part 1 of 6)

Filed under: Data Management,Data Quality — Patrick Durusau @ 8:20 pm

When It Comes to Data Quality Delivery, the Soft Stuff is the Hard Stuff (Part 1 of 6) by Richard Trapp.

From the post:

I regularly receive questions regarding the types of skills data quality analysts should have in order to be effective. In my experience, regardless of scope, high performing data quality analysts need to possess a well-rounded, balanced skill set – one that marries technical “know how” and aptitude with a solid business understanding and acumen. But, far too often, it seems that undue importance is placed on what I call the data quality “hard skills”, which include; a firm grasp of database concepts, hands on data analysis experience using standard analytical tool sets, expertise with commercial data quality technologies, knowledge of data management best practices and an understanding of the software development life cycle.

Read Richard’s post to get the listing of “soft skills” and evaluate yourself.

I am going to track this series and will post updates here.

Being successful with “big data,” semantic integration, whatever the next buzz words are, will require a mix of hard and soft skills.

Success has always required both hard and soft skills, but it doesn’t hurt to repeat the lesson.

March 9, 2012

Structure, Semantics and Master Data Models

Filed under: Master Data Management,Semantics — Patrick Durusau @ 8:45 pm

Structure, Semantics and Master Data Models by David Loshin.

From the post:

Looking back at some of my Informatica Perspectives posts over the past year or so, I reflected on some common themes about data management and data governance, especially in the context of master data management and particularly, master data models. As both the tools and the practices around MDM mature, we have seen some disillusionment in attempts to deploy an MDM solution, with our customers noting that they continue to hit bumps in the road in the technical implementation associated with both master data consolidation and then with publication of shared master data.

Almost every issue we see can be characterized into one of three buckets:

What do you think about David’s three buckets? Close? Far away?


David continued this line of postings:

Master Data Model Alternatives – Part 2 March 12, 2012.

Master Data Consolidation Versus Master Data Sharing: Modeling Matters! – Part 3 March 19, 2012.

Considerations for Multi-Domain Master Data Modeling – Part 4 March 26, 2012.

Ask For Forgiveness Programming – Or How We’ll Program 1000 Cores

Filed under: Multi-Core,Parallel Programming — Patrick Durusau @ 8:45 pm

Ask For Forgiveness Programming – Or How We’ll Program 1000 Cores

Another approach to multi-core processing:

The argument for a massively multicore future is now familiar: while clock speeds have leveled off, device density is increasing, so the future is cheap chips with hundreds and thousands of cores. That’s the inexorable logic behind our multicore future.

The unsolved question that lurks deep in the dark part of a programmer’s mind is: how on earth are we to program these things? For problems that aren’t embarrassingly parallel, we really have no idea. IBM Research’s David Ungar has an idea. And it’s radical in the extreme…

After reading this article, ask yourself, how would you apply this approach with topic maps?

Geographic news coverage visualized by Nathan Yau

Filed under: News,Visualization — Patrick Durusau @ 8:45 pm

Geographic news coverage visualized by Nathan Yau.

Nathan reports on work by Kitchen Budapest to visualize news coverage from Hungary from December 1998 to December 2011.

Interesting result. The comparison to other news sources, perhaps in different languages, would be an interesting study, side by side.

Graphene

Filed under: Dashboard,Graphene,Graphs — Patrick Durusau @ 8:45 pm

Graphene

From the readme:

Graphene is a realtime dashboard & graphing toolkit based on D3 and Backbone.

It was made to offer a very aesthetic realtime dashboard that lives on top of Graphite (but could be tailored to any back end, eventually).

Combining D3’s immense capabilities of managing live data, and Backbone’s ease of development, Graphene provides a solution capable of displaying thousands upon thousands of datapoints in your dashboard, as well as presenting a very hackable project to build on and customize.

Is it chance that interest on graph databases and graph display mechanisms have gone up at the same time?

Functional thinking: Functional design patterns, Part 1

Filed under: Design,Functional Programming,Merging,Topic Map Software — Patrick Durusau @ 8:45 pm

Functional thinking: Functional design patterns, Part 1 – How patterns manifest in the functional world

Summary

Contrary to popular belief, design patterns exist in functional programming — but they sometimes differ from their object-oriented counterparts in appearance and behavior. In this installment of Functional thinking, Neal Ford looks at ways in which patterns manifest in the functional paradigm, illustrating how the solutions differ.

From the article:

Some contingents in the functional world claim that the concept of the design pattern is flawed and isn’t needed in functional programming. A case can be made for that view under a narrow definition of pattern — but that’s an argument more about semantics than use. The concept of a design pattern — a named, cataloged solution to a common problem — is alive and well. However, patterns sometimes take different guises under different paradigms. Because the building blocks and approaches to problems are different in the functional world, some of the traditional Gang of Four patterns (see Resources) disappear, while others preserve the problem but solve it radically differently. This installment and the next investigate some traditional design patterns and rethink them in a functional way.

A functional approach to topic maps lends a certain elegance to the management of merging questions. 😉

Comments?

Visualizing Social Network Changes

Filed under: Cypher,Heroku,Neo4j,Visualization — Patrick Durusau @ 8:44 pm

Visualizing Social Network Changes by Max De Marzi.

From the post:

Some relationships change over time. Think about your friends from high school, college, work, the city you used to live in, the ones that liked you ex- better, etc. When exploring a social network it is important that we understand not only the strength of the relationship now, but over time. We can use communication between people as a measure.

I ran into a visualization that explored how multiple parties where connected by communications in multiple projects. We’re going to reuse it to explore how multiple people interact with each other. So let’s make a network of 50 friends and connect them to each other multiple times. Think of it as people writing on your facebook wall.

Excellent example of using a graph database to visualize changes in a social network. Not sure if it would be robust enough to capture the social dynamics of your local high school but it might be worth a shot. 😉

Modesty as a Technology Elite attribute

Filed under: Design,Marketing,Programming — Patrick Durusau @ 8:44 pm

Modesty as a Technology Elite attribute by Vinnie Mirchandani.

From the post:

The core of my new book is about 12 attributes of what I describe as the industry’s elites. 12 adjectives – 3Es, 3Ms, 3Ps and 3Ss – made the cut: Elegant, Exponential, Efficient, Mobile, Maverick, Malleable, Physical, Paranoid, Pragmatic, Speedy, Social, Sustainable. (The TOC and links to excerpts are here). Each attribute has its own chapter – the first half has 5 to 7 cameo examples of that attribute, the second half is a fuller case study. So the Elegant chapter focuses on Human Centered Design, Google’s Doodles, Jonathan Ive of Apple, John Lasseter of Pixar and others, and the case study is Virgin America and how it has redefined the flying experience with technology.

One attribute I had on my long list was “Modest”. I had a case study identified, but struggled to find 5 to 7 cameos for the first half of the chapter. Let’s face it, it’s an elusive attribute in our industry where vendors send you press releases for every obscure award they qualify for:)

Curious, when was the last time you advised a client your product or approach wasn’t the best solution for their problem?

Isn’t that part of being modest?

If you are a customer, when was the last time you asked a vendor or even consultant to suggest an alternative solution, one that did not involve their product or services? If you have ever asked that question, how would you evaluate the answer you got?

Global Brain Institute

Filed under: Artificial Intelligence,Networks — Patrick Durusau @ 8:44 pm

Global Brain Institute

From the webpage (under development):

The Global Brain can be defined as the distributed intelligence emerging from the planetary network of people and machines—as supported by the Internet. The Global Brain Institute (GBI) was founded in January 2012 at the Vrije Universiteit Brussel to research this revolutionary phenomenon. The GBI grew out of the Global Brain Group, an international community of researchers founded in 1996.

MissionTim Berners-Lee’s breakthrough invention of the Web stems from a simple and easy way to link any kind of information, anywhere on Earth. Since then, the development of the web has been largely an erratic proliferation of mutually incompatibleWeb 2.0 technologies with no clear direction. This demands a new unified paradigm to facilitate their integration.

The Global Brain Institute intends to develop a theory of the global brain that would help us to understand and steer this on-going evolution towards ever-stronger interconnection between humans and machines. If successful, this would help us achieve a much higher level of distributed intelligence that would allow us to efficiently tackle global problems too complex for present approaches.

Objectives

  • Develop a theory of the Global Brain that may offer us a long-term vision of where our information society is heading.
  • Build a mathematical model and computer simulation of the structure and dynamics of the Global Brain.
  • Survey the most important developments in society and ICT that are likely to impact on the evolution of the Global Brain.
  • Compare these observations with the implications of the theory.
  • Investigate how both observed and theorized developments may contribute to the main indicators of globally intelligent organization:
    • education, democracy, freedom, peace, development, sustainability, well-being, etc.
  • Disseminate our understanding of the Global Brain towards a wider public, so as to make people aware of this impending revolution

Our approach

We see people, machines and software systems as agents that communicate via a complex network of communication links. Problems, questions or opportunities define challenges that may incite these agents to act.

Challenges that cannot be fully resolved by a single agent are normally propagated to one or more other agents, along the links in the network. These agents contribute their own expertise to resolving the challenge, and if necessary propagate the challenge further, until it is fully resolved. Thus, the skills and knowledge of the different agents are pooled into a collective intelligence much more powerful than the one of its individual members.

The propagation of challenges across the global network is a complex, self-organizing process, similar to the “spreading activation” that characterizes thinking in the human brain. This process will typically change the network by reinforcing useful links, while weakening the others. Thus, the network learns or adapts to new challenges, becoming more intelligent in the process.

Sounds to me like there are going to be subject identity issues galore in a project such as this one.

BI’s Dirty Secrets – The Unfortunate Domination of Manually-Coded Extracts

Filed under: BI,Documentation — Patrick Durusau @ 8:44 pm

BI’s Dirty Secrets – The Unfortunate Domination of Manually-Coded Extracts by Rick Sherman.

From the post:

Manually-coded extracts are another dirty secret of the BI world. I’ve been seeing them for years, in both large and small companies. They grow haphazardly and are never documented, which practically guarantees that they will become an IT nightmare.

How have manually-coded extracts become so prevalent? It’s not as if there aren’t enough data integration tools around, including ETL tools. Even large enterprises that use the correct tools to load their enterprise data warehouses will often resort to manually-coded extracts to load their downstream BI data sources such as data marts, OLAP cubes, reporting databases and spreadsheets.

I thought the following passage was particularly good:

….Tools are easy; concepts are harder. Anyone can start coding; it’s a lot harder to actually architect and design. Tool vendors don’t help this situation when they promote tools that “solve world hunger” and limit training to the tool, not any concepts.

I don’t see manual coding as a problem, so long as it is documented. There should be one and only one penalty for lack of documentation. Termination.

Lack of documentation can put critical IT systems at risk and it doesn’t take complex systems to produce. Even (gasp) MS Word documents that are maintained with a table of contents and indexes can be adequate documentation.

Not the same as a bug database with bug reports, patches, pointers to code, email discussions, meeting minutes, etc., but interactive production of graphs and charts isn’t a requirement for successful documentation.

Undocumented manually-coded extracts are a sign that the requirements of BI users are not being meet. Getting those documented and incorporated into BI tools looks like a good first start to solving this secret.

Visual Studio Toolbox: Dependency Graphs

Filed under: Dependency Graphs,Programming — Patrick Durusau @ 8:43 pm

Visual Studio Toolbox: Dependency Graphs by Robert Green.

From the description:

In this episode, Cameron Skinner joins us to talk about the enhanced dependency graphs in Visual Studio 11. Dependency graphs represent your application structures as nodes and the relationships in your application as links. Cameron shows us how these graphs help you better understand your software so you can most efficiently enhance and maintain it.

I don’t have Visual Studio 11 so I will have to rely the comments of others about it.

However, it does sound quite useful.

I mention it here because I wonder what it would be like to have a dependency graph across applications. Say to show the libraries or methods that you used across different applications. Could be very useful when a bug is found in a library to quickly isolate all the applications where that library was used. Yes? (Or even methods.)

March 8, 2012

Fast and slow visualization

Filed under: Graphics,Maps,Visualization — Patrick Durusau @ 9:44 pm

Fast and slow visualization by Nathan Yau.

Nathan point out:

James Cheshire ponders the difference between fast and slow thinking maps, and the dying breed of the latter.

I wondered in Is That A Graph In Your Cray? if “interactivity” with data is a real requirement.

In the sense of being one that makes sense for any analytical project.

Take the alleged role of “Twitter” in the Arab Spring, a “conclusion” driven by easy and superficial access to data.

Deeper, slower analysis has since established that trade unions had been creating social networks (the traditional kind) for years and so when Twitter went dark, so what? The traditional social networks were in place and far more robust that a technological marvel available mostly to Western reporters.

Of course, the data “analysts” who touted the Twitter conclusion have gone onto other superficial analysis and conclusions, probably in the US election cycle by this point. At least they are not polluting international events for the moment, or perhaps not as much.

Graph Reading Club: 7 March 2011

Filed under: Graph Reading Club,Graphs — Patrick Durusau @ 9:44 pm

Related work of the Reading club on distributed graph data bases (Beehive, Scalable SPARQL Querying of Large RDF Graphs, memcached) by René Pickhardt.

You know, I really need to talk to René about his blog titles. 😉

René gives the reading assignment for next week and summarizes the discussion from this week, along with some housekeeping details.

Thanks René!

Hard science, soft science, hardware, software

Filed under: Science — Patrick Durusau @ 9:43 pm

Hard science, soft science, hardware, software by John D. Cook.

The post starts:

The hard sciences — physics, chemistry, astronomy, etc. — boasted remarkable achievements in the 20th century. The credibility and prestige of all science went up as a result. Academic disciplines outside the sciences rushed to append “science” to their names to share in the glory.

Science has an image of infallibility based on the success of the hard sciences. When someone says “You can’t argue with science,” I’d rather they said “It’s difficult to argue with hard science.”

Read on….

I think…, well, you decide on John’s basic point for yourself.

Personally I think the world is complicated, historically, linguistically, semantically, theologically, etc. I am much happier searching in hopes of answers that seem adequate for the moment, as opposed to seeking certitudes, particularly for others.

Kartograph

Filed under: Graphs,Visualization — Patrick Durusau @ 8:50 pm

Kartograph

From the post:

La Bella Italia

This example map of Italy showcases some ways you can style Kartograph maps. The base colors are defined in CSS, glow filters and image textures are applied in SVG.

Also you can see heavy use of symbols (labels and icons) as well as geo paths. The labels are set in the Aquiline font, created by Manfred Klein.

Notice the toggle layer visibility.

Now imagine that the layers were overlapping graph displays of heterogeneous information resources. Where you can increase or decrease the “noise” in the image.

Twitter Current English Lexicon

Filed under: Dataset,Lexicon,Tweets — Patrick Durusau @ 8:50 pm

Twitter Current English Lexicon

From the description:

Twitter Current English Lexicon: Based on the Twitter Stratified Random Sample Corpus, we regularly extract the Twitter Current English Lexicon. Basically, we’re 1) pulling all tweets from the last three months of corpus entries that have been marked as “English” by the collection process (we have to make that call because there is no reliable means provided by Twitter), 2) removing all #hash, @at, and http items, 3) breaking the tweets into tokens, 4) building descriptive and summary statistics for all token-based 1-grams and 2-grams, and 5) pushing the top 10,000 N-grams from each set into a database and text files for review. So, for every top 1-gram and 2-gram, you know how many times it occurred in the corpus, and in how many tweets (plus associated percentages).

This is an interesting set of data, particularly when you compare it with a “regular” English corpus, something traditional like the Brown Corpus. Unlike most corpora, the top token (1-gram) for Twitter is “i” (as in me, myself, and I), there are a lot of intentional misspellings, and you find an undue amount of, shall we say, “callus” language (be forewarned). It’s a brave new world if you’re willing.

To use this data set, we recommend using the database version and KwicData, but you can also use the text version. Download the ZIP file you want, unzip it, then read the README file for more explanation about what’s included.

I grabbed a copy yesterday but haven’t had the time to look at it.

Twitter feed pipeline software you would recommend?

Stash

Filed under: Cache Invalidation,node-js,Redis — Patrick Durusau @ 8:49 pm

Stash by Nate Kohari.

From the post:

Stash is a graph-based cache for Node.js powered by Redis.

Warning! Stash is just a mental exercise at this point. Feedback is very much appreciated, but using it in production may cause you to contract ebola or result in global thermonuclear war.

Overview

“There are only two hard things in computer science: cache invalidation and naming things.”
— Phil Karlton

One of the most difficult parts about caching is managing dependencies between cache entries. In order to reap the benefits of caching, you typically have to denormalize the data that’s stored in the cache. Since data from child items is then stored within parent items, it can be challenging to figure out what entries to invalidate in the cache in response to changes in data.

As Nate says, a thought experiment but an interesting one.

From a topic map perspective, I don’t know that I would consider cache invalidation and naming things as two distinct problems. Or rather, the same problem under different constraints.

If you don’t think “cache invalidation” is related to naming, what sort of problem is it when a person’s name changes upon marriage? Isn’t a stored record “cached?” May not be cache in the sense of the cache in an online service or chip, but those are the special cases aren’t they?

Induction

Filed under: Database,Query Language,Visualization — Patrick Durusau @ 8:49 pm

Induction: A Polyglot Database Client for Mac OS X

From the post:

Explore, Query, Visualize

Focus on the data, not the database. Induction is a new kind of tool designed for understanding and communicating relationships in data. Explore rows and columns, query to get exactly what you want, and visualize that data in powerful ways.

SQL? NoSQL? It Don’t Matter

Data is just data, after all. Induction supports PostgreSQL, MySQL, SQLite, Redis, and MongoDB out-of-the-box, and has an extensible architecture that makes it easy to write adapters for anything else you can think of. CouchDB? Oracle? Facebook Graph? Excel? Make it so!

Some commercial advice for the Induction crew:

Sounds great!

Be aware that Excel controls 75% of the BI market. I don’t know the numbers for Oracle products generally, but suspect “enterprise” and “Oracle” are most often heard together. I would make those “out of the box” even before 1.0.

If this is visualization, integration can’t be far behind.

On the Power of HBase Filters

Filed under: BigData,Filters,HBase — Patrick Durusau @ 8:49 pm

On the Power of HBase Filters

From the post:

Filters are a powerful feature of HBase to delegate the selection of rows to the servers rather than moving rows to the Client. We present the filtering mechanism as an illustration of the general data locality principle and compare it to the traditional select-and-project data access pattern.

Dealing with massive amounts of data changes the way you think about data processing tasks. In a standard business application context, people use a Relational Database System (RDBMS) and consider this system as a service in charge of providing data to the client application. How this data is processed, manipulated, shown to the user, is considered to be the full responsability of the application. In other words, the role of the data server is restricted to what is does best: efficient, safe and consistent storage and access.

The post goes on to observe:

When you deal with BigData, the data center is your computer.

True, but that isn’t the lesson I would draw from HBase Filters.

The lesson I would draw is: it is only big data until you can find the relevant data.

I may have to sift several haystacks of data but at the end of the day I want the name, photo, location, target, time frame for any particular evil-doer. That “big data” was part of the process is a fact, not a goal. Yes?

History of Information Organization (Infographic)

Filed under: Information Overload,Information Retrieval,Information Science — Patrick Durusau @ 8:49 pm

From Cartography to Card Catalogs [Infographic]: History of Information Organization

Mindjet has posted an infographic and blog post about the history of information organization. I have embedded the graphic below.

Let me preface my remarks by saying I have known people at Mindjet and it is a fairly remarkable organization. And to be fair, the history of information organization is of interest to me, although I am far from being a specialist in the field.

However, when a graphic jumps from “850 CE The First Byzantine Encyclopedia,” to “1276 CE Oldest Continuously Functioning Library” and informs the reader on the edge in between that was “3,000 years ago,” it seems to be lacking in precision or proofing, perhaps both.

Although information has to be summarized for such a presentation, I thought the rise of writing in Egypt/Sumeria would have merited a note, perhaps the library of Ashurbanipal (first library of the ancient Middle East) or the Library of Alexandria, just to name two. Noting you would have to go before Ashurbanipal to get 3,000 years ago. And there were written texts and collections of such texts for anywhere from 2,000 to 3,000 years before that.

I do appreciate that Mindjet doesn’t think information issues arose with the digital computer. I am hopeful that they will encourage a re-examination of older methods and solutions in hopes of finding clues to new solutions.

Tuple MapReduce: beyond the classic MapReduce

Filed under: MapReduce,Tuple MapReduce,Tuple-Join MapReduce,Tuples — Patrick Durusau @ 8:49 pm

Tuple MapReduce: beyond the classic MapReduce by Pere Ferrera Bertran.

From the post:

In this post we’ll review the MapReduce model proposed by Google in 2004 and propound another one called Tuple MapReduce. We’ll see that this new model is a generalization of the first and we’ll explain what advantages it has to offer. We’ll provide a practical example and conclude by discussing when the implementation of Tuple MapReduce is advisable.

In the conclusion:

In this post we have presented a new MapReduce model, Tuple MapReduce, and we have shown its benefits and virtues. We have generalized it in order to allow joins between different data sources (Tuple-Join MapReduce). We have noted that it allows the same things to be done as the MapReduce we already know, while making it much simpler to learn and use.

We believe that an implementation of Tuple MapReduce would be advisable and that it could act as a replacement for the original MapReduce. This implementation, instead of being comparable to existing high-level tools that have been created on top of MapReduce, would be comparable in efficiency to current implementations of MapReduce.

The post promises open source code in the near future.

I have to admit to being interested even without working code but that would quickly change to excitement upon successful testing of Tuple-Join MapReduce. Quite definitely the sort of mapping exercise that needs a standardized mapping language. 😉

March 7, 2012

NoSQL Matters 2012 – Speakers

Filed under: Conferences,NoSQL — Patrick Durusau @ 5:43 pm

NoSQL Matters 2012 – Speakers

NoSQL Matters – Cologne, Germany – May 29-30, 2012.

Rather than run the risk of playing favorites, I listed all the speakers for the conference. Even only one or two of them would be worth attending the conference. To have all of them together, this is a must attend type conference!

From the webpage:

Key-Note

  • Luca Garulli – From Values to Documents, from Relations to Graphs – A Survey and Guide through the unexhausted areas of NoSQL
  • Doug Judd – Scaling in a Non-Relational World

Overview

  • Dirk Bartels – NoSQL. A Technology for Real Time Enterprise Applications?
  • Pavlo Baron – DistributedDB (Playfully Illustrated)
  • Peter Idestam-Almquist – NewSQL Database for New Real-Time Applications
  • Tim Lossen – From MySQL to NoSQL to „Nothing“
  • Daniel McGrath – Rocket U2 Databases & The MultiValue Model
  • Martin Scholl – NoSQL: Back to the Future or It Is Simply Yet Another Database Feature?

Specific Databases

  • Jonathan Ellis – Apache Cassandra: Real-World Scalability, Today
  • Muharem Hrnjadovic – MongoDB Sharding
  • Doug Judd – Hypertable
  • Jan Lehnardt – The No-Marketing Bullshit Introduction to Couchbase Server 2.0
  • Mathias Meyer – RIAK
  • Salvatore Sanfillipo – Redis
  • Martin Schönert – AvocadoDB

Graph

  • Luca Garulli – Design your Application Using Persistent Graphs and OrientDB
  • Peter Neubauer – Neo4J, Gremlin, Cypher: Graph Processing for All
  • Pere Urbon-Bayes – From Tables to Graph. Recommendation Systems, a Graph Database Use Case Analysis

Application

  • Timo Derstappen – NoSQL: Not Only a Fairy Tale
  • Chris Harris – Building Hybrid Applications with MongoDB, RDBMS & Hadoop
  • Alex Morgner – structr – A CMS Implementation Based On a Graph Database

Other

  • Olaf Bachman – NoNoSQL@Google
  • Matt Casters – Crazy NoSQL Data Integration with Pentaho
  • Vincent Delfosse – UML As a Schema Candidate for NoSql
  • Oliver Gierke – Data Access 2.0? Please Welcome: Spring Data!
  • Alexandre Morgaut – Wakanda: NoSQL for Model-Driven Web Applications
  • Bernd Ocklin – MySQL Cluster: The Realtime Database You Haven’t Heard About

Datomic

Filed under: Data,Database,Datomic — Patrick Durusau @ 5:43 pm

Michael Popescu (myNoSQL) has a couple of posts on resources for Datomic.

Intro Videos to Datomic and Datomic Datalog

and,

Datomic: Distributed Database Designed to Enable Scalable, Flexible and Intelligent Applications, Running on Next-Generation Cloud Architectures

I commend the materials you will find there but the white paper in particular, which has the following section:

ATOMIC DATA – THE DATOM

Once you are storing facts, it becomes imperative to choose an appropriate granularity for facts. If you want to record the fact that Sally likes pizza, how best to do so? Most databases require you to update either the Sally record or document, or the set of foods liked by Sally, or the set of likers of pizza. These kind of representational issues complicate and rigidify applications using relational and document models. This can be avoided by recording facts as independent atoms of information. Datomic calls such atomic facts ‘datoms‘. A datom consists of an entity, attribute, value and transaction (time). In this way, any of those sets can be discovered via query, without embedding them into a structural storage model that must be known by applications.

In some views of granularity, the datom “atom” looks like a four-atom molecule to me. 😉 Not to mention that entities/attributes and values can have relationships that don’t involve each other.

Batch Importer – Neo4j

Filed under: CSV,Neo4j,SQL — Patrick Durusau @ 5:43 pm

By Max De Marzi.

From part 1:

Data is everywhere… all around us, but sometimes the medium it is stored in can be a problem when analyzing it. Chances are you have a ton of data sitting around in a relational database in your current application… or you have begged, borrowed or scraped to get the data from somewhere and now you want to use Neo4j to find how this data is related.

Batch Importer – Part 1: CSV files.

Batch Importer – Part 2: Use of SQL to prepare files for import.

What other importers would you need for Neo4j? Or would you use CSV as a target format for loading into Neo4j?

JavaScript Console and Excel Coming to Hadoop

Filed under: Excel,Hadoop,Javascript — Patrick Durusau @ 5:42 pm

JavaScript Console and Excel Coming to Hadoop

Alex Popescu (myNoSQL) has pointers to news of Hadoop on Windows Azure. Opens Hadoop up to Javascript developers and Excel/PowerPivot users.

Alex captures the winning strategy for new technologies when he says:

Think of integration with familiar tools and frameworks as a huge adoption accelerator.

What would it look like to add configurable merging on PowerPivot? (I may have to get a copy of MS Office 2010.)

Mathics

Filed under: Mathematics,Mathics — Patrick Durusau @ 5:42 pm

Mathics

From the website:

Mathics is a free, general-purpose online computer algebra system featuring Mathematica-compatible syntax and functions. It is backed by highly extensible Python code, relying on SymPy for most mathematical tasks and, optionally, Sage for more advanced stuff.

A general mathematics package that self-describes some of its needs as folows:

Apart from performance issues, new features like 3D graphics and more functions in various mathematical fields like calculus, number theory, or graph theory are still to be added. (http://www.mathics.net/doc/manual/introduction/what-is-missing/)

As you explore graphs and other structures, you might want to consider contributing to this project.

JgraphT

Filed under: Graphs,JgraphT — Patrick Durusau @ 5:42 pm

JgraphT

From the webpage:

JGraphT is a free Java graph library that provides mathematical graph-theory objects and algorithms. JGraphT supports various types of graphs including:

  • directed and undirected graphs.
  • graphs with weighted / unweighted / labeled or any
    user-defined edges.
  • various edge multiplicity options, including: simple-graphs,
    multigraphs, pseudographs.
  • unmodifiable graphs – allow modules to provide
    "read-only" access to internal graphs.
  • listenable graphs – allow external listeners to track modification events.
  • subgraphs graphs that are auto-updating subgraph views on
    other graphs.
  • all compositions of above graphs.

Although powerful, JGraphT is designed to be simple and type-safe (via Java generics). For example, graph vertices can be of any objects. You can create graphs based on: Strings, URLs, XML documents, etc; you can even create graphs of graphs! This code example shows how.

Other features offered by JGraphT:

Suggestions of other graph exploration software?

python-graph

Filed under: Graphs,Python-Graph — Patrick Durusau @ 5:41 pm

python-graph by Pedro Matiello.

From the webpage:

python-graph is a library for working with graphs in Python.

This software provides a suitable data structure for representing graphs and a whole set of important algorithms.

…..

Provided features and algorithms:

  • Support for directed, undirected, weighted and non-weighted graphs
  • Support for hypergraphs
  • Canonical operations
  • XML import and export
  • DOT-Language import and export (for usage with Graphviz)
  • Random graph generation
  • Accessibility (transitive closure)
  • Breadth-first search
  • Critical path algorithm
  • Cut-vertex and cut-edge identification
  • Cycle detection
  • Depth-first search
  • Gomory-Hu cut-tree algorithm
  • Heuristic search (A* algorithm)
  • Identification of connected components
  • Maximum-flow / Minimum-cut (Edmonds-Karp algorithm)
  • Minimum spanning tree (Prim's algorithm)
  • Mutual-accessibility (strongly connected components)
  • Pagerank algorithm
  • Shortest path search (Dijkstra's algorithm)
  • Shortest path search (Bellman-Ford algorithm)
  • Topological sorting
  • Transitive edge identification

Python package to help you explore graphs and algorithms on graphs.

py2neo 1.1.0

Filed under: Neo4j,py2neo — Patrick Durusau @ 5:41 pm

py2neo 1.1.0 – Python bindings to Neo4j by Nigel Small.

From the webpage:

The py2neo project provides bindings between Python and Neo4j via its RESTful web service interface. It attempts to be both Pythonic and consistent with the core Neo4j API and is compatible with Python 3.

elasticsearch: Search made easy for (web) developers

Filed under: ElasticSearch — Patrick Durusau @ 5:41 pm

elasticsearch: Search made easy for (web) developers by Alexander Reelsen.

Nothing particularly new or startling but does cover elasticsearch from a perspective that may be of interest to web developers.

I don’t think search is “easy” for web developers or any other community. Some times, however, tools don’t get in the way as much as others.

« Newer PostsOlder Posts »

Powered by WordPress