Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 22, 2012

Teiid (8.2 Final Released!) [Component for TM System]

Filed under: Data Integration,Federation,Information Integration,JDBC,SQL,Teiid,XQuery — Patrick Durusau @ 11:16 am

Teiid

From the homepage:

Teiid is a data virtualization system that allows applications to use data from multiple, heterogenous data stores.

Teiid is comprised of tools, components and services for creating and executing bi-directional data services. Through abstraction and federation, data is accessed and integrated in real-time across distributed data sources without copying or otherwise moving data from its system of record.

Teiid Parts

  • Query Engine: The heart of Teiid is a high-performance query engine that processes relational, XML, XQuery and procedural queries from federated datasources.  Features include support for homogenous schemas, hetrogenous schemas, transactions, and user defined functions.
  • Embedded: An easy-to-use JDBC Driver that can embed the Query Engine in any Java application. (as of 7.0 this is not supported, but on the roadmap for future releases)
  • Server: An enterprise ready, scalable, managable, runtime for the Query Engine that runs inside JBoss AS that provides additional security, fault-tolerance, and administrative features.
  • Connectors: Teiid includes a rich set of Translators and Resource Adapters that enable access to a variety of sources, including most relational databases, web services, text files, and ldap.  Need data from a different source? A custom translators and resource adaptors can easily be developed.
  • Tools:

Teiid 8.2 final was released on November 20, 2012.

Like most integration services, not strong on integration between integration services.

Would make one helluva component for a topic map system.

A system with an inter-integration solution mapping layer in addition to the capabilities of Teiid.

June 22, 2012

Wanted: Your Help in Testing Neo4j JDBC Driver

Filed under: JDBC,Neo4j — Patrick Durusau @ 4:00 pm

Wanted: Your Help in Testing Neo4j JDBC Driver

The Neo4j team requests your assistance in testing the Neo4j JDBC driver.

Or at least you will find that if you jump down to: Next stop: Public Testing in that post. (Single topic posts would make pointing to such calls easier.)

A good opportunity to do some testing (over the weekend) and contribute to the project.

Instructions are given.

December 15, 2011

Neo4j JDBC driver

Filed under: JDBC,Neo4j — Patrick Durusau @ 7:51 pm

Neo4j JDBC driver

From the webpage:

This is a first attempt at creating a JDBC driver for the graph database Neo4j. While Neo4j is a graph database, and JDBC is based on the relational paradigm, this driver provides a way to bridge this gap.

This is done by introducing type nodes in the graph, which are directly related to the root node by the relationship TYPE. Each type node has a property “type” with its name (i.e. “tablename), and HAS_PROPERTY relationships to nodes that represent the properties that the node can have (i.e. “columns”). For each instance of this type (i.e. “row”) there is a relationship from the instance to the type node via the IS_A relationship. By using this structure the JDBC driver can mimic a relational database, and provide a means to execute queries against the Neo4j server.

Now that isn’t something you see everyday! 😉

What if there were a GrJDBC driver? A Graph JDBC driver? Such that it views tables, rows, columns, column headers, cells, values, as graph nodes with defined properties? Read from a configuration file that identifies some database:table.

Extending the recovery of investment in large relational clusters by endowing them with graph-like capabilities (dare I say topic map like capabilities?) would be a real plus in favor of adoption. Not to mention that in read-only mode, you could demonstrate it with the client’s data.

Contrast that with all the stammering from your competition about the need to convert, etc.

I will poke around because it seems like something like that has been done but it was a long time ago. I seem to remember it wasn’t a driver but a relational database built as a graph. The same principles should apply. If I find it I will post a link (if online) or a citation to the hard copy.

December 2, 2011

Cassandra Drivers

Filed under: Cassandra,JDBC,Python — Patrick Durusau @ 4:50 pm

Cassandra Drivers

I not more write about the new release of Cassandra than I see a post about Python DB and JDBC drivers for Cassandra!

Enjoy!

November 20, 2011

Jeff Hammerbacher on Experiences Evolving a New Analytical Platform

Filed under: Crunch,Dremel,Dryad,Flume,Giraph,HBase,HDFS,Hive,JDBC,MapReduce,ODBC,Oozie,Pregel — Patrick Durusau @ 4:21 pm

Jeff Hammerbacher on Experiences Evolving a New Analytical Platform

Slides from Jeff’s presentation and numerous references, including to a live blogging summary by Jeff Dalton.

In terms of the new analytical platform, I would strongly suggest that you take Cloudera’s substrate:

Cloudera starts with a substrate architecture of Open Compute commodity Linux servers configured using Puppet and Chef and coordinated using ZooKeeper. Naturally this entire stack is open-source. They use HFDS and Ceph to provide distributed, schema-less storage. They offer append-only table storage and metadata using Avro, RCFile, and HCatalog; and mutable table storage and metadata using HBase. For computation, they offer YARN (inter-job scheduling, like Grid Engine, for data intensive computing) and Mesos for cluster resource management; MapReduce, Hamster (MPI), Spark, Dryad / DryadLINQ, Pregel (Giraph), and Dremel as processing frameworks; and Crunch (like Google’s FlumeJava), PigLatin, HiveQL, and Oozie as high-level interfaces. Finally, Cloudera offers tool access through FUSE, JDBC, and ODBC; and data ingest through Sqoop and Flume.

Rather than asking the usual questions, how to make this faster, more storage, etc., all of which are important, ask the more difficult questions:

  1. In or between which of these elements, would human analysis/judgment have the greatest impact?
  2. Would human analysis/judgment be best made by experts or crowds?
  3. What sort of interface would elicit the best human analysis/judgment? (visual/aural; contest/game/virtual)
  4. Performance with feedback or homeostasis mechanisms?

That is a very crude and uninformed starter set of questions.

Putting higher speed access to more data with better tools at our fingertips expands the questions we can ask of interfaces and our interaction with the data. (Before we ever ask questions of the data.)

Powered by WordPress