Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 3, 2012

Ontologies as Semantically Discrete Data

Filed under: Ontology,Rough Sets,Semantics — Patrick Durusau @ 5:09 pm

The contest associated with the Topical Classification of Biomedical Research Papers conference involves the use of of the domain ontology MeSH. The contest involves the classification of materials using that ontology and clustering the results. (You should read the contest description for the full details. I am only pulling out facts needed for this post, which aren’t many.)

It occurred to me that an ontology consists of a set of values that are semantically discrete. That is any value in an ontology is distinct from all other values in the ontology and there is no “almost X,” or “nearly Y,” in an ontology.

I mention this because we apply ontologies to semantically continuous domains. Such as journal articles that were written without regard to any particular ontology.

Which would also explain why given a common ontology, such as MeSH, we may disagree as to which terms to apply to a particular document. We “see” different aspects in the semantically continuous document that influence our view of what term from the semantically discrete ontology to use. And in many cases we may be in agreement.

But the fact remains that we have applied a semantically discrete instrument to a semantically continuous data set.

I suppose one question is whether rough sets can capture and preserve some semantic continuity for use in information retrieval.

What the Sumerians can teach us about data

Filed under: Data,Data Management,Marketing — Patrick Durusau @ 5:08 pm

What the Sumerians can teach us about data

Pete Warden writes:

I spent this afternoon wandering the British Museum’s Mesopotamian collection, and I was struck by what the humanities graduates in charge of the displays missed. The way they told the story, the Sumerian’s biggest contribution to the world was written language, but I think their greatest achievement was the invention of data.

Writing grew out of pictograms that were used to tally up objects or animals. Historians and other people who write for a living treat that as a primitive transitional use, a boring stepping-stone to the final goal of transcribing speech and transmitting stories. As a data guy, I’m fascinated by the power that being able to capture and transfer descriptions of the world must have given the Sumerians. Why did they invent data, and what can we learn from them?

Although Pete uses the term “Sumerians” to cover a very wide span of peoples, languages and history, I think his comment:

Gathering data is not a neutral act, it will alter the power balance, usually in favor of the people collecting the information.

is right on the mark.

There aspect of data management that we can learn from the Ancient Near East (not just the Sumerians).

Preservation of access.

It isn’t enough to simply preserve data. You can ask NASA preservation of data. (Houston, We Erased The Apollo 11 Tapes)

Particularly with this attitude:

“We’re all saddened that they’re not there. We all wish we had 20-20 hindsight,” says Dick Nafzger, a TV specialist at NASA’s Goddard Space Flight Center in Maryland, who helped lead the search team.

“I don’t think anyone in the NASA organization did anything wrong,” Nafzger says. “I think it slipped through the cracks, and nobody’s happy about it.”

Didn’t do anything wrong?

You do know the leading cause for firing of sysadmins is failure to maintain proper backups? I would hold everyone standing near a crack responsible. Would not bring the missing tapes back but it would make future generations more careful.

Considering that was only a few decades ago, how do we read ancient texts for which we have no key in English?

The ancients preserved access to their data by way of triliteral inscriptions. Inscriptions in three different languages but all saying the same thing. If you know only one of the languages you can work towards understanding the other two.

A couple of examples:

Van Fortress, with an inscription of Xerxes the Great.

Behistun Inscription, with an inscription in Old Persian, Elamite, and Babylonian.

BTW, the final image in Pete’s post is much later than the Sumerians and is one of the first cuneiform artifacts to be found. (Taylor’s Prism) It describes King Sennacherib’s military victories and dates from about 691 B.C. It is written in Neo-Assyrian cuneiform script. That script is used in primers and introductions to Akkadian.

Can I guess how many mappings you have of your ontologies or database schemas? I suppose the first question should be if they are documented at all? Then follow up with the question of about mapping to other ontologies or schemas. Such as an industry standard schema or set of terms.

If that sounds costly, consider the cost of migration/integration without documentation/mapping. Topic maps can help with the mapping aspects of such a project.

Iraq Body Count report: how many died and who was responsible?

Filed under: Dataset,News — Patrick Durusau @ 5:07 pm

Iraq Body Count report: how many died and who was responsible?

From the Guardian a very useful data set for a number of purposes. Particularly if paired with data on who was in the chain of command for various units.

It isn’t that hard to imagine a war crimes ticker for named individuals, linked to specific reports and acts. As well as more general responsibility for wars of aggression.

We will be waiting a long time for prosecutors who are dependent on particular countries for funding and support to step up and fully populate such a list with all responsible parties.

OpenData

Filed under: Dataset,Government Data — Patrick Durusau @ 5:06 pm

OpenData by Socrata

Another very large public data set collection.

Socrata developed the City of Chicago portal, which I mentioned at: Accessing Chicago, Cook and Illinois Open Data via PHP.

Mining Massive Data Sets – Update

Filed under: BigData,Data Analysis,Data Mining,Dataset — Patrick Durusau @ 5:03 pm

Mining Massive Data Sets by Anand Rajaraman and Jeff Ullman.

Update of Mining of Massive Datasets – eBook.

The hard copy has been published by Cambridge Press.

The electronic version remains available for download. (Hint, suggest all of us who can should buy a hard copy to encourage this sort of publisher behavior.)

Homework system for both instructors and self-guided study is available at this page.

While I wait for a hard copy to arrive, I have downloaded the PDF version.

What’s New in Apache Sqoop 1.4.0-incubating

Filed under: Hadoop,Sqoop — Patrick Durusau @ 9:21 am

What’s New in Apache Sqoop 1.4.0-incubating

New features and improvements in the first incubating release:

If you are interested in learning more about the changes, a complete list for Sqoop 1.4.0-incubating can be found here.  You are also encouraged to give this new release a try.  Any help and feedback is more than welcome. For more information on how to report problems and to get involved, visit the Sqoop project website at http://incubator.apache.org/sqoop/.

BTW, “Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.” (From Apache Sqoop (incubating))

January 2, 2012

Topical Classification of Biomedical Research Papers

Filed under: Bioinformatics,Biomedical,Contest,Medical Informatics,MeSH,PubMed — Patrick Durusau @ 6:36 pm

JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers

From the webpage:

JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers, is a special event of Joint Rough Sets Symposium (JRS 2012, http://sist.swjtu.edu.cn/JRS2012/) that will take place in Chengdu, China, August 17-20, 2012. The task is related to the problem of predicting topical classification of scientific publications in a field of biomedicine. Money prizes worth 1,500 USD will be awarded to the most successful teams. The contest is funded by the organizers of the JRS 2012 conference, Southwest Jiaotong University, with support from University of Warsaw, SYNAT project and TunedIT.

Introduction: Development of freely available biomedical databases allows users to search for documents containing highly specialized biomedical knowledge. Rapidly increasing size of scientific article meta-data and text repositories, such as MEDLINE [1] or PubMed Central (PMC) [2], emphasizes the growing need for accurate and scalable methods for automatic tagging and classification of textual data. For example, medical doctors often search through biomedical documents for information regarding diagnostics, drugs dosage and effect or possible complications resulting from specific treatments. In the queries, they use highly sophisticated terminology, that can be properly interpreted only with a use of a domain ontology, such as Medical Subject Headings (MeSH) [3]. In order to facilitate the searching process, documents in a database should be indexed with concepts from the ontology. Additionally, the search results could be grouped into clusters of documents, that correspond to meaningful topics matching different information needs. Such clusters should not necessarily be disjoint since one document may contain information related to several topics. In this data mining competition, we would like to raise both of the above mentioned problems, i.e. we are interested in identification of efficient algorithms for topical classification of biomedical research papers based on information about concepts from the MeSH ontology, that were automatically assigned by our tagging algorithm. In our opinion, this challenge may be appealing to all members of the Rough Set Community, as well as other data mining practitioners, due to its strong relations to well-founded subjects, such as generalized decision rules induction [4], feature extraction [5], soft and rough computing [6], semantic text mining [7], and scalable classification methods [8]. In order to ensure scientific value of this challenge, each of participating teams will be required to prepare a short report describing their approach. Those reports can be used for further validation of the results. Apart from prizes for top three teams, authors of selected solutions will be invited to prepare a paper for presentation at JRS 2012 special session devoted to the competition. Chosen papers will be published in the conference proceedings.

Data sets became available today.

This is one of those “praxis” opportunities for topic maps.

Teaching is about conveying a way of thinking

Filed under: Marketing,Teaching,Topic Maps — Patrick Durusau @ 6:33 pm

Teaching is about conveying a way of thinking by Jon Udell.

From the post:

As I build out the elmcity network, launching calendar hubs in towns and cities around the country, I’ve been gathering examples of excellent web thinking. In Ann Arbor’s public schools are thinking like the web I noted that the schools in that town — and most particularly the Slauson Middle School — are Doing It Right with respect to online calendars. How, I wondered, does that happen? How does a middle school figure out a solution that eludes most universities, theaters, city governments, nightclubs, museums, and other organizations with calendars of interest to the public?

[The Slauson Middle School principal, Chris Curtis, replied to Udell.]

I agree with the notion that the basic principles of computer science should be generalized more broadly across the curriculum. In many ways, teaching computer and technology skills courses absent a meaningful application of them is ineffective and silly. We wouldn’t teach driver’s education and not let students drive. We don’t teach a “pencil skills class” in which we learn the skills for using this technology tool without an immediate opportunity to apply the skills and then begin to consider and explore the many ways that the pencil and writing change how we organize, perceive, and interact with our world.

I really like the “pencils skills class” example, even though I can think of several readers who may say it applies to some of my writing. 😉 And they are probably right, at least in part. I have a definite preference for the theoretical side of things.

To usefully combine theory with praxis is the act of teaching.

Clickstream Data Yields High-Resolution Maps of Science

Filed under: Citation Indexing,Mapping,Maps,Visualization — Patrick Durusau @ 6:30 pm

Clickstream Data Yields High-Resolution Maps of Science Citation: Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, Chute R, et al. (2009) Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3): e4803. doi:10.1371/journal.pone.0004803.

A bit dated but interesting none the less:

Abstract

Background

Intricate maps of science have been created from citation data to visualize the structure of scientific activity. However, most scientific publications are now accessed online. Scholarly web portals record detailed log data at a scale that exceeds the number of all existing citations combined. Such log data is recorded immediately upon publication and keeps track of the sequences of user requests (clickstreams) that are issued by a variety of users across many different domains. Given these advantages of log datasets over citation data, we investigate whether they can produce high-resolution, more current maps of science.

Methodology

Over the course of 2007 and 2008, we collected nearly 1 billion user interactions recorded by the scholarly web portals of some of the most significant publishers, aggregators and institutional consortia. The resulting reference data set covers a significant part of world-wide use of scholarly web portals in 2006, and provides a balanced coverage of the humanities, social sciences, and natural sciences. A journal clickstream model, i.e. a first-order Markov chain, was extracted from the sequences of user interactions in the logs. The clickstream model was validated by comparing it to the Getty Research Institute’s Architecture and Art Thesaurus. The resulting model was visualized as a journal network that outlines the relationships between various scientific domains and clarifies the connection of the social sciences and humanities to the natural sciences.

Conclusions

Maps of science resulting from large-scale clickstream data provide a detailed, contemporary view of scientific activity and correct the underrepresentation of the social sciences and humanities that is commonly found in citation data.

An improvement over traditional citation analysis but it seems to be on the coarse side to me.

That is to say users don’t request nor do authors cite papers as a whole. In other words, there are any number of ideas in a particular paper which may merit citation and a user or author may be interested in only one.

Tracing the lineage of an idea should be getting easier, yet I have the uneasy feeling that it is becoming more difficult.

Yes?

Introduction to Hibernate Framework

Filed under: Hibernate — Patrick Durusau @ 6:26 pm

Introduction to Hibernate Framework

A series of posts on Hibernate, including:

Hibernate Tutorial Series

You may be thinking to yourself that you saw this posting at a blog aggregation site that rewrites URLs to keep you on their site. Including rewriting URLs so you can’t cleanly point to the first article in a series. You would be correct. The links you see above point to the source of the tutorial material.

If you like this sort of material, presented without trying to capture your browser, please continue to visit my site. Consider supporting it or hiring me in a consulting role (which indirectly supports it).

Using Bio4j + Neo4j Graph-algo component…

Filed under: Bio4j,Bioinformatics,Biomedical,Neo4j — Patrick Durusau @ 3:00 pm

Using Bio4j + Neo4j Graph-algo component for finding protein-protein interaction paths

From the post:

Today I managed to find some time to check out the Graph-algo component from Neo4j and after playing with it plus Bio4j a bit, I have to say it seems pretty cool.

For those who don’t know what I’m talking about, here you have the description you can find in Neo4j wiki:

This is a component that offers implementations of common graph algorithms on top of Neo4j. It is mostly focused around finding paths, like finding the shortest path between two nodes, but it also contains a few different centrality measures, like betweenness centrality for nodes.

The algorithm for finding the shortest path between two nodes caught my attention and I started to wonder how could I give it a try applying it to the data included in Bio4j.

Suggestions of other data sets where shortest path would yield interesting results?

BTW, isn’t the shortest path an artifact of the basis for nearness between nodes? Thinking that shortest path when expressed between gene fragments as relatedness would be different than physical distance. (see: Nearness key in microbe DNA swaps: Proximity trumps relatedness in influencing how often bacteria pick up each other’s genes.)

Research Tip: Conference Proceedings (ACM DL)

Filed under: Research Methods — Patrick Durusau @ 11:29 am

To verify the expansion of the acronyms for Jeff Haung’s Best Paper Awards in Computer Science [2011], I used the ACM Digital Library.

If the conference is listed under conferences in the Digital Library, following the link results in a listing of the top ten (10) paper downloads in the last six (6) weeks and the top ten (10) “most cited article” listings.

Be aware it isn’t always the most recent papers that are the most downloaded.

Another way to keep abreast of what is of interest in a particular area of computing.

Best Paper Awards in Computer Science [2011]

Filed under: Conferences,CS Lectures — Patrick Durusau @ 11:08 am

Best Paper Awards in Computer Science [2011]

Jeff Huang’s list of the best paper awards from 21 CS conferences since 1996 up to and including 2011.

Just in case you are unfamiliar with the conference abbreviations, I have expanded them below and added links to sponsoring organization’s website.

Web Search – It’s Worse Than You Think

Filed under: Search Engines,Searching — Patrick Durusau @ 9:50 am

Web Search – It’s Worse Than You Think

Matthew Hurst writes (in part):

While it seems like everything in the online space is hunky dory and progress is making predictable strides towards our inevitable AI infested future, I often see such utter failures in search engine results that makes me think we haven’t even started to lay the foundations.

Here’s the story: as I’ve become interested in mining the news cycle for various reasons, I’ve started attempting to understand who the editors of major news sources are. The current version of the Hapax Page on d8taplex tracks the attribution of article authors and editors (I conflate the concept of writer, reporter and un-typed contributors under the term ‘author’ while explicit editors are tracked separately). From this analysis, I see that there is someone called Cynthia Johnston who is often associated with articles from Reuters (in fact, she is currently at the top of the list ranked by count of articles).

You need to read his post in full to get the real flavor of his experience with the Cynthia Johnston search request.

Two quick points:

+1 to we have not laid the foundations for adequate searching. Not surprising since I don’t think we understand what adequate “searching” means in a multi-semantic context such as the WWW. Personally I don’t think we understand searching in a mono-semantic context but but is a separate issue.

As to his blog post changing the search experience for anyone seeking information on Cynthia Johnston, do we need to amend:

Observer effect (information technology)

or

Observer effect (physics)

at Wikipedia, or do we need a new subject:

Observer effect (search technology)?

Open Source As Dumping Ground?

Filed under: Open Source — Patrick Durusau @ 9:11 am

Cynthia Murrell in Big Loss Department: HP Open Sources WebOS points to Neil McAllister’s story, How HP and Open Source Can Save WebOS which says in part:

HP’s press release offers few specifics. We don’t know which open source license (or licenses) it plans to use for WebOS or what form the project’s governance will take. To its credit, HP says it is “committed to good, transparent, and inclusive governance to avoid fragmentation of the platform.” What it hasn’t said, however, is how committed it is to ongoing WebOS development.

Unfortunately, the answer might be “not very.” A month ago, HP wasn’t talking about open source; it was trying to sell off its whole Palm division, WebOS and all. Rumored bidders included Intel and Qualcomm. The catch: Any buyer would have had to agree to license WebOS back to HP at a deep discount. It seems HP may only be truly committed to the platform if it can offload the cost of developing and maintaining it.

Yet if that’s what HP hopes to achieve by opening the WebOS source, it’s bound to be disappointed. Most open source projects rely on dedicated developers to set their tone and direction, not casual contributors, and effective management of an active open source community can be difficult, time-consuming, and expensive.

I mention this as a cautionary tale about commercial products, whose sponsors suddenly “see the light” about open source software and decided to donate software to open source projects.

As the various NoSQL databases and other semantic technologies shake out over the next several years, we are likely to see more “donations” of software products. Which may be a good thing if the donating companies contribute expertise and resources to help make those projects a benefit to the entire community.

On the other hand, donating software products that fracture and drain the resources of the open source community aren’t doing the community any favors.

It would be less distracting if they would simply donate the source code and any relevant patents under an Apache license to a public repository. If there is any benefit to the open source community, someone will pick it up and run with it. If not, the open source community is not the loser.

January 1, 2012

Optimizing Findability in Lucene and Solr

Filed under: Findability,Lucene,LucidWorks,Solr — Patrick Durusau @ 6:00 pm

Optimizing Findability in Lucene and Solr

From the post:

To paraphrase an age-old question about trees falling in the woods: “If content lives in your application and you can’t find it, does it still exist?” In this article, we explore how to make your content findable by presenting tips and techniques for discovering what is important in your content and how to leverage it in the Lucene Stack.

Table of Contents

Introduction
Planning for Findability
Knowing your Content
Knowing your Users
Garbage In, Garbage Out
Analyzing your Analysis
Stemming In Greater Detail
Query Techniques for Better Search
Navigation Hints
Final Thoughts
Resources

by Grant Ingersoll

You know when a blog post starts off with a table of contents it is long. Fortunately in this case, it is also very good. By one of the principal architects of Lucene, Grant Ingersoll.

A good start on developing findability skills but as the post points out, a lot of it will depend on your knowledge of what “findability” means to your users. Only you can answer that question.

Strata: Making Data Work – Update

Filed under: Conferences,Data — Patrick Durusau @ 5:59 pm

Strata: Making Data Work – Update

Data sessions for the Strata Conference, February 28 – March 1, 2012, Santa Clara, California.

Too many to list or effectively summarize.

Conference homepage.

Big Data: It’s Not How Big It Is, It’s How You Use It

Filed under: BigData,Data — Patrick Durusau @ 5:58 pm

Big Data: It’s Not How Big It Is, It’s How You Use It

If you are thinking about how this year will shape up, this is a post to keep in mind.

At least its points on “big data” not being as meaningful as your performance with a particular data set in a given context. That may be “big data” in someone’s view but the important point being a particular result with the data, whether it is “big” or “small.”

I am less concerned with the notion that a transition to a data/information economy can be managed. It makes for interesting water cooler talk but not much more than that.

Remember, history is written by survivors, not pre-revolution visionaries.

Be a survivor, use data, big or small, to your best advantage (or that of your client).

Zorba: The Most Complete XQuery Processor

Filed under: Data Mining,XQuery — Patrick Durusau @ 5:57 pm

Zorba: The Most Complete XQuery Processor

From the homepage:

All Flavors Available

General purpose XQuery processor – written in C++.

Complete family of W3C familly of specifications: XPath, XQuery, Update, Scripting, Full-Text, XSLT, XQueryX, and more.

Pluggable Store

Seamlessly process XML data stored in different places.

Main memory, mobile devices, browsers, disk-based, or cloud-based stores.

Developer Friendly Tools

Benefit from a rich ecosystem of tools.

Eclipse plugins, command-line interface, and debugger.

Rich Module Library

Web mashups, cryptography, image processing, geo projections, emails, data cleaning… there is a module for that.

Runs Everywhere

Available on Windows, Linux, and Mac OS.

Bindings available for 6 Programming Languages: C++, C, PHP, Ruby, Java and Python.

Fun & Productive

XQuery unifies development for all tiers; database, content management, application logic, and presentation.

I started to mention this under the Cutting Edge Data Processing with PHP & XQuery post (which uses Zorba) but XQuery is important enough to list it separately.

In the draft Topic Map Tool Chain, I would put this under mining/analysis, but as was pointed out in comments, the mining/analysis phase can be informed by an ontology.

I would say “explicitly” informed by an ontology since there is always some ontology in play, whether explicit or not. (Formal ontologists, note the small “o” in ontology. An explicit ontology would have a name and be written <NAME> Ontology.

Cutting Edge Data Processing with PHP & XQuery

Filed under: PHP,XQuery — Patrick Durusau @ 5:57 pm

Cutting Edge Data Processing with PHP & XQuery

From the webpage:

PHP and XQuery have always been an happy couple and we are looking to build on that momentum. Our goal is to contribute a powerful toolkit to harness unstructured data in PHP developments. In this perspective, the first edition of the PHP Tour was a perfect fit to introduce developers with the possible interactions between PHP and XQuery. The aim of the talk was to explore the gain of functionality and productivity that can be achieved by introducing XQuery into PHP applications.

The slide deck by William Cadillion, from PHP Tour Lille 2011, will give you an idea of the capabilities of PHP and XQuery. I mention this because PHP is widely used in the library community and XQuery will make that use more productive and powerful.

Cassandra NYC 2011 Presentation Slides and Videos

Filed under: Cassandra,NoSQL — Patrick Durusau @ 5:55 pm

Cassandra NYC 2011 Presentation Slides and Videos

Almost the first half:

  • Chris Burroughs (Clearspring) – Apache Cassandra Clearspring (HD Video)
  • David Weinstein (Adobe) – Cassandra at Adobe (HD Video)
  • Drew Robb (SocialFlow) – Cassandra at Social Flow (HD Video)
  • Ed Capriolo (m6d) – Cassandra in Online Advertising (Slides and HD Video)
  • Eric Evans (Acunu) – CQL: SQL for Cassandra (Slides and HD Video)
  • Ilya Maykov (Ooyala) – Scaling Video Analytics with Apache Cassandra (Slides)
  • Joe Stein (Medialets) – Cassandra as the Central Nervous System of Your Distributed Systems (Slides and HD Video)

I count nine (9) more at the Datastax site.

Just in case you want to get started on your New Year’s resolution to learn one (or another?) NoSQL database cold.

I would amend that resolution to learn one of: DB2, Oracle, MySQL, PostgreSQL, SQL Server as well. That will enable you to make an intelligent assessment of the requirements of your projects and the capabilities of a range of storage solutions.

Gora Graduates!

Filed under: Cassandra,Hadoop,HBase,Hive,Lucene,MapReduce,Pig,Solr — Patrick Durusau @ 5:54 pm

Gora Graduates! (Incubator location)

Over Twitter I just saw a post announcing that Gora has graduated from the Apache Incubator!

Congratulations to all involved.

Oh, the project:

What is Gora?

Gora is an ORM framework for column stores such as Apache HBase and Apache Cassandra with a specific focus on Hadoop.

Why Gora?

Although there are various excellent ORM frameworks for relational databases, data modeling in NoSQL data stores differ profoundly from their relational cousins. Moreover, data-model agnostic frameworks such as JDO are not sufficient for use cases, where one needs to use the full power of the data models in column stores. Gora fills this gap by giving the user an easy-to-use ORM framework with data store specific mappings and built in Apache Hadoop support.

The overall goal for Gora is to become the standard data representation and persistence framework for big data. The roadmap of Gora can be grouped as follows.

  • Data Persistence : Persisting objects to Column stores such as HBase, Cassandra, Hypertable; key-value stores such as Voldermort, Redis, etc; SQL databases, such as MySQL, HSQLDB, flat files in local file system of Hadoop HDFS.
  • Data Access : An easy to use Java-friendly common API for accessing the data regardless of its location.
  • Indexing : Persisting objects to Lucene and Solr indexes, accessing/querying the data with Gora API.
  • Analysis : Accesing the data and making analysis through adapters for Apache Pig, Apache Hive and Cascading
  • MapReduce support : Out-of-the-box and extensive MapReduce (Apache Hadoop) support for data in the data store.

Spring Data Neo4j 2.0.0 Released

Filed under: Neo4j,Spring Data — Patrick Durusau @ 5:54 pm

Spring Data Neo4j 2.0.0 Released

From the post:

We’re happy to present you with the release of Spring Data Neo4j 2.0 as a small Christmas gift from our side. Spring Data Neo4j is based on Neo4j 1.6.M02.

The major feature of this release is the addition of a simple mapping mode (spring-data-neo4j). Just annotate your POJOs and use a GraphRepository for the usual CRUD and advanced query operations.

For graph-attached POJOs and high performance use-cases, you can employ the advanced mapping mode (spring-data-neo4j-aspects), which leverages AspectJ to enhance your domain class.

Both mapping modes use the same underlying code, which is now based on the Spring Data Commons mapping infrastructure.

We improved the Cypher graph query language support by supporting new Cypher features, adding queries derived from finder-methods to the repositories and extended the result handling conversions to include projections to mapping-interfaces, Pages and more.

See the post for more information.

Good Relationships: The Spring Data Graph Guide Book

Filed under: Neo4j,Spring Data — Patrick Durusau @ 5:53 pm

Good Relationships: The Spring Data Graph Guide Book [updated to point to stable (latest) version guide. PDF file. Other versions http://static.springsource.org/spring-data/data-neo4j/docs/]

This is an update to: Good Relationships: The Spring Data Graph Guide Book (1.0.0-BUILD-SNAPSHOT).

Since there was no date on the updated text I can’t tell you when it occurred.

And since the URL doesn’t hold the promise of uniquely identifying the current version of the document, you won’t know when it changes from the URL.

This is an excellent introduction to Spring Data Neo4j. But, It is annoying that so much web based information lacks dates or other versioning information. It is as though it exists in an eternal “now” with no past to speak of. At least outside of version control systems. Not likely to happen so enjoy reading this introduction to Spring Data Neo4j.


PS: Apologies for the inline revision but using the stable URL may help some future reader.

60 Months, Minimal Search Progress

Filed under: Search Requirements,Searching — Patrick Durusau @ 5:52 pm

60 Months, Minimal Search Progress

Stephen E Arnold revises his August 2005 observation:

The truth is that nothing associated with locating information is cheap, easy or fast.

to read:

The truth is that nothing associated with locating information is accurate, cheap, easy or fast.

Which reminds me of the project triangle, where the choices are cheap, fast, good and you can pick any two.1.

In fact, I created an Euler diagram of the four choices:

I got stuck when it came to adding “easy.”

In part because I don’t know “easy” for who? Easy for the search engine user? Easy for the end-user?

If easy for the end-user, is that a continuum? If so, what lies at both ends?

Having a single text box may be “easy” for the end-user but how does that intersect with “accurate?”

Suggestions? Pen is in your hand now.


1. PMI has withdrawn the 50 year old triangle on the basis that project’s have more constraints that interact than just three. On which see: The Death of the Project Management Triangle by Ben Synder.

« Newer Posts

Powered by WordPress