How to get Neo4J Graph database in GWT with Eclipse running
A short screencast of first steps with Neo4J, GWT and Eclipse.
How to get Neo4J Graph database in GWT with Eclipse running
A short screencast of first steps with Neo4J, GWT and Eclipse.
Argumentation mining by Raquel Mochales and Marie-Francine Moens, Artificial Intelligence and Law Volume 19, Number 1, 1-22, DOI: 10.1007/s10506-010-9104-x.
Abstract:
Argumentation mining aims to automatically detect, classify and structure argumentation in text. Therefore, argumentation mining is an important part of a complete argumentation analyisis, i.e. understanding the content of serial arguments, their linguistic structure, the relationship between the preceding and following arguments, recognizing the underlying conceptual beliefs, and understanding within the comprehensive coherence of the specific topic. We present different methods to aid argumentation mining, starting with plain argumentation detection and moving forward to a more structural analysis of the detected argumentation. Different state-of-the-art techniques on machine learning and context free grammars are applied to solve the challenges of argumentation mining. We also highlight fundamental questions found during our research and analyse different issues for future research on argumentation mining.
I mention this for two reasons.
First, a close friend of mine thinks tracking argumentation is a way to guide diverse audiences into useful discussions about globally important issues. On the other hand, I have observed as few as seven or eight committee members be unable to find a common place for a lunch break. Perhaps some decisions are harder than others. 😉
Second, and perhaps more pragmatically, I think identification of arguments in texts are an important part of textual analysis, from a scholarly perspective. Any tool that can play the role of assistant in that task, is of interest to me.
New and Emerging Legal Infrastructures Conference (NELIC)
From the post:
The New and Emerging Legal Infrastructures Conference (NELIC) was held April 15, 2011 at Berkeley Law School in Berkeley, CA. It brought together the lawyers, entrepreneurs, and technologists who are working to build the next biggest disruptive technologies in the legal industry.
The aim of the conference was to provide a meeting point for a deep and substantive discussion about the long-term impact of these technologies, and how they might come to be broadly adapted in the industry as a whole. It tackled the topics of quantitative legal prediction, legal automation, legal finance, the design of user-facing interfaces that make it possible for laypeople to manage the law, and startups in the legal industry.
The entire conference is available on videos.
Looks like a good resource for finding places where topic maps would make a substantive contribution.
Building a better legal search engine, part 1: Searching the U.S. Code
From the post:
As I mentioned last week, I’m excited to give a keynote in two weeks on Law and Computation at the University of Houston Law Center alongside Stephen Wolfram, Carl Malamud, Seth Chandler, and my buddy Dan from CLS. The first part in my blog series leading up to this talk will focus on indexing and searching the U.S. Code with structured, public domain data and open source software.
He closes with:
Stay tuned next week for the next part in the series. I’ll be using Apache Mahout to build an intelligent recommender system and cluster the sections of the Code.
It won’t pull the same audience share as the “Who shot J.R.?” episode of Dallas, but I have to admit I’m interested in the next part of this series. 😉
Revealing the true challenges in fighting bank fraud
From the Infoglde blog:
The results of the survey are currently being compiled for general release, but it was extremely interesting to learn that the key challenges of fraud investigations include:
1. the inability to access data due to privacy concerns
2. a lack of real-time high performance data searching engine
3. and an inability to cross-reference and discover relationships between suspicious entities in different databases.
For regular readers of this blog, it comes as no surprise that identity resolution and entity analytics technology provides a solution to those challenges. An identity resolution engine glides across the different data within (or perhaps even external to) a bank’s infrastructure, delivering a view of possible identity matches and non-obvious relationships or hidden links between those identities… despite variations in attributes and/or deliberate attempts to deceive. (emphasis added)
It being an Infoglide blog, guess who they think has an identity resolution engine?
I looked at the data sheet on their Identity Resolution Engine.
I have a question:
If two separate banks are using “Identity Resolution Engine” have built up data mappings, on what basis do I merge those mappings, assuming there are name conflicts in the data mappings as well as in the data proper?
In an acquisition, for example, I should be able to leverage existing data mappings.
Marcel Caraciolo says:
In this post I will introduce three metrics widely used for evaluating the utility of recommendations produced by a recommender system : Precision , Recall and F-1 Score. The F-1 Score is slightly different from the other ones, since it is a measure of a test’s accuracy and considers both the precision and the recall of the test to compute the final score.
Recommender systems are quite common and you are likely to encounter them while deploying topic maps. (Or you may wish to build one as part of a topic map system.)
Where am I? Techniques for wayfinding and navigation in faceted search
Tony Russell-Rose has a really nice review of wayfinding and navigation.
You will still have to test your UI with ordinary users (not fellow developers) but this should give you some good ideas to build upon.
Top 25 Blogs for Editing Geeks
Something for those of us who are concerned with documentation and standards.
There is a lot of documentation, not to mention standards, in the topic maps area that could use attention.
Rather ironic that documentation for topic maps should be sub-par since the topic maps adventure started off as a software documentation project.
Perhaps getting our own house in order might make topic maps more appealing to others as well as giving all of us better documentation for existing topic map applications.
The 5th International Joint Conference on Natural Language Processing (IJCNLP2011)
May 20, 2011, submission deadline
From the announcement:
The 5th International Joint Conference on Natural Language Processing, organized by the Asian Federation of Natural Language Processing will be held in Chiang Mai, Thailand on November 8-13, 2011. The conference will cover a broad spectrum of technical areas related to natural language and computation. IJCNLP 2011 will include full papers, short papers, oral presentations, poster presentations, demonstrations, tutorials, and workshops.
SPARQL by Example: The Cheatsheet
Good introductory materials.
Recall that MaJorToM and Maiana both support SPARQL queries.
Neo4j and real world scenarios
Thu, May 19, 2011 3:00 PM – 4:00 PM GMT
From the Webinar registration form:
Graph databases are designed to deal with big amounts of complex data structures in a transactional and performant manner. This Webinar is going to give an introduction to the data model and the Neo4j Graph Database, and walk you through some of the application domains where graphs are used in real deployments.
Lily 1.0: Smart Data, at Scale, made Easy
From the blog entry:
We’re really proud to release the first official major release of Lily – our flagship repository for scalable data and content management, after 18 months of intense engineering work. Along this event, we are also launching our commercial Lily services, and announcing some early-stage customers and partners. We’re thrilled being first to launch the first open source, general-purpose, highly-scalable yet flexible data repository based on NOSQL/BigData technology: read all about it below.
What
Lily is Smart Data, at Scale, made Easy. Lily is a data and content repository made for the Age of Data: it allows you to store and manage vast amounts of data, and in the future will allow you to monetize user interactions by tracking and analyzing audience data.
Lily makes Big Data easy with a high-level, developer-friendly data model with rich types, versioning and schema management. Lily offers simple Java and REST APIs for creating, reading and managing data. Its flexible indexing mechanism supports interactive and batch-oriented index maintenance.
Lily is the foundation for any large-scale data-centric application: social media, e-commerce, large content management applications, product catalogs, archiving, media asset management: any data-centric application with an ambition to scale beyond a single-server setup. Don’t focus on scale and infrastructure: we’ll do that for you while you can focus on real differentiators.
Lily is dead serious about Scale. The Lily repository has been tested to scale beyond any common content repository technology out there, due to its inherently distributed architecture, providing economically affordable, robust, and high-performing data management services for any kind of enterprise application.
NoSQL databases for the .NET developer: What’s the fuss all about?
Date: May 24 2011 – 2:00pm – 3:00pm EST
http://www.regonline.com/970013
From the post:
NOSQL (Not Only SQL) databases are one of the hottest technology trends in the software industry. Ranging from web companies like Facebook, Foursquare, Twitter to IT power houses such as the US Federal Government, Banks or NASA; the number of companies that invest in the NOSQL paradigm as part of their infrastructure is growing exponentially. What is this NOSQL movement? What are the different types of NOSQL databases? What are the real advantages, challenges and ROIs? Can we leverage NOSQL databases from my .NET applications? This webinar will present an overview of the NOSQL movement from the perspectives of a .NET developer. We will explore the different types of NOSQL databases as well as their .NET interfaces. Finally, we will present a series of real world examples that illustrate how other companies have taken advantage of NOSQL databases as part of their infrastructure.
Are you ready to leverage a NoSQL database from inside a topic map .Net application?
Whenever I hear the TMRM referred to or treated like a data model, I feel like saying in a Darth Vader type voice:
If the TMRM is a data model, then where are its data types?
It is my understanding that data models, legends in TMRM-speak, define data types on which they base declarations of equivalence (in terms of the subjects represented).
Being somewhat familiar with the text of the TMRM, or at least the current draft, I don’t see any declaration of data types in the TMRM.
Nor do I see any declarations of where the recursion of keys ends. Another important aspect of legends.
Nor do I see any declarations of equivalence (on the absent data types).
Yes, there is an abstraction of a path language, which would depend upon the data types and recursion through keys and values, but that is only an abstraction of a path language. It awaits declaration of data types, etc., in order to be an implementable path language.
There is a reason for the TMRM being written at that level of abstraction. To support any number of legends, written with any range of data types and choices with regard to the composition of those data types and subsequently the paths supported.
Any legend is going to make those choices and they are all equally valid if not all equally useful for some use cases. Every legend closes off some choices and opens up others.
For example, in bioinformatics, why would I want to do the subjectIdentifier/subjectLocator shuffle when I am concerned with standard identifiers for genes for example?
BTW, before anyone rushes out to write the legend syntax, realize that its writing results in subjects that could also be the targets of topic maps with suitable legends.
It is important that syntaxes be subjects, for a suitable legend, because syntaxes come and go out of fashion.
The need to merge subjects represented by those syntaxes, however, awaits only the next person with a brilliant insight.
Functional thinking: Thinking functionally, Part 1
Summary:
Functional programming has generated a recent surge of interest with claims of fewer bugs and greater productivity. But many developers have tried and failed to understand what makes functional languages compelling for some types of jobs. Learning the syntax of a new language is easy, but learning to think in a different way is hard. In the first installment of his Functional thinking column series, Neal Ford introduces some functional programming concepts and discusses how to use them in both Java™ and Groovy.
The first in a series of articles on functional programming by Neal Ford.
From the article:
Welcome to Functional thinking. This series explores the subject of functional programming but isn’t solely about functional programming languages. As I’ll illustrate, writing code in a “functional” manner touches on design, trade-offs, different reusable building blocks, and a host of other insights. As much as possible, I’ll try to show functional-programming concepts in Java (or close-to-Java languages) and move to other languages to demonstrate capabilities that don’t yet exist in the Java language. I won’t leap off the deep end and talk about funky things like monads (see Resources) right away (although we’ll get there). Instead, I’ll gradually show you a new way of thinking about problems (which you’re already applying in some places — you just don’t realize it yet).
A functional approach to topic maps has been around since the first edition of the topic maps standard announced:
The two or more topic links may be merged, and/or applications may process and/or render them as if they have been merged. (ISO/IEC 13250:2000, 5.2.1 Topic Link Architectural Form)
I read that to mean, merging without side effects.
Functional merging, that is merging without side effects, is useful for:
Thoughts/suggestions?
From the website:
Learn Clojure faster, use Clojure wisely.
Instant access to documentation, source, lovingly-crafted conceptual relationships, and a dynamic visualization of how everything ties together.
Because a great language deserves to be paired with a great way to understand it.
For viewing:
Clojure Atlas requires a “modern” browser. Specifically, Chrome, Safari, Internet Explorer 9, or Firefox 3+ (in that order of preference; Firefox is sadly quite the dog when it comes to SVG).
BTW, years ago I suggested to someone that topic maps would be a great way to visualize Java in general and in particular with regard to its application in programs.
The response:
Why do that? We have Javadocs.
The Clojure Atlas is my somewhat belated response.
Imagine being able to drill down from the Clojure language to examples in running code?
Or drilling up from running code to the Clojure language?
Or stopping along the way to see other running code with the same concepts?
I don’t know if those sort of features are planned for the Clojure Atlas, but they certainly are possible with topic maps.
I am a bit uncertain about this site.
At present it has mostly materials from Aster Data, the sponsor of the site.
Not that commercial sponsorship bothers me, any more than commercial vendors do. Commercial vendors being the people who sponsor standards writing, software projects and provide employment for those who participate in both.
Perhaps over time the site will grow to offer a wider range of materials on mapreduce.
That would be a real service to the community and would enable this site to function as a common ground.
TMQL4J Documentation and Tutorials
With all the activity on TMQL, consoles and the like, thought it would be good to draw attention to the TMQL4J documentation.
If anyone is interested in writing additional beginner articles or tutorials, this would be a good place to start.
5 Free B-Books and Tutorials on Scala
From ReadWriteHack, a listing of books and tutorials for Scala.
One correction.
Programming in Scala, first edition, by Martin Odersky, Lex Spoon, and Bill Venners, is freely available.
There is a second edition out, which isn’t free.
Introducing Druid: Real-Time Analytics at a Billion Rows Per Second
A general overview of Druid and the choices that led up to it.
The next post is said to have details about the architecture, etc.
From what I read here, the holding of all data in memory is one critical part of the solution.
That and having data that can be held in smallish cells.
Tossing blobs, ASCII or binary, into cells, might cause a problem.
Won’t know until the software is available for use by a diverse audience.
I mention it here as an example of defining data sets and requirements in such a way that scalable architectures can be developed, for that particular set of requirements.
There is nothing wrong with having a solution that works best for a particular application.
Ballpoint pens are wonderful writing devices but fail miserably as hammers.
A software or technology solutions that works for your problem is far more valuable than software that solves the general case but not yours.
What are the Differences between Bayesian Classifiers and Mutual-Information Classifiers?
I am sure we have all laid awake at night worrying about this question at some point. 😉
Seriously, the paper shows that Bayesian and mutual information classifiers compliment each other in classification roles and merits your attention.
Abstract:
In this study, both Bayesian classifiers and mutual information classifiers are examined for binary classifications with or without a reject option. The general decision rules in terms of distinctions on error types and reject types are derived for Bayesian classifiers. A formal analysis is conducted to reveal the parameter redundancy of cost terms when abstaining classifications are enforced. The redundancy implies an intrinsic problem of “non-consistency” for interpreting cost terms. If no data is given to the cost terms, we demonstrate the weakness of Bayesian classifiers in class-imbalanced classifications. On the contrary, mutual-information classifiers are able to provide an objective solution from the given data, which shows a reasonable balance among error types and reject types. Numerical examples of using two types of classifiers are given for confirming the theoretical differences, including the extremely-class-imbalanced cases. Finally, we briefly summarize the Bayesian classifiers and mutual-information classifiers in terms of their application advantages, respectively.
After detailed analysis, which will be helpful in choosing appropriate situations for the use of Bayesian or mutual information classifiers, the paper concludes:
Bayesian and mutual-information classifiers are different essentially from their applied learning targets. From application viewpoints, Bayesian classifiers are more suitable to the cases when cost terms are exactly known for trade-off of error types and reject types. Mutual-information classifiers are capable of objectively balancing error types and reject types automatically without employing cost terms, even in the cases of extremely class-imbalanced datasets, which may describe a theoretical interpretation why humans are more concerned about the accuracy of rare classes in classifications.
From the website:
PoolParty is a thesaurus management system and a SKOS editor for the Semantic Web including text mining and linked data capabilities. The system helps to build and maintain multilingual thesauri providing an easy-to-use interface. PoolParty server provides semantic services to integrate semantic search or recommender systems into enterprise systems like CMS, web shops, CRM or Wikis.
I encountered PoolParty in the video Pool Party – Semantic Search.
The video elides over a lot of difficulties but what effective advertising doesn’t?
Curious if anyone is familiar with this group/product?
Slides: Pool Party – Semantic Search
Nice slide deck on semantic search issues.
The History of Search [infographic]
James Anderson has produced a history of [Internet] search infographic.
To see the full page view, The History of Search.
Interesting and probably worth having printed as a decorative poster for the office wall.
An infographic that included search techniques, both digital and analog before the Internet would be even more interesting.
Algoviz.org: The Algorithm Visualization Portal
From the website:
AlgoViz.org is a gathering place for users and developers of algorithm visualizations and animations (AVs). It is a gateway to AV-related services, collections, and resources.
An amazing resource for algorithm visualization and animations. The “catalog” has over 500 entries. Along with an annotated bibliography of papers on algorithm visualization, field reports of the use of visualizations in the classroom, forums and other resources.
Visualization of merging algorithms is going to take on increasing importance as TMCL and TMQL increase the range of merging opportunities.
Building on prior techniques and experiences with visualization seems like a good idea.
HCatalog, tables and metadata for Hadoop
HCatolog is described at its Apache site as:
Apache HCatalog is a table and storage management service for data created using Apache Hadoop.
This includes:
- Providing a shared schema and data type mechanism.
- Providing a table abstraction so that users need not be concerned with where or how their data is stored.
- Providing interoperability across data processing tools such as Pig, Map Reduce, Streaming, and Hive.
From the post:
Last month the HCatalog project (formerly known as Howl) was accepted into the Apache Incubator. We have already branched for a 0.1 release, which we hope to push in the next few weeks. Given all this activity, I thought it would be a good time to write a post on the motivation behind HCatalog, what features it will provide, and who is working on it.
Putting and Getting Data from a Database
Overview of database structures and and data operations on those structures by Marko A. Rodriguez.
Marko covers primitive, key-value, and document stores, plus graph databases.
There are a number of database structures in addition to those four, although those are certainly popular ones.
I would not put too much stock into claims about one form of technology or another until I saw it in operation with my data.
That software works great with someone else’s data isn’t all that interesting, either for you or your manager.
Prediction API: Every app a smart app by By Travis Green of the Google Prediction API Team.
From the post:
If you’re looking to make your app smarter and you think machine learning is more complicated than making three API calls, then you’re reading the right blog post.
Today, we are releasing v1.2 of the Google Prediction API, which makes it even easier for preview users to build smarter apps by accessing Google’s advanced machine learning algorithms through a RESTful web service.
I haven’t played with this but could be interested in hearing from someone who has.
I ran across Top Three Drivers of Solr Adoption and thought it might offer some lessons for driving topic map adoption.
From a survey of customers, these were the following drivers:
Having a “cute” name didn’t make the list. So much for all the various debates and recriminations over what to call products. Useful products, even with bad names, survive and possible thrive. Useless products, well-named or not, don’t.
Vendor Fatigue referred to the needless complex and sometimes over-reaching vendor agreements that seek to guarantee only particular levels of usage, etc. You really need to see the Dlibert cartoon at the post.
Very large vendors, ahem, I pass over without naming names, can rely on repeat business “just because.” Small vendors, on the other hand, should concentrate on delivering results and no so much on trying to trap customers in agreements. (You will also have lower legal fees.)
Good results = repeat business.
Flexibility referred to the ease with which Solr can be adapted to particular needs both for input and output. Topic maps have that in spades.
Stablity I think what the author meant was complexity. That is Lucene is for more complex than Solr, which makes it more difficult to maintain. Solr, like any other abstraction (compare editing with ex to vi), makes common tasks easier.
Topic maps can be as complex as need be.
But, in terms of user interfaces, successful topic map applications are going to be domain/task specific.
I say that because our views of editing/reading are so shaped by our communities, that departures from those, even if equally capable of some task, feel “unnatural.”
Shaping topic map interfaces in conversation with actual users, a fairly well documented technique, is more likely to produce a successful interface than developers guessing for days what they think is an “intuitive” interface.
Powered by WordPress