Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 13, 2011

RecordBreaker: Automatic structure for your text-formatted data

Filed under: Data Analysis,Data Mining — Patrick Durusau @ 7:30 pm

RecordBreaker: Automatic structure for your text-formatted data

From the post:

This post was contributed by Michael Cafarella, an assistant professor of computer science at the University of Michigan. Mike’s research interests focus on databases, in particular managing Web data. Before becoming a professor, he was one of the founders of the Nutch and Hadoop projects with Doug Cutting. This first version of RecordBreaker was developed by Mike in conjunction with Cloudera.

RecordBreaker is a project that automatically turns your text-formatted data (logs, sensor readings, etc) into structured data, without any need to write parsers or extractors. In particular, RecordBreaker targets Avro as its output format. The project’s goal is to dramatically reduce the time spent preparing data for analysis, enabling more time for the analysis itself.

No quite “automatic” but a step in that direction and a useful one.

Think of “automatic” identification of subjects and associations in such files.

Like the files from campaign financing authorities.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress