Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 15, 2012

Apache Nutch v2.0 Release

Filed under: Nutch,Search Engines — Patrick Durusau @ 10:18 am

Apache Nutch v2.0 Release

From the post:

The Apache Nutch PMC are very pleased to announce the release of Apache Nutch v2.0. This release offers users an edition focused on large scale crawling which builds on storage abstraction (via Apache Gora™) for big data stores such as Apache Accumulo™, Apache Avro™, Apache Cassandra™, Apache HBase™, HDFS™, an in memory data store and various high profile SQL stores. After some two years of development Nutch v2.0 also offers all of the mainstream Nutch functionality and it builds on Apache Solr™ adding web-specifics, such as a crawler, a link-graph database and parsing support handled by Apache Tika™ for HTML and an array other document formats. Nutch v2.0 shadows the latest stable mainstream release (v1.5.X) based on Apache Hadoop™ and covers many use cases from small crawls on a single machine to large scale deployments on Hadoop clusters. Please see the list of changes

http://www.apache.org/dist/nutch/2.0/CHANGES.txt made in this version for a full breakdown..

A full PMC release statement can be found below:

http://nutch.apache.org/#07+July+2012+-+Apache+Nutch+v2.0+Released

Nutch v2.0 is available in source (zip and tar.gz) from the following download page: http://www.apache.org/dyn/closer.cgi/nutch/2.0

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress