From the website:
The LDSpider project aims to build a web crawling framework for the linked data web. Requirements and challenges for crawling the linked data web are different from regular web crawling, thus this projects offer a web crawler adapted to traverse and harvest sources and instances from the linked data web. We offer a single jar which can be easily integrated into own applications.
Features:
- Content Handlers for different formats
- Different crawling strategies
- Crawling scope
- Output formats
Content handlers, crawling strategies, crawling scope, output formats, all standard crawling features. Adapted to linked data formats but those formats should be accessible to any crawler.
A welcome addition since we are all going to encounter linked data but I am missing what is different?
If you see it, please post a comment.
Questions:
- What semantic requirements should a web crawler have?
- How does this web crawler compare to your requirements?
- What one capacity would you add to this crawler?
- What other web crawlers should be used for comparison?