Aperture: a Java framework for getting data and metadata
From the website:
Aperture is an open source library for crawling and indexing information sources such as file systems, websites and mail boxes. Aperture supports a number of common source types and document formats out-of-the-box and provides easy ways to extend it with custom implementations.
Example applications include:
- bibsonomycrawler.bat – crawls Bibsonomy accounts, extracts bookmarks and tags
- deliciouscrawler.bat – crawls delicious accounts, extracts bookmarks and tags
- filecrawler.bat – crawls filesystems, extracts the folder structure, the file metadata and the file content
- flickrcrawler.bat – crawls flickr accounts, extracts tags, and photos metadata
- icalcrawler.bat – crawls calendars stored in the well-known iCalendar format, extracts events, todos, journal entires etc.
- imapcrawler.bat – crawls remote mailboxes accessible with IMAP
- mboxcrawler.bat – crawls local mailboxes stored in mbox-format files (e.g. those from thunderbird)
- outlookcrawler.bat – makes a connection with the outlook instance and crawls appointments, contacts and emails, note that this crawler will obviously only work in Windows if the MS Outlook is installed
- thunderbirdcrawler.bat – crawls a thunderbird addressbook, extracts contacts, note that for crawling emails – use the mboxcrawler
- webcrawler.bat – crawls websites
More tools for your topic map toolbox!