Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 3, 2010

Aperture: a Java framework for getting data and metadata

Filed under: Data Mining,Software — Patrick Durusau @ 7:07 pm

Aperture: a Java framework for getting data and metadata

From the website:

Aperture is an open source library for crawling and indexing information sources such as file systems, websites and mail boxes. Aperture supports a number of common source types and document formats out-of-the-box and provides easy ways to extend it with custom implementations.

Aperture wiki

Example applications include:

  • bibsonomycrawler.bat – crawls Bibsonomy accounts, extracts bookmarks and tags
  • deliciouscrawler.bat – crawls delicious accounts, extracts bookmarks and tags
  • filecrawler.bat – crawls filesystems, extracts the folder structure, the file metadata and the file content
  • flickrcrawler.bat – crawls flickr accounts, extracts tags, and photos metadata
  • icalcrawler.bat – crawls calendars stored in the well-known iCalendar format, extracts events, todos, journal entires etc.
  • imapcrawler.bat – crawls remote mailboxes accessible with IMAP
  • mboxcrawler.bat – crawls local mailboxes stored in mbox-format files (e.g. those from thunderbird)
  • outlookcrawler.bat – makes a connection with the outlook instance and crawls appointments, contacts and emails, note that this crawler will obviously only work in Windows if the MS Outlook is installed
  • thunderbirdcrawler.bat – crawls a thunderbird addressbook, extracts contacts, note that for crawling emails – use the mboxcrawler
  • webcrawler.bat – crawls websites

More tools for your topic map toolbox!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress