If you could make a computer do anything with documents,…

Monday, March 17th, 2014

If you could make a computer do anything with documents, what would you make it do?

The OverviewProject has made a number of major improvements in the last year and now they are asking your opinion on what to do next?

They do have funding, developers and are pushing out new features. I take all of those to be positive signs.

No guarantee that what you ask for is possible with their resources or even of any interest to them.

But, you won’t know if you don’t ask.

I will be posting my answer to that question on this blog this coming Friday, 21 March 2014.

Spread the word! Get other people to try Overview and to answer the survey.

Getting Into Overview

Thursday, January 9th, 2014

Getting your documents into Overview — the complete guide Jonathan Stray.

From the post:

The first and most common question from Overview users is how do I get my documents in? The answer varies depending the format of your material. There are three basic paths to get documents into Overview: as multiple PDFs, from a single CSV file, and via DocumentCloud. But there are several other tricks you might need, depending on your situation.

Great coverage of the first step towards using Overview.

Just in case you are not familiar with Overview (for the about page):

Overview is an open-source tool to help journalists find stories in large numbers of documents, by automatically sorting them according to topic and providing a fast visualization and reading interface. Whether from government transparency initiatives, leaks or Freedom of Information requests, journalists are drowning in more documents than they can ever hope to read.

There are good tools for searching within large document sets for names and keywords, but that doesn’t help find the stories you’re not specifically looking for. Overview visualizes the relationships among topics, people, and places to help journalists to answer the question, “What’s in there?”

Overview is designed specifically for text documents where the interesting content is all in narrative form — that is, plain English (or other languages) as opposed to a table of numbers. It also works great for analyzing social media data, to find and understand the conversations around a particular topic.

It’s an interactive system where the computer reads every word of every document to create a visualization of topics and sub-topics, while a human guides the exploration. There is no installation required — just use the free web application. Or you can run this open-source software on your own server for extra security. The goal is to make advanced document mining capability available to anyone who needs it.

Examples of people using Overview? See Completed Stories for a sampling.

Overview is a good response to government “disclosures” that attempt to hide wheat in lots of chaff.

Step-by-step instructions for using Overview

Saturday, December 14th, 2013

Step-by-step instructions for using Overview by Jonathan Stray.

The Overview project posted the first job ad that I ever posted to this blog: Overview: Visualization to Connect the Dots.

A great project that enables ordinary users to manage large numbers of documents, to mine them and then to visualized relationships, all part of the process of news investigations.

Johnathan has written very clear and useful instructions for using Overview.

It is an open source software project so if you see possible improvement or added features, sing out! Or even better, contribute such improvement and/or features to the project.

Document Mining with Overview:…

Friday, March 15th, 2013

Document Mining with Overview:… A Digital Tools Tutorial by Jonathan Stray.

The slides from the Overview presentation I mentioned yesterday.

One of the few webinars I have ever attended where nodding off was not a problem! Interesting stuff.

It is designed for the use case where there “…is too much material to read on deadline.”

A cross between document mining and document management.

A cross that hides a lot of the complexity from the user.

Definitely a project to watch.

Complexity Explorer Project

Saturday, November 3rd, 2012

Complexity Explorer Project

A website development project that reports that when “live” it will serve (among others):

Scientist keeping up to date on papers with Source Materials Search Engine and Paper Summaries

Professor designing new course on complexity

High-school science teacher using virtual laboratory for student science projects

Non-expert learning how complex systems science relates to their own field

Scheduled to go beta in the Fall of 2012.

As always, of interest to see how semantic issues are handled in research/library settings.

Introducing DocDiver

Thursday, November 3rd, 2011

Introducing DocDiver by Al Shaw. The ProPublica Nerd Blog

From the post:

Today [4 Oct. 2011] we’re launching a new feature that lets readers work alongside ProPublica reporters—and each other—to identify key bits of information in documents, and to share what they’ve found. We call it DocDiver [1].

Here’s how it works:

DocDiver is built on top of DocumentViewer [2] from DocumentCloud [3]. It frames the DocumentViewer embed and adds a new right-hand sidebar with options for readers to browse findings and to add their own. The “overview” tab shows, at a glance, who is talking about this document and “key findings”—ones that our editors find especially illuminating or noteworthy. The “findings” tab shows all reader findings to the right of each page near where readers found interesting bits.

Graham Moore (Networkedplanet) mentioned early today that the topic map working group should look for technologies and projects where topic maps can make a real difference for a minimal amount of effort. (I’m paraphrasing so if I got it wrong, blame me, not Graham.)

This looks like a case where an application is very close to having topic map capabilities but not quite. The project already has users, developers and I suspect would be interested in anything that would improve their software, without starting over. That would be the critical part, to leverage existing software an imbue it with subject identity as we understand the concept, to the benefit of current users of the software.

Document Management System with CouchDB

Friday, October 21st, 2011

Document Management System with CouchDB

I mention this series of posts as a way to become acquainted with CouchDB, not as a tutorial on writing a document management system. Or at least not one for production use.

For my class:

  1. You don’t have to read the code, skip to the end of part 3 to the “simple” user interface. Make a list (one page or less) of what is missing from this “document management” system.
  2. What other document management systems are you familiar with? (If not any, check with me I will assign you one.) Make a one page feature list from the “other” document management system and mark which ones are present/absent in this system.

Not strictly a topic map issue but you are going to encounter people who say software is sufficient if it does X, particularly when you want Y. This is in part to prepare you to win those conversations.