Archive for the ‘OpenCalais’ Category

Congratulations! You’re Running on OpenCalais 4.7!

Monday, March 25th, 2013

Congratulations! You’re Running on OpenCalais 4.7!

From the post:

This morning we upgraded OpenCalais to release 4.7. Our focus with 4.7 was on a significant improvement in the detection and disambiguation of companies as well as some behind-the-scenes tune-ups and bug fixes.

If your content contains company names you should already be seeing a significant improvement in detection and disambiguation. While company detection has always been very good in OpenCalais, now it’s great.

If you’re one of our high-volume commercial clients (1M+ transactions per day), we’ll be rolling out your upgrade toward the end of the month.

And, remember, you can always drop by the OpenCalais viewer for a quick test or exploration of OpenCalais with zero programming involved.

If you don’t already know OpenCalais:

From a user perspective it’s pretty simple: You hand the Web Service unstructured text (like news articles, blog postings, your term paper, etc.) and it returns semantic metadata in RDF format. What’s happening in the background is a little more complicated.

Using natural language processing and machine learning techniques, the Calais Web Service examines your text and locates the entities (people, places, products, etc.), facts (John Doe works for Acme Corporation) and events (Jane Doe was appointed as a Board member of Acme Corporation). Calais then processes the entities, facts and events extracted from the text and returns them to the caller in RDF format.

Please also check out the Calais blog and forums to see where Calais is headed. Significant development activities include the ability for downstream content consumers to retrieve previously generated metadata using a Calais-provided GUID, additional input languages, and user-defined processing extensions.

Did I mention it is a free service up to 50,000 submissions a day? (see the license terms for details)

OpenCalais won’t capture every entity or relationship known to you but it will do a lot of the rote work for you. You can then fill in the specialized parts.

Adobe CQ5 – OpenCalais Integration [Drupal too!]

Tuesday, November 27th, 2012

Adobe CQ5 – OpenCalais Integration by Mateusz Kula.

From the post:

In the massive amount of information available on the Internet it is getting more and more difficult to find relevant and valuable content and categorize it in one way or another. No doubt tagging this overwhelming amount of data is becoming more and more crucial from the SEO and digital marketing point of view as it plays important role in site positioning and allows end users a keyword search. Problems appear when editors are not scrupulous enough to add tags for new pages, press releases, blogs and tweets and to update them when content significantly changes. The worst case scenario is when there is a CMS filled with a whole bunch of untagged content. Then it may take too much time and resources to catch up with tagging. OpenCalais turns out to be a great solutions to such problems and what is more it allows for auto-tagging and can be easily integrated with other services.

An interesting take on integrating OpenCalais with Adobe’s enterprise content management system, CQ5.

Suspect there are topic map authoring lessons here as well.

Rather than seeing topic map editing as always a separate activity, integrating it into content management workflow, automated to the degree possible, could be a move in the right direction.

BTW, there is an OpenCalais module for Drupal, in case you are interested.

Calais Release 4.6 is Available for Beta Testing [Through 23rd of August]

Sunday, August 12th, 2012

Calais Release 4.6 is Available for Beta Testing [Through 23rd of August]

From the post:

As we mentioned in our prior post, Version 4.6 of OpenCalais is now available for beta testing. While we should have 100% backward compatibility – it’s always a good idea to run a set of transaction through and make sure there are no issues.

You’ll see a number of new things in this release:

  • Under the covers we’ve upgraded our core processing engine. While this won’t directly affect you as an end user – it does set the stage for further improvements in the future.
  • We’ve improved the quality of the Company and Person extraction. Not surprisingly, these are two of our most frequently used concepts and we want them to be insanely great – we’re getting there.
  • We’ve updated and refreshed our Social Tags feature. If you haven’t had a chance to experiment with Social Tags in the past, give it a try. This is a great way to immediately improve the “findability” of your content.
  • We’ve introduced six new concepts that we’ll discuss below.

PersonParty extracts information about the affiliation of a person with a political party. CandidatePosition extracts information on past, current and aspirational political positions for a candidate. ArmedAttack extracts information regarding and attack by a person or organization on a country, organization or political figure. MilitaryAction extracts references to non-combative military actions such as troop deployments or movements. ArmsPurchaseSale Extracts information on planned, proposed or consummated arms sales. PersonLocation extracts information on where a person lives or is traveling.

So, it’s the Politics and Conflict pack – always popular topics.

More details at the post (including release notes).

Get your comments in early! Planned end of beta test: 23rd of August 2012.


Thursday, July 12th, 2012


From the introduction page:

The free OpenCalais service and open API is the fastest way to tag the people, places, facts and events in your content.  It can help you improve your SEO, increase your reader engagement, create search-engine-friendly ‘topic hubs’ and streamline content operations – saving you time and money.

OpenCalais is free to use in both commercial and non-commercial settings, but can only be used on public content (don’t run your confidential or competitive company information through it!). OpenCalais does not keep a copy of your content, but it does keep a copy of the metadata it extracts there from.

To repeat, OpenCalais is not a private service, and there is no secure, enterprise version that you can buy to operate behind a firewall. It is your responsibility to police the content that you submit, so make sure you are comfortable with our Terms of Service (TOS) before you jump in.

You can process up to 50,000 documents per day (blog posts, news stories, Web pages, etc.) free of charge.  If you need to process more than that – say you are an aggregator or a media monitoring service – then see this page to learn about Calais Professional. We offer a very affordable license.

OpenCalais’ early adopters include CBS Interactive / CNET, Huffington Post, Slate, Al Jazeera, The New Republic, The White House and more. Already more than 30,000 developers have signed up, and more than 50 publishers and 75 entrepreneurs are using the free service to help build their businesses.

You can read about the pioneering work of these publishers, entrepreneurs and developers here.

To get started, scroll to the bottom section of this page. To build OpenCalais into an existing site or publishing platform (CMS), you will need to work with your developers. 

I thought I had written about OpenCalais but it turns out it was just in quotes in other posts. Should know better than to rely on my memory. 😉

The 50,000 document per day limit sounds reasonable to me and should be enough for some interesting experiments. Perhaps even comparisons of the results from different tagging projects.

Not to say one is better than another but to identify spots on semantic margins where ambiguity may be found.

Historical documents should make interesting test subjects.

Being cautious the further back in history we reach, the less meaningful it is to say a word has a “correct” meaning. An author used it with a particular meaning but that passed from our ken with the passing of the author and their linguistic community. We can guess what may have been meant, but nothing more.