May 4, 2010

OpenCalais: Serving up context on the fly

Why sharing your content is a smart business move

JD LasicaYou may have heard about the threat by online publishers to cordon off their sites’ content and put a No Trespassing sign on their lawns to ward off crawls and indexing by the search engine.

That chest-thumping strategy is a dead end.*

Consider the exact opposite of that strategy: Sharing your content in a broad ecosystem of traditional news sites, blogs and alternative publication. That’s the approach taken by participants in OpenCalais.

I’ve been hearing for some time about OpenCalais (pronounced cal-lay) — have you? — but I didn’t really begin to grasp it until I ran into Krista Thomas of Thomson Reuters at NewComm Forum.

I suspect most of us don’t really want to know the under-the-hood stuff about the Semantic Web. What we want to know is: How will these new capabilities advance my publication, my business, my blog, my cause? (Trust me, they will!) Here’s your five-minute backgrounder on OpenCalais, one important slice of this onrushing shift in Web technologies.

A context engine for blogs and news publications

Calais MarmosetIn a sentence, OpenCalais is the Thomson Reuters initiative that provides a free Web service and open API to connect content on the Web — and offer much-needed context. As its website proclaims, “We want to make all the world’s content more accessible, interoperable and valuable.”

Krista, who is also vice president of marketing & communications for OpenCalais, puts it this way: “Some call this trend the semantic Web, some call it Web 3.0, but simply put, OpenCalais is a quick and easy way for publishers to generate metadata for their content –- which they can use to streamline content operations, improve SEO, drive increased reader engagement, automate the creation of topic hubs and more.” It’s a valuable toolkit of capabilities that allow you to easily incorporate state-of-the-art semantic functionality within your blog, content management system, website or application.

Yes, any website or blog, no matter how small.

“These technologies make it easy to automatically connect the people, companies and concepts in your content to the related content on the rest of the Web.
— Krista Thomas

OpenCalais is just one piece of this burgeoning movement. “The reason OpenCalais – and so-called Web 3.0 concepts like the Semantic Web, Linked Data, Structured Data, etc. – is important is that these technologies make it easy to automatically connect the people, companies and concepts in your content to the related content on the rest of the Web,” she says. In a somewhat similar fashion, OAuth (Open Authorization) is an open standard that allows users to share their private resources — e.g., photos, videos, contact lists — stored on one site with another site without having to hand out their username and password. OAuth is paving the way for seamless Web experiences where users can flow data across sites and share social experiences across a range of platforms and devices, including mobile handhelds.

OpenCalais in practical terms

OK, that all sounds fine, but what will OpenCalais do for you in real terms?

When you’re writing about Steve Jobs announcing the next big thing at an Apple developers conference, your story or blog post will suck in links to other timely, relevant stories about the announcement — automatically inserted by OpenCalais without you having to embed anything. If you’re looking for the right photo to go with your post, OpenCalais can help. And if you want to save even more time, it can suggest relevant tags (keywords).

In other words, greater context and background — something the new media landscape sorely needs.

You can use OpenCalais as a publisher to help syndicate your content or as a blogger to bring in related content and photos. As my former partner Marc Canter likes to say, it not only sucks in, but also spits out — that’s what real interoperability is about. You do want to make your own content and data available to the world, right?

Image representing Krista Thomas as depicted i...
Image via CrunchBase

The service is free to use in both commercial and noncommercial settings — “there is no cost at all for the service and we don’t offer any professional services or other consulting around OpenCalais — it really is free,” Krista says. It should only be used for public content, of course (i.e. don’t run your medical records through it!). OpenCalais does not keep a copy of your content, but it does keep a copy of the metadata it extracts. You can process up to 50,000 documents per day (blog posts, news stories, Web pages, etc.) free of charge. That’s amazing. (If you need to process more than that – say you are an aggregator like Moreover or a media monitoring service like Meltwater – then see the OpenCalais site for details.) In some instances, you may want to pay the costs of your web developer to handle some of this.

I just started using OpenCalais for both and through the Tagaroo plug-in for WordPress. In many ways, it seems to behave similarly to Zemanta, which I use all the time now to add related articles, images and tags. (I ran into Zemanta CTO Andraz Tori at SXSWi.) I need to find out how it syndicates out my content to other sources. Other cool metadata content services include Apture and Picapp (which I wrote about in April).

More than 30,000 developers have signed on to OpenCalais, as well as more than 50 publishers and 75 entrepreneurs — and the open source platform Drupal. Early adopters of OpenCalais include the Huffington Post, CBS Interactive/CNET, Slate, Al Jazeera, and The New Republic. And OpenCalais just announced a raft of new content partnerships: Ushahidi ,, and Three Minute Media.

Krista points out that OpenCalais also makes sense for institutions like libraries, museums and universities looking to forge a path into the future of digital media. Other innovative services using OpenCalais to deliver intelligent content experiences include:

  • Digest — The new ‘Digest’ Application for Apple Computer’s iPad and the popular ‘Read it Later’ bookmarking service for Firefox, iPhone make it easy to save news stories, Web pages and more, and then retrieve them to read later on iPad, desktop or mobile with ease. Both Digest and Read it Later use OpenCalais to automatically categorize stories and quickly sort them into helpful folders for reading ease.
  • – Hedgehogs is a social application platform for the hedge fund and investment community and those who serve it. Hedgehogs uses OpenCalais to automatically index and cross reference content from heterogeneous sources as the basis for persistent search and alerts and to power its automated contextual “related content” capability.
  • FeedTrace – FeedTrace enables anyone to mine Twitter content for links, and makes it easy to find the most relevant news, articles and videos being shared on Twitter by link popularity. FeedTrace uses OpenCalais to categorize popular links to help you quickly navigate through real-time content to find the “right time content” that matters most.

How to get started with OpenCalais

There are a number of ways to get started with OpenCalais:

  1. If you are on WordPress, try the Calais Tagaroo plug-in, which is easy to install and which automatically tags your content as you type. It can also fetch images from Flickr and videos from Google Video, which you can select for inclusion in your post or disregard.
  2. If you want to manipulate your search results appearance in Google and Yahoo!, you can try Calais Marmoset, which is simple javacode you embed in your site pages. Marmoset will collect the metadata from your page (in the form of RDFa) and hand it over to Google Rich Snippets and Yahoo! Search Monkey so that you can customize the way your search results appear.
  3. If you are building a new site from the ground up, consider using OpenPublish, a free Content Management System based on the popular open source platform Drupal. OpenPublish bakes in OpenCalais from the ground up to “semantify” your site and automate the creation of ‘related reading’ widgets, ‘topic hubs’ and more. It comes from Phase2 Technology and Thomson Reuters, and now comes with support and hosting from Drupal founder Dries Buytaert’s company, Acquia. (Krista adds: If you were building a whole new site, I’d recommend OpenPublish, but you’d still end up working with the help of at least one programmer/developer type in most cases (unless you have a background in Drupal).
  4. To build OpenCalais into an existing site or publishing platform (CMS), you will need to work with your developers, who can find the resources and information they need at and Semantic Universe.

Final words

I suspect as word filters out, we’ll see more and more publishers, entrepreneurs and developers learning how to work with the new tools. And the ultimate winners will be readers who access to more relevant content.

Note: Calais is the core service, and OpenCalais is the free and public version of it. But the makers prefer to always refer to it it as OpenCalais. (Here’s Wikipedia’s entry.) Also, OpenCalais is an open API but it’s not open source. “That said,” Krista adds, “we have published our schema for others who want to use it. But once is has been adapted or augmented, we won’t take it back. We’re preserving its integrity so that we can stand by our results.”

* I’m referring to general news sites that block Google and other search engines. There may be a few special cases where pay walls make business sense.


PicApp: Free quality images for your blog (
Seedcamp winners meet the Traveling Geeks (including Zemanta): (
Krista Thomas interviewed about OpenCalais (YouTube)
Web 3.0 slide show: Marketing Content: Bringing structure to articles, blog posts and more (Slideshare)

Reblog this post [with Zemanta]

JD Lasica, founder of, is now co-founder of the cruise discovery engine Cruiseable. See his About page, contact JD or follow him on Twitter or Google Plus.

Related Posts Plugin for WordPress, Blogger...

3 thoughts on “OpenCalais: Serving up context on the fly

  1. JD-

    Tom Tague from OpenCalais here.

    Thanks for the thoughtful and comprehensive article – we really appreciate any help we can get in getting the word out about something that we believe is game changing.

    I would like to encourage your readers to grab a news story and visit

    This is just a simple demo application that will show you just how much information OpenCalais can extract from your content – with no programming or installations needed.


  2. Thanks JD, for the inside scoop on OpenCalais. You mentioned you just started using it, and I'd be interested to hear your thoughts on it in the future.