Posted in Publishing, tagging

In her first diary post from the O’Reilly Tools of Change conference in New York City, Sara’s number one ‘take-away’ for the day was: “40% of Internet users are tagging content on a daily basis – how many publishers are ensuring their content is taggable?”

2008 may be all about metadata (and perhaps a few other things along the way) as companies in the information space engage with the increasing usefulness and accessibility of structured information.

Recent news on ReadWriteWeb is that Reuters has opened it’s Calais API:

The idea behind Calais is simple – identify interesting bits into metadata in documents. In this implementation the focus is on People, Companies, Places, and Events, but surely the technology can be adopted to other entities. The heavy lifting is done by the combination of a natural language processing engine and a massive hard coded, learning database that Clear Forest has built.

Why have they done this? ReadWriteWeb reckons:

Semantic technologies result in better, faster, more precise and relevant information, and Reuters, as a big player in the information space, wants to be one of the first companies delivering this kind of experience.

And, of course, they’re opening the tagging doors even wider by allowing users to submit their own content for tagging. That 40% of users tagging content on a daily basis can now tag more each day, and Reuters enriches their information (and trains Calais) along the way.

The roadmap for the Calais Web Service give the following outline of developments in 2008:

  1. Jan. ‘08 – allow users to submit text and receive back rich semantically tagged content… support English language content and will work best on content such as news, press releases, blog entries and other well-written prose. Future releases will incorporate specialty capabilities for patents, blogs, entertainment and sports news, scientific documents and financial filings.
  2. Apr. ‘08 – provide users with a persistent GUID allowing anyone with the GUID to call the Calais service and access the original metadata… ability to support user-generated metadata.
  3. Jul. ‘08 – incorporate a number of additional languages within Calais.
  4. Sep. ‘08 – providing users with a development environment that will allow them to create new extraction capabilities unique to their needs.
  • Facebook
  • del.icio.us
  • Digg
  • StumbleUpon
  • Technorati
  • LinkedIn

4 Comments

  1. Jack Macdonald
    Posted on 12 February, 2008

    James,

    Do you think user tagging will take off for publishers? Won’t users get tired of having to tag at every site they visit, and just use cross-platform tagging like del.icio.us?

    Jack

  2. Posted on 12 February, 2008

    Jack,

    I think cross-platform tagging will still be the main route in for individual users, although I don’t think del.icio.us will necessarily persist as we know it today – everything evolves, right? I think, though, that publishers will/must begin to tag their own content more richly so that web services can pick it up and play with it – e.g. using Microformats in our online content that will have native support in Firefox 3. I think it’s easy to see how we can tag our science, technical and non-fiction content more, but what do we do with fiction? Tagging names, places, dates is a good start. We (users & publishers alike) could tag all the instances of “Poirot” and see how far the character penetrates fiction culture, for example – talk about intertextuality! Or if you like reading stories about Paris in the 1920s with references to the Left Bank and cabaret, then pull those tags together and you have reading/recommendation list.

    James

  3. Ian Mulvany
    Posted on 13 February, 2008

    Hmm, I’d like to know where this figure of 40% daily comes from. I just don’t see it, at least not yet.

    @Jack, cross-platform tagging is sort of a way to go. My guess is that tagging will be mature when the stuff you tag on the web can be carried around with you and integrated into your desktop/phone search. In any case there is a value for publishers if they can pull back from these tagging pools the meta-data about that they have that have been tagged in these repositories. Connotea, the site I run, allows you to do this through our API calls and some people are already building on top of this, but mainly in the scientific field, not so much in the mainstream publishing field. It’s small scale now, but it’s there and it works.

    - Ian

  4. Jack Macdonald
    Posted on 14 February, 2008

    James – I agree, publisher tagging is really exciting for both readers and publishers. Especially STM publishers – imagine a tag-based business model, where users could buy access to every article tagged with, say, biochemistry+drugs! Publishers could allow end users to create their business models for them.

    Ian that sounds very interesting – like a totally new kind of analytics, I suppose.

    Jack

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*