In her first diary post from the O'Reilly Tools of Change conference in New York City, Sara's number one 'take-away' for the day was: "40% of Internet users are tagging content on a daily basis - how many publishers are ensuring their content is taggable?" 2008 may be all about metadata (and perhaps a few other things along the way) as companies in the information space engage with the increasing usefulness and accessibility of structured information.
Recent news on ReadWriteWeb is that Reuters has opened it's Calais API:
The idea behind Calais is simple - identify interesting bits into metadata in documents. In this implementation the focus is on People, Companies, Places, and Events, but surely the technology can be adopted to other entities. The heavy lifting is done by the combination of a natural language processing engine and a massive hard coded, learning database that Clear Forest has built.
Why have they done this? ReadWriteWeb reckons:
Semantic technologies result in better, faster, more precise and relevant information, and Reuters, as a big player in the information space, wants to be one of the first companies delivering this kind of experience.
And, of course, they're opening the tagging doors even wider by allowing users to submit their own content for tagging. That 40% of users tagging content on a daily basis can now tag more each day, and Reuters enriches their information (and trains Calais) along the way.
The roadmap for the Calais Web Service give the following outline of developments in 2008:
- Jan. '08 - allow users to submit text and receive back rich semantically tagged content... support English language content and will work best on content such as news, press releases, blog entries and other well-written prose. Future releases will incorporate specialty capabilities for patents, blogs, entertainment and sports news, scientific documents and financial filings.
- Apr. '08 - provide users with a persistent GUID allowing anyone with the GUID to call the Calais service and access the original metadata... ability to support user-generated metadata.
- Jul. '08 - incorporate a number of additional languages within Calais.
- Sep. '08 - providing users with a development environment that will allow them to create new extraction capabilities unique to their needs.