Archive

Archive for the ‘Semantic’ Category

SES Chicago Dec 7 - 11th

December 7th, 2009
Chicago, originally uploaded by aserpa.

Well, it’s been a long time, a fantastically busy time and that’s the end of my non-excuse for the lack of posting of late.

I’ve just landed in Chicago to attend SES and discovered some weather for the first time in 15 months (you’d be amazed to hear you can actually miss weather if you move to California!) and a bank of Taxis outside the conference parading around with the new Yahoo! advertising on the top. I have a feeling this is a happy accident, but marketing is rather a ‘dark art’ so I’m not committing to that.

I’m here because I was asked to join a panel (Developments in Information Retrieval on the Web) and talk to the crowd about Semantic Data along with Jamie Taylor (MetaWeb), Martin Hepp (Universität der Bundeswehr München) and Jay Myers (Best Buy).

It’s a great panel - two guys capable of talking about any aspect of RDF and Microformats and two guys who’ve had the pleasure of learning from them and implementing structured data solutions. Jay works over at BestBuy where he’s done a cracking job of integrating structure on a public site with a massive catalog (~500k+ pages) - oh and he’s also been instrumental in developing the GoodRelations spec, so all-in-all a Semantic Superstar! That sort of implementation makes my life so much easier and hopefully, in turn, yours as Search Engines and aggregators start to use this structure to help you find what you want.

If you’re in town and would like to meet for a drink please drop me a mail, IM or reach out on Twitter.

UPDATE:

It’s all over, i’m heading back to California where white stuff doesn’t fall out the sky and coats are for people with holiday cabins in Tahoe. Huge thanks to Sean Golliher (great blog template, Sir!) for organising a great panel, it was a really enjoyable session.

Shame on those of us who felt the delay whilst Martin got his Mac ready would have made a good advert for Microsoft’s ‘I’m a PC’ series - the upshot is this great video of his presentation with slides. http://vimeo.com/8065914 It’s a pity we don’t have the full 126 slide original to compare it against!

In the spirit of sharing, as soon as I get to a stable connection I’ll be adding my slides to SlideShare and asking Jamie and Jay to do likewise…. more soon.

UPDATE 2:

I’ve uploaded my slides from the panel to: http://www.slideshare.net/NickCox/ses-chicago-2009-searchmonkey

UPDATE 3:

I just saw a Tweet from Jay Myers, now his slides from SES are up on Slideshare at http://slidesha.re/4UoQbg. 3 down, 1 to go!

Yahoo! Placemaker

June 5th, 2009
Yahoo! Placemaker

Yahoo! Placemaker

Recently Yahoo! launched a new Geo API called Placemaker. I’ve been playing with it all week and am continually delighted with the recall and accuracy it’s able to deliver.

Essentially you can pass in a text string or web document (structured or unstructured) and the service will identify, disambiguate and extract the places contained within. For example this sentence includes the location Sunnyvale, California which whilst seemingly completely out of context is where I work. I ran this paragraph through the API and here’s an extract of what was returned:

<document>
<administrativeScope>
<woeId>2502265</woeId>
<type>Town</type>
<name><![CDATA[Sunnyvale, CA, US]]></name>
<centroid>
<latitude>37.3716</latitude>
<longitude>-122.038</longitude>
</centroid>
</administrativeScope>

</document>

Along with the location name, a latitude and longitude of both the centroid and each corner of a bounding box we also have the superb WOEIDs (Where-on-Earth ID). Armed with all this information there’s almost no location based application I can’t build. Indeed sites such as Just Landed which searches Twitter for the text ‘just landed in’ and geocodes the places in order to provide intriguing visualisations just became as simple as tying two APIs together!

As a supporter of all things Semantic, it’s important to highlight that this API goes far beyond some complex string matching. Placemaker recognizes geographic semantic tags, such as the W3C Geo Vocabulary, and microformats such as geo and adr. Pretty neat huh? Drop a note in the comments below and let me know what you think about this and post any links to cool applications it’s allowed you to build.

Google Joins Semantic Web

May 28th, 2009

As I highlighted a few short weeks ago, Google has been dropping hints about the Semantic Web so subtle that even us chaps realised something exciting was going on over at the Googleplex. During the Searchology conference (their annual slap in the face to startups who dared think they were on to something unique and exciting) Big G revealed that the Christmastime rumors of data islands were no more and that RDFa was accepted!

The announcement focuses on hCard and hReview, which if found on your page be will be turned in to a visual presentation and added to your result on their SRP. Sound familiar? If it does that’s because, as many bloggers pointed out, it’s incredibly similar to Yahoo! SearchMonkey Structured Objects. Competition aside, this is great news for publishers as it is yet another vindication of the benefits of structured data on your pages.

Google Rich Snippets

Google Rich Snippets

Yahoo! SearchMonkey

Yahoo! SearchMonkey

Where SearchMonkey has focused on complete Objects for presentation - e.g. a Video looks like this whilst a News article looks like that - Rich Snippets, as Google is calling this, call out single key/value pairs which can add value to a standard result. So far however their presentation appears to be behind flood controls as you need to add your domain to a waiting list. My hunch is that Google is treading carefully due to concerns as much about spam as the resulting visual impact on their end users.

Now that the top two engines are adopting public, open-standards we can expect to increasingly enjoy the benefits of ever richer, more accurate results with highly targeted presentations.

Wolfram goes Alpha

May 17th, 2009

Continuing their ‘WTF?!’ launch policy Wolfram|Alpha chose to open the floodgates to their servers late on Friday afternoon, several days earlier than announced. Perhaps this was ploy to reduce the likelihood of their hardware stumbling under the load, if it was it didn’t work - mostly because their target audience doesn’t have much else to do on a Friday night.

As Google proved a few days prior, server problems happen to the best of us and I for one won’t be marking them down for that - it’s an Alpha for a reason. If anything it’s extra marks for thinking ahead and offering an error message aimed squarely at their audience.

Wolfram|Alpha Server Failure Message

Wolfram|Alpha Server Failure Message

Ok, so what are the results like? Overall I’m impressed, the linking of data is frankly excellent even if you get the feeling they’re just showing off at times. For example, knowing the height of the ‘tallest tree’ in the most appropriate unit would be satisfactory. Going on to convert 385ft to miles, yards, meters, km, cm and even fathoms is bordering on the autistic. Another classic example informs me that the speed ‘55mph’ is 0.62 x the speed at which Marty McFly needed to drive the Delorean DMC-12 in order to time travel ( 88 mph ) - now is that geeky, a fun Easter egg or just data because it was there?

Childlike fact telling aside, Wolfram doesn’t offer the most accurate Query Linguistic Analysis engine and that leads to many failed queries which it would appear Wolfram actually does have the answer to. For example ‘average salary’ fails whereas ’salary’ returns average salary information for a set of major occupations. This is something that can be improved dramatically with access to a massive volume of real world queries, this Alpha release and associated ‘Google Killer’ hype will certainly enable the collection of that.

I’m also not going to knock off marks for the user interface or breadth of their dataset, both of those can be fixed over time if the proof of concept warrants it - and the first look suggests that it really does. Whilst Google wanted to index the world’s data, Freebase, Wikipedia and now Wolfram seem to have most of the worlds ‘factual content’ wrapped up.

VoCamp Sunnyvale CA: June 18-19, 2009

May 12th, 2009

I’ve talked recently of my sadness at the lack of a central repository for ontological knowledge on the Web. Until the major players can sort that out (I really don’t expect it to be long coming now) on the Web there is plenty you can do back in the RealWorld(tm).

VoCamps provide a two day forum for vocabulary creation and discussions on the management of the Semantic Web. Unlike Semantic Web meet ups which typically take a few hours and focus on a single presentation, the VoCamp format is open and provides time to members of the community to talk about current issues with vocabularies and semantic interoperability and the chance to work in small groups.

If you live in the Bay Area and want to come along to a VoCamp and help shape the future of the Semantic Web please sign up on the VoCampSunnyvale2009 wiki page. Space is limited, but we will try to expand if necessary. The event is right after SemTech San Jose so you won’t have far to travel, and perhaps best of all it’s free!

Bogged down by Semantics

May 9th, 2009

I’m running massively behind on my Podcasts. The backlog has been building up for the past month whilst I’ve been focusing on that ever present joy - quarterly planning. As you might have guessed from my place of work, planning right now has a few more variables than one might hope for. Digressions aside, I grabbed a few hours this weekend to get psyched about Tech again.

Highest on my playlist was The Semantic Web Gang, and not just because my colleague Peter Mika was taking part this time. This is regularly a great show for anyone wanting to learn more. I ended up a little depressed as the conclusions of everyone on the panel sadly matched those I’ve been coming to for a while.

No one likes to ‘reinvent the wheel’ so before delving in to code most of us look around to see if we need to. When investigating Semantic Objects today there is no clear source of truth as to prior-art for any developer (corporate or personal) wanting to create an Ontology. Whilst this doesn’t surprise me at this stage in the Semantic Web, I am a little shocked that no one has attempted to take ownership of this space.

It’s in the interest of the community to offer a set of complete vocabularies for specific objects and all of us spend a fair amount of time trying to define the next set. With both these thoughts in mind, here’s my elevator pitch for a possible solution:

  • Offer a gallery style view of known and ‘complete’ objects.
  • This gallery would be user contributable.
  • This gallery would allow for comments and feedback to the authors to ensure the needs of the wider world are considered by the authors.
  • This gallery would offer links to ontology creation tools.
  • This gallery would support and allow for group collaboration on the definition of a new object.
  • When an ontology is complete and examples of real world usage were linked to by more than 3 people Yahoo!, MSN, Ask, Google etc.  would extend support for it by adding crawler support (e.g. we would agree to accept this format for our indexes).
  • The entire Ontology set would be made available under CC licenses (or most appropriate alternative) and ‘donated’ to the community to ensure adoption.

Why is ‘something’ like the above useful? It would be a start point for the confused masses. Does an ontology exist for ‘bicycles’? A simple search could return nothing:  You’ll need to go and create something, and here are some tools and access to a community. Or something: Here’s an ontology you can go and use or contribute to in order to extend it as you need.

Well, that’s one possible way to lower the barriers to entry which people are increasingly telling me are too high right now. What do you think, is there a better way?

Wolfram|Alpha

April 28th, 2009

Ignoring the traditional ‘how to launch a new site’ playbook which state you must whore yourself around expert commentators, provide personal updates on your blog for months in advance, build a following among an ever increasing alpha test group and finally issue an overblown PR announcement on the day of launch which preferably includes some quotes hinting ‘Google killer?’ from your new friendly commentators, Stephen Wolfram has seemingly rubbed much of the industry the wrong way the mysteriously quiet run up to the launch of Wolfram|Alpha.

As redundant as this may sound, geniuses are aren’t stupid people. For a while there though I was starting to question the wisdom of the MacArthur genius grant review committee. Whilst Wolframs approach has garnered the biggest swell in anticipation prior to a launch since, well probably since, Teoma and Wisenut back in 2002/3, yesterdays webcast was a bust for me. Scheduled at a time I couldn’t attend I hoped to catch up later in the evening. No such luck. The broadcast appeared to have been replay free until over 30 hrs had passed and we started to see some neat download options began to appear – download the video, stream it or grab the MP3 – cool! Speaking of which, Cuil followed the playbook, everyone seems to hate them, and even if they did publish an MP3 version of their most recent announcement (a timeline presentation seen before a dozen times elsewhere) nobody would have cared. It’s important to add that you don’t get a single screenshot of this ‘amazing’ new product during the entire 90 minute presentation - Stupidity or extreme genius? You decide.

What can it do? It can describe places, like Lexington, Mass., by its vital statistics, like location, population, weather, etc. It can compare Lexington with Moscow. If you type “LDL 180,” it will tell you the percentile of the population with higher or lower cholesterol and show you the answer on a chart. If you tell “LDL 180 male 45,” it will adjust the chart for gender and age group. It can chart the life expectancy of a male age 40 in Italy or tell you who was president of Brazil in 1928

http://bits.blogs.nytimes.com/2009/04/28/wolfram-alpha-veil-lifted

Without visual proof of the thing in action it’s hard to state this with any degree of convicion, but there appears to be nothing in the demo that couldn’t be achieved without a decent query parser and a triple or perhaps, if we wanted to store the context of the data, a quad store. I have seen a few leaked screenshots from the initial webcast and it would seem that many of the examples can be knocked up with Freebase. So is Wolfram|Alpha one of the next generation of Object Data store powered Search Engines? Hard to say from this small ‘preview’, but the indications do hint at it.

To cap the growing excitement with the fateful rubber stamp of ‘Google Killer’, Google themselves came out with a Direct Display for the top of their results to show US Census data. A nothing launch on any day of the week – with the exception of the nice graphing animations thanks to Trendalyzer – this timing got the press buzzing. Do Google think this is a threat? Is Google trying to prove that whatever Wolfram can do they can do better? And so on until you loose the will to care. Well, at least until you get the chance to see for yourself in May when the real launch happens.

[UPDATE May 11th]  According to their blog, Wolfram|Alpha will open to the full force of the Webs interest on 18th May 2009. If you’re lucky enough to stumble in to a test bucket you may be able to experiment already.

Linked Data

April 3rd, 2009

Tim Berners-Lee used his 15 minutes at TED to state a fairly obvious point. Linked Data, or the Semantic Web, or the Deep Web are all important things we can’t do much with right now because people don’t take the time semantically tag it in a useful way. As one would expect from a certified genius, he’s quite correct about the problem. I do however feel that the reason people aren’t doing this at scale is threefold – complexity, tangible benefit, security.

The issue of data security became highly apparent to me at SxSW. Discussions during the metadata panels would often focus on the issue of Intellectual Property (IP) and the fear of leaking proprietary information to competitors. My personal view is that semantically enriching your data is rather like a city investing in transport infrastructure. On one hand, yes people can move quickly to leave your town, but on the other you’ve provided a more satisfying travel experience and made it easier and more attractive for people to visit and return.

As for complexity and benefits. You won’t notice this unless you came to this page from Yahoo! Search, but I’ve taken the mandate from TBL to heart and embedded enough mark-up to identify his TED talk as a video with an associated thumbnail image in nice well formed RDFa.

Here’s an example we’ve used fairly extensively which shows how to embed a Hulu video and benefit from this new approach. Just the first two lines of code are required to generate an enhanced result. The other four lines are optional and assist with the display.

<link rel="image_src" href="http://thumbnails.hulu.com/9/967/32912_145×80_generated__VfW.jpg" />

<link rel="video_src" href="http://www.hulu.com/embed/GREW9Qw0P7KjIyjJydQYRw" />

<meta name="video_height" content="296"/>

<meta name="video_width" content="512"/>

<meta name="description" content="Video description: Homer gets upset at a vending machine filled with apples."/>

<meta name="video_type" content="application/x-shockwave-flash"/>

Simple, huh?

The premise of Linked data is simple, but until now the implementation was difficult and the tangible benefits lacking. At Yahoo! we’re trying to clarify the benefits with richer search results powered by SearchMonkey, and I spend large portions of my day evaluating ways to simplify the mark-up process for site owners and publishers.

If you have a flash object such as a video, game, or document embedded on a page, adding a few lines of code will make it appear as an enhanced result after we re-crawl your page. No semantic mark-up knowledge is required as you can simply cut-and-paste our example code, and you don’t have to build your own application to display the result – although you can go crazy with the SearchMonkey Developer tool if you so wish. SearchMonkey does the heavy lifting, taking your mark-up and extracting the necessary structured data to display it as an enhanced result.

Those ‘hardcore’ few who started writing the Web using Vi or Notepad are the same who understand RDFa and Semantic Mark-up technologies. The Linked world probably needs a FrontPage or Dreamweaver solution before we see mass adoption. Until that great day, until there’s a Semantic-FrontPage for the rest of us, why not give our documentation a whirl?

Google sees the light?

March 28th, 2009

It’s been all quiet on the Semantic front over at Google until a flurry of recent press murmurings. Something’s definitely changed over at the Googolplex, but so far it appears to be just their PR department!

A pretty good article over at PCW discusses some of these recent announcements, but in short it appears that Semantic Search holds a different meaning for Google than everyone else… with the possible exception of your favourite dictionary. Semantic simply means ‘the meaning of language’ or ‘the relationship between symbols’. If we assume words and phrases are symbols, then Google is certainly pursuing Semantic Search. Their visible focus of late has been to provide links to related topics and longer summaries, both of which have been available at the competition for a very long time. Nothing new so far.

Rumours in December leaked hinting at some early work to create data-islands within the pages of a number of top publishers - a new form of non standard markup which could lead to new presentations in the Google SRP. So far, there’s been little sign of this progressing which is a tremendous relief, and not just because I head up the SearchMonkey program over at Y! where we’ve already launched an open approach to this.  I truly believe in the power of utilising metadata for Search. Seeing our competitors follow suit at this stage is more of a vindication than a troubling development, but any attempt to force the market to use non-standard markup is not a good sign for the web at large.

Have you started to see signs of other Semantic Search developments on the web? What do you think of them so far, and do you think open standards are of any importance at this time?

Author: Nick Cox Categories: Google, Search, Semantic Tags: ,

Really Simple SearchMonkey

March 12th, 2009
A SearchMonkey 'document' template in action.

A SearchMonkey 'document' template in action.

I’m delighted to be able to announce the launch of a new SearchMonkey feature which allows site owners to automatically benefit from the enhanced presentations (SearchMonkey apps) you’ve been seeing over at Y! Search without the need to write a single line of code.  

Until now, if you wanted to offer a visually rich result for your site on Y! Search you had to perform a few tasks; You needed to mark up your site with metadata, create an application to transform that metadata into a result presentation, publish the application and finally wait for our editorial staff to review your work. In the grand scheme of things this isn’t much to ask in return for the awesome user experience people will enjoy when they see your site in their results but it is, none the less, a massive hurdle for many site owners. 

As ground breaking as SearchMonkey was, we realized that we could make things much simpler. Specifically, from now on if you annotate your pages with structured data using vocabularies (formats) that we understand, we’ll do the heavy lifting. We’ll create an Enhanced Result for you and publish it for all our Y! Search users without you having to write a single line of PHP or even become the proud owner of a Y! user ID ! Obviously, if you want to customize the presentation of your data we’ve left you that option. You can still use the SearchMonkey Developer tool to write an application, but if you are happy with the new templates, you no longer have to.

Our new documentation not only shows you how to mark up your page for certain types of objects, but it also let’s you validate immediately if your markup is correct.  This is something that many have asked for in the past and I’m immensely proud of the team for delivering a solution. The first objects that we support are Video, Games and Documents, but more are on the way.

I believe this is an exciting step which will help drive greater adoption of RDFa and other forms of semantic markup. What do you think - have we gone far enough, what do you want to see in the future?