“Exploring a ‘Deep Web’ That Google Can’t Grasp” with DeepPeep. Etc….02.25.09

25 02 2009

There was a short article drawing attention to the problem of searching the Deep Web in the Sunday NYT titled Exploring a ‘Deep Web’ That Google Can’t Grasp by Alex Wright that is excerpted here:

“…To extract meaningful data from the Deep Web, search engines have to analyze users’ search terms and figure out how to broker those queries to particular databases… 

That approach may sound straightforward in theory, but in practice the vast variety of database structures and possible search terms poses a thorny computational challenge.

‘This is the most interesting data integration problem imaginable,’ says Alon Halevy, a former computer science professor at the University of Washington who is now leading a team at Google that is trying to solve the Deep Web conundrum.

Google’s Deep Web search strategy involves sending out a program to analyze the contents of every database it encounters…

In a similar vein, Prof. Juliana Freire at the University of Utah is working on an ambitious project called DeepPeep (www.deeppeep.org) that eventually aims to crawl and index every database on the public Web. Extracting the contents of so many far-flung data sets requires a sophisticated kind of computational guessing game.

‘The naïve way would be to query all the words in the dictionary,’ Ms. Freire said. Instead, DeepPeep starts by posing a small number of sample queries, ‘so we can then use that to build up our understanding of the databases and choose which words to search.’

Based on that analysis, the program then fires off automated search terms in an effort to dislodge as much data as possible. Ms. Freire claims that her approach retrieves better than 90 percent of the content stored in any given database. Ms. Freire’s work has recently attracted overtures from one of the major search engine companies…”

Copyright 2009 The New York Times Company

See how the DeepPeep search interface looks like now here:

http://www.deeppeep.org/index.jsp 

deeppeep2

About these ads

Actions

Information

One response

14 04 2009
brian despain

I think Deep Peep’s approach loses the contextual relevance of the Deep Web. The Deep Web isn’t the same as the surface web. The information is more structured, more useful and in the end you cannot solve the problem of the deep web by using the same techniques used to solve the surface web problem. The problems are fundamentally different.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 689 other followers

%d bloggers like this: