Breaking down meaning of words, rather than spelling, may create effective search engine using conversational phrasing instead of keywords.By Eric Auchard
SAN FRANCISCO (Reuters) - Powerset on Sunday unveiled tools for
searching Wikipedia that use conversational phrasing instead of
keywords, marking the first step of its challenge to established Web
search services such as Google.
Powerset's technology breaks down the meaning of words and sentences
into related concepts, freeing users from always needing to type the
exact words they want to find.
The closely watched Silicon Valley start-up is offering a way of
searching millions of entries in Wikipedia's online encyclopedia,
helping users find detailed answers to questions rather than isolated
links that require further research.
For example, a user who wants to know how many wives King Henry VIII
had (six, or two, depending on your definition of marriage) can find an
answer via Powerset's service at tinyurl.com/5qpcr9/.
San Francisco-based Powerset is looking to leapfrog the current
generation of services that rely on keyword searches such as Google
Inc, Yahoo Inc, Microsoft Corp and IAC InterActiveCorp's Ask.com.
"The Wikipedia is becoming a microcosm of the most useful parts of
the Web," said Greg Sterling, an Internet analyst with Sterling Market
Intelligence. "This offers a powerful way to find what you are looking
for against this subset of the Web."
While still a far cry from letting users search the World Wide Web,
Powerset is using Wikipedia as a trial showcase for how its technology
can be used to search a vast number of other websites using natural
language phrases or questions.
Over time, it aims to partner with other high-quality data sites
where information can be organized in a question and answer form that
lends itself to Powerset search techniques. Examples might include
financial or patent filings, the CIA Factbook or Wikipedia-inspired
clones, company officials said.
Powerset, which can be found at www.powerset.com/,
looks beyond words to try to understand conceptual relationships that
get closer to what a user may be searching for. It analyzes each
sentence and whole documents to do so.
Powerset plans eventually to make money selling advertising
alongside its search services. But for now, the 60-employee company
consists almost entirely of computer scientists and linguists. It has
no advertising staff and only a handful of marketing and support staff.
Sterling said it is likely to take years for Powerset to be able to
search the Web on the scale Google now does using statistical ranking
techniques to find relevant Web links.
"What I don't know is how Powerset will perform on the wide open
Web. In a sense, this is a massive prototype using the relatively
structured information of Wikipedia. It is difficult to compare to what
Google has built," Sterling said.
Sterling said a bigger danger to Google would be if rival Microsoft
were to acquire Powerset and incorporate it into other search
technologies it has. Recently, Microsoft backed off a $44 billion bid
for Yahoo to create a formidable rival to Google in Web search and
online advertising.
"This could become the basis of a Google-killer," Sterling said. "Someone like Microsoft might want to buy Powerset."
Spokesmen for Microsoft and Powerset declined to comment on rumors of a potential tie-up between the two companies.
Powerset offers richly annotated ways for searching inside Wikipedia
entries to find related concepts. Called "Factz", these related ideas
generate outlines, summaries and automated answers to users' questions.
"Our system is a little more forgiving," Scott Prevost, general
manager of Powerset, said in an interview on Sunday. "It is not looking
for hard-word matches. We are not searching for exact words, but
concepts," he said.
The 2-1/2-year-old start-up licensed natural language processing
technology and related machine processing methods developed over three
decades at the Xerox PARC research centre in Silicon Valley to create
new consumer Web search services.
With tacit approval of the non-profit Wikimedia Foundation, the
organization behind the Wikipedia, Powerset officials said they are
hosting a copy of Wikipedia's 2.5 million English-language entries on
its own computers. This lets Powerset make links across the breadth of
Wikipedia data.
"What Powerset is doing is offering readers a natural-language
search interface, and we think that is an interesting experiment," Mike
Godwin, Wikimedia Foundation's general counsel, said in response to an
emailed question about how the two organizations would work together.
In addition to Wikipedia, Powerset's new service also searches a
related database called Freebase created by MetaWeb, another Web search
start-up.
After decades of research and debate, natural language processing is
finally poised to go mainstream, predicted Barney Pell, co-founder and
chief technology officer.
"2008 is the year that semantic and linguistic technologies cross over into widespread consumer use," he said.
(Editing by Louise Ireland)
© Thomson Reuters 2008 All rights reserved