Search - Publish.com
Publish.com Ziff-Davis Enterprise  
SEARCH · ONLINE MEDIA · MOBILE · WEB DESIGN · GRAPHICS TOOLS · PRINTING · PHOTO · TIPS · OPINIONS
Home arrow Search arrow Google, Methinks Thou Doth Search Too Much
Google, Methinks Thou Doth Search Too Much
By Stephen Bryant

Rate This Article:
Add This Article To:
Blog Post: Updated: When you accidentally index 5.5 billion pages of spam, something's rotten in the state of Googledom.

Google Search is good. Maybe too good?

Reports from the Monetize blog last weekend demonstrated that a spammer -- since revealed to be a Moldavian malcontent -- managed to get 5.5 billion spurious pages indexed by Google. Google has removed the pages from their index, and are reportedly working on a solution, according to several sources.

Google, however, has not returned requests for comment.

The Moldavian spammer's pages, which were created using a variety of readily-available and easily learned techniques, were also partially indexed by Yahoo. MSN's new live search, meanwhile, indexed only 18 pages.

"It is important for the entire Search industry to continue to focus on reducing spam, and this an area our development teams are working on as we build out Windows Live Search," said Justin Omer, senior product manager for Windows Live Search, in an e-mail to Google Watch. "Our goal is to help our customers find the information they are looking for vs. useless information that clog results pages and reduces relevancy."

Yahoo, for its part, said in an email to Google Watch that the number of spammer pages indexed was "several orders of magnitude lower" than the numbers reported on John Battelle's Searchblog. However, that particular entry did not specify how many spurious pages Yahoo indexed. Yahoo has not returned a request to clarify this statement.

Representatives from Yahoo did say that "a very tiny percentage of our index was impacted by these spam sites and most of these spam sites are already eliminated. We believe our user experience was not compromised."

The big problem, however, remains to be Google, which apparently indexed more spurious domains faster than any other search engine. Some critics see Google's indexing problem as more evidence that their core business is hurting. Others have noticed that Google is having problems with their site: operator, which allows a user to search a single domain or multiple domains for keywords.

"Since they introduced the new infrastructure, it has been a state of constant change," said Merge Database's Alex Firmani in an IM interview. "It is like swapping out the engine of a car while it is still running... in fact, going 80 MPH down the highway. So now Google has swapped the engine and all sorts of flaws are showing up. Flaws that were not evident in their previous infrastructure, but are hurting the new methodology."

Firmani also noted that Google, which recently made major changes to its indexing engine, dropped quite a few non-spam sites from its index at the same time it was indexing the spurious results.

In some cases, Google has also reverted to two-year old page caches for sites.

In April, Google CEO Eric Schmidt told the New York Times that Google's indexing servers are running near capacity. "Those machines are full. We have a huge machine crisis."

The Times revealed recently that Google is building a large facility in Oregon, presumably to house more computing power.






This article originally appeared on Steve Bryant's blog, GoogleWatch.


Discuss Google, Methinks Thou Doth Search Too Much
 
>>> Be the FIRST to comment on this article!
 

 
 
>>> More Search Articles          >>> More By Stephen Bryant
 


Buyer's Guide
Explore hundreds of products in our Publish.com Buyer's Guide.
Web design
Content management
Graphics Software
Streaming Media
Video
Digital photography
Stock photography
Web development
View all >

ADVERTISEMENT


FREE ZIFF DAVIS ENTERPRISE ESEMINARS AT ESEMINARSLIVE.COM
  • Dec 10, 4 p.m. ET
    Eliminate the Drawbacks of Traditional Backup/Replication for Linux
    with Michael Krieger. Sponsored by InMage
  • Dec 11, 1 p.m. ET
    Data Modeling and Metadata Management with PowerDesigner
    with Joel Shore. Sponsored by Sybase
  • Dec 12, 12 p.m. ET
    Closing the IT Business Gap: Monitoring the End-User Experience
    with Michael Krieger. Sponsored by Compuware
  • Dec 12, 2 p.m. ET
    Enabling IT Consolidation
    with Michael Krieger. Sponsored by Riverbed & VMWare
  • VTS
    Join us on Dec. 19 for Discovering Value in Stored Data & Reducing Business Risk. Join this interactive day-long event to learn how your enterprise can cost-effectively manage stored data while keeping it secure, compliant and accessible. Disorganized storage can prevent your enterprise from extracting the maximum value from information assets. Learn how to organize enterprise data so vital information assets can help your business thrive. Explore policies, strategies and tactics from creation through deletion. Attend live or on-demand with complimentary registration!
    FEATURED CONTENT
    IT LINK DISCUSSION - MIGRATION
    A Windows Vista® migration introduces new and unique challenges to any IT organization. It's important to understand early on whether your systems, hardware, applications and end users are ready for the transition.
    Join the discussion today!



    .NAME Charging For Whois
    Whois has always been a free service, but the .NAME registry is trying to change that.
    Read More >>

    Sponsored by Ziff Davis Enterprise Group

    NEW FROM ZIFF DAVIS ENTERPRISE


    Delivering the latest technology news & reviews straight to your handheld device

    Now you can get the latest technology news & reviews from the trusted editors of eWEEK.com on your handheld device
    mobile.eWEEK.com

     


    RSS 2.0 Feed


    internet
    rss graphic Publish.com
    rss graphic Google Watch

    Video Interviews


    streaming video
    Designing Apps for Usability
    DevSource interviews usability pundit Dr. Jakob Nielsen on everything from the proper attitude for programmers to the importance of prototyping in design to the reasons why PDF, Flash and local search engines can hurt more than they help.
    ADVERTISEMENT