Guba.com, a video site that hosts user video, has developed "Johnny," a software tool to filter and block copyrighted content in cooperation with the Motion Picture Association of America.Guba.com, a video site that hosts user video, has developed "Johnny," a software tool to filter and block copyrighted content in cooperation with the Motion Picture Association of America.
Guba which began offering downloadable Warner Bros. films in June, said its new "Johnny" tool will be made available to other hosting sites to block a list of copyrighted movies and TV shows that it received from the MPAA.
With sites like YouTube and Google Video accepting video from potentially thousands of users each day, the odds are that some copyrighted content will sneak through are significant, especially with site operators unable to visually check all of them as they come in.
As an eight-year-old company that originally specialized in searching the Usenet message boards where pictures, video, and music are encoded in alphanumeric strings that must be decoded and pieced together to form the original file Guba executives said they began developing "Johnny"'s predecessor years ago.
"We've had to deal with copyrighted content and later DMCA notices since the very beginning," said Bart Myers, the vice president of operations at Guba.
"We've developed this tool -- this is a problem many other companies are facing: the open posting of content," Myers added. "
From an online standpoint, it's the whole online business model that stands in the way, the Napster-ization of content. If you're online and do a search for 'South Park' 250 videos of South Park pop up. If this technology helps build a premium marketplace of content online, so much the better."
During Guba's early days, users came up with more and more sophisticated ways of manipulating images, such as adding text, and cropping the images. As they did so, the company began looking for ways to automate the detection process. The first image detection program turned images into MD5 hashes, a cryptographic tool also used to check for file integrity. The hash information treated the image as a graph, evaluating the image using a number of different attributes including image size, color variance, and other factors, Myers said, all mapped together.
Copyrighted images with hashes that matched other, unknown images were flagged for examination, and discarded if they violated the site's terms of service.
"Since then, we have taken the core concept and applied it to video," Myers said.
That program, known as "Johnny" (the original Guba author named it after the Keanu Reeves sci-fi flick Johnny Mnemonic, rather than Short Circuit's Johnny-Five, or the famous line from The Shining, Myers said) works both on snippets of video, as well as trailers, half-hour episodes, or full-length movies, Myers said. The site receives a watch list of copyrighted movies and other materials from the MPAA, and cross-checks it against its hosted video, he said.
One of the keys that enables the process to work is the fact that Guba accepts video in a variety of formats, but transcodes it into its own file format. The automated process, which routes the video through the Guba servers, allows the company to create a hash during the process.
"Johnny" takes a series of snapshots, not of the video, but of the video hash, recorded every few minutes, and turned into a signature file. As new videos are uploaded, the files are cross-checked against the signature files.
"Even if it's a clip, we'll catch it," Myers said. A longer clip, which offers more opportunities to sample, is easier to detect. Even if the clip isn't immediately identified as a copyrighted video, however, the file can be flagged for human review.
Continued...
The site tries to strike a balance between user-generated content and copyrighted material. In some cases, the content is obvious: an episode of Lost or Battlestar Galactica, for example. The MPAA provides Guba with a list of keywords to track, as well; tagging a file with the unique word "Galactica" will quickly flag it for review. But Myers also said the site will not immediately remove a user-generated parody of Lost, however.
"Johnny" will be made available to other video hosting sites, although the company hasn't said if it will license the technology.
"Providing consumers legitimate ways to get movie and television programming online is essential to our industry," said Dan Glickman, the chairman and chief executive of the MPAA, in a statement. "Collaborating with Guba has given us an opportunity to test new technology that will help ensure consumers can freely share videos without being exposed to illegal programming, which could lead to copyright infringement. We hope that other such sites will employ similar technology which allows them to conduct legitimate online businesses while protecting the creations of thousands of people who work in the entertainment industry."
"Johnny" is but of a series if filters the site uses: other tricks include identifying users who post a large percentage of copyrighted content, or looking for specific terms.
But since the technology requires the content to flow through Guba's servers, it won't likely be used to defeat peer-to-peer piracy, since individuals trade files among themselves. If a file were to pass through a server with the "Johnny" software enabled, however, detection might be performed on the completed file.
"It was great in the old days, where you could just cue up The Daily Show," Myers said. "But those days of 'cowboy content' are over."