Wednesday, March 4, 2009

Copyright Holders Challenge Sites That Scrape Content - NYTimes.com

Copyright Holders Challenge Sites That Scrape Content - NYTimes.com 

Copyright Challenge for Sites That Excerpt

By BRIAN STELTER

When the popular New York business blog Silicon Alley Insider quoted a quarter of Peggy Noonan’s Wall Street Journal column in mid-February, the editor added a caveat at the end: “We thank Dow Jones in advance for allowing us to bring it to you.”

The editor added “in advance” because Dow Jones, the publisher of The Journal, had not given the blog permission to use the column. The excerpt was published with the assumption that it would be permitted under the “fair use” statute of copyright law.

Generally, the excerpts have been considered legal, and for years they have been welcomed by major media companies, which were happy to receive links and pass-along traffic from the swarm of Web sites that regurgitate their news and information.

But some media executives are growing concerned that the increasingly popular curators of the Web that are taking large pieces of the original work — a practice sometimes called scraping — are shaving away potential readers and profiting from the content.

With the Web’s advertising engine stalling just as newspapers are under pressure, some publishers are second-guessing their liberal attitude toward free content.

“A lot of news organizations are saying, ‘We’re not willing to accept the tiny fraction of a penny that we get from the page views that these links are sending in,’ ” said Joshua Benton, the director of the Nieman Journalism Lab at Harvard. “They think they need to defend their turf more aggressively.”

Copyright infringement lawsuits directed at bloggers and other online publishers seem to be on the rise. David Ardia, the director of the Citizen Media Law Project, said his colleagues kept track of 16 such suits in 2007. In 2004 and 2005, it monitored three such suits each year. And newspapers sometimes send cease-and-desist orders to sites that they believe have crossed the line.

Some publishers complained last week when Google News, a site that aggregates headlines from thousands of news sources, added advertising to its search results.

Last December, GateHouse Media sued The New York Times Company, alleging copyright infringement after local sites associated with The Boston Globe, a Times Company newspaper, copied the headlines and lead sentences of GateHouse’s newspaper articles. The case was settled out of court in January.

In another case, which is pending, The Associated Press sued the online news distributor All Headline News last year, saying that it had improperly copied A.P. articles.

The legal disputes are emblematic of a larger question that has emerged from the Internet’s link economy. The editors of many Web sites, including ones operated by the Times Company, post excerpts from competitors’ content from time to time. At what point does excerpting from an article become illegal copying?

Courts have not provided much of an answer. In the United States, the copyright law provides a four-point definition of fair use, which takes into consideration the purpose (commercial vs. educational) and the substantiality of the excerpt.

But editors in search of a legal word limit are sorely disappointed. Even before the Internet, lawyers lamented that the fair use factors “didn’t map well onto real life,” said Mr. Ardia, whose Citizen Media Law Project is part of the Berkman Center at Harvard Law School. “New modes of creation, reuse, mixing and mash-ups made possible by digital technologies and the Internet have made it even more clear that Congress’s attempt to define fair use is woefully inadequate.”

For now, Web sites are defining it themselves. Sites like Alley Insider and The Huffington Post are ad-supported businesses that filter the Web for readers, highlighting what they deem to be the most meaningful parts of newspaper articles and TV segments.

Alley Insider, according to its editor in chief, Henry Blodget, operates under a digital golden rule: “To excerpt others the way we want to be excerpted ourselves.” The post about Ms. Noonan’s column, including five full paragraphs, had explicit credits to the author and the newspaper, three links to the source and a direct encouragement to users to read the original column.

Alley Insider doubtlessly exposed new readers to Ms. Noonan’s column, and an unknown number of users followed the links to The Journal’s Web site. But others probably did not follow the link, meaning that Alley Insider alone — and not The Journal — reaped the advertising pennies from the excerpt.

The Huffington Post, the popular news and opinion forum co-founded by the author and columnist Arianna Huffington, is perhaps the star of the excerpting debate. Ms. Huffington’s editors are especially adept at optimizing the site for search engine results, so that in a Google search, a Huffington Post summary of a Washington Post or a CNN.com report may appear ahead of the original article.

“We want to both drive traffic to ourselves and drive traffic to others,” Ms. Huffington said in a telephone interview. Adding that “we are at the beginning of developing the rules of the road” online, she said the site’s editors were “constantly talking” about appropriate excerpting conduct.

To the extent that the site republishes articles produced by other organizations, “we excerpt to add value,” Ms. Huffington said, sometimes by combining articles, videos and transcripts. Much of the Web works this way, skimming quotes and photos from other sources while trying to remain within the provisions of fair use.

Ms. Huffington said that The Huffington Post, which had more than 20 million unique visitors in January, received more than 100 requests for links each weekday from reporters, editors and public relations representatives. “Everybody wants to be linked to,” she said.

That is true as long as readers follow those links. The prevailing wisdom is that content should roam widely online, but lackluster digital advertising of late has called that into question.

That has fueled a round of recent commentaries about payment models for online news. Cablevision, the owner of the Newsday newspaper, said Thursday that it would “end distribution of free Web content.” Hearst, the owner of 16 newspapers, said Friday that it would charge for some content on its Web sites.

Widespread excerpting would seem to make pay models harder to impose. Even more troubling for news organizations is blatant copying. In December, The Huffington Post’s new Chicago off-shoot was accused of copying the full contents of local publications’ concert reviews. Ms. Huffington called it a “mistake made by an intern.”

Other sites copy content from news organizations using automated syndication feeds. The sites typically display text or show ads around the excerpts to make money.

GateHouse’s suit against The New York Times Company contended that the company was “link scraping” by automatically aggregating articles from GateHouse newspapers, to be excerpted on local news sites operated by The Boston Globe.

“They felt that The Globe was benefiting too much from the work of GateHouse journalists,” said Mr. Benton of the Harvard journalism lab. The Times Company denied that it was scraping GateHouse’s site and said that its use of GateHouse content did not violate copyright laws.

It also said that GateHouse’s Web sites copied headlines and other text from Times Company sites. Last month the Times Company agreed to stop copying GateHouse’s headlines and lead paragraphs.

It remains to be seen whether excerpting standards from before the Internet age still apply. Mr. Ardia said that quoting “is often a sign of respect” online.

“The norms are developing outside — or ahead of — the law,” he said.

Alley Insider’s partial republication of Ms. Noonan’s column, for instance, was edited shortly after it was posted online. The reason, Mr. Blodget said, was that the excerpt seemed slightly too long.

Copyright Holders Challenge Sites That Scrape Content - NYTimes.com

No comments: