Tuesday, February 10, 2009

Google’s book effort stuffs the copyright act - TECH.BLORGE.com

Google’s book effort stuffs the copyright act - TECH.BLORGE.com 

Google’s book effort stuffs the copyright act
By Gareth Powell

Google will bring its e-books to mobiles. You will pay. The idea is that Google will make its vast library available to cell phones. American newspapers, which have been taught to sit up and beg when Google announces almost anything, plainly has absolutely no understanding of the reality of the case and why, in most cases, it is a total nonsense.

The company said, ‘We are excited to announce the launch of a mobile version of Google Book Search, opening up over 1.5 million mobile public domain books in the US (and over half a million outside the US) for you to browse.’ That is PR flackery from Flackery 101.

The reality is very different.

Initially, the books are scanned in at very high speed with no thought as to accuracy or to placement. In many cases they are totally unreadable. Scanning a book goes through two stages.

First the book is scanned.

Then the book is proofread.

In the initial stages Google did not proofread any of its books. Any. Ever. In any way.

One title checked in this office a few months ago had 1,313 errors in it. If a book publishing company put out a title with that many errors it would be laughed out of court.

Compare with Gutenberg which has an army, a positive army, of proof readers to see that they books are correct and readable.

Then Google, harassed by American publishers issued a Google Book Search Settlement Agreement which totally trampled over the Berne and Geneva copyright conventions. It was written in a type of legal flackese which suggested, falsely, the concept of Evergreen Copyright. You scan in an out-of-copyright book and Google cannot display it in its whole form.

Google did this because the publishers went ‘boo’. The publishers, had, in the main, done the square root of of sod all. But Google is gutless and someone had worked out the cost of proofreading and wanted to pass it on to the eventual user — you the mug punter.

Google,  in a post on Thursday on the Google Book Search blog, said mobile versions of the books could be read on devices such as the Apple iPhone or T-Mobile G1, which is powered by Google’s Android software. It made no mention of cost of reading out of copyright books.

As far as can be ascertained Google has decided to rewrite the copyright law to get around some of its proof-reading problems.

Take The Yangtze Valley and Beyond by the amazing Victorian adventurer Isabella Bird. Earnshaw Books publishes it but does not claim copyright.  It cannot do that. The book was originally published in 1899.

Google thinks it can. Google, on its own behalf, has in effect rewritten the Copyright Law so that any book republished acquired a new copyright from the new date of publication. Congress knows nothing of this. Nor does the Copyright Office in London.

Google said, ‘These new mobile editions are optimized to be read on a small screen. With this launch, we believe that we’ve taken an important step toward more universal access to books.’

Stuff and nonsense. Rubbish. Knickers and bum and worse swear words I can think of.

Yes,  I have books scanned into my mobile phones. Yes, I read them. But FIRST, they are proof read. Properly proof-read so that they are intelligible, useful, add to the sum of knowledge. And I pay for the books if they are in copyright.

The wunderkind at Apple know not of proof-reading. (There is an allegation that none of them can read but this should be ignored.)

To proof read a book to any sort of acceptable level costs — to use a Fermi figure which will do for this calculation — $500.

So according to the Google flacks 2 million books are to be hurled at the unsuspecting public. To get them into properly readable shape would cost a billion dollars.

The reason why Microsoft stopped scanning in the vast library of pictures it had acquired and deposited them in a damp-proof Canadian mine is it found it cost, roughly, $50 a picture to scan them properly. The costs are nigh on unbearable.

If it were possible, and it is not, every book in the English language about China would be on the Internet under the colophon of Earnshaw Books. It has not happened simply because to make them readable — leave alone researchable — they need to be proofread.

What SHOULD happen now is the smug, self-satisfied sods at Google should look at Gutenberg and see how books should be prepared for portable readers.

Google’s announcement comes just days ahead of the expected unveiling by Amazon of a new generation version of its electronic book reader, the Kindle, at a New York press conference on Monday. (The PR people have put the word ‘popular’ before electronic but refuse to give sales figures. One wonders why?)

Drew Herdener, an Amazon spokesman, told the Times, ‘We are excited to make Kindle books available on a range of mobile phones. We are working on that now.’

What work is required? If they are proof-read well enough for Kindle — a name that always brings to mind the Nazi book burning — dropping them over to a mobile phone is a matter of moments work.

The Amazon spokesman did not provide any further details. Possible because there were none. But a fair bet is that we will be charged. And Google and Amazon will say, smugly, ‘Look how wonderful we are bringing out-of-copyright literature to the masses for only quite a small charge.’

Bah! Humbug!

What follows need not be read but is included for historical interest. I wrote to David Pogue of the New York Times 8, November 2008. What follows is an edited version:

Google Books cannot continue as it is set up. It is not the legal side. It is the cost and the usefulness of the end product.

An Historical, Geographical, and Philosophical View of the Chinese Empire by William Winterbotham. I know this is not an easy book to handle being first published in 1795.The .pdf scan by Google itself is not that bad although fingers appear where they have been holding down pages and the map is a bad joke and in places the text runs diagonally across the page. OK, it is not great scanning but you can work around it.

When you scan at high speed you get these problems.

It is when Google turns the scanned text into ordinary text using OCR that farce becomes tragedy.

There are just over 1,200 mistakes in this book.

I am not talking about the spelling style of the time — vaft for vast — but error on error on error.

I attach a small part of the index to show the style.

To proofread such a book in any country is going to cost $500 and up. But unless some sort of correction is done the OCR version is effectively useless.

Google says it is scanning more than 3,000 books a day and it already has over one million books which are in the public domain. To make those useful as text will cost half a billion dollars.

Yet, if this is not done scanning the books with a high speed camera means that some pages are not scanned or scanned so they are unreadable.

Although, I think, this is generally understood what has not been made clear is that even when Google is dealing with an out of copyright book it may still be producing an unreadable mess. Possibly, but not probably usable in .pdf. A bad joke where it has been OCR’d.

StatCounter - Free Web Tracker and Counter

Google’s book effort stuffs the copyright act - TECH.BLORGE.com

No comments: