Thursday, May 28, 2009

Books For Free: The Gutenberg Project

. OK: Yet Another Postponement Alert (YAPA). For Folk Music of the Sixties - please stay tuned. But first, this. Perhaps some of my readers will sympathize with me. I look with great dismay at the disappearance of many of the things that I enjoyed the most from popular culture. I'm not talking about Popular Culture; just popular culture; the collection of ideas that form a potential base for any sort of a discussion with say, someone you meet on a bus: TV shows, movies, books, radio shows, personalities, events.  It makes me sigh; nobody knows the delights I've seen ... it's but a step to Swing Low, you know what I mean? Take me, Lord; or perhaps: Beam Me Up, Scotty. I am definitely not wallowing in nostalgia, at least not in the first place. I think of nostalgia as being the longing for bygone times. What we have here is the loss of a common culture. (Of course, the fact that I don't have TV makes it worse; otherwise I would have at least the 2009 equivalent of "Plop, plop, fizz, fizz!" with which I could make a stranger smile.) Oh sure; you have "knock-knock" jokes now, but how will you feel ten years hence, when nobody knows what they are? Think about it. Sometimes even terrible things need to be preserved. One major problem is that, of course, people rarely have the time or the energy to actually read paper books, and certainly not the books that the Baby Boomer Generation (the generation born immediately after World War 2, mostly blamed on the fecundity of returning GIs) cherished, and collected in their homes. It seems trite to call these books "classics"; the word has absolutely no meaning any more, simply because it is used for so many different things, from music, to literature, to cars, and to Rock and Roll. The attempt to create a Canon of Western Literature met the need to Publish or Perish in a huge clash called Post-Modernism, which left very little in its wake except systematic destruction of classifications of everything. So we're simply talking about books --whichever of them that happen to be now in the public domain, I must add-- that were held in high regard by SOME sector of the population in the past, and has been selected by SOME individual or committee as being worthy of preservation. [I will try and put the selection process here if I discover it.] To preserve copies of the actual books themselves would be moderately useful, of course. But one would have to travel to various locations for the only copy of a book that's available. The books are perishing, for the usual reasons that books do perish: the paper deteriorates because of its acid content, the air affects it, the bindings fall apart. Also, more recent publications, considered more useful to younger generations, need the space. The first idea was to scan the books. After the books have been selected to be scanned, the scanned pages are placed in various archives in image form, e.g. "jpg" or "png". The Second stage is to run the scanned images through an OCR scanner, where OCR stands for "optical character recognition", which is a kind of program that splits images into lines, and each line into individual characters. These programs have been available for decades, and have become very intelligent, and can cope with very poor image-text indeed. The Third, Fourth and Fifth stages have to do with visually checking the mechanical OCR output, which is a page of text in a file, processing the text to make sure italics, bold and other formatting, images, footnotes, etc are all handled appropriately, and a final PDF file or HTML file created. These files are finally indexed and cataloged, and made available to the public for absolutely free on the WWW. Anyone who is interested in this entire process, and in browsing through the available titles should visit Project Gutenberg, which serves as a home-page for the entire broader books-to-e-text movement. On a recent visit to this page, I found myself interested in the process of how these books were created. One of the most creative ideas to come along is to DISTRIBUTE the PROOF-READING of the scanned book pages to thousands of internet volunteers around the globe. It seemed reasonable to believe that my labor was worth more than my contribution of a few dollars to the project, and so I followed the links, and soon found myself volunteering to be a proof-reader. I am now in my fourth day, still on trial, and struggling to get qualified as an authorized proof-reader. At present all I'm allowed to do is to check the OCR output, and nothing else. It takes three weeks of experience to move ahead to higher levels of participation. In these troubled and cynical times, one needs something to counter the despair that occasionally sets in, and every bit of positive information is infinitely precious. I urge all visitors to this page to sign up to proof-read Project Gutenberg books at the Distributed Proofreading page. If you're unsure of whether you have the skills or the energy to volunteer --they only ask for a few pages at a time-- you can try it out on this page. (Your first few attempts will be checked by a qualified proof-reader anyway.) At the very least, download a book or two, and enjoy reading them! P.S. Johannes Gutenberg was, of course, the first to invent the printing press, with which he printed a Bible; see picture at top. He is showing a favorite page from his Bible, which he printed in German. Gutenberg's invention is considered a crux of Western Civilization, and for good reason. By the way: an archival copy of every piece of writing that is copyrighted must be deposited with the Copyright Office, and ultimately in the Library of Congress. This procedure is, I understand, being reviewed. Archimedes

No comments:

Final Jeopardy

Final Jeopardy
"Think" by Merv Griffin

The Classical Music Archives

The Classical Music Archives
One of the oldest music file depositories on the Web

Strongbad!

Strongbad!
A weekly cartoon clip, for all superhero wannabes, and the gals who love them.

My Blog List

Followers