Friday, June 25, 2010

Google releases 500 scans of Ancient Greek and Latin texts for research



As an undergraduate I dabbled in Classics, and I remember being surprised by the term hapax legomenon (ἅπαξ λεγόμενον). That's "written once" -- a word that occurs in only one place in the written record. It seems impossible, but happens surprisingly often: over 300 words in the Iliad appear nowhere else in Greek. So much has been lost (all but 7 of Sophocles' 123 plays, for instance) that every text that survives is precious. They communicate the self-understanding of their cultures -- which helped shape the modern world -- and have commanded scholarly attention for centuries. For these artifacts of a long-vanished world, passed down by generations of hand copying, merely establishing the text requires careful study of crabbed handwriting and critical comparison of divergent copies.

Modern scholars of Ancient Greek and Latin, continuing in this tradition, are working to create comprehensive electronic editions of these texts. For anyone who remembers studying Latin the old way, constantly paging through a dictionary, these electronic texts are a revelation. Now we have Caesar's Gallic Wars (Perseus Digital Library) with every word parsed and translated, along with linguistic commentary and a collection of references to the text from other works. We can read about Sophocles’ 123 plays in the Stoa Consortium's electronic edition of the Suda, a 10th-century Byzantine Greek encyclopedia. And scholars around the world can now consult a high-resolution digital scan of Venetus A, one of the best manuscripts of the Iliad, at the Center for Hellenic Studies.

I'm pleased to announce that Google Books is now assisting this work by sharing high-resolution digital scans of over 500 volumes of Ancient Greek and Latin, dating from the sixteenth through nineteenth centuries. (Of course, downloadable versions of over a million volumes in all fields are available from books.google.com, in a more compressed form.) Jon Orwant and I created this collection using a list of several thousand important Classics volumes identified by our collaborators Professor Gregory Crane and Alison Babeu of Tufts University. We are analyzing additional volumes and expect to be able to release more high-resolution scans in the future.

These scans will aid the development of accurate OCR (Optical Character Recognition) algorithms for Ancient Greek, and provide the basis for electronic versions of important editions of these Classics texts; but perhaps their greatest value will be for the development of new methods in this emerging field. We’re honored that Professor Crane called this donation “a major contribution to what scholars can do.”

No comments:

Post a Comment