The New Google-Amazon Overload

Is it high time to turn off our computers? Have wired campuses created a critical impasse of "information overload"? In "Knowing When to Log Off"in the April 22, 2005 Chronicle of Higher Education, Jeffrey R. Young asks whether we now devote so much time and attention to contemporary e-mail, online databases, and electronic texts that we are in serious danger of undermining our traditional habits of reading, researching, and thinking. Whatever the response by individual Poe scholars to these questions, it should be abundantly clear that the rapid and near-universal spread of contemporary (postmodern?) digital literacy in the last decade threatens is altering modern print culture -- just as that print culture had once altered medieval manuscript culture, and even as writing itself had once threatened classical speech and memory. Surely the novelty of quick and free scholarly information has not yet worn off, and computer addiction (and its opposite, computer avoidance) is far from unknown. But in this column I would like to reach for a balance between the extremes of logging on too much and never logging on. Is it possible to log on intelligently, to enjoy the benefits and avoid the dangers? The first step is to recognize that Poe in cyberspace has expanded in extraordinary ways in the last few months thanks to the two big players to watch, Google and Amazon.

The ultimate tribute to the success of Google as a search engine is that "google"is now a common verb in conversation. A search yesterday for "Edgar Allan Poe"on Google produced an astonishing report of 1,270,000 hits on Google and a similar 1,250,000 hits on Yahoo. The same search today produced 872,000 hits on Google and 1,210,000 hits on Yahoo. But these numbers have little practical meaning apart from bragging rights. For one thing, at a generous 100 hits a day it would take 24 years to browse the Google offering, assuming no changes took place during that time. For another, only a tiny fraction of the claimed hits can be displayed in sequence in any one session, since the browser simply stops after 805 hits on Google and 979 on Yahoo (try it yourself). But in practice most searchers confine themselves to the first few screens, generally containing only a few dozen hits. Nevertheless, the million potential hits are out there and can be targeted by narrowing the search to "Edgar Allan Poe criticism"(322,000 hits), "Edgar Allan Poe Eureka criticism"(4,400 hits) or "Edgar Allan Poe Eureka postmodern criticism"(206 hits).

To avoid the many amateur and redundant Poe sites, instead of http://www.google.com, try the more recent service, http://scholar.google.com,which limits itself to printed, academic sites. Here the results would be "Edgar Allan Poe"(2,900 hits), "Edgar Allan Poe criticism"(892 hits), "Edgar Allan Poe Eureka criticism"(34 hits) or "Edgar Allan Poe Eureka postmodern criticism"(7 hits) -- much more manageable results and evidence that less is more. What you will find will vary in nature: it may be a full text, a description, or just a citation. In any event, Google is clearly trying to be more than a better Yahoo, striving only to be the most powerful search engine: it is attempting to overcome its initial limitations as a global word index or concordance, one that tells you where the words Edgar, Allan, Poe, Eureka, postmodern, and criticism appear together on one page but not responding to the implied question, "What is the state of postmodern criticism of Edgar Allan Poe's Eureka?"In trying to overcome these limitations, Google is adding search prefixes, such as book (or books) for publications, define for word definitions, and link: for site addresses. In addition, it understands common synonyms: type in (~child) and you will get results for child, children, kids, childhood, youth, etc. In addition, Google can save your display preferences: if 10-hit screens are too skimpy for you, then increase you personal default up to 100 hits. So much for the old Google.

The new Google, perhaps the Google of the future, was announced in the fall of 2004 in the form of the Google Library Project, a collaboration with Michigan, Harvard, Stanford, and Oxford universities and the New York Pubic Library to scan and then make freely available on the internet the texts of 15 million books, all out of copyright, all printed before 1923 in the United States and the 1900's in Europe. It has been reported that Google will bear the costs, which are said to be about $10 a volume, and that the project will take about five years to complete. Google will accept the books themselves with permissions and then scan them, returning both book and a digital copy to the originating library, or it will accept the library's digital scans in the familiar PDF format. Many questions remain to be answered, however. Will entire books be made available, or only portions according to some quota? Since PDF files can be quite large (a sample 11 page magazine article I tested was half a megabyte), what sort of download capability will be necessary to access a complete book, and how long would it take? Although conversion of images to electronic text by OCR (optical character recognition) for purposes of searching, copying, or printing is a standard feature of Adobe Acobat, the home software of the PDF format, the accuracy of such texts is always open to question, especially where the printed originals have uneven, dirty, unusual, or complicated typefaces -- as is often the case in older editions, precisely the kind to be used in the library project because of copyright restrictions on newer editions. Moreover, OCR cannot distinguish the author's end-of-line hyphens from the typesetter's end-of-line hyphens, causing problems for the textual scholar when the syllables created by the latter happen to make sense as words. It is expected that the electronic texts created in this way might have links to book vendors as well as some sort of advertising.

It is a compliment to Google that a whole industry has spring up around it. The site http://www.logoogle.com sells Google books (there are now fourteen) and Google merchandise. For the technical minded, http://www.kuro5hin.org/story/2005//95844/59875 gives hints on how Google uses "cookies"to identify its users, explains how they can be faked to extend results, and suggests ways in which Google responds differently to book titles with and without quotation marks.

To stay competitive, Yahoo is exploiting the structural weakness of Google as a word index by adding subtopics as an editorial feature. The Poe subtopics I encountered in one recent search included biography, poems, The Raven, short stories, pictures, quotes, The Tell Tale Heart, criticism, Annabel Lee, works, The Fall of the House of Usher, books, bibliography, Annabelle Lee, death, literary criticism, information, The Pit and the Pendulum, life, timeline, To Helen, history, facts, Eldorado, info, Masque of the Red Death , critics, The Purloined Letter, museum, themes, autobiography, The Cask of Amontillado, The Murders in the Rue Morgue, essays, The Cast of Amontillado, Lenore, The Black Cat, complete works, awards, analysis, The Bells, The Gold Bug, reviews, when born, critiques, A Dream, literary works, when did Poe die, short biography, the sleeper, writing style, images, obituary, lesson plans, newspaper article, interview, Ligeia, literature, and Hop Frog. This collection still retained a somewhat miscellaneous character. Nevertheless, an attractive feature of Yahoo remains its directory capability: Poe's has 78 hits.

It was Amazon, not Google, that took the lead in the major new projects. About a year before Google announced its Library Project, Amazon launched its Search Inside the Book project (SITB) in October 2003, initially covering 190,000 books from 190 publishers. Here sample pages of books still in copyright -- where Google fears to tread -- are offered gratis on the internet. Although the conventional wisdom is that making electronic copies free will cut printed book sales, early reports are that the procedure actually stimulated rather than hurt book sales. Since the Amazon home page is unusually complex, being one of the most interactive user interfaces of any web site -- energetically using "cookies"to identify visitors and thereby customize the information according to the pages opened on previous visits, to your purchases previously made, and to purchases in addition to yours made by other visitors -- it may be easy to overlook the Search Inside the Book feature. Or if you do explore it, the display may seem similar to copyright-observant page-tease policies elsewhere that are limited to browsing the title page and Table of Contents and to showing just vendor's descriptions or some responses of other readers. But Search Inside the Book is genuinely different in how it surmounts the supposedly insurmountable copyright barrier. You may use SITB on the particular book page itself or from anywhere elsewhere on Amazon.com. Start by searching for a distinctive character, place, word, or phrase. The internal concordance will return a list of places where the search string appears in KWIC (key word in context) format, surrounded by two lines of text, including the page number, if necessary showing multiple books titles. Such a search for snippets can be useful in research to verify page numbers in past research, to locate alternate instances, and to see each in a bit of context. But here's where SITB gets interesting: each of these concordance items also serves as a link to its own entire page in PDF format. Even if the reader is supposedly limited initially to 20% of book pages within a given month, this is a major innovation in electronic research, despite the possibility of minor differences between pages as images and pages as electronic text. So SITB becomes a way for researchers to sample pages of books still under copyright. And for those whose budget matters more than author royalties, Amazon also provides links to sellers of used copies of books sometimes offered for little more than the price of postage.

But as Amazon itself was quick to realize, the marketplace bazaar atmosphere of amazon.com was not conducive to research or academic book sales. So it launched an informational site, http://www.a9.com, a must see address that is active now. Through A9, Amazon distributes licensed electronic informational material from existing channels (there were 161 of varying significance in April 2005). The user can select whether to display these channels as full screens, shared columns, or not at all. In my first visit I was identified as an Amazon book buyer and found the screen divided among five channels: the web, SITB, images, movies, and reference. The web channel came from Google, Search Inside the Book(r) from Amazon.com, Images from Google, movies from Internet Movie Database, and reference from GuruNet (definitions), American Heritage dictionary, Columbia Electronic Encyclopedia, and a Chronology by Daniel S Burt. On the occasion of a first visit, do take time to read about the features of A9.

I added the optional channel for Books and hid some of the others. Searching for "Edgar Allan Poe,"I was not able to determine the order in which the 12,130 claimed Poe titles were displayed, the vast majority of which were of no interest to me at the time. Then the first title to strike my eye was the 1997 re-issue of A. H.Quinn's biography. Using Search Inside the Book I found references to Lowell (79), Longfellow (44), and T. D. English (10). It was then that I discovered that the search hits also acted as links or reverse concordance to entire pages that could be read as graphical images. At first I assumed that only older material (Quinn was first published in 1941) was available in this way. But then I found from the 1980s two useful titles from the Library of America, Edgar Allan Poe: Essays and Reviews (1984) and Edgar Allan Poe: Poetry and Tales (1984), as well as Critical Essays on Edgar Allan Poe, edited by Eric W. Carlson (1987). From the 1990s I found The Poe Log: A Documentary Life of Edgar Allan Poe 1809-1849 by Dwight Thomas and David K. Jackson (1995) and Edgar Allan Poe and the Masses by Terence Whalen (1999). The two most recent titles that struck my eye in my cursory first search were A Historical Guide to Edgar Allan Poe, ed. J. Gerald Kennedy (2001), The Cambridge Companion to Edgar Allan Poe, ed. Kevin J. Hayes (2002).

Evidently A9 and Amazon were integrated in their use of Search Inside the Book. I could also use Amazon to focus on a particular title and then access SITB from there. It is also evident from A9's use of Google channels than Amazon and Google have discovered their synergy in dividing the electronic information spectrum largely along the fault lines of copyright protection. In addition, A9 is cooperating with AOL, which owns Netscape, since an A9 toolbar is available in Mozilla. I only had time to glance at the research channels in A9, but it was apparent that they used high quality sources suitable for university and professional use. Unlike Microsoft's Encarta, originally based on the Funk and Wagnalls high encyclopedia, A9 uses the American Heritage dictionary, Columbia Electronic Encyclopedia, and Daniel S Burt's excellent chronology.

In the current atmosphere, other major new projects are announced almost monthly. In April 2005, the Library of Congress and National Endowment for the Humanities are sponsoring the National Digital Newspaper Project, initially to cover U. S. newspapers issued in 1900-1910, including 1909, regarded as the high watermark for U.S. newspapers. The project, to create PDF files and etexts, in collaboration with the holdings of the universities of Virginia, Florida, Utah, California (Riverside), Kentucky (Research Foundation), and the New York Public Library, may be expanded to the period 1836-1922.

Of course, it takes more than Google and Amazon to make a world. Two catalogs of topic-specific search engines and databases are http://www.GeniusFind.com and http://www.Beaucoup.com -- the latter featuring http://www.beaucoup.com/3refeng.html,a source for search engines and databases in that specialize in literature. http://AskJeeves.com has improved much in last year or two, specializing in answers to natural language questions. Another natural language answer engine for queries is http://www.brainboost.com. The site http://www.InfoPlease.com -- evidently a shortening of Information, Please! -- contains almanac-based information. Professional standards are maintained on http://Www.lii.org, the Librarians'Index to the Internet, which has detailed descriptive entries, each signed and dated, featuring the unusual use of Library of Congress subject headings, with links to similarly classified material. Am open text encyclopedia generated by users is http://www.wikipedia.com -- if you don't like an entry, edit it yourself! A very useful compendium of tips on web information services is http://www.Batesinfo.com/tip.html -- where I found many of these useful hints on information services.

This page is available with live links at http://andromeda.rutgers.edu/~ehrlich/poe/.

Heyward Ehrlich, Dept. of English, Rutgers University-Newark