| ||||
![]()
![]()
www.common-place.org · vol. 3 · no. 3 · April 2003
|
"To 'read' a scanned page, the OCR machine runs a ten-second automated sequence that makes the image high contrast, adjusts the angle of the page, transcribes the text, and reconverts the image to grayscale. If left running day and night, the machine would probably finish perusing the thirty-six thousand books in Evans's Imprints in about nine months." |
American Originals Part I | II It took thirteen years to create Evans's Early American Imprints, but it took only a decade for Boni's Microprint technology to fall out of favor with librarians and their patrons. Although Microprint remains indispensable to historical research today (despite the fact that machines that can read the cards are now few and far between and nearly impossible to repair), by the 1980s institutions favored microfiche and microfilm, which were read on relatively easy-to-use machines that also offered printing capabilities. In answer to the demands of the market, Readex switched to making archival quality duplicates of the original film used for producing Microprint cards, and in 1983, when Connecticut-based NewsBank, which specialized in archiving and filming newspapers, purchased Readex, the company simply expanded Boni's old printing shop in Chester. Newsbank still occupies space in the basement of the AAS, in a dimly lit room crammed with file cabinets, a precariously overstacked desk, and—the very heart of the operation—a gigantic, Recordak (Kodak) microfilm camera. This is where, for the past forty years, Stanley Shapiro, who bears a remarkable resemblance to a trim St. Nick, has run the microform operation, carrying on the work started by Evans, Shipton, and Boni.
Today, Shapiro and his staff still follow the same basic procedure in filming the twelve hundred AAS imprints to be added to Evans Digital. Whatever is to be filmed—from pamphlets to schoolbooks to broadsides—is positioned carefully in a large, jointed cradle under a wide pane of glass. Gently the item is pushed upward with the help of a bicycle chain and a crank until it is pressed flat against the bottom of the glass, at which point the operator photographs it with a camera situated about three feet above. But the film created now is only a master copy to be sent to Chester to be digitized. Naturally, the architecture of Evans Digital Edition resembles in some respects its predecessors'. To begin with, NewsBank's technicians in Chester have chosen to use the non-archival microfiche "originals" from the 1950s and 1960s as source material, since they yield better quality scans, though at a slower rate, than the archival copies made in the 1980s. To effect the transformation to sleek, modern binary code, staff members feed each carefully cleaned fiche into a high-quality Mekel scanner. The Mekel looks like an unassuming beige plastic rectangle attached to a fairly ordinary Dell desktop. Upon loading, a small green replica of the fiche appears on the monitor allowing operators to track each page as the computer scans the image at 400 dots per inch (dpi). The resolution of 400 dpi seems modest, below what consumer scanners offer now, until one considers that, on fiche, each page measures about seven-eighths (or less) of an inch wide. To achieve a final, enlarged resolution of 400 dpi, the scanner is acquiring an astonishingly dense 6,400 dpi, all while separating documents page by page, converting them to grayscale, and tagging them with a unique file name that will help trace their every step toward inclusion in the Evans Digital database.
Because each page is independent in the system, in the next phase of production, a computer running optical character recognition (OCR) software can pluck multiple scans from a holding computer's memory to generate ASCII text. To "read" a scanned page, the OCR machine runs a ten-second automated sequence that makes the image high contrast, adjusts the angle of the page, transcribes the text, and reconverts the image to grayscale. If left running day and night, the machine would probably finish perusing the thirty-six thousand books in Evans's Imprints in about nine months. Even at the highly technical OCR stage, though, a tension between present and past technologies exists. Colonial American documents present a particular problem for OCR because of the quality of originals (many have broken type or show the effects of uneven hits on the press), variations in typefaces (such as the old style s that looks like an f to twenty-first century eyes), abrupt and inconsistent abbreviations (like Quest. for question or play'd and playd for played), and orthography (consider republick or rejoyce or booke). Therefore, before a page travels back to storage, filters "clean" the text with a series of algorithms designed to find and fix errors. While NewsBank places fierce pride in these filters, real limits to the abilities of OCR to master the impressions made by seventeenth- and eighteenth-century metal type remain, evidence of which lies in the Help Section of the Evans Digital portal. There, patrons are instructed to abandon formatting their searches with the character s and instead insert a wildcard character like ?—as in Ma??achu?ett? for Massachusetts.
After being machine read and cleaned, the page image again rests in storage, where it awaits a quality control review and the addition of "metadata," which includes bibliographic data and cross-references. In the final stages of production, staff members call up a fiche's worth of pages looking for Shipton's target cards, which were also filmed and positioned on the fiche at the beginning of each document. Early on, using Evans's American Bibliography as a strict guide, the editorial staff for Evans Digital compiled categories for the information; in addition to creating a structure for the archive's digital portal, these classifications have become part of the metadata that organizes the database. Here, a page from the Bay Psalm Book, for example, would be marked with not only its page number but also the fact that it belongs with the other pages of The Whole Book of Psalmes, a Psalter printed in 1640 by Matthew and Stephen Day of Cambridge, Massachusetts, and electronically filed under cross-references like "Music in Churches" and "Psalmody." (Since every title in Evans Digital Edition is indexed to the AAS's catalog, they are all also keyed to subject headings.) At this point, too, other staff members review pages, comparing them to the microfiche versions and manually adjusting the image for maximum readability. Staffers may reject pages marred by cutoffs, bad skews, or light or broken text. In each case, a rejection means staff will pull the microfiche again and try to improve the scan.
So far, each month, NewsBank's staff has captured about a terabyte—that's over one trillion bytes—of colonial thought—more information than can be stored on ten ordinary desktops. When the project is finished, seven drawers of fiche will need to be digitally stored on the equivalent of over three hundred desktops. But don't go to Chester hoping to see armies of computers assembled to deploy America's founding documents; NewsBank built from scratch several high-end computers for storing the ever-growing Evans, which is also backed up on at least $16,000 worth of digital tape. While on the material level, the digitization of Evans means the shifting of information from one type of plastic to other types of plastic (and some metal), on the experiential level, the change effected is nothing short of enchanting. The interface for Evans Digital Edition, developed with the assistance of a panel of librarians, looks like commercial Websites, with stylish lettering, a tasteful palette, and an uncluttered layout. Screens offer browsing and searching capacities coupled handily together on one page, guided by a comfortingly familiar row of browsing tabs. If an institution decides to invest in the expanded cataloging software developed by the AAS—offered, like access to Evans Digital, at different pricing levels to match the sizes and missions of various purchasers—any hit on an Evans reference in a library catalog would produce a direct link to the electronic image and text of the document. Once a patron locates an item, she can look at full citations, move through the book page-by-page, choose from two different formats for printing, use the "Table of Contents" feature to skip around or locate pages that matched her search criterion, and scale the images up to 300 percent or down to 25 percent in Alice-in-Wonderland fashion. Future improvements in searching and printing designed to assist teachers and researchers may also be integrated. In addition, NewsBank recently announced a text-creation partnership with the University of Michigan whereby the text of six thousand imprints, selected by the AAS, will be hand-keyed into another database (available separately, directly from the University of Michigan)—an initiative that will produce as close to 100 percent accurate searchable text as is humanly possible for some of the period's most widely used or historically significant documents.
As Newsbank staff members readily admit, the construction of Evans Digital Edition means the market for Evans's Early American Imprints on microfiche has been demolished. Already fifty-one institutions, large (Columbia, Ohio State, and the University of California system) and small (Williams, Hanover, and Calvin), are on board, and NewsBank has halted fiche duplication. Libraries that own Evans Imprints and that subsequently opt for Evans Digital must face a difficult decision: what to do with all these little plastic artifacts of twentieth-century Americana and the file cabinets that house them? Inevitably, some libraries will simply discard the fiche and print, as machines available to read them fall into disrepair and users voice their preference for the digital edition's seductive searchability. Still, perhaps some librarians will take a cue from NewsBank's Chester surroundings and let the past accumulate organically in all its forms—books, Microprint, microfiche, and terabytes—cheek by jowl, like so many churches in a small New England town. Further Reading: Those interested in the history of microreproduction may consult Nicholson Baker's Double Fold: Libraries and the Assault on Paper (New York, 2001), but should keep in mind that Evans's Imprints and similar initiatives had a profoundly different set of objectives than did the projects Baker chronicles. In addition, the Association of Research Libraries has collected many responses to Double Fold's assertions. Although a few links are now broken or expired, this remains an excellent source for a broader picture of the debate surrounding microreproduction, digitization, and the multiple missions of libraries in the twenty-first century. Edward G. Holley wrote a truly fascinating biography of Charles Evans entitled Charles Evans: American Bibliographer (Champaign, Ill., 1963)despite its age, it is still a terrific read and has a lot of interesting information about Evans's project and the creation of Readex's Imprints, Series I. Those interested in Albert Boni's early career should refer to Jay Satterfield, The World's Best Books: Taste, Culture, and the Modern Library (Amherst, 2002). The Whiting Library, in Chester, Vermont, maintains a Vermont Room, which is also an outstanding source for articles about the history of Chester and the Readex Microprint Corporation (many thanks to the librarians for their assistance). Readers interested in the history of the American Antiquarian Society and its librarians should click here or consult Under Its Generous Dome: The Collections and Programs of the American Antiquarian Society, 2nd edition (Worcester, 1992). Many thanks to NewsBank staff members Ken Dufort, Georgia Frederick, Kelly Lauren, Steve Osterland, Caroline Reyes, Stanley Shapiro, Debbie Swisher, Cindy Tufts, and Mike Walker for taking time out of their busy schedules to show me Evans Digital Edition; special thanks to Vicky Gardner, Korrie Heiden, and Jim Hornstra for making time on many occasions to answer my repeated queries. Thanks, too, to the librarians who helped give me a sense of how this product will be used, and to Crowley Micrographics for information about the Mekel scanner. Discuss this article in the Common-place Coffeeshop |
![]()
Copyright © 2004 Common-place The Interactive Journal of Early American
Life, Inc., all rights reserved
|
|