Personal computers are superb at churning out beautifully printed pages. But getting a printed page back into a computer where its text can be revised, indexed and stored is a more difficult task.
The process is called "optical character recognition." It requires a peripheral device connected to the computer, called a "scanner," which essentially takes a picture of the page. Then special software deciphers the letters and numerals on the page, turning them into the coded symbols that computers can manipulate.
Two California firms, Caere Corp. of Los Gatos and Hewlett Packard Co. of Cupertino, have teamed up to help the process work a little better.
HP's new color scanner, the ScanJet IIc, comes with built-in enabling technology called AccuPage, which is used to advantage by Caere's new OmniPage Professional 2.0 software for both IBM-compatible and Macintosh computers.
It is not a cheap solution. The ScanJet IIc lists for $2,195 for PC computers, including the expansion card needed for the connection, and $1,995 for Macintosh computers, which don't need the card. The OmniPage Professional 2.0 software retails for $995 in either Mac or PC versions. The PC version requires Windows 3.0 to operate.
(You can buy less-expensive scanners to use with OmniPage Professional 2.0, including HP's $995 monochrome ScanJet Plus, but you forgo the advantage of AccuPage.)
With AccuPage, the new scanner is able to vary the contrast as a page is scanned so that it is always at the optimum. The second step of the process, optical character recognition, or OCR as it's called, depends on proper contrast between the type image and the background to most accurately match the character patterns it knows with the image it sees.
B6 An impressive example of the new system is seen by
scanning a magazine page on which portions of text are printed over colored backgrounds. Without AccuPage, the contrast between the text and the backgrounds is too slight for OCR to work. But with AccuPage turned on, the scanner automatically adjusts the contrast for each colored block of text, as well as for the remainder of the text on the white portion of the page. The result is virtually perfect character recognition.
The system works well with printed pages and clean photocopies, whether the text is of typeset quality, typewriter quality or comes from a dot-matrix printer. But it does a poor job with pages that have been received by fax because the characters are generally too irregular. Fax images received directly in the computer can be converted with more success, especially if sent in fine-resolution mode.
The OCR portion of the process is fairly fast and the OmniPage Professional 2.0 software shows you how it is progressing by displaying the portion of the scanned image being recognized. The system is complete enough to read common type styles from about 1/10 inch high (6-point type) to an inch high (72-point type). It doesn't work with stylized type such as script or other fancy faces. You can teach the software how to recognize symbols or characters that it consistently misses.
Of course you also can scan photographs and other graphic images in color or black and white, but the OmniPage Professional 2.0 software treats them all as black and white, with up to 256 shades of gray. The software includes a variety of image-editing features.
HP's own software included with the ScanJet IIc allows you to retain and control the color in scanned images. That software also allows you to optimize color images for printing on black and white laser printers.
Caere also makes a less-expensive scanning system, Typist Plus Graphics, which is a hand-held unit capable of scanning a swath five inches wide. It is priced at $595 for the PC and $695 for the Macintosh.
You can scan a whole page by making several passes sideways across it and the software will automatically merge the images so that no lines are repeated.