TRANSFORMING words on paper to words in computer files keeps getting more ambitious and more accurate without quite achieving perfection.
Omnipage Pro for Windows 95, the latest optical character recognition (OCR) software from the Caere Corp., may be the best example of the genre's capabilities and limitations. With a good scanner and a clear original copy, the program can turn a page of printed text into an editable computer file with almost uncanny accuracy, right down to boldface, italics and column formatting. Fuzzy faxes and complicated layouts can present challenges, and understanding the program's foibles takes more effort than necessary. Still, it definitely beats typing.
Once you get it set up to work with your scanner and its software, which is not always an easy job, Omnipage Pro is reasonably simple to use, provided you stick to its most straightforward functions. After you scan a document, the program offers to step you through the process.
There are plenty of issues to consider. Would you prefer the text of a magazine page to be formatted in multiple columns that mimic the original layout, or would you settle for the look of a business letter or manuscript? Do you want the program to try to decide which portions of the page contain text and which are images (at which it does a pretty good job), or would you prefer to mark them yourself? And in what form will you save the results?
Once you have made those decisions, the program churns away and shows you its version of the text you have scanned, with suspect words in green, words it has guessed at in blue and characters it has found totally unrecognizable represented by red tildes. You can scan the screen for errors on your own or have the program present them to you one by one alongside enlargements of the original text images and suggestions for changes.
Unfortunately, the somewhat confusing interface of the correction program increases the likelihood that you will make an error and guarantees that you will miss mistakes the program thinks it got right. Double-clicking on any word in the full-screen mode will bring up a view of the original image.
On my 90-megahertz Pentium machine, Omnipage Pro took about 20 seconds to recognize the front page of a crisply printed newsletter with 779 words and slightly complex formatting scanned in at a resolution of 300 dots per inch (DPI). Only the logo and some initial capitals fooled it, and the display proudly highlighted proper names and typographical errors that it had recognized correctly. Scanned at the 200 DPI resolution found in many lower-priced scanners, the same document produced only six more errors.
Scanning a 450-word fax was far less successful, with the program missing many words entirely and guessing wrong about others, as when "Certain loaned items furnished hereunder" became "Cereal lowed items famished hereunder."
And there are particular areas in virtually any document where Omnipage reveals itself to have the adaptability of a computer program rather than a human. It is particularly poor, for example, at guessing what should be done about compound words whose hyphen falls at the end of a line; a "low-powered computer" may well become "lowpowered" until you fix it by hand.
When you ask Omnipage to retain a page's formatting, it will try hard to comply. But here, too, perfection is elusive, thanks to problems like the unavoidable discrepancies in metrics between the characters on the page you scan and the fonts in your computer.
It can even produce HTML output for use in Web pages, but the HTML pages I generated had serious flaws, particularly in the omission of spaces originally located at line breaks in the formatted file.
The program sells for about $500, but if you have earlier versions, including the "Limited Editions" packed with many scanners, an upgrade will cost less than $130.
Despite a "Paperport Ready" logo on the box, Omnipage Pro is very poorly integrated with the popular software that controls Visioneer scanners and some others, and there is nothing much a user can do about it.
In combination with Omnipage, the Paperport software adds a needless step to the scanning process and offers reduced control over the scanned image. Worse, unless you are very careful, the second time you transfer an image from Visioneer to Omnipage, you are almost certain to crash your entire system. The problem is noted in one of the manuals; it should have been fixed instead of reported.
The first 25 times you use Omnipage, it will nag you to register it. The 26th time, it will refuse to work until you call in and get a registration number. Given that the many aspects of the user interface can be confusing and that the manuals and online help are even more inadequate than most, it is galling to discover that to talk to a real person about problems will cost you $25 per hour after the first two calls.
Pub Date: 7/01/96