Computers are ready to listen


let's talkPerhaps poet W...

July 12, 1993|By Steve Auerweck | Steve Auerweck,Staff Writer

Computers are ready to listen; let's talk

Perhaps poet W. B. Yeats was looking ahead to the '90s when he penned the line, "Speech after long silence; it is right . . .," for we're at the dawn of an era in which computers everywhere will be speaking, and listening, to us.

The past few months have witnessed an explosion of products ranging from low-end game enhancements that will let you cry "Phasers!" to blast a Romulan to complex packages that hold the promise of a "Jetsons"-like typewriter that hammers away as you talk to it.

Researchers at IBM Corp.'s Gaithersburg center have created a line of software development tools that shatter some of the most daunting barriers in the science of speech recognition. They're making it possible for programs running on a desktop-level machine to accurately interpret continuous speech -- without artificial pauses -- from a wide variety of speakers.

"It's currently the only product out there that runs on the Intel chip set that allows you to talk quickly and requires no training," says Elton B. Sherwin, manager of speech recognition strategy and market development for IBM's Personal Systems division.

Initial applications will likely include information kiosks, Mr. Sherwin says. "You will be able to talk to the kiosk, and talk quite quickly." For example, you might ask, "Where is the children's shoe department?"

The Gaithersburg lab made two breakthroughs earlier in the year. First, it tailored the software to work with input from inexpensive, popular PC sound cards, such as the Sound Blaster from Creative Labs Inc. And it figured out how to accurately recognize spoken numbers, which people tend to slur together.

The IBM Continuous Speech Series (ICSS) version for the OS/2 operating system was demonstrated last fall; IBM announced two weeks ago that a version for Windows-only machines will be available by the end of the year.

Developers will get a package with a library of 22,000 words programmed by IBM, although only 1,000 words can be in the vocabulary at a given moment.

Baltimore's T. Rowe Price Inc. is using ICSS to develop software to monitor securities trading desk activity; other companies are working in such areas as health care and network management.

At the high end, IBM has been selling the Speech Server Series for the RISC System/6000 workstation, which can take dictation at 70 words a minute, with a vocabulary of 32,000 words. Although it still requires pauses and "training," it's been popular in the medical and legal fields, Mr. Sherwin says.

The bottom end of the speech recognition market -- inexpensive products with limited abilities -- is turning into a virtual brawl. IBM has VoiceType2, for DOS, and VoiceType Control for Windows, respectively a rough dictation system designed for those who have trouble with keyboards, and a sound-based substitute for the Windows mouse.

Earlier this year, Covox Inc. introduced Voice Blaster, a software/microphone package that lets Sound Blaster owners give voice commands to most programs. And Creative Labs is just about to begin shipping its own entrant in the market, VoiceAssist, a Windows package.

Windows machines will get wordy too

Owners of Windows machines also will be likely to find them a lot chattier in the near future. DSP Group Inc. announced last week that it's working with Microsoft Corp. and Compaq Computer Corp. on speech compression for Windows using DSP's TrueSpeech technology.

TrueSpeech reduces the size of a voice data file using mathematical formulas derived from the way sounds are shaped by the throat, mouth and tongue. A one-minute voice data file that consumes 940,000 bytes of data when stored using normal means can be reduced to just over 60,000 bytes with TrueSpeech, DSP says.

That makes it much more practical to use applications that support voice annotation of documents, for example, particularly when the data is being passed around on networks.

How to escape 'voice-mail jail'

Germantown's Microlog Corp. has good news for callers who land in "voice-mail jail" because they don't have Touch-Tone phones -- a speech recognition module that lets callers speak their choices.

Microlog voice-processing systems include voice-mail and more exotic applications, such as handling data base queries by phone. The system that was just announced will recognize numerals and the words "yes" and "no." It handles continuous speech, so a system could prompt, say, "What is your Social Security number?" and not get confused when the caller reels off nine digits.

Kathy Wilson of Microlog said the new module for the VCS 3500 system will be used by the Immigration and Naturalization Service to give callers the status of their visa applications.

Mechanical bartender on the golf course

Finally, a comforting thought as the mercury climbs to 100:

Imagine a long, sweltering trek across the golf course. But there, up ahead, is a sign: Cold Drinks. Cold BEER. It's a Golfer's Oasis!

The creation of USA Entertainment Center Inc., the SMART computerized vending machines accept credit cards and send the billing information back to a central computer. Beer sales are verified by an audio/video link back to the clubhouse.

Baltimore Sun Articles
Please note the green-lined linked article text has been applied commercially without any involvement from our newsroom editors, reporters or any other editorial staff.