Computer Talk

Some programmers say speech-based interaction with computers is essential, not extravagant.

Digital conversations

January 15, 2001|By Curtis Rist | Curtis Rist,NEW YORK TIMES NEWS SERVICE

Just now, Victor Zue's computer sits on his desk at the Massachusetts Institute of Technology Laboratory for Computer Science - but he doesn't expect it to stay there much longer.

Computers are already beginning to shrink drastically. Zue believes that tiny but powerful computers will soon be embedded in the walls of offices and homes, in handheld devices that look like cell phones, and in even the most mundane appliances.

Even an alarm clock, he believes, will soon develop a computer-assisted attitude: Connected to the Internet, it will be able to check your schedule, cross-reference it with traffic reports and decide what time to awaken you.

Zue says that "even more remarkable than the things we'll be doing with all these computers will be the way we interact with them. We won't be typing on keyboards. Instead, we'll be speaking to them."

And they'll be speaking back. A computer that talks has long been an elusive goal, one that has had less to do with science than with Hollywood, where the prototype was HAL in "2001: A Space Odyssey."

But as computers become more commonplace, they remain difficult to communicate with, as those who have struggled with a keyboard or dialed their way into oblivion through a voice-mail tree well know. Those problems would disappear, Zue says, if computers could be programmed to converse with humans.

"Speech is the simplest and fastest form of human communication there is," says Zue, an associate director of the MIT computer lab. "If we could talk to computers, then virtually anyone could use them, without any training at all."

Some people feel there is really no alternative but speech-based interaction. "There's a whole variety of trends that are making it desirable," says David Nahamoo, a manager of voice technology research at IBM. "A talking computer sounds cute, but this is not a novelty or a gimmick. It's essential."

Zue calls a Mercury travel agent to check the schedule of flights from Boston to San Francisco. A woman answers.

"What time do planes leave tomorrow?" he asks. "Are there any flights returning to Boston in the afternoon? What are the flight numbers? What time do they arrive?"

To each question, the smooth voice gives a quick, cheerful response. In two minutes, Zue has found out enough to book a flight. Aside from the speediness of the transaction, the surprise is that the Mercury travel agent was not human but a computer Zue himself has programmed to recognize human speech.

"Not a bad conversationalist for a computer, don't you think?" he says, hanging up the telephone.

Such fluency didn't come easily for the computer or for Zue himself, who had to struggle to acquire conversational English skills. Born in China, Zue enrolled as a student at the University of Florida in the late 1960s to be near his older sisters, who had moved there.

"To be accepted, I wanted to learn to speak like an American - but that was very difficult," he says. Words such as "did you," which he could read easily enough in a textbook, suddenly turned into the incomprehensible "didju" when he heard them spoken. Everywhere he turned, he says, he found himself flummoxed by inexplicable rules of pronunciation.

Zue's spark of inspiration came, ironically enough, from Hollywood. In 1968, after making hard-won progress in his English studies, he went to see 2001 and became riveted by HAL, the talking computer.

"I saw it and said, `This is the future,'" he recalls. "If I could learn all the different rules of pronunciation, then a computer could, too."

Determined to find a way to do it, he headed for graduate school at MIT. Somehow, he knew, computers could be taught to "hear" what was being spoken but that it would involve more than a microphone.

"Because of accents and the way words are pronounced, the ear is a very bad decoder of language - both for foreigners and computers," Zue says. "Instead, what I went looking for was a visual representation of speech."

What he ended up with was a spectrogram - an electronic tracing of speech sounds. No one had ever been able to "read" a spectrogram before, but Zue - practicing one hour a day for four years - showed that it could be done.

He then theorized that he could teach a computer to take frequency readings from a spoken voice that are similar to a spectrogram, which has turned out to be a reliable way to code speech.

"It essentially takes human language and translates it into a language that the computer can understand," Zue says.

At the core of speech recognition lies the phoneme, which is the basic phonetic building block. It's short - often barely 100 milliseconds in all - but that's all the time required to change a "b" sound to a "p," and to change the word "bit" into "pit."

To understand speech, a computer translates the spoken word into an electronic representation of these phonemes, then matches them against templates showing real words and clusters of words.

Baltimore Sun Articles
|
|
|
Please note the green-lined linked article text has been applied commercially without any involvement from our newsroom editors, reporters or any other editorial staff.