Torturous software gets tongue-tied, twisted

PUBLISHED : Tuesday, 25 February, 2003, 12:00am
UPDATED : Tuesday, 25 February, 2003, 12:00am

Like many a veteran computer user, I have been talking to my PC for years. For once, I would like it to respond.

Due to a repetitive strain injury, I installed Dragon NaturallySpeaking6 speech-recognition software to avoid the trauma of repeatedly striking the keyboard.

The promise is dramatic - to be able to dictate commands to your computer and have it respond to the sound of your voice.

The reality is somewhat less predictable - more like dictating commands to a toddler.

I expected to put in time training the software to understand my Tony Soprano-vacation-in-the-Catskills accent. I was thinking in terms of hours. I should have been thinking in terms of months.

I dictated a sentence straight from the instruction manual ('I am now able to talk to my computer') and waited 10 minutes for the words to appear on screen. I have had pizzas delivered in less time.

I began a furious pruning process to unburden my computer of extraneous files in hopes of improving transcription speed. I rebooted the system to start with a refreshed system memory. I gave the computer a weekend respite to recover from the ordeal.

I relaunched the application and began dictating this column. The experience is like watching the Super Bowl. I spoke a sentence and was forced to endure yawning, commercial-break- length pauses in the action. It took half an hour to dictate one paragraph.

The user's guide suggests closing your eyes for this part of the process - I guess that is to alleviate the frustration of waiting for the software to labour over the transcription.

Accuracy is another issue. The instruction manual warns: 'The words Dragon NaturallySpeaking types are not always the words you said.' Consider the following example, taken from my voice-to-garbled-text experience: 'If a spaced our ranking dish and Pearl Jam.' I have no idea what I said, either.

Editing proves equally cumbersome. The user's manual encourages using a method of editing text so the software can learn from its own mistakes.

I used a simple, spoken command to correct the text, but the process was something like communicating with my hearing-impaired husband (who, like media mogul Ted Turner, regards hearing aids as an optional fashion accessory).

I repeated myself three times and got a response that left me wondering whether the software understood anything I had said.

Suffice to say, I met with similar, frustrating results using the software to read my e-mail (it took five minutes to open one message) and browse the Web. The software's arthritic pace is partly a function of hardware limitations.

The equivalent of the Merriam-Webster's dictionary, with regional pronunciations added, it is too voluminous to store in quickly accessed memory (cache). It needs to be kept on the computer's hard drive.

The software must first locate on the computer's hard drive every word uttered before transcribing it to the page. That takes time.

The larger problem with desktop speech-recognition software is it does not understand context. It can learn, in time, that when I say 'caw-fee' I mean 'coffee', but it lacks the artificial intelligence to understand the meaning of my words when used in a sentence.

That is not to say speech-recognition technology is a bust.

When powered by the robust computing resources of a major corporation and supported by professional information technology staff, speech-recognition can work elegantly.

California-based Nuance has software allowing customers to use automated call centres to get updated billing information from Sprint, check flight times with American Airlines, execute a stock trade with Charles Schwab discount brokerage or check traffic conditions on Highway 101 using the new 511 system.

But experts think we will have to wait until 2010 before desktop software works reliably. For now, I have decided the pain of my typing injury is more tolerable than the pain of working with Dragon NaturallySpeaking.