Lifestyle
SCIENCE FOCUS

Now everyone can speak in language of their computers

Typing up your work with voice recognition can actually be quicker than a keyboard and avoids the danger of repetitive strain injury

PUBLISHED : Sunday, 08 September, 2013, 12:00am
UPDATED : Sunday, 08 September, 2013, 2:16am

Voice control of computers, a holy grail for nerds ever since Stanley Kubrick's 2001: A Space Odyssey, is finally for the masses with the move to mobile computing.

Apps like Siri or Android's built-in voice recognition offer free server-side services in multiple languages which can type for us and understand simple commands.

They even learn to understand our accents with neural networks which mimic the brain. Typing by voice is now actually faster and more accurate than the keyboard much of the time.

Voice recognition also helps with repetitive strain injuries, which are Hong Kong's most common occupational health problem, accounting for 64 per cent of occupational injuries; chronic pain persisting longer than three months affects up to 11 per cent of the workforce.

Continuous speech recognition has been available for Windows PCs since 1997, with Nuance's Dragon NaturallySpeaking software, which now has a monopoly on the market. Innovation and compatibility with modern browsers is waning.

Still, for English word processing it is the fastest and most and accurate solution.

You will need a Windows PC with an external soundcard, fast CPU and plenty of RAM - laptop sound cards are not up to the job. Controlling the PC orally however is quirky. Application support is limited, crashes are frequent. Technical support works Monday to Friday US time, slowly.

For best results with Dragon, you also need to train and add vocabulary items. It will not recognise your Hong Kong street name otherwise. And you must repeat this training every time you upgrade to a new PC. The basic version is reasonably priced and premium versions are available, which allow creation of custom voice macros, specialised medical and legal vocabularies, or let you store your voice files on a remote server, which saves time on reinstallations.

If you want to minimise your typing, then Dragon is the best option. Recognition errors can be corrected by voice and you can also select, cut and paste, bold or italicise.

Full voice control of the computer is something of a chimera, however. Hours spent training your PC will bring diminishing returns and eat up your exercise time, which is what you really need to avoid RSI. For those on a budget, there is the option of Windows built-in voice recognition, though the accuracy of its text recognition is inferior to Dragon's. Navigating the interface with voice control works better. Fortunately, there are other alternatives - iOS provides free built-in voice recognition which is almost as accurate as Dragon and far better than Windows.

Unlike Dragon and Windows, the processing is done on Apple servers so you don't need powerful hardware. A Mac Mini works fine. You can also quickly switch languages, with Cantonese and Putonghua available. You do not need to train the software and it's more stable than Dragon, but you cannot cut and paste text or correct errors by voice, so you will do more typing than you would with Dragon.

Nuance now offers Dragon Dictate for iOS, but this lacks a combined control and dictate mode available on the PC and seems to suffer from some of the same bugs as the PC version. Android and Siri both combine artificial intelligence with voice recognition. Some commands are quite useful. You can say things like, "search web for apple pie recipe" and it will open up the browser and display results.

With Siri you can ask it to convert inches to centimetres or show the nearest hospital on Maps. Recognition is acceptable for simple phrases, but corrections and word processing must be done manually.

Nevertheless, mobility unchains us from our workstations, breaks our work into chunks and lets us move around more - all important elements of healthy life.

So it is surprising how many users ignore the microphone icon on their keyboard and still struggle with tiny keyboards.

Some people are put off by the noise pollution we live with in Hong Kong that degrades recognition performance. Once you start using voice recognition you will really notice the repetitive announcements on the MTR and KCR.

And don't expect perfect results even in a quiet environment. You will have to do a bit of manual work to finish off your writings. Recent versions of Siri have also introduced an annoying bug which capitalises words after almost every pause.

A mobile-voice recognition feature which should attract bilingual and trilingual Hong Kongers is multiple-language voice recognition. Both iOS and Android offer Cantonese and Putonghua, with Android boasting the larger additional range of languages. It is surprisingly accurate, especially for simple phrases, and can even understand my British-accented Cantonese and Putonghua well, so long as I speak slowly.

Finally, if you want to dictate a lot of English text with the best results then Dragon is still the best option, and it's limited to a good desktop in a quiet location, so think carefully before bringing your PC to the office.

It is getting more portable, though. About HK$6,000 buys a top end, Micro XT-based PC with a super-fast i7 4770 processor and an external sound card which can be squeezed into a toaster-sized case that can be carried in your backpack.

If you just want to record notes, send short e-mails or chat online, then mobile voice recognition is good enough, as long as you can find a quiet place to use it, with a decent cell phone signal, when you will find that you can "type" a lot faster than your friends.

But be warned - voice recognition often confuses short words such as "you" and "she" and makes mistakes with less common phrases which can be hilarious to friends, but not so funny in the office. Always check before you click "send"!

Stephen Thompson is a Hong Kong-based journalist and IT consultant

Share

 

Send to a friend

To forward this article using your default email client (e.g. Outlook), click here.

Enter multiple addresses separated by commas(,)

For unlimited access to:

SCMP.com SCMP Tablet Edition SCMP Mobile Edition 10-year news archive