How speech recognition software is changing the way we communicate
We’ve started talking to our phones again but in a very different way, as speech-activated functions and speech-to-text services get more accurate and useful
One big mobile phone trend of the past 15 years has been the slow decline of calls in favour first of SMS, then of messaging. Since then it’s been all about text over talk, but the creeping use of more and more emojis as a shorthand is quickly transforming into using speech recognition as a way to save time on text input. We may not be chatting to each other as much as we used to, but the new trend is set; we’re talking to our phones again.
From virtual assistants such as Siri, Alexa and OK Google to apps including Dragon Anywhere, Swype, Swiftkey and Baidu’s new TalkType, speech-activated functions and speech-to-text services are growing.
Sundar Pichai, chief executive of Google, claims that 20 per cent of Google searches on smartphones are now entered by voice. Sending messages, creating appointments, getting directions and updating social media – all can now be done using the spoken word, and with ever-increasing accuracy.
“The voice has emerged as one of the most natural and intuitive ways to interact with devices, applications and intelligent systems, lessening our reliance on the mouse, keyboard and touch screen,” says John West at Nuance Communications, whose speech recognition engine powers its Dragon suite of apps and software.
In September, SwiftKey Keyboard announced a new version powered by neural networks. That means it can spot similar sentence structures, and capture the relationship and similarity between words. It’s only available for US and UK English language models for now, but more languages are promised.
Another virtual keyboard typical of the trend is Baidu’s TalkType app for Android smartphones, which is operated entirely by voice and can help users input text three times faster than typing. TalkType does include an alphanumeric keyboard that supports swipe, emojis, and even GIF sharing.
“TalkType is the first full-function Android keyboard that is voice-first, not voice-also,” says Bijit Halder, head of Baidu’s Silicon Valley AI Lab product team. “Unlike conventional keyboard designs, where voice is targeted for occasional use and delegated to a small icon, TalkType is designed for voice as the primary input mode.”
The accuracy of these voice recognition apps is sometimes stunning, but despite the use of AI there are still some teething problems.
People, places and product names are rarely recognised, similar-sounding words are confused, and punctuation has to be spoken. It means forming entire sentences in your head before speaking, which takes some practice.
While using Dragon Dictation, it’s not unusual for the cursor to head back into the document that’s being dictated to make changes, or delete entire sections. Was it something I said? Almost certainly; there are a range of spoken commands for completing various editing actions as well as merely having words spelled out onscreen. The learning curve is steep, but there’s no doubt that Dragon Dictation – and all other speech recognition software – is getting close to 95 per cent accuracy. A few years ago it was barely at 70 per cent.
So while BlackBerry may not be doing away with its keyboard just yet, it probably won’t be long before it does. What then for productivity pros who, for now, prefer finger typing to swiping or speaking? If that’s you, there are some choices.
As well as BlackBerry with its Priv and Passport handsets, there’s an optional keyboard cover for Samsung’s Galaxy S7, which is surprisingly easy to use. You can also find a keyboard on LG’s Xpression 2, but don’t kid yourself: phones with keyboards, once common, are becoming increasingly rare.
Though the Microsoft SurfacePro 4 tablet has its own keyboard case and the iPad has several options, there are also a host of highly portable wireless Bluetooth keyboards that can link to a phone or tablet. For example, the excellent Cervantes Mobile Jorno keyboard has a trifold design, as does the Zagg Pocket Keyboard, while the LG Rolly Keyboard KBB-700 – which also acts as a stand for a phone – can be folded into something pocket-sized.
For bigger devices, Windows 10’s personal assistant feature has speech recognition, and Apple Macs now have Siri, but these are mostly for dictating short messages and giving commands.
If you want to fully embrace the speech-to-text age to produce lengthy text documents purely by speaking, it’s all about Dragon Professional Individual software. After setting-up a voice profile (effectively teaching the software about your voice simply by reading to it for a few minutes), it’s possible to speak text into any application, from Microsoft Word to Gmail, and even web browsers.
The latest version uses so-called “deep learning”, a powerful pattern recognition technique inspired by the way the human brain learns and interprets sensory input. It also uses “deep neural nets” – another phrase common in artificial intelligence – to continuously learn from the user’s speech patterns.
Speaking to a computer while alone in an office is easy to get used to. So is conversing with Siri or Google on a phone in your own home, or in the car.
However, talking to technology in public and in shared spaces is a much bigger social jump. Who wants to dictate a message to a phone while on a train, walking down the street, or in a busy restaurant, even if it does save a few seconds? Long-time users of hands-free kits who brazenly chat while they walk may disagree, but many of us remain suspicious of such habits.
However, there is by now plenty of mobile technology to suit all kinds of applications and attitudes. According to analysts at Gartner, the average adult in Hong Kong will use three to four personal devices by 2018.
So rather than all being similarly sized but otherwise identical touch screen tablets all optimised for speech, there’s room for some diversity. Either way, the sight of commuters with their heads buried in a smartphone could soon get a spoken word soundtrack; chat is coming back, but not as we knew it.