Voice synthesising

PUBLISHED : Friday, 18 April, 2008, 12:00am
UPDATED : Friday, 18 April, 2008, 12:00am

Group Sense is determined to set new standards and make its electronic dictionaries sound good

A native English speaker reads out a word or sentence and each time it sounds exactly the same at the push of a button on an electronic dictionary.

Thanks to the advanced development and applications of sound synthesising technologies, the pronunciation function in the electronic dictionaries sounds almost like a private English tutor. 'Sound synthesising technologies have matured,' said Samson Tam Wai-ho, chairman of Group Sense (International).

There are various voice synthesising technologies with different international codec standards including Voxware, Real Speak, Solo Male and Lernout & Hauspie Female Voice systems which encode and decode the digital data of real person voice recordings.

'For pronunciation in electronic dictionaries, MP3 is regarded as the best among users,' Dr Tam said.

Voice synthesising systems are incorporated in products based on the requirements of users. The higher the memory, the better the sound quality and the more costly the product will be, according to Dr Tam. 'We decide on the use of technology based on the price level of the end products.'

Group Sense uses MP3 for its dictionaries targeting the higher-end market segment. For more competitively priced products destined for developing countries, synthesising codec with higher compression ratio was preferred because it required less memory, he said.

Because of the higher compression ratio, the definition of sound is less distinct and high pitch and bass sometimes become inaudible.

Apart from the clarity and sound quality, specialised sound synthesising engineers at Group Sense focus on perfecting the accuracy of stresses in English words with multiple syllables.

Set against the phonetic system of the English language, the synthesising system adopted for the company's electronic dictionary is built on American English pronunciation.

'Once we have accomplished the synthesising codec system for each new product, we will have native speakers help verify the accuracy of the pronunciation,' Dr Tam said.

'It sounds good on a computer with speakers but there will be some variations when a word is played in the dictionaries because of some noise and poor resonation. We will work on the tuning to enhance the accuracy of the stresses so that the synthesised sound of words is correct and standard.'

Sound synthesising is handled by specialised engineers and native speakers in the database division for meticulous fine-tuning and verification. 'The building of the database requires a lot of work.'

Group Sense's sound synthesising team employs several linguistics and acoustics specialists. The linguistics specialists are graduates of City University in Hong Kong.

'There are only a few sound synthesising technology developers specialising in Japanese, Chinese and English in the world,' he said. 'We plan to foster collaboration with them for our future projects. So far our linguistics-specialised manpower is sufficient as we have built up and maintained several large databases that can sustain our product development. We have established an efficient workflow already.'

The fast development of e-learning means that demand for sound synthesising specialists is growing significantly. 'E-learning will become more popular with more websites set to apply synthesising technology to their content building,' Dr Tam said.