DNA technique for search tools

PUBLISHED : Tuesday, 22 February, 2005, 12:00am
UPDATED : Tuesday, 22 February, 2005, 12:00am

Harvard researchers working on a 'meaningful' approach to Chinese characters

Two Harvard University genetics researchers hope to challenge mighty Google and the mainland's Baidu for China's internet search market using a software approach originally developed to understand human genes.

The pair - Gary Gao and his professor George Church - have launched Beijing-based start-up YDCTech to develop the software.

The goal is to replace 'keyword' search tools with more sophisticated 'semantic-based' tools, which YDC claims will yield faster and more accurate results.

'We hope to eventually displace Baidu and Google in the mainland market, perhaps in five years,' said Charles Gao, chief executive at YDC (and Gary Gao's elder brother).

YDC believes the problem with keyword searches is that they take words out of context and cannot deal with other semantic issues, such as synonyms. This results in long lists of relevant and irrelevant search results that internet users have to sift through to find what they need.

The YDC approach is different: it attempts to understand the Chinese language in a 'bottom-up' manner. The software treats ideographic characters as the basic units of the Chinese language. Using statistical or combinatorial analysis - such as scanning for how frequently certain characters appear in the Chinese language, among other patterns - it can understand vocabulary and, from there, syntax and semantics.

The approach was used by researchers to understand human genes. The basic characters of the DNA language are nucleotide bases adenine (A), thymine (T), guanine (G) and cytosine (C).

YDC said its search process was especially powerful for ideographic languages that lacked a 'word boundary', the 'white spaces' that appear in English.

All this, however, is merely theoretical.

Baidu already has an established business and received investment from Google. YDC is at least a year away from developing a working product, and is still in the fund-raising stage.

The company has raised just US$70,000 from friends and family.

'A friend who is a Chinese restaurant owner in Tennessee invested US$50,000. That helped us a lot,' Gary Gao said.

Charles Gao said YDC was in final discussions with three venture capital companies and aimed to raise up to US$3 million over the next few months.

Adam Bornstein of Ymer Capital Partners Asia said it was considering backing YDC because its technology was 'much, much better than the search engines we use'.

Nevertheless, he estimated it would take a war chest of as much as US$40 million to compete with Baidu and Google and establish a brand.

Mr Bornstein said YDC should develop a business model that would not pit it directly against Google and Baidu. 'YDCTech will most likely not go head to head ... but rather focus its attention on becoming 'best of breed' in specific verticals,' he said.

Instead of competing for advertising dollars, YDC could licence its technology to major portals or multinationals. A vertical business model will see it focus on specific segments, such as business and finance-related news, blogs and health-care databases.

Without a working product, YDC does not even register on the radars of rivals. Baidu marketing director Bi Sheng said: 'There are five to six companies on the mainland that have announced plans to develop their own search engine technologies, but so far only Baidu has its own technology.'

He also does not see a threat from semantic-based search technology. 'Our users have never asked for semantic search services before. And it takes several years for a search engine company to grow up,' he said.

YDC nevertheless remained optimistic for semantic-based services, with a beta launch planned within six months. Gary Gao said MIT and Stanford were working on similar projects, but YDCTech had a lead of six to nine months.