Digitalising DNA data may keep you from getting sick in the future

By Professor Wong Ka-chun, Assistant Professor of Computer Science at City University of Hong Kong

Sponsored feature

By Professor Wong Ka-chun, Assistant Professor of Computer Science at City University of Hong Kong |

Latest Articles

HKDSE 2021: Economics paper features some 'unusual' questions

Billie Eilish’s new book shows another side of her rise to fame

'Resident Evil Village’ has a strong story and excellent use of the newest console technology

‘I’m so boring’ and other common English mistakes we hear all the time

Explainer: WhatsApp’s new terms of service and what happens if you don’t accept

Right after completing my PhD at the University of Toronto in 2014, I joined City University of Hong Kong as an assistant professor. I set up a research lab to use computers to analyse biological data (also known as bioinformatics).

The bioinformatics lab works towards being able to better understand human DNA using computers, building the foundation for medical research in the near future. In particular, there are several research projects I’m working on.

The first project aims to discover common patterns in human DNA sequences. These common patterns are important because they could correspond to genes or other reaction sites that allow proteins to bind to DNA sequences. These patterns are largely unknown, but if they can be revealed, we can have a better understanding of human genetics, which can advance medical research.

In the future, there will be huge amounts of digital data of DNA sequences released from different human individuals across the world, but there is no effective way for scientists to manually analyse those sequences.

In the second project, I have developed a web server called “SNPdryad” that can predict disease-causing mutations in human protein-coding DNA sequences, such as those that lead to lung, colon and breast cancers.

My team has used SNPdryad on all human protein-coding DNA sequences to predict disease-causing mutations at possible locations on all proteins in the human body, resulting in 50GB of data free for everyone to download. The availability of such data could have a big impact on disease discovery and prevention.

The amount of this kind of data available is likely to grow. The next step is improving the analytical methods for knowledge discovery. My team and I will continue to discover different types of DNA sequence patterns in the human genome. In particular, I will explore how modern artificial intelligence techniques could be applied to the challenge of massive DNA sequence analysis.

Bioinformatics will continue to evolve, allowing for increasingly large amounts of DNA sequence data to be generated and measured. In addition, the development of clinical decision support systems will enable the prevention and diagnosis of many types of human diseases.

Edited by Charlotte Ames-Ettridge