January 17, 2019

Bioinformatics: Finding the needle in the data haystack

When the human genome was first sequenced, it took thirteen years and around US$1 billion to get the first draft ready. Today, sequencing a human genome takes less than a week and costs around $1000.

With advances in sequencing technology, as well as developments in large-scale, automated ‘high throughput’ molecular biology, an extraordinary amount of data on the minute workings of the genome and cell is now available for scientists to analyse.

The challenge they face is sifting through these vast databases for the proverbial needle; a significant association between the activity of a gene, protein or metabolite, and a disease or health outcome.

This is where the Monash Bioinformatics Platform comes in. Using Massive’s computing infrastructure, staff at the Platform take the terabytes and petabytes of big data generated from biological studies, and work with the researchers to make sense of it.

“You need special data-wrangling skills and software development skills and under- standing biology and that’s what our staff is equipped with,” says Dr Sonika Tyagi, bioinformatics manager at the Monash Bioinformatics Platform.

To get the best quality data, the platform’s staff get involved with a study at the earli- est stages, when the experiments are being designed. This means they can help researchers set up an experiment that will generate the most useful data.

Once the experiment is complete, the bioinformatics experts take massive files containing the raw data, and process them into tables, spreadsheets, graphs and other forms of data visualisation to help the researchers interpret the data..

The platform has also co-founded a Data Fluency community of practice, in collaboration with the Monash Library at Monash University. Under this initiative, the Bioinformatics Platform is collaborating with the Monash e-Research to run digital data upskilling activities on topics such as computer programming, Unix shell and high-performance computing.

This then empowers researchers who are doing data-intensive research with basic coding and data manipulation skills, which in turn helps them not only to take charge of their own data but it also facilitates communication between bioinformaticians and computer system admins.

The Bioinformatics Platform has been involved in a wide range of life sciences studies, including cancer, congenital heart conditions, diabetes, infectious diseases, agriculture and environmental studies; bioinformatics being an interdisciplinary field that has applications across the realm of science.

The rise of bioinformatics is also support- ing the rapidly growing field of personalised medicine; the design of therapies to a disease measured against personal genome. The detailed analysis of genomic, proteomic and metabolomics activity in disease conditions is enabling clinical researchers to hone in on new clinical targets that might only be present in a tiny percentage of patients. But for those few patients, bioinformatics could make the difference between life and death.