Our efforts have progressively moved from sequencing individual genes to mapping complete genomes through genome projects, these new subfields all form part of bioinformatics. The first organism and bacterium to be fully sequenced was Haemophilus influenza in 1995, and the first multicellular organism was the nematode Caenorhabditis elegans in 1998.
What is the definition of bioinformatics?
Genome projects have enabled us to research and understand what genes are present and expressed in all organisms. Since the effort to map the human genome in the late 90s, billions of DNA base pairs and genomes from various species have been collected.
Still, this information is difficult to assemble and analyse manually!
The human genome alone accounts for some 3 billion base pairs and 20 000 genes. The Human Genome Project (HGP) led the effort to map the human genome completely and was one of the largest international collaboration efforts ever undertaken in biology. It took 13 years to complete the HGP. The project began in 1990, and in 2003 the first draft was published! 1
Computer technology made it possible to collect and use the enormous amount of sequencing data generated and led to the development of bioinformatics.
Bioinformatics is an emerging area of bioscience that combines computer science, statistics, biology, and sequencing data. Computing tools and software, like algorithms and statistical tests, applied to raw biological data make this data faster and easier to understand, organise, store and find patterns in.
Importantly, computer software also makes biological data accessible to everyone over the internet, stimulating collaboration and further research.
Bioinformatics is an interdisciplinary field of bioscience that develops methodologies to collect, process, and analyse large amounts of raw biological data using computer science tools.
The importance of bioinformatics
As we collect more and more biodata, bioinformatics will be essential to any scientific discovery. Without bioinformatics and the ability to leverage computer science tools to big data, understanding and concluding biodata would be very hard.
The goals of bioinformatics
The main goals of bioinformatics are:
Organise biodata so that it becomes easily accessible and searchable
Develop software to help analyse biodata
Analyse and accurately interpret biodata from a biological perspective
The roles of bioinformatics
One of the main tools created by bioinformatics was databases. Several hundred databases hold different types of biological data like complete genomes and gene sequences. Databases allow the data to be stored and searched logically, enabling comparisons and links to be made that would have otherwise escaped the naked eye. These databases have increasing amounts of data that are growing at an exponential rate as we sequence more DNA.
Evolutionary relationships between organisms are examples of links that bioinformatics tools can make.
When comparing genomes present in these databases, sequence similarity can be assessed. Increasing DNA sequence similarity is indicative of recent common ancestry. These tools allow us to build evolutionary trees and see how life relates to each other because knowing the basic mutation rate of DNA and how similar two sequences/genomes are, we can infer when two genetic sequences from different species diverged from a common ancestor.
The mutation rate describes the amount of change a DNA sequence has undergone in a given period of time.
In 2014 bioinformatics databases had over 6 x 1011 base pairs of sequence data. This is roughly the equivalent of 200 human genomes and is probably even larger today!
Popular bioinformatics databases include the Ensembl database, which holds genomes of eukaryotic organisms like the human genome. Ensembl also includes the genomes of other important model organisms like the zebrafish, house mouse or the fruit fly. Other popular databases include GenBank and DDBJ.
Model organisms are organisms that are frequently used in biomedical research!
The BLAST (Basic Local Alignment Search Tool) tool is one of bioinformatics most relevant software algorithms used today. The BLAST tool allows researchers to compare millions of primary biological sequences present in the database with minimal effort. These comparisons help find sequence similarities amongst unknown sequences researchers are studying with those already present in the database.
As our knowledge of the DNA coding sequencing of living organisms’ genomes grew through DNA sequencing, so did our knowledge of what it codes: proteins. Knowing the genetic code of life, we can decipher what a gene encodes, meaning the protein its transcription and translation might result in. Databases were also created to contain the resulting amino acid sequences of proteins and protein structures, like UniProt (Universal protein resource). UniProt contains various amino acid sequence data alongside its respective protein function.
Bioinformatics is closely related to another emergent field in bioscience known as computational biology. The bioinformatics field created the computational biology field. Whereas bioinformatics collects and processes vast amounts of biodata, computational biology uses such data to construct theoretical models of biological systems. These models try to predict, for example, 3D structures of proteins or help identify specific genes linked to diseases in populations.
Computational biology is the study of biology through computational modelling software.
The benefits of bioinformatics to society
The ability to analyse large sets of biodata through bioinformatics has made it easier to understand DNA and its meaning and influence in our lives.
For example, as the result of sequencing and analysing the human genome, 1.4 million single nucleotide polymorphisms (SNP) were found.
SNPs are the most common genetic variation consisting of single-base variations caused by inherited point mutations in the DNA. The number of SNPs discovered since the HGP has greatly increased, and most of them are innocuous. However, some SNPs are associated with an increased risk of diseases like diabetes or heart disease.
Screening for such variations allows early detection and treatment of potential medical problems.
Fig. 2 - Single Nucleotide Polymorphism
As our knowledge of the genome and proteome of other organisms also increases, new revelations and possibilities regarding those organisms' utility to improve human life and the environment also emerge.
The proteome refers to all the proteins produced by an organism.
Analysing the genome of parasites, like the malaria-causing parasite Plasmodium falciparum, is fuelling research on how to fight this disease and control the parasite, namely through the development of vaccines. This parasite’s genome has been fully sequenced, and all 5300 of its genes can be found in databases, helping us understand its proteome and metabolism.
By sequencing and analysing their genome and proteome, identifying how organisms can withstand extreme temperatures or other lethal environmental conditions can have various biotechnological applications like producing biofuels or cleaning up pollutants.
Bioinformatics - Key Takeaways
Bioinformatics is an interdisciplinary field of bioscience that develops methodologies to collect, process, and analyse large amounts of raw biological data using computer science tools.
The main goals of Bioinformatics are: to organise biodata so that it becomes easily accessible and searchable; develop software to help analyse biodata; analyse and accurately interpret biodata from a biological perspective.
One of the main tools created by bioinformatics was databases. Databases allow the data to be stored and searched logically, enabling comparisons between the biodata.
Popular bioinformatics tools include Ensembl, BLAST, UniProt, GenBank and DDBJ.
1. Francis Collins, A vision for the future of genomics research, Nature, 2003
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel