Genome projects are scientific undertakings that attempt to identify an organism’s whole genome sequence and the location and function of the genes present in the genome.
Bioinformatics has been another key player in allowing the organic data collected from scientists worldwide to be read, stored, and organised at a faster rate than before.
Bioinformatics is the science of gathering and analysing large amounts of complex biological data, such as genetic codes.
Human Genome Project
A genome project involves collecting and sequencing many DNA samples from various donors of the same species. The DNA sequences obtained create a reference genome. Genome sequencing has tremendously helped the scientific community to understand different genes’ functions and interactions in different organisms. Whole-genome projects are usually created by the whole genome shotgun (WGS) approach. This approach involves sequencing multiple overlapping DNA fragments separately and then virtually assembling the small fragments into chromosomes using computer algorithms that identify the fragment sequences.
The Human Genome Project (HGP) was an international scientific research project aimed at sequencing the whole human DNA and identifying the location and function of all the genes in the human genome. The HGP was and still is the largest collaborative biological project globally. It was initiated on October 1, 1990, and was declared complete on April 14, 2003.
The double helix structure of the DNA was discovered in 1953, the HGP was completed in 2003, and CRISPR/Cas9 (an efficient method for editing the DNA in cells) was discovered in 2012. In less than 60 years, we went from not knowing much about DNA to sequencing and mapping all the genes within the human genome and knowing how to edit the genes within the cells! What do you think scientists will be able to do in the future?
Genome sequencing projects
DNA sequencing methods are constantly evolving and becoming more straightforward, but their principle is based on Sanger sequencing, an automated method invented by Fredrick Sanger in 1977.
Fredrick Sanger received his second Nobel Prize in chemistry for inventing the Sanger DNA sequencing method!
The Sanger DNA sequencing process can be broken down into three steps:
- Polymerase Chain Reaction (PCR): Automated DNA sequencing methods require large quantities of DNA. This is achieved by first amplifying the DNA samples using the polymerase chain reaction (PCR).
You can learn more about PCR in our Polymerase Chain Reaction article.
- Fluorescently labelled dideoxyribonucleotide triphosphate (DdNTP): Normal deoxyribonucleotide (dNTP), fluorescently labelled dideoxyribonucleotide (DdNTP) and DNA polymerase are added to the amplified DNA sample. DNA polymerase uses dNTPs to polymerise new strands of DNA based on the complementary sequence of the existing strands in the sample starting from the primer. DdNTP is a special type of nucleotide that differs from normal deoxyribonucleotides because it contains a hydrogen atom instead of a hydroxyl group on carbon number 3. DdNTP acts as an inhibitor of chain elongation and, once incorporated, terminates further nucleotide addition. The four different DdNTPs (A, G, T, and C) are tagged with different fluorescent labels giving each a distinct colour. Since DdNTPs will be randomly incorporated into the growing DNA strands, the result would be new DNA fragments of various lengths and sizes with the same point of origin (all starting from the primer) but ending with a fluorescently labelled DdNTP.
- Gel electrophoresis: The obtained fragments from the previous step, are pushed through a gel with small pores by an electrical field. This process separates the strands according to their length. Due to the random nature of the last step, there will be strands present that are 1 nucleotide in size, 2 nucleotides in size, 3 nucleotides in size and so on, and they all end with a fluorescently labelled DdNTP. Therefore, the fluorescent tags’ pattern of colour would tell us the DNA sequence.
Fig. 1 - The process of automated DNA sequencing using fluorescently labelled dideoxyribonucleotide triphosphate
Determining the proteome
Cells in organisms use DNA and the sequence in the genes to produce proteins.
A proteome is the total amount of proteins expressed by an organism or a cell at a given time and under specific conditions.
The field that studies proteins and the proteome of different organisms is called Proteomics. Proteins can be detected and sequenced with different techniques. However, protein composition changes depending on the specific conditions of the cell or organism, so it’s much more variable than the genome for a particular species.
The genome and proteome of simple organisms
It is relatively straightforward to determine the genome and proteome of basic organisms such as prokaryotes because:
- The size of prokaryotic DNA is substantially less than that of eukaryotic DNA.
- Histone proteins are not found in prokaryotic DNA.
- There are no non-coding DNA sequences in prokaryotic genomes. On the other hand, Eukaryotic DNA contains a large number of non-coding sequences that make determining the proteome challenging.
Benefits of knowing the proteome of simple organisms
The proteome of prokaryotes has many medical and non-medical applications.
Medical Applications
Identifying antigenic proteins on the surface of harmful bacteria can be exploited to develop vaccines against illnesses caused by certain microbes. Once the sequence of these antigens is known, they may be mass-produced and supplied to humans in the form of a vaccine. The immune system would then respond to the antigen by producing antibodies and memory cells against it. When confronted with a microbe that possesses the same antigen, memory cells would then be able to develop a secondary immune response to protect the host against infection.
Non-medical Applications
The proteome of simple organisms provides information on the biochemistry of the processes within them. Some of these microorganisms are employed in the production of biofuels. Moreover, organisms that can resist harsh and toxic environments can remove toxins from the environment.
The genome and proteome of complex organisms
The genome of complex organisms such as humans and plants is challenging to sequence due to the larger number of genes that are present in the eukaryotic DNA compared to prokaryotic DNA. But this challenging process has been overcome thanks to the recent advances in the technologies used for DNA sequencing resulting in the successful HGP in 2003 and various plant genome projects.
The major difficulty in studying complex organisms is determining the proteome. This difficulty is due to significant amounts of non-coding DNA in eukaryotic DNA. In humans, for example, it is estimated that 98.5% of the genome is non-coding and does not contribute to the proteome.
Another issue is determining which genome should be utilised for sequencing because all individuals, except for identical twins, have separate genomes.
Whole-genome sequencing (WGS) provides critical information for identifying congenital disorders caused by mutations, oncogenes and tumour suppressor genes affected by mutations that leads to cancer, tracking disease outbreaks and many more.
Genome Projects - Key takeaways
- Genome projects aim to determine the entire base sequence of an organism’s total DNA content.
- Automated DNA sequencing can be broken down into three steps:
- PCR: automated amplification of the DNA sample.
- Fluorescently labelled dideoxyribonucleotide triphosphate is added to the PCR substrate mixture.
- Gel electrophoresis
- Proteome can be determined by decoding the DNA base sequence of the active genes within a cell into amino acid sequences using the universal genetic code.
- applications of genome and proteome projects:
- Medical
- Non-medical
- Creating bio-fuels
- Removing pollutants from the environment
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel