Shotgun Sequencing - An Overview


Keywords

Shotgun sequencing, whole genome sequencing, DNA sequence, chain termination, fragmentation, sequencing, Sanger sequencing.

Introduction

Shotgun sequencing was one of the precursor technologies that was responsible for enabling whole genome sequencing. Shotgun sequencing is a laboratory technique for determining the DNA sequence of an organism’s genome in genetics. The method involves randomly breaking up the genome into small DNA fragments that are sequenced individually using the chain termination.

Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. A computer program looks for overlaps in the DNA sequences, using them to reassemble the fragments in their correct order to reconstitute the genome. Shotgun sequencing involves randomly breaking up DNA sequences into lots of small pieces and then reassembling the sequence by looking for regions of overlap. DNA is sonicated to obtain fragments of the desired size.

The chain-termination method of DNA sequencing ("Sanger sequencing") can only be used for short DNA strands of 100 to 1000 base pairs. Due to this size limit, longer sequences are subdivided into smaller fragments that can be sequenced separately, and these sequences are assembled to give the overall sequence.

Assembly of complex genomes is additionally complicated by the great abundance of repetitive sequences, meaning similar short reads could come from completely different parts of the sequence.

Whole Genome Shotgun Sequencing

Whole genome shotgun sequencing for small (4000- to 7000-base-pair) genomes was first suggested in 1979. The first genome sequenced by shotgun sequencing was that of tobacco mosaic virus, published in 1981.

Paired End Sequencing

Broader application benefited from pairwise end sequencing, known colloquially as double-barrel shotgun sequencing. Sequencing both ends of the same fragment and keeping track of the paired data was more cumbersome than sequencing a single end of two distinct fragments, the knowledge that the two sequences were oriented in opposite directions and were about the length of a fragment apart from each other was valuable in reconstructing the sequence of the original target fragment.

Approach

To apply the strategy, a high-molecular-weight DNA strand is sheared into random fragments, size-selected, and cloned into an appropriate vector. The clones are then sequenced from both ends using the chain termination method yielding two short sequences called end-read or read 1 and read 2.

Assembly

The original sequence is reconstructed from the reads using sequence assembly software. First, overlapping reads are collected into longer composite sequences known as contigs. Depending on the size of the gap, different techniques are used to find the sequence in the gaps. If the gap is small (5-20kb) polymerase chain reaction (PCR) is used followed by sequencing. If the gap is large (>20kb) the large fragment is cloned by bacterial artificial chromosomes (BAC) followed by sequencing.

Coverage

Coverage is the average number of reads. It can be calculated from the length of the original genome (G), the number of reads(N), and the average read length(L) as N × L/G. A high coverage in shotgun sequencing is desired because it can overcome errors in base calling and assembly. The subject of DNA sequencing theory addresses the relationships of such quantities.

Metagenomic Shotgun Sequencing

With millions of reads from the next generation sequencing of an environmental sample, it is possible to get a complete overview of any complex microbiome with thousands of species, like the gut flora. The sensitivity of metagenomic sequencing makes it an attractive choice for clinical use. It however emphasizes the problem of contamination of the sample or the sequencing pipeline.

Advantages of Shotgun Sequencing

  • By removing the mapping stages, whole genome shotgun sequencing is a much faster process than clone-by-clone sequencing.

  • Whole genome shotgun sequencing uses a fraction of the DNA that clone-by-clone sequencing needs.

  • Whole genome shotgun sequencing is particularly efficient if there is an existing reference sequence. It is much easier to assemble the genome sequence by aligning it to an existing reference genome.

  • Shotgun sequencing is much faster and less expensive than methods requiring a genetic map.

Disadvantages of Shotgun Sequencing

  • Vast amounts of computing power and sophisticated software are required to assemble shotgun sequences together.

  • Errors in assembly are more likely to be made because a genetic map is not used. These errors are generally easier to resolve than in other methods and minimized if a reference genome can be used.

  • Whole genome shotgun sequencing can only really be carried out if a reference genome is already available, otherwise assembly is very difficult without an existing genome to match it to.

  • Whole genome shotgun sequencing can also lead to errors which need to be resolved by other, more labor-intensive types of sequencing, such as clone-by-clone sequencing.

  • Repetitive genomes and sequences can be more difficult to assemble.

Conclusion

Shotgun metagenomic sequencing allows researchers to comprehensively sample all genes in all organisms present in each complex sample. The method enables microbiologists to evaluate bacterial diversity and detect the abundance of microbes in various environments. Shotgun metagenomics also provides a means to study unculturable microorganisms that are otherwise difficult or impossible to analyze.

With the ability to combine many samples in a single sequencing run and obtain high sequence coverage per sample, NGS-based metagenomic sequencing can detect very low abundance members of the microbial community that may be missed or are too expensive to identify using other methods.

Updated on: 18-May-2023

159 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements