Why is it useful to compare and align biosequences?

The alignment depends on the fact that all living organisms are associated by evolution. This uses that the nucleotide (DNA, RNA) and proteins series of the species that are nearer to each other in evolution must exhibit higher similarities.

An alignment is the phase of lining up sequences to obtain a maximal level of identity, which also defines the degree of similarity among sequences. There are two sequences are homologous if they send a common ancestor.

The degree of similarity acquired by sequence alignment can be beneficial in deciding the possibility of homology among two sequences. Such an alignment support decide the relative positions of different species in an evolution tree, which is known as phylogenetic tree.

The issue of alignment of biological sequences can be defined as follows − Given two or more input biological sequences, recognizes same sequences with high conserved subsequences. If the multiple sequences to be aligned is two, it is known as pairwise sequence alignment; therefore, it is multiple sequence alignment.

The sequences to be distinguished and aligned can be nucleotides (DNA/RNA) or amino acids (proteins). For nucleotides, two symbols align if they are exact. But for amino acids, two symbols align if they are exact, or if one can be changed from the other by substitutions that are appear in nature.

There are two types of alignments including local alignments versus global alignments. The former defines that only areas of the sequences are aligned, whereas the latter needed alignment over the complete length of the sequences.

For nucleotides or amino acids, insertions, deletions, and substitutions appear in nature with multiple probabilities. Substitution matrices define the probabilities of substitutions of nucleotides or amino acids and probabilities of insertions and deletions.

It is frequently uses the gap character, “−”, to denote positions where it is desirable not to align two symbols. It can compute the quality of alignments, a scoring structure is generally defined, which generally counts identical or same symbols as positive scores and gaps as negative ones.

The algebraic sum of the scores is taken as the alignment scope. The objective of alignment is to obtain the maximal score between some possible alignments. However, it is very costly to discover optimal alignment. Hence, there are several heuristic techniques have been developed to discover suboptimal alignments.

A genome is the whole set of genes of an organism. When proteins are required, the equivalent genes are copied into RNA. RNA is a chain of nucleotides. DNA conducts the synthesis of several RNA molecules, each with a specific role in cellular function.