Using codes in place of DNA Sample in Databases to reduce Storage

  • Shan e Zahra Faculty of Computer Science , Lahore Garrison University
  • Sabir Abbas Faculty of Computer Science , Lahore Garrison University
  • Tayyab Altaf Faculty of Computer Science , Lahore Garrison University
Keywords: Adenine, Arithmetic Coding, Base pair, Bits per base, Cytosine, Guanine, Run length encoding (RLE) and Thymine.

Abstract

Biological data mainly comprises of Deoxyribonucleic acid (DNA) and protein sequences. These are
the biomolecules that are present in all cells of human beings. Due to the self-replicating property of
DNA, it is a key constituent of genetic material that exists in all breathing creatures. This biomolecule
(DNA) comprehends the genetic material obligatory for the operational and expansion of all personified
lives. To save DNA data of a single person we require 10CD-Rom's. In this paper, A lossless three-phase
compression algorithm is presented for DNA sequences. In the first phase the dataset is segmented
having tetra groups and then the resultant genetic sequences are compressed in the form of unique
numbers (e.g Array Index) and in the second phase binary code is generated on the bases of array index
numbers and in the last phase the modified version of Run Length Encoding (RLE) is applied on the
dataset.
The newly proposed technique has been implemented and its performance is also measured on samples.
It has achieved the best average compression ratio. After Storing different DNA Samples.

Published
2020-04-21
How to Cite
Shan e Zahra, Sabir Abbas, & Tayyab Altaf. (2020). Using codes in place of DNA Sample in Databases to reduce Storage. Lahore Garrison University Research Journal of Computer Science and Information Technology, 3(3), 10-19. https://doi.org/10.54692/lgurjcsit.2019.030386
Section
Articles