Sequence-Subset Distance and Coding for Error Control in DNA Data Storage

09/16/2018
by   Wentu Song, et al.
0

The process of DNA data storage can be mathematically modelled as a communication channel, termed DNA storage channel, whose inputs and outputs are sets of unordered sequences. To design error correcting codes for DNA storage channel, a new metric, termed the sequence-subset distance, is introduced, which generalizes the Hamming distance to a distance function defined between any two sets of unordered vectors and helps to establish a uniform framework to design error correcting codes for DNA storage channel. We further introduce a family of error correcting codes, termed sequence subset codes, for DNA storage and show that the error-correcting ability of such codes is completely determined by their minimum distance. We derived some upper bounds on the size of the sequence subset codes including a Singleton-like bound and a Plotkin-like bound. We also propose some constructions, which imply lower bounds on the size of such codes.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset