Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage

01/09/2020
by   Tuan Thanh Nguyen, et al.
0

We propose coding techniques that limit the length of homopolymers runs, ensure the GC-content constraint, and are capable of correcting a single edit error in strands of nucleotides in DNA-based data storage systems. In particular, for given ℓ, ϵ > 0, we propose simple and efficient encoders/decoders that transform binary sequences into DNA base sequences (codewords), namely sequences of the symbols A, T, C and G, that satisfy the following properties: (i) Runlength constraint: the maximum homopolymer run in each codeword is at most ℓ, (ii) GC-content constraint: the GC-content of each codeword is within [0.5-ϵ, 0.5+ϵ], (iii) Error-correction: each codeword is capable of correcting a single deletion, or single insertion, or single substitution error. For practical values of ℓ and ϵ, we show that our encoders achieve much higher rates than existing results in the literature and approach the capacity. Our methods have low encoding/decoding complexity and limited error propagation.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset