Small space and streaming pattern matching with k edits

06/10/2021
by   Tomasz Kociumaka, et al.
0

In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer k, a pattern P of length m, and a text T of length n ≥ m, the task is to find substrings of T that are within edit distance k from P. Our main result is a streaming algorithm that solves the problem in Õ(k^5) space and Õ(k^8) amortised time per character of the text, providing answers correct with high probability. (Hereafter, Õ(·) hides a poly(log n) factor.) This answers a decade-old question: since the discovery of a poly(klog n)-space streaming algorithm for pattern matching under Hamming distance by Porat and Porat [FOCS 2009], the existence of an analogous result for edit distance remained open. Up to this work, no poly(klog n)-space algorithm was known even in the simpler semi-streaming model, where T comes as a stream but P is available for read-only access. In this model, we give a deterministic algorithm that achieves slightly better complexity. In order to develop the fully streaming algorithm, we introduce a new edit distance sketch parametrised by integers n≥ k. For any string of length at most n, the sketch is of size Õ(k^2) and it can be computed with an Õ(k^2)-space streaming algorithm. Given the sketches of two strings, in Õ(k^3) time we can compute their edit distance or certify that it is larger than k. This result improves upon Õ(k^8)-size sketches of Belazzougui and Zhu [FOCS 2016] and very recent Õ(k^3)-size sketches of Jin, Nelson, and Wu [STACS 2021].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2023

Streaming k-edit approximate pattern matching via string decomposition

In this paper we give an algorithm for streaming k-edit approximate patt...
research
07/09/2019

L_p Pattern Matching in a Stream

We consider the problem of computing distance between a pattern of lengt...
research
10/08/2018

Approximate Online Pattern Matching in Sub-linear Time

We consider the approximate pattern matching problem under edit distance...
research
07/13/2023

A Noise Resilient Transformation for Streaming Algorithms

In a streaming algorithm, Bob receives an input x ∈{0,1}^n via a stream ...
research
04/03/2020

Relative Error Streaming Quantiles

Approximating ranks, quantiles, and distributions over streaming data is...
research
03/20/2023

On the Maximal Independent Sets of k-mers with the Edit Distance

In computational biology, k-mers and edit distance are fundamental conce...
research
04/06/2022

Faster Pattern Matching under Edit Distance

We consider the approximate pattern matching problem under the edit dist...

Please sign up or login with your details

Forgot password? Click here to reset