Faster Approximate(d) Text-to-Pattern L1 Distance
The problem of finding distance between pattern of length m and text of length n is a typical way of generalizing pattern matching to incorporate dissimilarity score. For both Hamming and L_1 distances only a super linear upper bound O(n√(m)) are known, which prompts the question of relaxing the problem: either by asking for 1 ±ε approximate distance (every distance is reported up to a multiplicative factor), or k-approximated distance (distances exceeding k are reported as ∞). We focus on L_1 distance, for which we show new algorithms achieving complexities respectively O(ε^-1 n) and O((m+k√(m)) · n/m). This is a significant improvement upon previous algorithms with runtime O(ε^-2 n) of Lipsky and Porat (Algorithmica 2011) and O(n√(k)) of Amir, Lipsky, Porat and Umanski (CPM 2005). We also provide a series of reductions, showing that if our upper bound for approximate L_1 distance is tight, then so is our upper bound for k-approximated L_1 distance, and if the latter is tight then so is k-approximated Hamming distance upper bound due to the result of Gawrychowski and Uznański (arXiv 2017).
READ FULL TEXT