Linear-time computation of generalized minimal absent words for multiple strings
A string w is called a minimal absent word (MAW) for a string S if w does not occur as a substring in S and all proper substrings of w occur in S. MAWs are well-studied combinatorial string objects that have potential applications in areas including bioinformatics, musicology, and data compression. In this paper, we generalize the notion of MAWs to a set 𝒮 = {S_1, …, S_k} of multiple strings. We first describe our solution to the case of k = 2 strings, and show how to compute the set 𝖬 of MAWs in optimal O(n + |𝖬|) time and with O(n) working space, where n denotes the total length of the strings in 𝒮. We then move on to the general case of k > 2 strings, and show how to compute the set 𝖬 of MAWs in O(n ⌈ k / log n ⌉ + |𝖬|) time and with O(n (k + log n)) bits of working space, in the word RAM model with machine word size ω = log n. The latter algorithm runs in optimal O(n + |𝖬|) time for k = O(log n).
READ FULL TEXT