Browse/search for people
# Publication
- Dr Raphael Clifford

Clifford, R, Fontaine, A, Porat, E, Sach, B & Starikovskaia, T, 2016, ‘The *k*-mismatch problem revisited’. in: *Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms*. Society for Industrial and Applied Mathematics, pp. 2039-2052

We revisit the complexity of one of the most basic problems in pattern matching. In the *k*-mismatch problem we must compute the Hamming distance between a pattern of length *m* and every *m*-length substring of a text of length *n*, as long as that Hamming distance is at most *k*. Where the Hamming distance is greater than *k* at some alignment of the pattern and text, we simply output “No”.

We study this problem in both the standard offline setting and also as a streaming problem. In the streaming *k*-mismatch problem

the text arrives one symbol at a time and we must give an output before

processing any future symbols. Our main results are as follows:

Our first result is a deterministic *O*(*nk*^{2} log *k*/*m* + *n* polylog *m*) time *offline* algorithm for *k*-mismatch on a text of length *n*. This is a factor of *k* improvement over the fastest previous result of this form from SODA 2000 [9, 10].

We then give a randomised and online algorithm which runs in the same time complexity but requires only *O*(*k*^{2} polylog *m*) space in total.

Next we give a randomised (1 + ∊)-approximation algorithm for the streaming *k*-mismatch problem which uses *O*(*k*^{2} polylog *m*/∊^{2}) space and runs in *O*(polylog *m*/∊^{2}) worst-case time per arriving symbol.

Finally we combine our new results to derive a randomised *O*(*k*^{2} polylog *m*) space algorithm for the streaming *k*-mismatch problem which runs in *O*(√*k* log *k* + polylog *m*) worst-case time per arriving symbol. This improves the best previous space complexity for streaming *k*-mismatch from FOCS 2009 [26] by a factor of *k*.

We also improve the time complexity of this previous result by an even

greater factor to match the fastest known offline algorithm (up to

logarithmic factors).