Gzip versus bag-of-words for text classification with KNN

07/27/2023
by   Juri Opitz, et al.
0

The effectiveness of compression distance in KNN-based text classification ('gzip') has recently garnered lots of attention. In this note we show that simpler means can also be effective, and compression may not be needed. Indeed, a 'bag-of-words' matching can achieve similar or better results, and is more efficient.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset