On Clustering Incomplete Data

11/04/2019
by   Eduard Eiben, et al.
0

We study fundamental clustering problems for incomplete data. In this setting, we are given a set of incomplete d-dimensional Boolean vectors (representing the rows of a matrix), and the goal is to complete the missing vector entries so that the set of complete vectors admits a partitioning into at most k clusters with radius or diameter at most r. We develop a toolkit and use it to give tight characterizations of the parameterized complexity of these problems with respect to the parameters k, r, and the minimum number of rows and columns needed to cover all the missing entries. We show that the aforementioned problems are fixed-parameter tractable when parameterized by the three parameters combined, and that dropping any of these three parameters results in parameterized intractability. We extend this toolkit to settle the parameterized complexity of other clustering problems, answering an open question along the way. We also show how our results can be extended to data over any constant-size domain. A byproduct of our results is that, for the complete data setting, all problems under consideration are fixed-parameter tractable parameterized by k+r.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset