Probabilistic Databases with an Infinite Open-World Assumption
Probabilistic databases (PDBs) introduce uncertainty into relational databases by specifying probabilities for several possible instances. That is, they are traditionally finite probability spaces over database instances. Such PDBs inherently make a closed-world assumption - non-occurring facts are assumed to be impossible, rather than just unlikely. As convincingly argued by Ceylan et al. (KR 2016), this results in implausibilities and clashes with intuition. An open-world assumption, where facts not explicitly listed may have a small positive probability can yield more reasonable results. The corresponding open-world model of Ceylan et al. however assumes that all entities in the PDB come from a fixed finite universe. In this work, we take one further step and propose a model of "truly" open-world PDBs with an infinite universe. This is natural when we for example consider entities to be integers, real numbers or strings. While the probability space might become infinitely large, all instances of a PDB remain finite. We provide a sound mathematical framework for infinite PDBs in generalization of the existing theory on finite PDBs. Our main results are concerned with tuple-independent PDBs; we present a generic construction showing that such PDBs exist in the infinite and provide a characterization of their existence in general. This model can be used to apply open-world semantics to finite PDBs. The construction can also be extended to so-called block-independent-disjoint probabilistic databases. Algorithmic questions are not the focus of this paper, but we show how query evaluation algorithms can be lifted from finite PDBs to perform approximate evaluation in infinite PDBs.
READ FULL TEXT