How Stable is Knowledge Base Knowledge?
Knowledge Bases (KBs) provide structured representation of the real-world in the form of extensive collections of facts about real-world entities, their properties and relationships. They are ubiquitous in large-scale intelligent systems that exploit structured information such as in tasks like structured search, question answering and reasoning, and hence their data quality becomes paramount. The inevitability of change in the real-world, brings us to a central property of KBs – they are highly dynamic in that the information they contain are constantly subject to change. In other words, KBs are unstable. In this paper, we investigate the notion of KB stability, specifically, the problem of KBs changing due to real-world change. Some entity-property-pairs do not undergo change in reality anymore (e.g., Einstein-children or Tesla-founders), while others might well change in the future (e.g., Tesla-board member or Ronaldo-occupation as of 2022). This notion of real-world grounded change is different from other changes that affect the data only, notably correction and delayed insertion, which have received attention in data cleaning, vandalism detection, and completeness estimation already. To analyze KB stability, we proceed in three steps. (1) We present heuristics to delineate changes due to world evolution from delayed completions and corrections, and use these to study the real-world evolution behaviour of diverse Wikidata domains, finding a high skew in terms of properties. (2) We evaluate heuristics to identify entities and properties likely to not change due to real-world change, and filter inherently stable entities and properties. (3) We evaluate the possibility of predicting stability post-hoc, specifically predicting change in a property of an entity, finding that this is possible with up to 83 F1 score, on a balanced binary stability prediction task.
READ FULL TEXT