Dynamic Database Embeddings with FoRWaRD
We study the problem of computing an embedding of the tuples of a relational database in a manner that is extensible to dynamic changes of the database. Importantly, the embedding of existing tuples should not change due to the embedding of newly inserted tuples (as database applications might rely on existing embeddings), while the embedding of all tuples, old and new, should retain high quality. This task is challenging since state-of-the-art embedding techniques for structured data, such as (adaptations of) embeddings on graphs, have inherent inter-dependencies among the embeddings of different entities. We present the FoRWaRD algorithm (Foreign Key Random Walk Embeddings for Relational Databases) that draws from embedding techniques for general graphs and knowledge graphs, and is inherently utilizing the schema and its key and foreign-key constraints. We compare FoRWaRD to an alternative approach that we devise by adapting node embeddings for graphs (Node2Vec) to dynamic databases. We show that FoRWaRD is comparable and sometimes superior to state-of-the-art embeddings in the static (traditional) setting, using a collection of downstream tasks of column prediction over geographical and biological domains. More importantly, in the dynamic setting FoRWaRD outperforms the alternatives consistently and often considerably, and features only a mild reduction of quality even when the database consists of mostly newly inserted tuples.
READ FULL TEXT