The content correlation of multiple streaming edges
We study how to detect clusters in a graph defined by a stream of edges, without storing the entire graph. We extend the approach to dynamic graphs defined by the most recent edges of the stream and to several streams. The content correlation of two streams ρ(t) is the Jaccard similarity of their clusters in the windows before time t. We propose a simple and efficient method to approximate this correlation online and show that for dynamic random graphs which follow a power law degree distribution, we can guarantee a good approximation. As an application, we follow Twitter streams and compute their content correlations online. We then propose a search by correlation where answers to sets of keywords are entirely based on the small correlations of the streams. Answers are ordered by the correlations, and explanations can be traced with the stored clusters.
READ FULL TEXT