An individual-level ground truth dataset for home location detection
Home detection, assigning a phone device to its home antenna, is a ubiquitous part of most studies in the literature on mobile phone data. Despite its widespread use, home detection relies on a few assumptions that are difficult to check without ground truth, i.e., where the individual that owns the device resides. In this paper, we provide an unprecedented evaluation of the accuracy of home detection algorithms on a group of sixty-five participants for whom we know their exact home address and the antennas that might serve them. Besides, we analyze not only Call Detail Records (CDRs) but also two other mobile phone streams: eXtended Detail Records (XDRs, the “data” channel) and Control Plane Records (CPRs, the network stream). These data streams vary not only in their temporal granularity but also they differ in the data generation mechanism', e.g., CDRs are purely human-triggered while CPR is purely machine-triggered events. Finally, we quantify the amount of data that is needed for each stream to carry out successful home detection for each stream. We find that the choice of stream and the algorithm heavily influences home detection, with an hour-of-day algorithm for the XDRs performing the best, and with CPRs performing best for the amount of data needed to perform home detection. Our work is useful for researchers and practitioners in order to minimize data requests and to maximize the accuracy of home antenna location.
READ FULL TEXT