Try Before You Buy: A practical data purchasing algorithm for real-world data marketplaces

Data trading is becoming increasingly popular, as evident by the appearance of scores of Data Marketplaces (DMs) in the last few years. Pricing digital assets is particularly complex since, unlike physical assets, digital ones can be replicated at zero cost, stored, and transmitted almost for free, etc. In most DMs, data sellers are invited to indicate a price, together with a description of their datasets. For data buyers, however, deciding whether paying the requested price makes sense, can only be done after having used the data with their AI/ML algorithms. Theoretical works have analysed the problem of which datasets to buy, and at what price, in the context of full information models, in which the performance of algorithms over any of the O(2^N) possible subsets of N datasets is known a priori, together with the value functions of buyers. Such information is, however, difficult to compute, let alone be made public in the context of real-world DMs. In this paper, we show that if a DM provides to potential buyers a measure of the performance of their AI/ML algorithm on individual datasets, then they can select which datasets to buy with an efficacy that approximates that of a complete information model. We call the resulting algorithm Try Before You Buy (TBYB) and demonstrate over synthetic and real-world datasets how TBYB can lead to near optimal buying performance with only O(N) instead of O(2^N) information released by a marketplace.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset