Bayesian cross-validation of geostatistical models
The problem of validating or criticising models for georeferenced data is challenging, since the conclusions can vary significantly depending on the locations of the validation set. This work proposes the use of cross-validation techniques to assess the goodness of fit of spatial models in different regions of the spatial domain to account for uncertainty in the choice of the validation sets. An obvious problem with the basic cross-validation scheme is that it is based on selecting only a few out of sample locations to validate the model, possibily making the conclusions sensitive to which partition of the data into training and validation cases is utilized. A possible solution to this issue would be to consider all possible configurations of data divided into training and validation observations. From a Bayesian point of view, this could be computationally demanding, as estimation of parameters usually requires Monte Carlo Markov Chain methods. To deal with this problem, we propose the use of estimated discrepancy functions considering all configurations of data partition in a computationally efficient manner based on sampling importance resampling. In particular, we consider uncertainty in the locations by assigning a prior distribution to them. Furthermore, we propose a stratified cross-validation scheme to take into account spatial heterogeneity, reducing the total variance of estimated predictive discrepancy measures considered for model assessment. We illustrate the advantages of our proposal with simulated examples of homogeneous and inhomogeneous spatial processes to investigate the effects of our proposal in scenarios of preferential sampling designs. The methods are illustrated with an application to a rainfall dataset.
READ FULL TEXT