Estimation of daily streamflow from multiple donor catchments with Graphical Lasso
A novel algorithm is introduced to improve estimations of daily streamflow time series at sites with incomplete records based on the concept of conditional independence in graphical models. The goal is to fill in gaps of historical data or extend records at streamflow stations no longer in operation or even estimate streamflow at ungauged locations. This is achieved by first selecting relevant stations in the hydrometric network as reference (donor) stations and then using them to infer the missing data. The selection process transforms fully connected streamflow stations in the hydrometric network into a sparsely connected network represented by a precision matrix using a Gaussian graphical model. The underlying graph encodes conditional independence conditions which allow determination of an optimum set of reference stations from the fully connected hydrometric network for a study area. The sparsity of the precision matrix is imposed by using the Graphical Lasso algorithm with an L1-norm regularization parameter and a thresholding parameter. The two parameters are determined by a multi-objective optimization process. In addition, an algorithm based on the conditional independence concept is presented to allow a removal of gauges with the least loss of information. Our approaches are illustrated with daily streamflow data from a hydrometric network of 34 gauges between 1 January 1950 and 31 December 1980 over the Ohio River basin. Our results show that the use of conditional independence conditions can lead to more accurate streamflow estimates than the widely used approaches which are based on either distance or pair-wise correlation.
READ FULL TEXT