Reduction of the Random Access Memory Size in Adjoint Algorithmic Differentiation by Overloading
Adjoint algorithmic differentiation by operator and function overloading is based on the interpretation of directed acyclic graphs resulting from evaluations of numerical simulation programs. The size of the computer system memory required to store the graph grows proportional to the number of floating-point operations executed by the underlying program. It quickly exceeds the available memory resources. Naive adjoint algorithmic differentiation often becomes infeasible except for relatively simple numerical simulations. Access to the data associated with the graph can be classified as sequential and random. The latter refers to memory access patterns defined by the adjacency relationship between vertices within the graph. Sequentially accessed data can be decomposed into blocks. The blocks can be streamed across the system memory hierarchy thus extending the amount of available memory, for example, to hard discs. Asynchronous i/o can help to mitigate the increased cost due to accesses to slower memory. Much larger problem instances can thus be solved without resorting to technically challenging user intervention such as checkpointing. Randomly accessed data should not have to be decomposed. Its block-wise streaming is likely to yield a substantial overhead in computational cost due to data accesses across blocks. Consequently, the size of the randomly accessed memory required by an adjoint should be kept minimal in order to eliminate the need for decomposition. We propose a combination of dedicated memory for adjoint L-values with the exploitation of remainder bandwidth as a possible solution. Test results indicate significant savings in random access memory size while preserving overall computational efficiency.
READ FULL TEXT