Predicting Word Learning in Children from the Performance of Computer Vision Systems
For human children as well as machine learning systems, a key challenge in learning a word is linking the word to the visual phenomena it describes. We explore this aspect of word learning by using the performance of computer vision systems as a proxy for the difficulty of learning a word from visual cues. We show that the age at which children acquire different categories of words is predicted by the performance of visual classification and captioning systems, over and above the expected effects of word frequency. The performance of the computer vision systems is related to human judgments of the concreteness of words, supporting the idea that we are capturing the relationship between words and visual phenomena.
READ FULL TEXT