Generating Diverse and Consistent QA pairs from Contexts with Information-Maximizing Hierarchical Conditional VAEs
One of the most crucial challenges in questionanswering (QA) is the scarcity of labeled data,since it is costly to obtain question-answer(QA) pairs for a target text domain with human annotation. An alternative approach totackle the problem is to use automatically generated QA pairs from either the problem context or from large amount of unstructured texts(e.g. Wikipedia). In this work, we propose a hierarchical conditional variational autoencoder(HCVAE) for generating QA pairs given unstructured texts as contexts, while maximizingthe mutual information between generated QApairs to ensure their consistency. We validateourInformation MaximizingHierarchicalConditionalVariationalAutoEncoder (Info-HCVAE) on several benchmark datasets byevaluating the performance of the QA model(BERT-base) using only the generated QApairs (QA-based evaluation) or by using boththe generated and human-labeled pairs (semi-supervised learning) for training, against state-of-the-art baseline models. The results showthat our model obtains impressive performance gains over all baselines on both tasks,using only a fraction of data for training
READ FULL TEXT