Headline Generation: Learning from Decomposed Document Titles

04/17/2019
by   Oleg Vasilyev, et al.
0

We propose a novel method for generating titles for unstructured text documents. We reframe the problem as a sequential question-answering task. A deep neural network is trained on document-title pairs that have the property of decomposability, in which the vocabulary of the document title is a subset of the vocabulary of the document body. To train the model we use a corpus of millions of publicly available document-title pairs: news articles and headlines. We present the results of a randomized double-blind trial in which subjects were unaware of which titles were human or machine-generated. When trained on approximately 1.5 million news articles, the model generates headlines that humans judge to be as good or better than the original human-written headlines in the majority of cases.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset