The 28th International Conference on Computational Linguistics (COLING 2020) will be held as all virtual event instead of taking place in Barcelona, Spain. So here is my personal watch list for COLING 2020 including abstracts and links to the papers (in chronological order):
LONG1: Language Modelling 1, Xingyao Zhang, Linjun Shou, Jian Pei, Ming Gong, Lijie Wen, Daxin Jiang
The abundant semi-structured data on the Web, such as HTML-based tables and lists, provide commercial search engines a rich information source for question answering (QA). Different from plain text passages in Web documents, Web tables and lists have inherent structures, which carry semantic correlations among various elements in tables and lists. Many existing studies treat tables and lists as flat documents with pieces of text and do not make good use of semantic information hidden in structures. In this paper, we propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations. We also develop pre-training and reasoning techniques on the graph model for the QA task. Extensive experiments on several real datasets collected from a commercial engine verify the effectiveness of our approach. Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents
POSTER3: Applications: Biomedical, health records and medical texts. Posters, Tuan Lai, Trung Bui, Doo Soon Kim, Quan Hung Tran
Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents, making it challenging to train increasingly complex neural networks. In contrast, digital libraries store millions of scientific articles online, covering a wide range of topics. While a significant portion of these articles contain keyphrases provided by their authors, most other articles lack such kind of annotations. Therefore, to effectively utilize these large amounts of unlabeled articles, we propose a simple and efficient joint learning approach based on the idea of self-distillation. Experimental results show that our approach consistently improves the performance of baseline models for keyphrase extraction. Furthermore, our best models outperform previous methods for the task, achieving new state-of-the-art results on two public benchmarks: Inspec and SemEval-2017.
LONG8: Applications1, Thiemo Wambsganss, Christina Niklaus, Matthias Söllner, Siegfried Handschuh, Jan Marco Leimeister
In this paper, we present a novel annotation approach to capture claims and premises of arguments and their relations in student-written persuasive peer reviews on business models in German language. We propose an annotation scheme based on annotation guidelines that allows to model claims and premises as well as support and attack relations for capturing the structure of argumentative discourse in student-written peer reviews. We conduct an annotation study with three annotators on 50 persuasive essays to evaluate our annotation scheme. The obtained inter-rater agreement of α = 0.57 for argument components and α = 0.49 for argumentative relations indicates that the proposed annotation scheme successfully guides annotators to moderate agreement. Finally, we present our freely available corpus of 1,000 persuasive student-written peer reviews on business models and our annotation guidelines to encourage future research on the design and development of argumentative writing support systems for students.
LONG11: Semantics 1, Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti, Anna Korhonen, Goran Glavaš
Unsupervised pretraining models have been shown to facilitate a wide range of downstream NLP applications. These models, however, retain some of the limitations of traditional static word embeddings. In particular, they encode only the distributional knowledge available in raw text corpora, incorporated through language modeling objectives. In this work, we complement such distributional knowledge with external lexical knowledge, that is, we integrate the discrete knowledge on word-level semantic similarity into pretraining. To this end, we generalize the standard BERT model to a multi-task learning setting where we couple BERT’s masked language modeling and next sentence prediction objectives with an auxiliary task of binary word relation classification. Our experiments suggest that our “Lexically Informed” BERT (LIBERT), specialized for the word-level semantic similarity, yields better performance than the lexically blind “vanilla” BERT on several language understanding tasks. Concretely, LIBERT outperforms BERT in 9 out of 10 tasks of the GLUE benchmark and is on a par with BERT in the remaining one. Moreover, we show consistent gains on 3 benchmarks for lexical simplification, a task where knowledge about word-level semantic similarity is paramount, as well as large gains on lexical reasoning probes.
POSTER9: Text Generation. Posters, Eyal Orbach, Yoav Goldberg
Recent advancements in self-attention neural network architectures have raised the bar for open-ended text generation. Yet, while current methods are capable of producing a coherent text which is several hundred words long, attaining control over the content that is being generated—as well as evaluating it—are still open questions. We propose a controlled generation task which is based on expanding a sequence of facts, expressed in natural language, into a longer narrative. We introduce human-based evaluation metrics for this task, as well as a method for deriving a large training dataset. We evaluate three methods on this task, based on fine-tuning pre-trained models. We show that while auto-regressive, unidirectional Language Models such as GPT2 produce better fluency, they struggle to adhere to the requested facts. We propose a plan-and-cloze model (using fine-tuned XLNet) which produces competitive fluency while adhering to the requested content.
LONG22: Machine Learning 2, Yang Zhang, Qiang Ma
Based on an exponentially increasing number of academic articles, discovering and citing comprehensive and appropriate resources has become a non-trivial task. Conventional citation recommender methods suffer from severe information loss. For example, they do not consider the section of the paper that the user is writing and for which they need to find a citation, the relatedness between the words in the local context (the text span that describes a citation), or the importance on each word from the local context. These shortcomings make such methods insufficient for recommending adequate citations to academic manuscripts. In this study, we propose a novel embedding-based neural network called “dual attention model for citation recommendation (DACR)” to recommend citations during manuscript preparation. Our method adapts embedding of three semantic information: words in the local context, structural contexts, and the section on which a user is working. A neural network model is designed to maximize the similarity between the embedding of the three input (local context words, section and structural contexts) and the target citation appearing in the context. The core of the neural network model is composed of self-attention and additive attention, where the former aims to capture the relatedness between the contextual words and structural context, and the latter aims to learn the importance of them. The experiments on real-world datasets demonstrate the effectiveness of the proposed approach.
, Xunjie Zhu, Gerard de Melo
While important properties of word vector representations have been studied extensively, far less is known about the properties of sentence vector representations. Word vectors are often evaluated by assessing to what degree they exhibit regularities with regard to relationships of the sort considered in word analogies. In this paper, we investigate to what extent commonly used sentence vector representation spaces as well reflect certain kinds of regularities. We propose a number of schemes to induce evaluation data, based on lexical analogy data as well as semantic relationships between sentences. Our experiments consider a wide range of sentence embedding methods, including ones based on BERT-style contextual embeddings. We find that different models differ substantially in their ability to reflect such regularities.
POSTER18: Dialogue and understanding. Posters, Matthijs Westera, Jacopo Amidei, Laia Mayol
We take a close look at a recent dataset of TED-talks annotated with the questions they implicitly evoke, TED-Q (Westera et al., 2020). We test to what extent the relation between a discourse and the questions it evokes is merely one of similarity or association, as opposed to deeper semantic/pragmatic interpretation. We do so by turning the TED-Q dataset into a binary classification task, constructing an analogous task from explicit questions we extract from the BookCorpus (Zhu et al., 2015), and fitting a BERT-based classifier alongside models based on different notions of similarity. The BERT-based classifier, achieving close to human performance, outperforms all similarity-based models, suggesting that there is more to identifying true evoked questions than plain similarity.
LONG36: Summarization, Shahbaz Syed, Roxanne El Baff, Johannes Kiesel, Khalid Al Khatib, Benno Stein, Martin Potthast
The automatic summarization of argumentative texts has hardly been explored. This paper takes a further step in this direction, targeting news editorials, i.e., opinionated articles with a well-defined argumentation structure. With Webis-EditorialSum-2020, we present a corpus of 1330 carefully curated summaries for 266 news editorials. We evaluate these summaries based on a tailored annotation scheme, where a high-quality summary is expected to be thesis-indicative, persuasive, reasonable, concise, and self-contained. Our corpus contains at least three high-quality summaries for about 90% of the editorials, rendering it a valuable resource for the development and evaluation of summarization technology for long argumentative texts. We further report details of both, an in-depth corpus analysis, and the evaluation of two extractive summarization models.
Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning
LONG36: Summarization, Seoyeon Park, Cornelia Caragea
Scientific keyphrase identification and classification is the task of detecting and classifying keyphrases from scholarly text with their types from a set of predefined classes. This task has a wide range of benefits, but it is still challenging in performance due to the lack of large amounts of labeled data required for training deep neural models. In order to overcome this challenge, we explore pre-trained language models BERT and SciBERT with intermediate task transfer learning, using 42 data-rich related intermediate-target task combinations. We reveal that intermediate task transfer learning on SciBERT induces a better starting point for target task fine-tuning compared with BERT and achieves competitive performance in scientific keyphrase identification and classification compared to both previous works and strong baselines. Interestingly, we observe that BERT with intermediate task transfer learning fails to improve the performance of scientific keyphrase identification and classification potentially due to significant catastrophic forgetting. This result highlights that scientific knowledge achieved during the pre-training of language models on large scientific collections plays an important role in the target tasks. We also observe that sequence tagging related intermediate tasks, especially syntactic structure learning tasks such as POS Tagging, tend to work best for scientific keyphrase identification and classification.
LONG42: Text Classification, Jinhua Du, Yan Huang, Karo Moilanen
Recurrent neural networks (RNNs) suffer from well-known limitations and complications which include slow inference and vanishing gradients when processing long sequences in text classification. Recent studies have attempted to accelerate RNNs via various ad hoc mechanisms to skip irrelevant words in the input. However, word skipping approaches proposed to date effectively stop at each or a given time step to decide whether or not a given input word should be skipped, breaking the coherence of input processing in RNNs. Furthermore, current methods cannot change skip rates during inference and are consequently unable to support different skip rates in demanding real-world conditions. To overcome these limitations, we propose Pointer- LSTM, a novel LSTM framework which relies on a pointer network to select important words for target prediction. The model maintains a coherent input process for the LSTM modules and makes it possible to change the skip rate during inference. Our evaluation on four public data sets demonstrates that Pointer-LSTM (a) is 1.1x∼3.5x faster than the standard LSTM architecture; (b) is more accurate than Leap-LSTM (the state-of-the-art LSTM skipping model) at high skip rates; and (c) reaches robust accuracy levels even when the skip rate is changed during inference.
LONG45: Inference and Comprehension, Shuai Pang, Jianqiang Ma, Zeyu Yan, Yang Zhang, Jianping Shen
Recently, pre-trained language models such as BERT have shown state-of-the-art accuracies in text matching. When being applied to IR (or QA), the BERT-based matching models need to online calculate the representations and interactions for all query-candidate pairs. The high inference cost has prohibited the deployments of BERT-based matching models in many practical applications. To address this issue, we propose a novel BERT-based text matching model, in which the representations and the interactions are decoupled. Then, the representations of the candidates can be calculated and stored offline, and directly retrieved during the online matching phase. To conduct the interactions and generate final matching scores, a lightweight attention network is designed. Experiments based on several large scale text matching datasets show that the proposed model, called FASTMATCH, can achieve up to 100X speed-up to BERT and RoBERTa at the online matching phase, while keeping more up to 98.7% of the performance.
POSTER25: Machine Learning and Language Modelling. Posters, Branden Chan, Stefan Schweter, Timo Möller
In this work we present the experiments which lead to the creation of our BERT and ELECTRA based German language models, GBERT and GELECTRA. By varying the input training data, model size, and the presence of Whole Word Masking (WWM) we were able to attain SoTA performance across a set of document classification and named entity recognition (NER) tasks for both models of base and large size. We adopt an evaluation driven approach in training these models and our results indicate that both adding more data and utilizing WWM improve model performance. By benchmarking against existing German models, we show that these models are the best German models to date. All trained models will be made publicly available to the research community.
For all accepted papers please see the COLING 2020 proceedings. https://www.aclweb.org/anthology/events/coling-2020/
To be updated