Publications

Publications by years

2024

  • Malte Ostendorff, Pedro Ortiz Suarez, Lucas Fonseca Lage, and Georg Rehm. LLM-Datasets: An Open Framework for Pretraining Datasets of Large Language Models. In COLM 2024. PDF URL
  • Manuel Brack, Malte Ostendorff, Pedro Ortiz Suarez, Jose Javier Saiz, Iñaki Lacunza Castilla, Jorge Palomar-Giner, Alexander Shvets, Patrick Schramowski, Georg Rehm, Marta Villegas, Kristian Kersting. Community OSCAR: A Community Effort for Multilingual Web Data. In Preprint. URL
  • Orhun Caglidil, Malte Ostendorff, Georg Rehm. Investigating Gender Bias in Turkish Language Models. In Preprint. URL
  • Martin Courtis, Malte Ostendorff, Leonhard Hennig, and Georg Rehm. Symmetric Dot-Product Attention for Efficient Training of BERT Language Models. In ACL Findings 2024. URL
  • Jorge Palomar-Giner, Jose Saiz, Ferran Espuña, Mario Mina, Severino Da Dalt, Joan Llop, Malte Ostendorff, Pedro Ortiz Suarez, Georg Rehm, Aitor Gonzalez-Agirre, and Marta Villegas. A CURATEd CATalog: Rethinking the Extraction of Pretraining Corpora for Mid-Resourced Languages. In LREC-COLING 2024.
  • Ben Hauptvogel, Malte Ostendorff, Georg Rehm, Sebastian Möller. Reward Modeling with Weak Supervision for Language Models. In Preprint. URL
  • 2023

  • Malte Ostendorff and Georg Rehm. Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning. In PML4DC @ ICLR 2023. PDF URL
  • Malte Ostendorff, Pedro Ortiz Suarez, Julian Moreno-Schneider, and Georg Rehm. Europe’s Technical Debt: Why We Need Web Search in the Age of Generative AI. In OSSYM 2023. PDF URL
  • Tim Schopf, Emanuel Gerber, Malte Ostendorff, and Florian Matthes. AspectCSE: Sentence Embeddings for Aspect-based Semantic Textual Similarity Using Contrastive Learning and Structured Knowledge. In RANLP 2023. URL
  • Maria Gonzalez Garcia, Julian Moreno Schneider, Malte Ostendorff, and Georg Rehm. Integration of a Semantic Storytelling Recommender System in Speech Assistants. In Text2Story @ ECIR 2023. URL
  • Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr. Tokenizer Choice For LLM Training: Negligible or Crucial?. In arXiv preprint. URL
  • Georg Rehm, Muhammad Zeshan Afzal, Aliki Anagnostopoulou, Mahta Bakshizadeh, Stephan Baumann, Cristina España-Bonet, Carlos Franzreb, Josef van Genabith, Annika Grützner-Zahn, Stefanie Hegele, Tim Herzig, Jakob Karolus, Siting Liang, Paul Lukowicz, Katrin Marheinecke, Julián Moreno-Schneider, Malte Ostendorff, Simon Ostermann, Alain Pagani, Tim Polzehl, and Sven Schmeier. European streaming platform for national news accessible in all EU languages: Technical feasibility study. In European Parliament, European Parliamentary Research Service, 6 2023. STUDY Panel for the Future of Science and Technology. Scientific Foresight Unit (STOA) PE 740.249 - June 2023. URL
  • 2022

  • Malte Ostendorff, Till Blume, Terry Ruas, Bela Gipp, Georg Rehm. Specialized Document Embeddings for Aspect-based Similarity of Research Papers. In JCDL 2022. URL
  • Malte Ostendorff, Nils Rethmeier, Isabelle Augenstein, Bela Gipp, Georg Rehm. Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings. In EMNLP 2022. URL
  • Qian Ruan, Malte Ostendorff, Georg Rehm. HiStruct+: Improving Extractive Text Summarization with Hierarchical Structure Information. In Findings ACL 2022. URL
  • Niklas Dehio, Malte Ostendorff, Georg Rehm. Claim Extraction and Law Matching for COVID-19-related Legislation. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022). URL
  • Michael Raring, Malte Ostendorff, Georg Rehm. Semantic Relations between Text Segments for Semantic Storytelling: Annotation Tool - Dataset - Evaluation. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022). URL
  • Rémi Calizzano, Malte Ostendorff, Qian Ruan, Georg Rehm. Generating Extended and Multilingual Summaries with Pre-trained Transformers. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022).
  • 2021

  • Malte Ostendorff, Elliott Ash, Terry Ruas, Bela Gipp, Julian Moreno-Schneider, Georg Rehm. Evaluating Document Representations for Content-based Legal Literature Recommendations. In Proceedings of the 18th International Conference on Artificial Intelligence and Law (ICAIL 2021). PDF URL
  • Malte Ostendorff, Corinna Breitinger, Bela Gipp. A Qualitative Evaluation of User Preference for Link-based vs. Text-based Recommendations of Wikipedia Articles. In Proceedings of the 23rd International Conference on Asia-Pacific Digital Libraries (ICADL 2021). URL
  • Remi Calizzano, Malte Ostendorff, Georg Rehm. DFKI SLT at GermEval 2021: Multilingual Pre-training and Data Augmentation for the Classification of Toxicity in Social Media Comments. In Proceedings of the GermEval 2021 Workshop on the Identification of Toxic, Engaging, and Fact-Claiming Comments: 17th Conference on Natural Language Processing KONVENS 2021. URL
  • Saskia Ostendorff, Malte Ostendorff. Open Legal Data - Warum der Rechtsstaat mehr offene Daten brauch. In Recht und Zugang 03/2021 (RuZ). (in print)
  • Lydia Pintscher, Peter Bourgonje, Julián Moreno Schneider, Malte Ostendorff, and Georg Rehm. Wissensbasen für die automatische Erschließung und ihre Qualität am Beispiel von Wikidata. In Qualität der Inhaltserschließung, Bibliotheks- und Informationspraxis (BIPRA). De Gruyter Saur. URL
  • Georg Rehm, Karolina Zaczynska, Peter Bourgonje, Malte Ostendorff, Julián Moreno-Schneider, Maria Berger, Jens Rauenbusch, André Schmidt, Mikka Wild, Joachim Böttger, Joachim Quantz, Jan Thomsen, and Rolf Fricke. Semantic Storytelling: From Experiments and Prototypes to a Technical Solution. In Computational Analysis of Storylines: Making Sense of Events. Cambridge University Press. URL
  • Dmitrii Aksenov, Peter Bourgonje, Karolina Zaczynska, Malte Ostendorff, Julián Moreno-Schneider, and Georg Rehm. Fine-grained Classification of Political Bias in German News: A Data Set and Initial Experiments. In Proceedings of the Workshop on Online Abuse and Harms (WOAH 2021). URL
  • Rémi Calizzano, Malte Ostendorff, and Georg Rehm. Ordering Sentences and Paragraphs with Pre-trained Encoder-Decoder Transformers and Pointer Ensembles. In The ACM Symposium on Document Engineering (DocEng 2021). PDF URL
  • 2020

  • Malte Ostendorff, Terry Ruas, Till Blume, Bela Gipp, Georg Rehm. Aspect-based Document Similarity for Research Papers. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020). PDF URL
  • Malte Ostendorff, Terry Ruas, Moritz Schubotz, Georg Rehm, Bela Gipp. Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020). PDF URL
  • Malte Ostendorff, Till Blume, Saskia Ostendorff. Towards an Open Platform for Legal Information. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020). PDF URL
  • Malte Ostendorff. Contextual Document Similarity for Content-based Literature Recommender Systems. In Proceedings of the Doctoral Consortium at ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020). PDF
  • Philipp Scharpf, Moritz Schubotz, André Greiner-Petter, Malte Ostendorff, Olaf Teschke, Bela Gipp. ARQMath Lab: An Incubator for Semantic Formula Search in zbMATH Open?. In ARQMath Lab at the Conference and Labs of the Evaluation Forum (CLEF). PDF
  • Sarah Schulz, Jurica Seva, Samuel Rodriguez, Malte Ostendorff, Georg Rehm. Named Entities in Medical Case Reports: Corpus and Experiments. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). PDF
  • Georg Rehm, Karolina Zaczynska, Julian Moreno Schneider, Malte Ostendorff, Peter Bourgonje, Maria Berger, Jens Rauenbusch, Andre Schmidt, Mikka Wild. Towards Discourse Parsing-inspired Semantic Storytelling. In Proceedings of QURATOR 2020. PDF
  • Georg Rehm et al.. QURATOR: Innovative Technologies for Content and Data Curation. In Proceedings of QURATOR 2020. PDF
  • Saskia Ostendorff, Malte Ostendorff. Open Data in der Justiz - Open Legal Data: Ein Appell für die vollständige Veröffentlichung aller Urteile. In Betrifft JUSTIZ, Nr. 143 von September 2020. URL
  • 2019

  • Malte Ostendorff, Peter Bourgonje, Maria Berger, Julian Moreno Schneider, Georg Rehm, Bela Gipp. Enriching BERT with Knowledge Graph Embedding for Document Classification. In GermEval Workshop located at KONVENS 2019. PDF URL
  • Saskia Ostendorff, Malte Ostendorff. Open Justice, Open Government und Open Legal Data. In Re:Thinking Law 03/2019. URL
  • 2018

  • Saskia Ostendorff, Malte Ostendorff. Open Legal Data: Make Law Transparent Again. In Chaos Communication Congress (#35C3) - Lightning Talks. PDF URL YouTube
  • 2017

    • Malte Schwarzer, Corinna Breitinger, Moritz Schubotz, Norman Meuschke, Bela Gipp. Citolytics: A Link-based Recommender System for Wikipedia. In Proceedings of the Eleventh ACM Conference on Recommender Systems (RecSys`17). DOI PDF

    2016

    • Malte Schwarzer, Moritz Schubotz, Norman Meuschke, Corinna Breitinger, Volker Markl, Bela Gipp. Evaluating Link-based Recommendations for Wikipedia. In Proceedings of the 16th ACM/IEEE Joint Conference on Digital Libraries (JCDL`16). DOI PDF
    • Malte Schwarzer, Jonas Düver, Andreas Lommatsch. An Interactive e-Government Question Answering System. In Proceedings of Lernen, Wissen, Daten, Analysen (LWDA`16). PDF