Malte Ostendorff, Pedro Ortiz Suarez, Lucas Fonseca Lage, and Georg Rehm.
LLM-Datasets: An Open Framework for Pretraining Datasets of Large Language Models.
In COLM 2024.
PDFURL
Manuel Brack, Malte Ostendorff, Pedro Ortiz Suarez, Jose Javier Saiz, Iñaki Lacunza Castilla, Jorge Palomar-Giner, Alexander Shvets, Patrick Schramowski, Georg Rehm, Marta Villegas, Kristian Kersting.
Community OSCAR: A Community Effort for Multilingual Web Data.
In Preprint.
URL
Orhun Caglidil, Malte Ostendorff, Georg Rehm.
Investigating Gender Bias in Turkish Language Models.
In Preprint.
URL
Martin Courtis, Malte Ostendorff, Leonhard Hennig, and Georg Rehm.
Symmetric Dot-Product Attention for Efficient Training of BERT Language Models.
In ACL Findings 2024.
URL
Jorge Palomar-Giner, Jose Saiz, Ferran Espuña, Mario Mina, Severino Da Dalt, Joan Llop, Malte Ostendorff, Pedro Ortiz Suarez, Georg Rehm, Aitor Gonzalez-Agirre, and Marta Villegas.
A CURATEd CATalog: Rethinking the Extraction of Pretraining Corpora for Mid-Resourced Languages.
In LREC-COLING 2024.
Ben Hauptvogel, Malte Ostendorff, Georg Rehm, Sebastian Möller.
Reward Modeling with Weak Supervision for Language Models.
In Preprint.
URL
2023
Malte Ostendorff and Georg Rehm.
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning.
In PML4DC @ ICLR 2023.
PDFURL
Malte Ostendorff, Pedro Ortiz Suarez, Julian Moreno-Schneider, and Georg Rehm.
Europe’s Technical Debt: Why We Need Web Search in the Age of Generative AI.
In OSSYM 2023.
PDFURL
Tim Schopf, Emanuel Gerber, Malte Ostendorff, and Florian Matthes.
AspectCSE: Sentence Embeddings for Aspect-based Semantic Textual Similarity Using Contrastive Learning and Structured Knowledge.
In RANLP 2023.
URL
Maria Gonzalez Garcia, Julian Moreno Schneider, Malte Ostendorff, and Georg Rehm.
Integration of a Semantic Storytelling Recommender System in Speech Assistants.
In Text2Story @ ECIR 2023.
URL
Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr.
Tokenizer Choice For LLM Training: Negligible or Crucial?.
In arXiv preprint.
URL
Georg Rehm, Muhammad Zeshan Afzal, Aliki Anagnostopoulou, Mahta Bakshizadeh, Stephan Baumann, Cristina España-Bonet, Carlos Franzreb, Josef van Genabith, Annika Grützner-Zahn, Stefanie Hegele, Tim Herzig, Jakob Karolus, Siting Liang, Paul Lukowicz, Katrin Marheinecke, Julián Moreno-Schneider, Malte Ostendorff, Simon Ostermann, Alain Pagani, Tim Polzehl, and Sven Schmeier.
European streaming platform for national news accessible in all EU languages: Technical feasibility study.
In European Parliament, European Parliamentary Research Service, 6 2023. STUDY Panel for the Future of Science and Technology. Scientific Foresight Unit (STOA) PE 740.249 - June 2023.
URL
2022
Malte Ostendorff, Till Blume, Terry Ruas, Bela Gipp, Georg Rehm.
Specialized Document Embeddings for Aspect-based Similarity of Research Papers.
In JCDL 2022.
URL
Malte Ostendorff, Nils Rethmeier, Isabelle Augenstein, Bela Gipp, Georg Rehm.
Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings.
In EMNLP 2022.
URL
Qian Ruan, Malte Ostendorff, Georg Rehm.
HiStruct+: Improving Extractive Text Summarization with Hierarchical Structure Information.
In Findings ACL 2022.
URL
Niklas Dehio, Malte Ostendorff, Georg Rehm.
Claim Extraction and Law Matching for COVID-19-related Legislation.
In Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022).
URL
Michael Raring, Malte Ostendorff, Georg Rehm.
Semantic Relations between Text Segments for Semantic Storytelling: Annotation Tool - Dataset - Evaluation.
In Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022).
URL
Rémi Calizzano, Malte Ostendorff, Qian Ruan, Georg Rehm.
Generating Extended and Multilingual Summaries with Pre-trained Transformers.
In Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022).
2021
Malte Ostendorff, Elliott Ash, Terry Ruas, Bela Gipp, Julian Moreno-Schneider, Georg Rehm.
Evaluating Document Representations for Content-based Legal Literature Recommendations.
In Proceedings of the 18th International Conference on Artificial Intelligence and Law (ICAIL 2021).
PDFURL
Malte Ostendorff, Corinna Breitinger, Bela Gipp.
A Qualitative Evaluation of User Preference for Link-based vs. Text-based Recommendations of Wikipedia Articles.
In Proceedings of the 23rd International Conference on Asia-Pacific Digital Libraries (ICADL 2021).
URL
Remi Calizzano, Malte Ostendorff, Georg Rehm.
DFKI SLT at GermEval 2021: Multilingual Pre-training and Data Augmentation for the Classification of Toxicity in Social Media Comments.
In Proceedings of the GermEval 2021 Workshop on the Identification of Toxic, Engaging, and Fact-Claiming Comments: 17th Conference on Natural Language Processing KONVENS 2021.
URL
Saskia Ostendorff, Malte Ostendorff.
Open Legal Data - Warum der Rechtsstaat mehr offene Daten brauch.
In Recht und Zugang 03/2021 (RuZ).
(in print)
Lydia Pintscher, Peter Bourgonje, Julián Moreno Schneider, Malte Ostendorff, and Georg Rehm.
Wissensbasen für die automatische Erschließung und ihre Qualität am Beispiel von Wikidata.
In Qualität der Inhaltserschließung, Bibliotheks- und Informationspraxis (BIPRA). De Gruyter Saur.
URL
Georg Rehm, Karolina Zaczynska, Peter Bourgonje, Malte Ostendorff, Julián Moreno-Schneider, Maria Berger, Jens Rauenbusch, André Schmidt, Mikka Wild, Joachim Böttger, Joachim Quantz, Jan Thomsen, and Rolf Fricke.
Semantic Storytelling: From Experiments and Prototypes to a Technical Solution.
In Computational Analysis of Storylines: Making Sense of Events. Cambridge University Press.
URL
Dmitrii Aksenov, Peter Bourgonje, Karolina Zaczynska, Malte Ostendorff, Julián Moreno-Schneider, and Georg Rehm.
Fine-grained Classification of Political Bias in German News: A Data Set and Initial Experiments.
In Proceedings of the Workshop on Online Abuse and Harms (WOAH 2021).
URL
Rémi Calizzano, Malte Ostendorff, and Georg Rehm.
Ordering Sentences and Paragraphs with Pre-trained Encoder-Decoder Transformers and Pointer Ensembles.
In The ACM Symposium on Document Engineering (DocEng 2021).
PDFURL
2020
Malte Ostendorff, Terry Ruas, Till Blume, Bela Gipp, Georg Rehm.
Aspect-based Document Similarity for Research Papers.
In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020).
PDFURL
Malte Ostendorff, Terry Ruas, Moritz Schubotz, Georg Rehm, Bela Gipp.
Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles.
In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020).
PDFURL
Malte Ostendorff, Till Blume, Saskia Ostendorff.
Towards an Open Platform for Legal Information.
In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020).
PDFURL
Malte Ostendorff.
Contextual Document Similarity for Content-based Literature Recommender Systems.
In Proceedings of the Doctoral Consortium at ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020).
PDF
Philipp Scharpf, Moritz Schubotz, André Greiner-Petter, Malte Ostendorff, Olaf Teschke, Bela Gipp.
ARQMath Lab: An Incubator for Semantic Formula Search in zbMATH Open?.
In ARQMath Lab at the Conference and Labs of the Evaluation Forum (CLEF).
PDF
Sarah Schulz, Jurica Seva, Samuel Rodriguez, Malte Ostendorff, Georg Rehm.
Named Entities in Medical Case Reports: Corpus and Experiments.
In Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020).
PDF
Georg Rehm, Karolina Zaczynska, Julian Moreno Schneider, Malte Ostendorff, Peter Bourgonje, Maria
Berger, Jens Rauenbusch, Andre Schmidt, Mikka Wild.
Towards Discourse Parsing-inspired Semantic Storytelling.
In Proceedings of QURATOR 2020.
PDF
Georg Rehm et al..
QURATOR: Innovative Technologies for Content and Data Curation.
In Proceedings of QURATOR 2020.
PDF
Saskia Ostendorff, Malte Ostendorff.
Open Data in der Justiz - Open Legal Data: Ein Appell für die vollständige Veröffentlichung aller Urteile.
In Betrifft JUSTIZ, Nr. 143 von September 2020.
URL
2019
Malte Ostendorff, Peter Bourgonje, Maria Berger, Julian Moreno Schneider, Georg Rehm, Bela Gipp.
Enriching BERT with Knowledge Graph Embedding for Document Classification.
In GermEval Workshop located at KONVENS 2019.
PDFURL
Saskia Ostendorff, Malte Ostendorff.
Open Justice, Open Government und Open Legal Data.
In Re:Thinking Law 03/2019.
URL
2018
Saskia Ostendorff, Malte Ostendorff.
Open Legal Data: Make Law Transparent Again.
In Chaos Communication Congress (#35C3) - Lightning Talks.
PDFURLYouTube
2017
Malte Schwarzer, Corinna Breitinger, Moritz Schubotz, Norman Meuschke, Bela Gipp.
Citolytics: A Link-based Recommender System for Wikipedia.
In Proceedings of the Eleventh ACM Conference on Recommender Systems (RecSys`17).
DOIPDF
2016
Malte Schwarzer, Moritz Schubotz, Norman Meuschke, Corinna Breitinger, Volker Markl, Bela Gipp.
Evaluating Link-based Recommendations for Wikipedia.
In Proceedings of the 16th ACM/IEEE Joint Conference on Digital Libraries (JCDL`16).
DOIPDF
Malte Schwarzer, Jonas Düver, Andreas Lommatsch.
An Interactive e-Government Question Answering System.
In Proceedings of Lernen, Wissen, Daten, Analysen (LWDA`16).
PDF