Some of my projects from the last few years:
- llm-datasets: A collection of datasets for large language model pretraining including scripts for downloading, preprocesssing, and sampling.
- DFKI Chat: A research prototype of a chat-optimized LLM with retrieval augmentation.
- Open Legal Data: Free Access to Legal Information.
- Open Redact: Semi-automatic anonymization of documents.
- Citolytics: Citation Analysis for Wikipedia Articles
- Arms Trade Visualization: An interactive visualization of EU arms trade.
- Leaflet.Sim: Visualize moving elements on a Leaflet-based map.
Some of the language models that I published: