Evaluating and Explaining Foundation Models

Project Description

Current benchmarks provide limited insight into how Large Language Models (LLMs) represent and reason about the world, motivating evaluation approaches that go beyond surface-level performance. Moving beyond static benchmarking, to procedural evaluations and mechanistic and behavioral explanations are essential for developing robust foundation models. This project explores the evaluation of foundation models by probing their internal representations to study induced world models. Students will work closely with ongoing research, contributing to reproducible evaluations.

Goal

  1. Design or extend evaluation datasets that reveal world models.
  2. Analyze the correlation between representations and task performance on selected frameworks.
  3. Statistical analysis to make inference on the coherence of the world model.

Requirements

  • Background in Machine Learning and Decision Theory
  • Experience with Python and PyTorch
  • Familiarity with LLMs or benchmark datasets is a plus

What You Get

  • Close supervision in an active research group
  • Potential to co-author a paper
  • Flexible scope (HiWi, guided research project, or Master’s thesis)