Evaluating and Explaining Foundation Models

Project Description

Current benchmarks provide limited insight into how Large Language Models (LLMs) represent and reason about the world, motivating evaluation approaches that go beyond surface-level performance. Moving beyond static benchmarking, to procedural evaluations and mechanistic and behavioral explanations are essential for developing robust foundation models. This project explores the evaluation of foundation models by probing their internal representations to study induced world models. Students will work closely with ongoing research, contributing to reproducible evaluations.

Goal

Design or extend evaluation datasets that reveal world models.
Analyze the correlation between representations and task performance on selected frameworks.
Statistical analysis to make inference on the coherence of the world model.

Requirements

Background in Machine Learning and Decision Theory
Experience with Python and PyTorch
Familiarity with LLMs or benchmark datasets is a plus

What You Get

Close supervision in an active research group
Potential to co-author a paper
Flexible scope (HiWi, guided research project, or Master’s thesis)

Apply Now