News

Now

Working on this website, filling out the gaps and experimenting with colors and typography. Bunch of client work that is being disrupted by writing blog posts, alas.

Previously

November 7, 2024

Creative Tech Demo Night at betaworks

I gave a demo of Metguessr and presented some working ideas on taste and recognition games. Enjoyed presentations from Nolen, Johanna, Morry, Yufeng and others.

October 21, 2024

Building with Claude | An evening with Anthropic

I gave a short demo of Metguessr at AI Tinkerers, with a focus on evals. Got to meet some folks from Anthropic after the event, which was great.

October 18, 2024

Can LLMs Grade Open Response Reading Questions?

A paper I coauthored was published in the International Journal of AI in Education. It evaluates LLMs at the task of grading student responses to reading comprehension questions. With Owen Henkel, Libby Hills, and Joshua McGrane.

September 17, 2024

LLM Evals and Benchmarking for Teaching Lab Fellows

I led a workshop for fellows and staff of Teaching Lab Studio. Topics included the role of evals in ML research, static benchmark datasets, dynamicc hat bot arenas, perspectives of leading practitioners, and building evals for AI product development.

July 18, 2024

Can Large Language Models Make the Grade?

A paper I coauthored was accepted as an ACM Learning @ Scale short paper. It evaluates LLMs at grading short answer responses across a variety of K-12 settings. With Owen Henkel, Libby Hills, Adam Boxer, and Zach Levonian.