@aarav-mehta
ML eng @ ex-Scale AI · building eval tooling for LLMs
Spent 3 years labeling and evaluating LLM outputs at Scale AI before going indie. Now building Evalify, a structured benchmark runner for frontier models. Passionate about making evals reproducible and shareable.
Evalify – open-source LLM benchmark runner with shareable leaderboards
Skills
Open to
Building an AI-powered hackathon submission evaluator for this event. Need someone who can whip up a clean React UI fast. DM me if you work well under pressure.
joined Jun 2026