A research initiative
HEARTHbench
Household, Emotional & Relational Tasks for Human-centered AI
The first benchmark suite for the work of our homes.
The Gap
AI agent benchmarks are built almost entirely on paid-work tasks. Household and family management work — which represents an estimated 15–50% of true GDP, is performed disproportionately by women, and involves some of the most cognitively and relationally complex coordination humans do — has no occupational taxonomy, no benchmark suite, and therefore zero representation in how AI systems are evaluated or trained.
Wang et al. (2026) used the O*NET database to map 43 agent benchmarks onto paid-work occupations and revealed systematic gaps. We're tackling the work O*NET doesn't capture: everything that happens in our homes and our communities.
HEARTHbench builds that infrastructure.
The 12 Domains
Home production work mapped, for the first time, onto a structured evaluation framework.
Child Development & Education
Milestone tracking, play facilitation, tutoring, emotional coaching, school communication
Health & Medical Management
Symptom triage, medication management, appointment coordination, insurance navigation
Food & Nutrition
Meal planning, dietary constraints, food safety, nutritional adequacy, feeding logistics
Household Operations
Cleaning schedules, home maintenance, safety assessment, supply management
Financial Management
Budgeting, bill payment, benefit navigation, savings planning, tax documentation
Logistics & Coordination
Scheduling, transportation, activity management, childcare coordination
Emotional & Relational Work
Conflict mediation, emotional support, co-parenting coordination, family culture
Information & Knowledge Management
School communications, service navigation, benefits research, institutional interface
Elder & Dependency Care
Aging parent coordination, disability accommodation, care handoffs, end-of-life planning
Community & Network Management
Extended family coordination, school involvement, informal support networks
Crisis & Contingency Management
Emergency response, backup planning, problem-solving under constraint, care failure recovery
Identity & Cultural Transmission
Traditions, rituals, values communication, family narrative, cultural practice
Submit a Task
We're building the benchmark from the ground up — starting with real scenarios from real caregivers. If you've faced a moment where you had to think hard, hold competing concerns, or draw on hard-won knowledge to manage your home and family, we want it.
Collaborate
HEARTHbench is in early development. We're looking for qualitative researchers, feminist economists, family practitioners, and ML evaluation engineers who want to build the infrastructure that makes care work visible in AI.
If that's you, get in touch.