beautiful chaos

A research initiative

HEARTHbench

Household, Emotional & Relational Tasks for Human-centered AI

The first benchmark suite for the work of our homes.

The Gap

AI agent benchmarks are built almost entirely on paid-work tasks. Household and family management work — which represents an estimated 15–50% of true GDP, is performed disproportionately by women, and involves some of the most cognitively and relationally complex coordination humans do — has no occupational taxonomy, no benchmark suite, and therefore zero representation in how AI systems are evaluated or trained.

Wang et al. (2026) used the O*NET database to map 43 agent benchmarks onto paid-work occupations and revealed systematic gaps. We're tackling the work O*NET doesn't capture: everything that happens in our homes and our communities.

HEARTHbench builds that infrastructure.

The 12 Domains

Home production work mapped, for the first time, onto a structured evaluation framework.

01

Child Development & Education

Milestone tracking, play facilitation, tutoring, emotional coaching, school communication

02

Health & Medical Management

Symptom triage, medication management, appointment coordination, insurance navigation

03

Food & Nutrition

Meal planning, dietary constraints, food safety, nutritional adequacy, feeding logistics

04

Household Operations

Cleaning schedules, home maintenance, safety assessment, supply management

05

Financial Management

Budgeting, bill payment, benefit navigation, savings planning, tax documentation

06

Logistics & Coordination

Scheduling, transportation, activity management, childcare coordination

07

Emotional & Relational Work

Conflict mediation, emotional support, co-parenting coordination, family culture

08

Information & Knowledge Management

School communications, service navigation, benefits research, institutional interface

09

Elder & Dependency Care

Aging parent coordination, disability accommodation, care handoffs, end-of-life planning

10

Community & Network Management

Extended family coordination, school involvement, informal support networks

11

Crisis & Contingency Management

Emergency response, backup planning, problem-solving under constraint, care failure recovery

12

Identity & Cultural Transmission

Traditions, rituals, values communication, family narrative, cultural practice

Submit a Task

We're building the benchmark from the ground up — starting with real scenarios from real caregivers. If you've faced a moment where you had to think hard, hold competing concerns, or draw on hard-won knowledge to manage your home and family, we want it.

Describe the situation as specifically as possible — real constraints, real context. Write it as a prompt someone could give an AI assistant.

What's the tacit knowledge at the heart of this task — the thing that takes years to learn?

Collaborate

HEARTHbench is in early development. We're looking for qualitative researchers, feminist economists, family practitioners, and ML evaluation engineers who want to build the infrastructure that makes care work visible in AI.

If that's you, get in touch.