AI Learning Companions

A multi-stage AI pipeline for designing, animating, voicing, and reviewing lifelike learning companions for educational tutoring systems.

Knowledge & Intelligence Systems & Complexity Human Flourishing

Active

Research pipeline for AI Learning Companions

The multi-stage AI-transformed pedagogical agent production pipeline, with human-in-the-loop verification gates.

Overview

Learning companions — on-screen characters that guide, encourage, and respond to students during tutoring sessions — have a well-established record of improving motivation and learning outcomes in intelligent tutoring systems. What has constrained their development is the cost and time required to produce realistic, culturally diverse, emotionally expressive agents at scale.

This project addresses that constraint directly. Using a multi-stage AI production pipeline, we are developing a new generation of pedagogical agents for MathSpring, an adaptive mathematics tutoring system with a large and diverse user base. The agents are designed to feel like genuine companions: they move, they speak, and their expressions and speech are synchronized.

The Production Pipeline

The pipeline moves through five stages, each powered by a different AI system:

Stage 1 — Character design. A text-to-image model generates character concepts from structured prompts specifying age, cultural appearance, expression, and educational context. Promising generations are selected through human review.

Stage 2 — Animation. Selected still images are passed to an image-to-video synthesis model, which generates short motion clips — subtle head movements, blinking, and natural idle animation — from the static character art.

Stage 3 — Speech synthesis. A text-to-speech system generates natural-sounding audio for the agent’s tutoring dialogue, calibrated for clarity, warmth, and appropriate pacing for the target age group.

Stage 4 — Lip synchronization. LatentSync, running on an NVIDIA H100 GPU, aligns the generated animation with the synthesized speech at the frame level, producing coherent lip movements that match the spoken audio without manual animation.

Stage 5 — Human review and deployment. Every agent undergoes review by human annotators before integration into MathSpring. Reviewers assess educational appropriateness, visual quality, lip-sync accuracy, and alignment with representation goals. Approved agents are deployed live into the tutoring system.

Why This Matters

The diversity of pedagogical agents matters for the same reason that representation in media matters: students are more engaged, more trusting, and more persistent with tutors who look and sound like people they recognize. Producing a diverse set of high-quality agents through traditional production methods — illustration, animation, voice casting — is expensive and slow. AI-assisted pipelines make it feasible to offer students agents from a wide range of cultural backgrounds, at scale, without sacrificing quality.

The H100 GPU cluster supporting this work enables inference at a speed and resolution that would not be practical on consumer hardware, making real-time, high-fidelity lip sync a deployable reality rather than a research demonstration.

Current Status

The pipeline is operational and producing agents for integration into MathSpring. Work is ongoing on quality assessment frameworks, alignment with representation guidelines, and evaluation of student engagement and learning outcomes with the new agents relative to earlier-generation characters.

← All Research Projects