Evidence

What the research actually says.

Nineteen peer-reviewed papers on handwriting, the brain, AI tutoring, and how students actually learn maths — summarised, linked, and sourced.

19 papers5 research threadsSources linked

Our product makes a claim — that doing maths by hand, showing the working, and getting feedback on the reasoning teaches better than tapping at a screen. Here is the research behind it.

We've summarised nineteen peer-reviewed papers across five threads: what each one actually did, what it found, and why it matters for a pen-and-paper maths app with AI feedback. Every entry links to the source so you can read it yourself.

01

Handwriting, typing & the learning brain

What happens in the brain when you write by hand instead of typing.

01
EEG study

Only three fingers write, but the whole brain works

van der Meer & van der Weel · Frontiers in Psychology · 2017 · n=17 adults

The experiment

High-density EEG (256 channels) recorded 17 young adults as they drew pictures with a stylus, typed words on a keyboard, and described them aloud — comparing the brain activity each task produced.

What they found

Drawing produced theta- and alpha-band desynchronization across parietal and occipital regions — a pattern tied to memory encoding and learning. Typing produced no comparable engagement.

Why it matters — Producing content by hand recruits far more of the brain than tapping a key — the activation associated with deep learning.

02
EEG study

Cursive handwriting over typing — in children and adults

Ose Askvik, van der Weel & van der Meer · Frontiers in Psychology · 2020 · 12 adults + 12 children

The experiment

A 256-channel EEG study compared cursive handwriting with a digital pen, typing, and drawing in 12 university students and 12 seventh-graders, analysing the first seconds of each task.

What they found

Handwriting and drawing elicited theta-band synchronization in parietal and central regions — linked to memory and encoding — that typing did not. Children showed the same direction of effect as adults.

Why it matters — The fine-motor control of forming letters by hand drives the brain rhythms tied to learning — in classroom-age children, too.

03
EEG study

Handwriting drives widespread brain connectivity

Van der Weel & Van der Meer · Frontiers in Psychology · 2024 · n=36 adults

The experiment

256-channel EEG measured brain connectivity in 36 university students as they handwrote words with a digital pen versus typed them with one finger.

What they found

Handwriting produced markedly more widespread theta/alpha coherence across parietal and central regions — the kind of connectivity associated with working memory and encoding — than typing.

Why it matters — The coordinated visual-motor act of writing wires together the very networks the brain uses to learn.

04
Classic study

The pen is mightier than the keyboard

Mueller & Oppenheimer · Psychological Science · 2014 · 3 studies, N=327

The experiment

Across three experiments, college students took notes by longhand or laptop during lectures, then were tested on factual recall and conceptual application.

What they found

Laptop note-takers transcribed more words verbatim and did worse on conceptual questions; longhand writers, forced to summarise, understood more. Even telling laptop users not to transcribe didn't help.

Why it matters — Writing by hand forces the selective, generative processing that turns notes into understanding — the opposite of mindless transcription.

02

Letters & sensorimotor learning

Producing letters by hand — not just seeing them — shapes the brain networks we read and recognise with.

05
Behavioural

Writing letters by hand builds recognition

Longcamp, Zerbato-Poudou & Velay · Acta Psychologica · 2005 · n=76, ages 3–5

The experiment

Preschool children were trained over three weeks to copy letters either by hand or on a keyboard, then tested on how well they recognised those letters.

What they found

Handwriting training led to better letter recognition than typing — the advantage clearest in the older preschoolers, who built richer motor-perceptual representations of each form.

Why it matters — The motor act of forming a letter, not merely seeing it, strengthens how the brain comes to recognise it.

06
fMRI study

Handwriting shapes the developing reading circuit

James & Engelhardt · Trends in Neuroscience and Education · 2012 · n=15 children

The experiment

Pre-literate children (~4–5 yrs) practised letters by free-form printing, typing, or tracing, then viewed letters during fMRI.

What they found

The brain's “reading circuit” — left fusiform gyrus, inferior frontal gyrus, posterior parietal cortex — was recruited during letter perception only after free-form printing, not after typing or tracing.

Why it matters — Self-generated handwriting, with all its messy variability, is what switches on the network the brain later reads with.

07
fMRI study

The motor act tunes how we see letters

James & Atwood · Cognitive Neuropsychology · 2009 · adults, novel symbols

The experiment

Adults learned novel letter-like symbols by writing them, typing them, or only viewing them; fMRI then measured the brain's response to those symbols.

What they found

Only after writing practice did activation to the new symbols come to resemble the response for real letters, notably in the left fusiform gyrus. Typing and viewing did not.

Why it matters — Writing a form by hand causally reshapes how the visual brain perceives it — production teaches perception.

03

Showing your work & self-explanation

The most direct evidence: making reasoning explicit, and comparing methods, is what drives maths learning.

08
Foundational

Studying worked examples by explaining them to yourself

Chi, Bassok, Lewis, Reimann & Glaser · Cognitive Science · 1989 · n=8

The experiment

Students studied worked physics examples while thinking aloud; their explanations were analysed in detail and related to later problem-solving success.

What they found

Stronger learners spontaneously generated “self-explanations” that linked each step to underlying principles and monitored their own understanding; weaker learners mostly re-read the examples.

Why it matters — Explaining your own steps — exactly what showing the working forces — is what converts a worked example into transferable understanding.

09
Experiment

Self-explanation plus instruction improves transfer

Rittle-Johnson · Child Development · 2006 · n=85, grades 3–5

The experiment

Children learning mathematical equivalence were crossed on prompted self-explanation (or not) and instruction type, then tested on retention and transfer to novel problems.

What they found

Prompted self-explanation improved transfer across conditions, and worked best combined with direct instruction.

Why it matters — Prompting a student to explain why drives them to generalise, not just repeat a procedure.

10
Meta-analysis

Meta-analysis: prompted self-explanation in mathematics

Rittle-Johnson, Loehr & Durkin · ZDM Mathematics Education · 2017

The experiment

A meta-analysis synthesising experimental studies of prompted self-explanation in mathematics across procedural knowledge, conceptual knowledge, and procedural transfer.

What they found

Prompted self-explanation gave small-to-moderate gains on all three immediate outcomes, strongest when explanations were scaffolded toward quality.

Why it matters — Across many studies, asking students to explain their reasoning reliably helps — especially when the prompts are well designed.

11
Experiment

Comparing solution methods builds flexibility

Rittle-Johnson & Star · Journal of Educational Psychology · 2007 · n=70, grade 7

The experiment

Seventh-graders learned to solve equations either by comparing two solution methods side-by-side or by studying the same methods one at a time.

What they found

Comparing methods produced greater procedural knowledge and flexibility than sequential study, with comparable gains in conceptual knowledge.

Why it matters — Seeing more than one correct path — side by side — teaches the structure, not just the steps.

12
Experiment

Which kind of comparison helps most

Rittle-Johnson & Star · Journal of Educational Psychology · 2009 · n=162, grades 7–8

The experiment

A follow-up varied the type of comparison: different solution methods, different problem types, or surface-equivalent problems.

What they found

Comparing different solution methods best supported conceptual knowledge and procedural flexibility; comparing surface-equivalent problems helped least.

Why it matters — It is specifically contrasting how to solve — not just seeing more problems — that deepens understanding.

04

AI as tutor, done right

A constrained, research-based tutor that checks the working — not a raw chatbot — is what moves learning.

13
RCT

A well-designed AI tutor beat active learning

Kestin et al. · Scientific Reports · 2025 · RCT, N=316, college physics

The experiment

In a real Harvard physics course, 316 students were randomized to a custom, pedagogically-constrained AI tutor or a high-quality in-class active-learning lesson on the same topics.

What they found

The AI-tutored group learned more than twice as much, in less time, and reported higher engagement and motivation (post-test medians 4.5 vs 3.5; p < 10⁻⁸). It was engineered with sequencing, scaffolding, and feedback — not raw ChatGPT.

Why it matters — Designed well — with sequencing, scaffolding, and guardrails — an AI tutor can outperform even active learning. The design is the product.

14
NLP · EMNLP

Checking a student's reasoning, step by step

Daheim, Macina, Kapur, Gurevych & Sachan · EMNLP · 2024

The experiment

The team built a dataset of 1,002 teacher-annotated math solutions, each marked at the first wrong step, and a tutoring pipeline that verifies the student's working before responding.

What they found

Grounding the tutor in a step-verifier produced more targeted feedback — more often correct, with fewer hallucinations than strong baselines — by locating the specific step where reasoning broke.

Why it matters — The valuable job isn't solving the problem for the student — it's inspecting their reasoning and finding exactly where it broke.

05

How learning actually sticks

The durable, well-replicated learning science the loop is built on — struggle, retrieval, mixing, and tutoring.

15
Meta-analysis

Productive failure: struggle first, instruct second

Sinha & Kapur · Review of Educational Research · 2021 · meta-analysis, 53 studies

The experiment

A meta-analysis of 53 studies (166 comparisons) testing whether having students attempt a hard problem before instruction beats the usual instruct-first order.

What they found

Problem-solving-before-instruction won overall (g = 0.36), and more strongly when it followed “productive failure” design principles.

Why it matters — Letting students attempt and fail first, then organizing what they discovered, can teach more than explaining up front.

16
Review

Spacing and retrieval practice make learning stick

Carpenter, Pan & Butler · Nature Reviews Psychology · 2022 · review

The experiment

A review synthesizing a century of evidence on two strategies: spacing study out over time, and retrieval practice — actively recalling material rather than re-reading it.

What they found

Both are among the most robust, broadly applicable ways to make learning durable — and both are badly underused, because students' own intuitions mislead them about what actually works.

Why it matters — Revisiting weak concepts over spaced intervals, by recalling them, is one of the surest routes to retention.

17
Classroom RCT

Interleaving beats blocked practice in math

Rohrer, Dedrick & Stershic · Journal of Educational Psychology · 2015 · n=126, grade 7

The experiment

Over three months, 126 seventh-graders did the same problems either blocked (one type at a time) or interleaved (types mixed, so they must choose which strategy applies), then took an unannounced test.

What they found

Interleaving won decisively, and the gap grew over time: 74% versus 42% correct on a test 30 days later (d = 0.79).

Why it matters — Mixing problem types — forcing the student to decide which method applies — is what builds real problem-solving, not just fluency.

18
RCT

Intensive online tutoring works at scale

Gortazar, Hupkau & Roldán-Monés · Journal of Public Economics · 2024 · RCT

The experiment

A randomized trial of a fully-online, small-group (two-to-one) math tutoring program for disadvantaged secondary students, run intensively over eight weeks.

What they found

It raised test scores by +0.26 SD and end-of-year grades by +0.49 SD, and cut the odds of repeating the year — at a fraction of the cost of one-to-one tutoring.

Why it matters — High-dosage tutoring's gains can survive going online and small-group — which is how you reach students at scale.

19
Meta-analysis

Active learning beats lecturing in STEM

Freeman et al. · PNAS · 2014 · meta-analysis, 225 studies

The experiment

A landmark meta-analysis of 225 studies comparing active learning with traditional lecturing across undergraduate science, engineering, and mathematics.

What they found

Active learning raised exam scores by about 0.47 SD, and students in lecture sections were roughly 1.5× more likely to fail the course.

Why it matters — Passive intake loses to active work — the student has to do something, not just watch.

The system the evidence points to

Read together, the threads converge on one design — not “AI school,” not more screens. A system that makes students attempt, retrieve, reason, write and draw their working, get step-level feedback, repair their mistakes, and revisit weak concepts over time.

That is the loop MathXP is built around. Each decision traces back to a finding above.

Attempt before you're taught
Problem-solving before instruction — productive failure — can beat instruct-first. [15]
Solve on paper, by hand
Handwriting recruits broad sensorimotor and memory networks, and shapes the reading brain itself. [01–03 · 05–07]
Show the full working
Self-explanation — making each step explicit — turns an example into transferable understanding. [08 · 09 · 10]
Mix the problem types
Interleaving forces the student to choose a strategy, and it sticks far better than blocked practice. [17]
Feedback on the reasoning, step by step
Explaining and comparing methods — and verifying each step — is what improves learning. [09 · 11 · 12 · 14]
Revisit weak concepts over time
Spaced retrieval practice is among the most reliable ways to make learning durable. [16]
AI that tutors, not just answers
A well-designed, constrained tutor that checks the working can outperform active learning. [13 · 14]
Keep it active, at the right level
Intensive tutoring and active work beat passive intake — and reach students at scale. [18 · 19]
All papers are linked to their DOI; open-access full text is linked where available. Summaries were verified against the primary sources (publisher, PubMed/PMC, or ACL Anthology). Questions, or a paper we should read? research@mathxp.app.