Nineteen peer-reviewed papers on handwriting, the brain, AI tutoring, and how students actually learn maths — summarised, linked, and sourced.
19 papers5 research threadsSources linked
Our product makes a claim — that doing maths by hand, showing the working, and getting feedback on the reasoning teaches better than tapping at a screen. Here is the research behind it.
We've summarised nineteen peer-reviewed papers across five threads: what each one actually did, what it found, and why it matters for a pen-and-paper maths app with AI feedback. Every entry links to the source so you can read it yourself.
01
Handwriting, typing & the learning brain
What happens in the brain when you write by hand instead of typing.
01
EEG study
Only three fingers write, but the whole brain works
van der Meer & van der Weel · Frontiers in Psychology · 2017 · n=17 adults
The experiment
High-density EEG (256 channels) recorded 17 young adults as they drew pictures with a stylus, typed words on a keyboard, and described them aloud — comparing the brain activity each task produced.
What they found
Drawing produced theta- and alpha-band desynchronization across parietal and occipital regions — a pattern tied to memory encoding and learning. Typing produced no comparable engagement.
Why it matters —
Producing content by hand recruits far more of the brain than tapping a key — the activation associated with deep learning.
Cursive handwriting over typing — in children and adults
Ose Askvik, van der Weel & van der Meer · Frontiers in Psychology · 2020 · 12 adults + 12 children
The experiment
A 256-channel EEG study compared cursive handwriting with a digital pen, typing, and drawing in 12 university students and 12 seventh-graders, analysing the first seconds of each task.
What they found
Handwriting and drawing elicited theta-band synchronization in parietal and central regions — linked to memory and encoding — that typing did not. Children showed the same direction of effect as adults.
Why it matters —
The fine-motor control of forming letters by hand drives the brain rhythms tied to learning — in classroom-age children, too.
Van der Weel & Van der Meer · Frontiers in Psychology · 2024 · n=36 adults
The experiment
256-channel EEG measured brain connectivity in 36 university students as they handwrote words with a digital pen versus typed them with one finger.
What they found
Handwriting produced markedly more widespread theta/alpha coherence across parietal and central regions — the kind of connectivity associated with working memory and encoding — than typing.
Why it matters —
The coordinated visual-motor act of writing wires together the very networks the brain uses to learn.
Across three experiments, college students took notes by longhand or laptop during lectures, then were tested on factual recall and conceptual application.
What they found
Laptop note-takers transcribed more words verbatim and did worse on conceptual questions; longhand writers, forced to summarise, understood more. Even telling laptop users not to transcribe didn't help.
Why it matters —
Writing by hand forces the selective, generative processing that turns notes into understanding — the opposite of mindless transcription.
Preschool children were trained over three weeks to copy letters either by hand or on a keyboard, then tested on how well they recognised those letters.
What they found
Handwriting training led to better letter recognition than typing — the advantage clearest in the older preschoolers, who built richer motor-perceptual representations of each form.
Why it matters —
The motor act of forming a letter, not merely seeing it, strengthens how the brain comes to recognise it.
James & Engelhardt · Trends in Neuroscience and Education · 2012 · n=15 children
The experiment
Pre-literate children (~4–5 yrs) practised letters by free-form printing, typing, or tracing, then viewed letters during fMRI.
What they found
The brain's “reading circuit” — left fusiform gyrus, inferior frontal gyrus, posterior parietal cortex — was recruited during letter perception only after free-form printing, not after typing or tracing.
Why it matters —
Self-generated handwriting, with all its messy variability, is what switches on the network the brain later reads with.
Adults learned novel letter-like symbols by writing them, typing them, or only viewing them; fMRI then measured the brain's response to those symbols.
What they found
Only after writing practice did activation to the new symbols come to resemble the response for real letters, notably in the left fusiform gyrus. Typing and viewing did not.
Why it matters —
Writing a form by hand causally reshapes how the visual brain perceives it — production teaches perception.
Students studied worked physics examples while thinking aloud; their explanations were analysed in detail and related to later problem-solving success.
What they found
Stronger learners spontaneously generated “self-explanations” that linked each step to underlying principles and monitored their own understanding; weaker learners mostly re-read the examples.
Why it matters —
Explaining your own steps — exactly what showing the working forces — is what converts a worked example into transferable understanding.
Self-explanation plus instruction improves transfer
Rittle-Johnson · Child Development · 2006 · n=85, grades 3–5
The experiment
Children learning mathematical equivalence were crossed on prompted self-explanation (or not) and instruction type, then tested on retention and transfer to novel problems.
What they found
Prompted self-explanation improved transfer across conditions, and worked best combined with direct instruction.
Why it matters —
Prompting a student to explain why
drives them to generalise, not just repeat a procedure.
A meta-analysis synthesising experimental studies of prompted self-explanation in mathematics across procedural knowledge, conceptual knowledge, and procedural transfer.
What they found
Prompted self-explanation gave small-to-moderate gains on all three immediate outcomes, strongest when explanations were scaffolded toward quality.
Why it matters —
Across many studies, asking students to explain their reasoning reliably helps — especially when the prompts are well designed.
A constrained, research-based tutor that checks the working — not a raw chatbot — is what moves learning.
13
RCT
A well-designed AI tutor beat active learning
Kestin et al. · Scientific Reports · 2025 · RCT, N=316, college physics
The experiment
In a real Harvard physics course, 316 students were randomized to a custom, pedagogically-constrained AI tutor or a high-quality in-class active-learning lesson on the same topics.
What they found
The AI-tutored group learned more than twice as much, in less time, and reported higher engagement and motivation (post-test medians 4.5 vs 3.5; p < 10⁻⁸). It was engineered with sequencing, scaffolding, and feedback — not raw ChatGPT.
Why it matters —
Designed well — with sequencing, scaffolding, and guardrails — an AI tutor can outperform even active learning. The design
is
the product.
The team built a dataset of 1,002 teacher-annotated math solutions, each marked at the first wrong step, and a tutoring pipeline that verifies the student's working before responding.
What they found
Grounding the tutor in a step-verifier produced more targeted feedback — more often correct, with fewer hallucinations than strong baselines — by locating the specific step where reasoning broke.
Why it matters —
The valuable job isn't solving the problem for the student — it's inspecting their reasoning and finding exactly where it broke.
The durable, well-replicated learning science the loop is built on — struggle, retrieval, mixing, and tutoring.
15
Meta-analysis
Productive failure: struggle first, instruct second
Sinha & Kapur · Review of Educational Research · 2021 · meta-analysis, 53 studies
The experiment
A meta-analysis of 53 studies (166 comparisons) testing whether having students attempt a hard problem
before
instruction beats the usual instruct-first order.
What they found
Problem-solving-before-instruction won overall (g = 0.36), and more strongly when it followed “productive failure” design principles.
Why it matters —
Letting students attempt and fail first, then organizing what they discovered, can teach more than explaining up front.
A review synthesizing a century of evidence on two strategies: spacing study out over time, and retrieval practice — actively recalling material rather than re-reading it.
What they found
Both are among the most robust, broadly applicable ways to make learning durable — and both are badly underused, because students' own intuitions mislead them about what actually works.
Why it matters —
Revisiting weak concepts over spaced intervals, by recalling them, is one of the surest routes to retention.
Over three months, 126 seventh-graders did the same problems either blocked (one type at a time) or interleaved (types mixed, so they must choose which strategy applies), then took an unannounced test.
What they found
Interleaving won decisively, and the gap grew over time: 74% versus 42% correct on a test 30 days later (d = 0.79).
Why it matters —
Mixing problem types — forcing the student to decide which
method applies — is what builds real problem-solving, not just fluency.
Gortazar, Hupkau & Roldán-Monés · Journal of Public Economics · 2024 · RCT
The experiment
A randomized trial of a fully-online, small-group (two-to-one) math tutoring program for disadvantaged secondary students, run intensively over eight weeks.
What they found
It raised test scores by +0.26 SD and end-of-year grades by +0.49 SD, and cut the odds of repeating the year — at a fraction of the cost of one-to-one tutoring.
Why it matters —
High-dosage tutoring's gains can survive going online and small-group — which is how you reach students at scale.
A landmark meta-analysis of 225 studies comparing active learning with traditional lecturing across undergraduate science, engineering, and mathematics.
What they found
Active learning raised exam scores by about 0.47 SD, and students in lecture sections were roughly 1.5× more likely to fail the course.
Why it matters —
Passive intake loses to active work — the student has to do
something, not just watch.
Read together, the threads converge on one design — not “AI school,” not more screens.
A system that makes students attempt, retrieve, reason, write and draw their working, get step-level feedback, repair their mistakes, and revisit weak concepts over time.
That is the loop MathXP is built around. Each decision traces back to a finding above.
Attempt before you're taught
Problem-solving before instruction — productive failure — can beat instruct-first.
[15]
Solve on paper, by hand
Handwriting recruits broad sensorimotor and memory networks, and shapes the reading brain itself.
[01–03 · 05–07]
Show the full working
Self-explanation — making each step explicit — turns an example into transferable understanding.
[08 · 09 · 10]
Mix the problem types
Interleaving forces the student to choose a strategy, and it sticks far better than blocked practice.
[17]
Feedback on the reasoning, step by step
Explaining and comparing methods — and verifying each step — is what improves learning.
[09 · 11 · 12 · 14]
Revisit weak concepts over time
Spaced retrieval practice is among the most reliable ways to make learning durable.
[16]
AI that tutors, not just answers
A well-designed, constrained tutor that checks the working can outperform active learning.
[13 · 14]
Keep it active, at the right level
Intensive tutoring and active work beat passive intake — and reach students at scale.
[18 · 19]
All papers are linked to their DOI; open-access full text is linked where available. Summaries were verified against the primary sources (publisher, PubMed/PMC, or ACL Anthology). Questions, or a paper we should read? research@mathxp.app.