· AI

The forgetting curve is the math behind your make-or-buy decision for knowledge work

Humans forget 58% of new information in 20 minutes, 75% in a day, 90% in a week. Ebbinghaus measured this in 1885 and Murre replicated it cleanly in 2015. The forgetting curve is the cognitive-science substrate that decides which retention-critical knowledge work AI can structurally replace at a mid-size company.

The short version

Humans forget the bulk of new information inside a week. AI does not. That gap is the structural argument for replacing the retention-critical band of knowledge work, while keeping humans on everything that needs empathy, body language, or judgment they cannot articulate.

  • Hermann Ebbinghaus measured the forgetting curve in 1885. Jaap Murre and Joeri Dros replicated it cleanly in 2015 at PLOS ONE.
  • SwipeGuide's 40/35/25 model splits operational knowledge into documented, informal, and tacit. AI now extends into the 35% informal band that Tallyfy and SOP software cannot reach.
  • Mid-size companies, 50 to 500 employees, feel this curve hardest. Enterprises have redundancy. Sole proprietors have no institution to lose.
  • Three questions before any new hire: how retention-critical, how relational, how tacit. The math tells you which roles to design around AI.

You hire someone. Within twenty minutes they have lost 58% of what you taught them in onboarding. Within a day, they have lost 75%. Within a week, 90%.

Don’t blame the hires. This is the forgetting curve, measured first by Hermann Ebbinghaus in 1885 and replicated cleanly in 2015 by Jaap Murre and Joeri Dros at the University of Amsterdam. The replication ran in PLOS ONE and confirmed what every learning-and-development vendor has known for decades but refuses to draw the structural conclusion from.

The conclusion: for any role where retention of nuance is the job, biology is the bottleneck. AI is the only substrate without a forgetting curve.

That doesn’t mean fire everyone. Empathy, body language, the lie detection your CRM can’t do, those still need a human. But the layer of knowledge work that depends on remembering edge cases, recent decisions, last quarter’s exceptions, the contractor who burned you in 2022? That layer is fighting biology, and biology hasn’t been upgraded since Ebbinghaus’s nonsense syllables.

What Ebbinghaus measured, and what every replication has confirmed

The original experiment was crude, and the crudeness was the point. Ebbinghaus learned lists of nonsense syllables, three-letter strings like ZOF or REW with no semantic anchor, and then tracked how much it took to relearn the list at intervals of 20 minutes, an hour, a day, a week. He was his own subject for over a year of relearning sessions because any other subject would have introduced personality variables, and any real word would have triggered some learners to retain better through pattern-matching. Nonsense syllables were the only stimulus that isolated raw memory from clever recall. The metric he invented was the savings score: the percentage of original learning time saved when you relearn. A savings score of 1.0 means you remembered everything and needed zero relearning time. A score near zero means you had to learn it from scratch, as if you had never seen the material before.

Run that 130-plus years forward and you have Murre and Dros at Amsterdam. Their 2015 replication, published in PLOS ONE, put one subject through 70 hours of learning and relearning sessions across 31 days. The savings scores landed at 0.472 after 20 minutes, 0.293 after a day, 0.078 after a week, and 0.041 after 31 days. Same shape as Ebbinghaus. Same conclusion.

That is the academic version.

The corporate L&D industry quotes it as 58% lost in 20 minutes, 75% lost in a day, 90% lost in a week. Those numbers come from rephrasing the savings score as a forgetting score and rounding for the deck. They’re not wrong, exactly. They’re a popularization. The shape is right. The methodology is what makes it citeable.

Turns out the curve is stubborn.

Murre published a follow-up in 2022 defending the savings method as a “pure” measure of memory because it isolates what’s stored from what can be cued and retrieved. The point matters: the curve isn’t an artifact of how you test recall. It’s structural to how the human brain decays a memory trace.

None of this is news in academic psychology. What’s new is that we now have an alternative substrate.

The 40/35/25 stack, and which layer AI now extends into

SwipeGuide, which makes operating-procedure software for manufacturing floors, has a useful model for tribal knowledge: 40% of operational knowledge in a typical company is documented (SOPs, manuals, training decks); 35% is informal (judgment, exceptions, “we don’t ship to that vendor on Fridays”); 25% is irreducibly tacit (intuition, pattern recognition built up over years).

Three-layer tribal-knowledge stack: 40% documented (Tallyfy), 35% informal (AI), 25% tacit (humans)

The 40% layer is what Tallyfy has been doing for the last 11 years. Workflow steps, approval gates, conditional logic, version-controlled procedures, the things you can write down once and reapply with discipline are what make this layer software-eatable. What makes a piece of operational knowledge belong here is the existence of a stable input-output mapping: given this trigger, do that thing, escalate to this person, finish in this state. Anything you can teach a new hire by handing them a binder lives in this band, and software has been eating that work for over a decade now. Tallyfy’s own tribal-knowledge piece cites the cost: 42% of departing veterans’ work cannot be covered by their replacements, with a $31.5B annual hit across the Fortune 500. Documentable knowledge sits in this band because the boundary of the band is exactly the boundary of what software can encode without judgment. Anything fuzzier has historically been left in someone’s head.

The 35% layer is where AI now extends. Judgment, exceptions, edge cases, the stuff that lives in someone’s head because it was never important enough to write down but is critical for any non-trivial decision. This is the band most affected by the forgetting curve. It is also the band most easily transferred to a Claude project or a custom GPT or whatever your stack is. The popular pitch of “AI replaces everyone” is rubbish; the real argument is narrower and more useful.

The 25% layer is Polanyi’s paradox territory. Michael Polanyi observed in 1958 that “we can know more than we can tell.” Reading body language across a table. Knowing when a vendor is bluffing. The intuitive pattern-match a senior surgeon does in three seconds that takes the resident three minutes. The Brookings Institution has been clear about this layer: generative AI cannot capture knowledge that the holder cannot articulate.

That is the scope cut. This post isn’t claiming total AI replacement of knowledge workers. The scope is the retention-critical middle band specifically. The 25% layer is still yours. The 40% layer is workflow software. The argument is about the middle.

Why mid-size companies feel this curve hardest

Enterprises have L&D budgets and role redundancy. When the lead Salesforce admin leaves, there are two more admins who know roughly the same things. Sole proprietors have no institutional memory to lose, because there was no institution. Mid-size, call it 50 to 500 employees, is the band where the forgetting curve costs the most relative to revenue.

Here is what that looks like. A 200-person ops team has maybe 8 to 10 senior operators who carry the 35% informal layer. They each hold a portfolio of edge cases, vendor quirks, customer history, and “we tried that in 2021 and here is what happened.” None of it is in the wiki. When one of them leaves, and turnover at this band runs at maybe 15% a year, the new hire spends 6 to 12 months relearning what was already known by the team six months ago.

That’s the math.

In advisory work with mid-size companies, this shows up the same way every time. The CEO knows there’s “tribal knowledge” but can’t name it. The HR director knows turnover is expensive but quotes the recruiter fee and the productivity ramp, not the forgetting cost. The forgetting cost is the bigger number. It’s just invisible, because nobody measures relearning. The post-hire onboarding kludge, 12 months of relearning repeated across 1 or 2 departures per year across a 30-person operations function, is what compounds the cost. Mid-size operators don’t have the recruiter overhead an enterprise carries to absorb this loss, and they don’t have the founder-attention slack a 10-person startup uses to paper over it. They sit in the band where the curve bites and the slack is gone. That is why I keep landing on the same recommendation: design the role around the substrate that doesn’t forget, and put the human on the parts the substrate can’t do.

Maybe I’m wrong here. But every consulting engagement I run lands in the same place, and the pattern is what convinced me to write Tallyfy in the first place.

Daniel Miessler argued recently that AI will replace knowledge work because the “articulation gap” closes: every time a human articulates expertise to an AI, the AI keeps it forever. His argument is spot on, just from a different angle. He skips the empirical curve and lands on the same place. The forgetting curve is the math underneath his intuition.

What this changes about the next hire

Three questions to ask before opening a new requisition. They take ten minutes and they reframe the entire posting.

First: what fraction of the role is retention-critical? Meaning, the value depends on remembering rules, precedents, recent decisions, edge cases. If this number is above 70%, you are hiring against biology. You will spend the onboarding cost, then watch the value decay every quarter, then repeat.

Second: what fraction of the role is relational? Meaning, the value depends on empathy, body language, lie detection, persuasion, negotiation in a physical room. If this number is above 50%, the hire is correct. AI is a productivity layer for this person. It isn’t a replacement.

Third: what fraction of the role is irreducibly tacit? Judgment built up over years that the practitioner cannot articulate. This is the Polanyi band. AI cannot touch it. Neither can a job description. You hire and accept that the training period is long.

Worked example. You’re replacing a junior contract reviewer who handles 60 SaaS renewals and vendor agreements a quarter. Q1: retention-critical, maybe 85%. Q2: relational, maybe 5% (most of the work is reading documents, not negotiating in a room). Q3: tacit, maybe 10%. This is a role to redesign around an AI workflow with a human at the verification step. Not a role to fill at full cost.

Different example. You’re replacing a head of customer success at a 250-person SaaS company. Q1: retention-critical, maybe 25%. Q2: relational, maybe 65% (the job is reading customer emotional signals, retaining executive relationships, knowing when to escalate). Q3: tacit, maybe 10%. This is the wrong role to AI-replace. AI here is a tool the head of CS uses, not the head of CS. The actual work is sitting in a room with a frustrated VP of revenue who has just lost an executive sponsor at his largest customer, reading the body language of his second-most-senior account manager, and deciding whether to escalate to the CEO right now or after dinner. None of that compresses into a Claude project. All of it depends on a human in the room. The pattern that keeps showing up across hiring conversations is teams over-applying the AI substitution to roles where the value is relational, and under-applying it to roles where the value is recall.

The three-question screen does not tell you to hire fewer humans. It tells you which roles to design around AI and which to design around a person. Same headcount question, more precise answer.

Limits and counter-arguments worth naming

Four objections matter.

The first: AI also forgets. Context windows drift. Models have catastrophic forgetting when fine-tuned. Long sessions lose early context. Sean Warman has written a sharp piece on this. The objection is real, but it is engineering, not biology. Anthropic’s Claude went from 100K context in 2023 to 200K in 2024 to 1M today. Human working memory has been seven plus-or-minus two items since George Miller measured it in 1956. The forgetting curve doesn’t move. The context window keeps moving.

The second: spaced repetition. The whole reason the L&D industry quotes Ebbinghaus is to sell adaptive learning. Spaced repetition flattens the curve. It is real. Anki users, medical students, language learners all benefit from it. The catch is that spaced repetition requires the human to keep showing up, to do the reviews, to engage with the prompts, to keep the schedule. The curve is flatter with spaced repetition. It is never gone. And in the band of busy operators at a mid-size company, who is doing daily Anki reviews of their vendor lists? Nobody. Next question.

The third: Polanyi’s paradox is the strongest objection. Tacit knowledge, the surgeon’s intuition, the senior buyer’s gut for which vendor will renege, cannot be articulated and therefore cannot be transferred. Brookings is right about that 25% band. The post is conceding it explicitly. The argument isn’t for total AI replacement of knowledge workers. The argument is for AI replacement of the retention-critical middle band where biology is fighting itself.

The fourth objection worth naming briefly: the “AI is overhyped” reaction. I get it. Cycle after cycle of vendor pitches. Plenty of failed pilots. This isn’t a vendor pitch. It is pointing at a 140-year-old cognitive-science finding and observing that we finally have a substrate where it doesn’t apply. Hype cycles are real, and a lot of the coverage of AI replacing whole job functions is wrong because most jobs are mostly relational, and the relational part is exactly what AI cannot do. The narrower claim, that AI can hold the retention-critical band of work biology was provably bad at since 1885, survives the hype-cycle skepticism because it is grounded in a measurement replicated every decade for 140 years. The pattern of self-improving processes and Claude project knowledge bases is the practical version of that observation. The companion question of how to prompt Claude to do this kind of cross-domain analysis is covered in the workflow-versus-persona prompt argument: describe the work, not the worker.

None of this means hire fewer humans. It means hire humans for the part of the job biology is good at, the empathy and body language a CRM cannot capture, and stop hiring humans for the part of the job biology has been provably bad at since 1885. The next hire is correct. The next hire’s job description is what needs to change.

About the Author

Amit Kothari is an experienced consultant, advisor, coach, and educator specializing in AI and operations for executives and their companies. With 25+ years of experience, he is the Co-Founder & CEO of Tallyfy® (raised $3.6m, the Workflow Made Easy® platform) and Partner at Blue Sheen, an AI advisory firm for mid-size companies. He helps companies identify, plan, and implement practical AI solutions that actually work. Originally British and now based in St. Louis, MO, Amit combines deep technical expertise with real-world business understanding. Read Amit's full bio →

Disclaimer: The content in this article represents personal opinions based on extensive research and practical experience. While every effort has been made to ensure accuracy through data analysis and source verification, this should not be considered professional advice. Always consult with qualified professionals for decisions specific to your situation.

AI advisory services via Blue Sheen.
Contact me Follow 10k+