Key Takeaways
- The five-year forecast: Mosaic Pediatric Therapy’s Dr. David Cox predicts every ABA company will be using bespoke in-house software within five years, driven by agentic coding tools that let non-engineers ship working applications quickly.
- The working demonstration: Mosaic’s experimental at-risk report uses six-month assessment scores and information about service delivery to predict which children are unlikely to progress, routing flagged cases to a training and compliance team that meets with BCBAs weekly.
- The numbers behind the caution: Cox cites published research showing hallucination rates ranging from roughly 1.5% to over 60% and bias rates from under 5% to as high as 80 or 90% in clinical AI applications, noting there is yet to be a study published where the rate was zero.
- The structural argument: Clinical-grade AI is more likely to come from operators than from technology vendors because operators can take experimental risk that vendor business models, which require broad market appeal, cannot.
- The path forward: Cox and behavior scientist Ryan O’Donnell publish Project Chiron weekly. Cox built and stood up a new resource site last week intended specifically for clinicians and tech builders constructing AI-driven systems in ABA and behavioral health.
Five years from now, Dr. David J. Cox expects the typical applied behavior analysis company to be running on software it built itself. Bespoke tools, custom dashboards, in-house automations: not because providers will have hired engineering teams, but because they will not have had to.
“The accessibility of building this stuff now is easier than ever,” Cox said in an interview from the CASP conference in Las Vegas, where he was attending sessions before flying back to Mosaic Pediatric Therapy, the multi-state ABA provider where he leads research and data science. “Anybody that’s reading this that thinks there’s no way my company or someone, without a technical background, could build some kind of software, I would encourage them strongly to go check out Claude Code or Codex. They could build an app today in three hours, if they really wanted to.”
Cox is not selling anything. He holds a Ph.D. in behavior analysis, an M.S. in bioethics, and post-doctoral training in data science and behavioral pharmacology, and is also Associate Director of Research for the Doctoral Program at Endicott College’s Institute for Applied Behavioral Science. He has spent the last twelve years at the intersection of clinical decision-making and machine learning, including a previous stint leading data science at RethinkFirst, the behavioral health technology company, before moving to Mosaic a little over a year ago. In January, the American Psychological Association’s Division 25 named him a recipient of the B. F. Skinner Foundation New Researcher Award for Basic Research, and the Society for the Advancement of Behavior Analysis named him its 2026 Science Translation Award recipient, citing his work on ethical AI in behavior analysis. His warning is more pointed than his enthusiasm.
“The last six months, I think, changed things,” he said. The change he means is not generic AI hype. It is the arrival of agentic coding tools (Cox cited Claude Code by name) that allow non-engineers to ship working software in an afternoon. “I feel that rapidly changed everything,” he said. “Five years from now, I’d be surprised if every ABA company is not building out kind of their own suite of software platforms or tools or things that they need to automate some of this data movement and analysis.”
What follows that prediction, in his telling, is a field of bespoke software built by people who do not yet know how to evaluate whether the software they have built is wrong.
Mosaic’s “At-Risk Report”: How an In-House AI Turns Six-Month ABA Assessment Data Into Daily Clinical Signal
Cox’s prediction is not theoretical. At Mosaic, he is already building what he is forecasting other operators will build. The clearest example he described in the interview is an internal system the team calls its at-risk report.
The infrastructure underneath it is the kind of work ABA organizations have been edging toward for years. Mosaic collects four standardized assessments at six-month intervals across its caseload, a practice the company started before Cox arrived. The data accumulates. “It’s nice to be able to see how kids are changing every six months,” he said. “But then the question is, what do you do with that information?”
The at-risk report is his answer. The system pulls a child’s six-month assessment scores, layers in how services were actually delivered (hours of contact, types of therapy, the specific clinical decisions that BCBAs made about goals and programs), and uses that combined picture to predict which kids are unlikely to show progress in the next reporting period. Flagged cases route to Mosaic’s training and compliance team, which meets weekly with BCBAs and can bring the list into those conversations. Cox stressed that the system is still experimental. He also stressed what made it possible: a measurement and analytics infrastructure at the organization deep enough that there was something to predict from.
That portfolio is the subject of a paper Cox co-authored with Jenn Godwin, Matthew R. Filer, and Callie Plattner, published in February in *Behavior Analysis in Practice*. The paper offers what they call a first-principles approach to choosing among the many standardized assessments available to ABA programs, and to building a measurement system that is feasible to collect, useful to compare across BCBAs and clinics, and meaningful to payers increasingly demanding objective measurement of ABA outcomes without overburdening clinicians. What Mosaic is trying to do with the at-risk report is taking that portfolio and pushing the signal down from the six-month reporting cadence into something a BCBA can act on weekly.
It is, in effect, the kind of operational AI that Cox said is conspicuously missing from the ABA technology market. “If you look at a lot of the products and stuff being sold in the ABA space that has AI, it’s very operations heavy and focused toward admin,” he said. “Very little is getting into that clinical component.” The bulk of the field’s AI investment, in other words, has gone into billing, scheduling, and revenue cycle workflows, not the harder problem of clinical decision support.
Why ABA Operators, Not Tech Vendors, Are Now Building the Most Cutting-Edge Clinical AI
That gap (admin AI plentiful, clinical AI scarce) is part of why Cox left RethinkFirst for Mosaic. The constraint, he argued, is structural rather than technical. Vendors have to build for broad appeal. “A vendor is a business, so they have to develop tools and products that have broad appeal to everybody,” Cox said. “Because of that, you see less cutting edge products that are pushed forward in terms of features important to individual organizations.” A clinical agency, in his telling, can run experiments a vendor cannot. “It’s one company, and so you can focus in on building products specific to their values,” he said. “Because it’s also internal, you can be a little bit riskier in things that you try to see if and how they fail because you’re not trying to sell a product.”
That logic (operators willing to invest in their own bespoke clinical infrastructure, vendors slower to follow) is the same dynamic Cox sees scaling across the field as agentic coding tools lower the build cost. The five-year prediction, in other words, is not just about technology getting easier. It is about which kind of organization, operator or vendor, has the incentive to use it.
AI Hallucination and Bias Rates in Clinical Settings: What the Published Research Shows
Cox’s case for in-house build is paired with an unusually specific case for caution. Asked about the risks of clinicians without engineering backgrounds shipping their own AI tools, he reached for the published research on large language model performance in healthcare contexts, and the numbers he cited are the kind of figures the industry rarely sees on a vendor slide.
“If you look at the research literature around these as performances when they’ve tested it varies across the board for looking at just hallucinations, let’s say,” Cox said. “But the lowest rate of hallucinations I’ve seen is one and a half percent. The highest I’ve seen is over fifty percent, sixty percent. Same with bias assessments and things like that. Lowest is like less than five percent. Highest is eighty, ninety percent.”
“There’s yet to be a study published where it was zero,” he added.
The implication, in Cox’s framing, is that the bar for using these tools in clinical contexts is not whether they work. It is whether the operator using them has built the infrastructure to know when they are not. “It’s being pushed everywhere in Excel, Copilot, whatever,” he said. “Helping people set up the infrastructure to be able to start monitoring that stuff in the tools that they build seems critically important. And also just recognition that, you know, one out of 20 at a minimum of the things you get back is going to be inaccurate in some capacity. So how do you start training people to evaluate and find that stuff?” That monitoring discipline is something Acuity has previously identified as the most common gap in behavioral health AI adoption.
The distinction Cox kept returning to is between deterministic software and generative AI. With a deterministic system, a bug, once found, is fixed. With a generative system, every output carries the possibility of error. “Every output has to be monitored,” he said. “There’s no bug fix for those things. Every output carries that possibility.” The clinical stakes are not abstract: any false billing claim has legal ramifications, any incorrect documentation enters a child’s health record. “Everywhere we look, it’s easy enough to implement this stuff,” Cox said. “So how do we just solve for the new challenges that are there?”
Project Chiron and a New ABA AI Resource Site: Guardrails for Clinicians Building Their Own Software
Cox’s answer to that question is a resource project he and Ryan O’Donnell, a behavior scientist and filmmaker who co-founded The Behavior Academy and is now at Motivity, have been publishing since launching last summer under the name Project Chiron. The newsletter focuses on best practices at the intersection of behavior analysis and applied data science. Cox also recently submitted a paper to *Behavior Analysis in Practice* on the specific risks of large language models in ABA, and he presented related material at CoFABA 2026, the spring chapters event of the Florida Association for Behavior Analysis, on April 24.
It was conversations at CoFABA and other recent conferences, Cox said, that prompted him to spend last week building and standing up a new website, Responsible Clinical AI, intended specifically for clinicians and tech builders constructing AI-driven systems in ABA and behavioral health. “Over the last week, in fact, I built and stood up a website that’s supposed to be a resource guide that sits at the intersection of clinicians and tech builders trying to build AI driven systems, primarily for ABA, but behavioral health more broadly,” he said. The site offers reference guides and downloadable structures that builders can import into their own projects, intended to function as a guardrail kit for the kind of bespoke build Cox expects to proliferate.
“It’s certainly not perfect, and we’ll evolve,” he said. “But yeah, that’s kind of what led to it, was hearing people doing this without understanding potential risks on both sides.”
Augment, Don’t Replace: How ABA Operators Should Use AI Productivity Gains
The closing argument Cox made for how organizations should think about adopting these tools is one that has become a kind of consensus among practitioners working seriously on AI-augmented behavioral health care: the value is in augmentation, not replacement, and the productivity gains should be put back into better, higher quality work, not used as an excuse to do less of it.
“I remember it would take me six, eight hours to write a treatment reauthorization report,” Cox said, recalling his time as a clinician. “A lot of work. Now what used to take me eight hours can be done in say fifteen, thirty minutes if you have the right data system. That gets me seven and a half hours where I can go in and analyze the caseload as a whole at greater depth.”
What he was emphatic about is that the time back is not the point. “If it’s pure efficiency gain, do more in the same amount of time, that’s a different way to frame use of the technology than process improvement or quality improvement,” he said. “Don’t just take those eight hours, turn it to thirty minutes, and then go play golf. You want to still do stuff and use that time to do something that requires more human clinical insight.”
The opportunity, in his read, is in front of operators willing to do the harder work: build the measurement portfolio, build the predictive layer, build the monitoring infrastructure, and build it knowing that one in twenty outputs is going to be wrong. “My hunch,” Cox said, “is we’re gonna have all sorts of really interesting, creative solutions to challenges that we didn’t even know existed or have been intractable for so long, because we have more people doing more in this space.”
Frequently Asked Questions
What is Mosaic Pediatric Therapy’s at-risk report?
The at-risk report is an internal, experimental predictive system Mosaic uses to flag clients in its ABA programs who are unlikely to show progress in their next reporting period. It combines a child’s scores across four standardized assessments collected at six-month intervals with operational data about how services were actually delivered (hours of contact, the types of therapy provided, and the clinical decisions BCBAs made about goals and programs), and uses that combined picture to identify cases that warrant additional review. Flagged cases route to a training and compliance team that meets weekly with BCBAs. According to Dr. David Cox, who leads research and data science at Mosaic, the system is still experimental and depends on the organization having an underlying measurement portfolio deep enough to predict from.
Will every ABA company really need to build its own AI software?
Cox’s prediction is that within five years, most ABA organizations will be running on software they built themselves rather than purchasing from external vendors. The driver, in his framing, is the recent arrival of agentic coding tools (he cited Claude Code by name) that allow non-engineers to produce working software in hours rather than months. The implication is not that operators need engineering teams. It is that the build cost has fallen low enough that bespoke clinical software is now a reasonable in-house investment for many providers. The harder question, Cox argued, is whether providers building their own AI tools have the readiness signals and process discipline to evaluate whether what they have built is working.
What are the hallucination and bias rates for AI in clinical behavioral health?
Citing the published research literature on large language model performance in clinical contexts, Cox said hallucination rates range from approximately 1.5% at the low end to more than 60% at the high end, depending on the use case and how the system is built. Bias rates show similar variability, from under 5% to as high as 80 or 90%. He noted that no published study has yet found a rate of zero. The implication, in his framing, is that any operator deploying AI in clinical workflows must build the monitoring infrastructure required to detect those failures, because the failures will occur.
What is Project Chiron, and who runs it?
Project Chiron is a weekly newsletter on AI literacy for behavior analysts, co-authored by Dr. David J. Cox of Mosaic Pediatric Therapy and Ryan O’Donnell, a behavior scientist and filmmaker who co-founded The Behavior Academy and is now at Motivity. It launched in summer 2025 and publishes ongoing analysis of how applied behavior analysis professionals can develop the skills needed to work ethically and effectively with AI tools. Cox separately built and launched Responsible Clinical AI, a new resource site, last week aimed at clinicians and tech builders constructing AI-driven systems in ABA and behavioral health.
What infrastructure do ABA providers need before adopting AI?
Cox’s argument throughout the interview was that the threshold for AI adoption in ABA is not whether the technology works in any given case. It is whether the operator has built the infrastructure to know when it is not working. That includes a measurement portfolio deep enough to support meaningful predictions, monitoring systems to evaluate AI outputs over time, training for staff who will interact with the AI’s recommendations, and feedback loops that route detected failures back into system improvement. He also emphasized that any AI deployment in healthcare contexts has clinical and legal consequences that deterministic software does not, because every generative output carries the possibility of error. The operational readiness questions that determine whether a behavioral health technology investment succeeds are the same questions that determine whether an in-house AI build succeeds.







