10 min read

Verification Was Never the Competence Question. Architecture Is.

## Verification Was Never the Competence Question. Architecture Is. ### THE TECHNOLOGY BLIND SPOT Every state bar that has issued guidance on artificial intelligence has converged on a single instruction: verify the output. The American Bar Association made it formal in Opinion 512. More than three hundred federal judges have signed standing orders demanding AI disclosure and certification. Two years of continuing legal education have drilled the same rule into the profession. Check every citation the machine hands you. The instruction has now been delivered in full. The problem it was meant to solve is accelerating. Damien Charlotin, a research fellow at HEC Paris, maintains the public database of court filings caught containing fabricated AI-generated authorities. In June 2025 it held fewer than 200 cases. This month it holds more than 1,450. Charlotin describes the current pace as ten cases from ten different courts on a single day. On March 31, 2026, one legal commentator counted seventeen separate United States court decisions flagging suspected AI fabrications, all filed within a single twenty-four-hour window. An analysis of the database by Stanford’s Riana Pfefferkorn found that roughly nine in ten of the sanctioned filings came from solo practitioners and small firms. A fourteen-attorney litigation practice is not the safe end of that distribution. It is the center of it. The cure was administered at full dose. The patient got sicker. ### The Wrong Instruction Lawyers did not ignore the rule. Solo practitioners, appellate specialists, and partners at the most prestigious firms in the country have all been sanctioned for the same failure, after the same warnings, holding the same instruction in hand. The explanation is that “verify harder” was the wrong instruction. The competence question was never whether you verified. It is which architecture you verified. ### Why the Machine Cannot Stop Start with what the tool actually does. A large language model does not look up a case. It generates text by predicting the most statistically probable next word, given your prompt and everything it absorbed in training. When it produces a citation, it is not retrieving a record. It is composing a string that resembles one. The string can be formatted perfectly, attributed to a real court, and entirely invented. Since 2024, three independent teams of computer scientists have proven that this is not a defect a future model will repair. Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli published the first proof, using a method adapted from Georg Cantor’s work on infinity. For any computable language model, they showed, there exists a question the model cannot answer correctly and on which it will fabricate. The result does not depend on model size. It does not depend on the training data. It does not depend on better prompting. Two further proofs, arriving through separate branches of mathematics, reached the same floor. Fabrication is a permanent property of any system that generates text by prediction. That is the premise the verification mandate skipped. The bar treated fabrication as a temporary engineering flaw, the kind of thing the next version patches. It is structural. It will be present in the version after next, and the version after that. ### Confidence Is Not a Signal A careful reader will push back here. If the tool fabricates, the lawyer simply catches it. That is what verification is for. The objection assumes the lawyer can tell which outputs to distrust. She cannot. In 2025, a research team led by Adi Simhi documented a pattern they named CHOKE: certain hallucinations overriding known evidence. The model produces a confident, fluent, wrong answer to a question it can answer correctly under a slightly reworded prompt. The knowledge is in there. The fabrication is not a gap in what the model knows. It is a sampling accident, delivered with complete composure. Across the models tested, between 16 and 43 percent of all fabrications fell into this high-confidence category. That consequence lands directly on the managing partner’s desk. The natural triage instinct, scrutinize the answers that sound shaky and trust the ones that sound certain, is precisely backward. The fabrications that sound most authoritative are the ones least likely to be questioned. Confidence is not a signal of accuracy. It is noise wearing a suit. So the lawyer cannot verify selectively. She must verify everything. And here the second number arrives. In a preregistered Stanford study, the leading commercial legal research platforms, the purpose-built products sold to attorneys, produced incorrect information on more than 17 percent of queries, climbing to 33 percent on the analytically demanding questions. Verifying every output of a tool that is wrong one time in three is not assistance. It is doing the research twice, after paying for the privilege. Run the arithmetic the way a managing partner runs it. An associate bills time against a tool that fabricates one citation in three. Every output now demands a full independent check before anyone can trust it, and that check takes roughly as long as the original research would have. The firm has paid a subscription fee to shift the work from the research stage to the verification stage, and added a license cost on top. The efficiency the product promised does not survive contact with the error rate. ### The Story the Profession Told Itself The early cases came with a comforting narrative. Mata v. Avianca was a solo lawyer who trusted a consumer chatbot, never checked its work, and got caught. Carelessness. A discipline problem, not a structural one. Then the premium products failed in the same way. In United States v. Farris, the Sixth Circuit removed an appointed attorney from a criminal appeal and denied him compensation for the entire matter after a premium legal AI product generated false quotations and misrepresented holdings in two appellate briefs. The tool sold for its accuracy fabricated like the free one. [See Your AI Research Tool Fabricated the Quotation, The Technology Blind Spot (2026).] April 2026 closed the argument. Sullivan & Cromwell, a firm that counts OpenAI itself among its clients, filed an emergency motion in a Chapter 15 bankruptcy proceeding containing roughly two dozen erroneous citations. The firm’s own review process did not catch them. Opposing counsel did. Picture the discovery. Somewhere in the Boies Schiller offices, a lawyer preparing the opposition pulled the first case Sullivan & Cromwell had cited, meaning only to read it. The case did not exist. The next one did not exist either. By the end of the motion the tally had reached roughly two dozen authorities that no court had ever issued. A routine afternoon of cite-checking had turned into the catch that the most sophisticated review apparatus in American law had missed. Sullivan & Cromwell then wrote to Chief Judge Martin Glenn and conceded, in its own letter, that its internal review had not stopped the fabrications before the motion was filed. Hold that fact still for a moment. Sullivan & Cromwell commands the deepest verification infrastructure in American law: associates, research librarians, citation software, layered partner review. The firm applied that infrastructure. The fabrications went out the door anyway. If verification at that scale does not catch the problem, verification was never the variable that decided the outcome. ### The Spectrum Nobody Showed You The variable is architecture. AI tools sit on a spectrum, and the spectrum is defined by one question: how much of the output is generated, and how much is retrieved. At one end sits the general-purpose chatbot. ChatGPT, Gemini, the consumer assistants. Every word is generated by prediction, with no step that anchors the output to a verified source. On legal questions, these tools fabricate between 58 and 88 percent of the time. They are the open end of the spectrum, and the mathematics binds them completely. In the middle sit the products most firms have actually bought: the legal research platforms that retrieve real documents first, then generate an answer from them. Retrieval lowers the fabrication rate. It does not remove it, because the generation step is still there, still predicting, still probabilistic. This is the 17-to-33-percent tier. It is better than the chatbot. It is not safe to file from without checking every line. At the far end sits deterministic retrieval. A system that returns published authorities indexed in a verified corpus and returns nothing else. It cannot fabricate a case, because composing a new case is not an action the architecture can perform. It hands the lawyer a real record or it hands her nothing. The proofs that bind the generative tiers do not touch this end of the spectrum, because there is no generation step left for them to bind. Here is the trap most firms have already walked into. The middle tier is marketed in the language of the safe end. A platform that generates text from retrieved documents gets sold as though it merely retrieves them. The buyer believes she purchased deterministic retrieval. She purchased a generative architecture with a retrieval step bolted to the front. The 17 percent floor is the receipt. ### The Competence Duty Has Moved This is where the ethics rule has quietly shifted. Model Rule 1.1, Comment 8, requires a lawyer to understand the benefits and risks associated with relevant technology. The bar has read that almost entirely as a duty to verify the output. A second duty lives inside Comment 8, and the bar has not named it. Before a lawyer can verify an output, she has to know what kind of tool produced it, because the architecture determines whether verification is a thirty-second check or a three-hour reconstruction. A lawyer who cannot say whether her research tool generates text or retrieves records has not evaluated its risks. She has evaluated its interface. Vendors have made this harder on purpose. The phrase “hallucination-free” appeared in the marketing of products the Stanford team measured at a 17 percent floor. That phrase is not a description of the technology. It is a description of what the seller wants the buyer to assume. Knowing the architectural tier of every tool the firm pays for, and refusing to accept a marketing adjective in place of an engineering specification, is now part of the competence obligation. Not a best practice. Part of the rule. ### The Strongest Case Against This The opposing argument deserves a fair hearing in its strongest form. It runs like this. The Model Rules already require verification. Opinion 512 already requires it. A diligent lawyer who checks every citation is compliant no matter which tool produced the draft. The architecture framework is academic over-engineering that changes nothing a careful practitioner is not already doing. Every part of that is correct except the conclusion. Verification is required. It is non-negotiable. A lawyer who checks every authority and catches every error has breached no duty, and nothing here suggests otherwise. But the argument assumes verification is free. It is not. The CHOKE finding means verification cannot be done selectively, and the Stanford rate means doing it completely costs as much as the original research. On a generative tool, “verify everything” is either a fiction nobody actually performs or a practice that erases the tool’s only advantage. The architecture framework does not replace the verification duty. It is the thing that makes the verification duty survivable, by identifying the tools where the checking has a finite end. Two honest disclosures belong here. The architectural spectrum described above comes from a scholarly paper I co-authored, now posted to the Social Science Research Network, and a reader is entitled to weigh that. The claims in it are testable against the public record, and every case and study cited above can be checked independently of anything I have written. The second disclosure is a limit on the argument itself. None of this means a litigation firm should abandon generative AI. A general-purpose model is a genuinely useful drafting partner, a brainstorming surface, a way to pressure-test the structure of an argument. The framework does not say never generate. It says match the tier to the task, and never let a generative tool carry a citation into a filing. ### What to Do Before Thursday By Thursday, the managing partner can settle this for her own firm. The test has four steps and takes an afternoon. 1. List every AI tool the firm pays for that touches legal substance. The research platform, the chatbot subscriptions, anything AI bundled into software the firm already licenses. 2. For each tool, send the vendor one written question. Does this product generate text by prediction, or return retrieved authority from a verified corpus, and which is it. 3. Require the answer in writing, as a specification rather than a brochure adjective. A vendor that will not state its own architecture in writing has answered the question in a different way. 4. Until that answer is in hand, no associate files a citation that originated in a generative tool without independent confirmation against a deterministic source. That is one email per vendor and one instruction to the associates. It is executable before the week ends. And it converts a vague duty to verify AI output into a specific, documented record of architectural competence, which is exactly what Comment 8 has asked for since 2012. Every piece of that diagnosis was carried out. The bar wrote the opinions, signed the standing orders, ran the training, and told every lawyer in America to verify. The medicine was administered at full dose, and the fabricated-citation count climbed past 1,450 regardless. The failure was not discipline. It was diagnosis. The instruction was never “verify harder.” It was “know what you are verifying.” No lawyer can check her way out of a tool that was built to invent. Read the full paper: The Hallucination Problem And The Architecture Of Trust The Case for Legal-Specific Language Models in the Practice and on the Bench https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6766598 ### About the Author JD Morris is Co-Founder and COO of LexAxiom, an Agentic AI platform for the business of law. Over a 25-year career, he has built and scaled enterprise technology products across Dell, EMC, VMware, and Cisco, including the first exabyte eDiscovery platform. He holds dual MBAs from Columbia Business School (Finance) and UC Berkeley Haas (Marketing), a Master of Legal Studies in Cybersecurity Law from Texas A&M, and a Master of Engineering from George Washington University. He writes The Technology Blind Spot on the intersection of emerging technology and law. Connect with him on LinkedIn at http://www.linkedin.com/in/jdavidmorris, on X at @JDMorris_LTec

Originally published on LinkedIn Newsletter: The Technology Blind Spot

Leave a Reply

Discover more from The Technology Blind Spot

Subscribe now to keep reading and get access to the full archive.

Continue reading