AI Won’t Take Your Job. The Attorney Who Uses It Better Will. Part 2 of 2: The Ten Traps

THE TECHNOLOGY BLIND SPOT

The text message read: “I don’t see these cases anywhere. I think ChatGPT just made them up.”

A Denver attorney had asked ChatGPT to draft a motion, submitted it to the court, and only later asked a paralegal to verify the citations. The paralegal could not find them because they did not exist. The attorney’s text admitting he had never checked the work became Exhibit A in the disciplinary proceeding. He accepted a 90-day suspension. His license, his livelihood, and his professional reputation turned on a single text message about citations he never bothered to verify.

He is not alone. French researcher Damien Charlotin tracks judicial decisions involving AI-generated errors. His database reached 684 cases by December 2025, nearly doubling in two months. Courts in New York, Maryland, South Florida, Texas, and Colorado have sanctioned attorneys for AI-related failures. The pattern across every case is identical: the attorney used the tool, did not understand its limitations, and submitted the output as their own work.

Part 1 of this series established that AI competence is now an ethical obligation under Comment 8 to Model Rule 1.1. This installment delivers the field guide. Ten problems, organized into three categories: how the tools fail you, how the tools expose you, and how you fail yourself. Every item has generated a sanctions order, a bar complaint, a malpractice claim, or a data breach in the past two years.

The Direct Answer

AI tools fail in predictable, documented ways. Attorneys who understand these failure modes can use AI effectively. Attorneys who do not will join the growing list of practitioners who submitted work they did not verify, disclosed information they did not intend to share, or relied on outputs they did not understand.

How the Tools Fail You

The first four problems are the tool’s fault. AI models have structural limitations baked into their architecture. These are not bugs awaiting a fix. They are fundamental characteristics of how large language models work. Understanding them is the difference between using a sharp instrument carefully and handing a scalpel to someone who thinks it’s a butter knife.

1. Hallucinations and Fabricated Citations

AI models generate text by predicting the most likely next word in a sequence. They do not retrieve information from a database. They do not verify facts against source material. When the model encounters a gap in its pattern-matching, it fills that gap with plausible-sounding content. In legal research, this means fabricated case names, invented holdings, fictional docket numbers, and citations to cases that have never existed in any reporter.

Stanford RegLab tested the major platforms in 2024 with over 200 legal queries. The results: Lexis+ AI produced errors more than 17% of the time. Westlaw’s AI-Assisted Research hallucinated at 33%. Ask Practical Law AI delivered accurate responses only 18% of the time. General-purpose GPT-4 hallucinated between 58% and 88% on legal queries. A citation to Smith v. Jones, 482 F.3d 115 (2d Cir. 2019) looks correct because the model has absorbed thousands of similar patterns. The case number, court, and year follow recognizable formats. The model has no mechanism to verify whether that specific combination points to a real decision.

Handle it: Verify every citation through Shepard’s or KeyCite. Read the full text of every case, not the AI’s summary. Check quoted language against the actual opinion word by word. A real case with an invented holding is more dangerous than a fictional citation, because it survives a basic existence check while corrupting the analysis.

2. Inconsistent Outputs

Ask the same question twice. Get different answers. AI models generate outputs probabilistically, introducing controlled randomness through a setting called “temperature.” Higher settings produce more creative responses. Lower settings produce more predictable ones. Neither setting eliminates variance entirely.

For legal work, this creates a specific problem: if the tool summarizes a contract clause differently each time you ask, which summary do you trust? If consecutive risk analyses of the same document flag different issues, the variance itself reveals that the task exceeds the tool’s reliability threshold.

Handle it: Reduce temperature settings when available. Use detailed prompts with explicit constraints and output format specifications. Run critical analyses multiple times and compare results. When outputs diverge materially, the task requires human judgment, not another AI query.

3. Jurisdictional and Procedural Errors

AI models train on legal text from every American jurisdiction simultaneously, plus Canadian, British, Australian, and European sources. They do not maintain boundaries between federal and state law, between neighboring states’ statutes, or between current rules and superseded ones. The result: outputs that blend legal standards from multiple jurisdictions into a single, confident, and wrong analysis.

The model cites a statute from the wrong state. It applies a federal standard when state law controls. It references a procedural rule that the relevant court modified by local order three years ago. Each error reads as authoritative because the underlying legal concepts are real. Only the application is wrong. An estate plan that applies another state’s execution formalities creates liability that surfaces years later, when the testator cannot clarify intent and the drafting attorney cannot explain the mistake.

Handle it: Specify jurisdiction, court level, and applicable law in every prompt. Include explicit constraints: “Apply only Texas state law. Do not reference federal standards or other states.” Verify every jurisdictional element independently. The prompt is your engagement letter with the AI: vague instructions produce vague work product.

4. Training Data Cutoffs and Outdated Information

Every model has a knowledge cutoff date. After that date, the model knows nothing. It does not know about cases decided last month, statutes amended last quarter, or rules that took effect last week. It will not tell you its information is stale. It will generate answers based on superseded law with the same confidence it applies to current authority.

Even legal-specific platforms that supplement their training data with current databases still show 17% to 34% error rates. The integration between the model’s general knowledge and the live database is imperfect. The model may reach for its outdated training data instead of the current database sitting right next to it, and you will never see the seam.

Handle it: Verify all legal authorities through current databases. Check dates on every case, statute, and rule. Run Shepard’s or KeyCite on every citation. Never assume an AI-identified legal standard reflects current law without independent confirmation.

How the Tools Expose You

The next three problems involve security and confidentiality. The tool does not just give you wrong answers. It takes your information, loses your context, and processes your instructions alongside hostile content. These risks are invisible during normal use. You will not know they have materialized until a privilege challenge, a data breach notification, or a client who reads their own case strategy in an AI training dataset.

5. Confidentiality and Privilege Destruction

When you type client information into ChatGPT, that information travels to OpenAI’s servers. OpenAI’s privacy policy states the company collects “input, file uploads, or feedback” and may review conversations for safety and training. Unless you have disabled chat history or signed an enterprise agreement with explicit no-training clauses, your client’s privileged information may become part of the model’s training data. In March 2023, a ChatGPT breach exposed user names, payment information, passwords, and chat histories.

Sam Altman acknowledged in July 2025 that OpenAI has not “figured out” how to handle legal privilege and confidentiality. That admission from the CEO of the company whose product 79% of attorneys are using should stop every practitioner mid-keystroke. Uploading privileged communications to a public platform may constitute voluntary third-party disclosure, destroying the privilege your client hired you to protect.

Handle it: Never upload privileged communications to a public AI platform. Require enterprise agreements with no-training clauses and data processing agreements before using any AI tool on client matters. Disable chat history on consumer tools. Obtain informed client consent and document it. Include AI use provisions in engagement letters.

6. Context Window and Memory Limitations

Every AI model has a finite context window: the maximum text it can hold in working memory during a single interaction. Claude supports up to 200,000 tokens. ChatGPT supports 128,000. These sound generous. They are not. A 400-page contract exceeds most models’ effective processing capacity. A multi-volume deposition transcript overwhelms all of them.

The problem runs deeper than raw capacity. Research on the “lost in the middle” phenomenon confirms that models perform best on content at the beginning and end of their context window, with accuracy degrading sharply for material in between. Harvey AI’s prompt limit drops from 100,000 to 4,000 characters upon document upload, a 96% reduction. In multi-turn conversations, a sliding window drops your earlier messages without notification. The model fills gaps with assumptions and generates outputs that look consistent but may contradict instructions you provided twenty exchanges ago.

Handle it: Break complex tasks into separate sessions. Provide context summaries at the start of each new conversation. Use Projects or Custom Instructions features for persistent guidance. Process long documents in sections rather than uploading entire files. If you would not hand a junior associate 400 pages and say “read this and get back to me,” do not hand it to an AI model either.

7. Instruction Drift and Prompt Injection

Here is a vulnerability most attorneys have never considered: AI models cannot reliably distinguish between your instructions and content embedded in documents you upload. Research presented at ICLR 2025 confirmed this structural limitation. If a contract uploaded for review contains hidden text instructing the AI to produce a favorable summary instead of a critical analysis, the model may follow the embedded instruction rather than yours.

The attack surface is real and expanding. Opposing counsel could embed prompt injections in discovery documents. A counterparty could insert hidden instructions in redlined agreements. An email forwarded for AI analysis could contain malicious prompts invisible to the human reader but fully processed by the model. Meanwhile, AI providers update their models continuously. A prompt that produced reliable results last month may generate different outputs after a silent backend update. OpenAI routes queries between model variants without notification, meaning the same interface may connect you to a different model on Tuesday than it ran on Monday.

Handle it: Use structured prompting with clear separation between instructions and data (the four-block pattern: INSTRUCTIONS / CONTEXT / TASK / OUTPUT FORMAT). Test critical prompts after provider updates. Maintain a prompt library for recurring tasks. Above all, never trust AI analysis of adversary-produced documents without independent human review.

How You Fail Yourself

The final three problems belong to the user, not the tool. Bias, overconfidence, and poor inputs produce poor outputs regardless of how capable the underlying model becomes. These are human failures amplified by machine speed.

8. Over-Reliance and Automation Bias

AI outputs read with the confidence of a seasoned associate and the reliability of an unsupervised intern. The polished prose masks the absence of any mechanism for distinguishing strong arguments from weak ones, binding precedent from dicta, controlling authority from persuasive authority. The model generates text that sounds authoritative regardless of whether the underlying analysis holds. Your brain, wired to trust fluent communication, fills in the credibility gap the model cannot.

The NBER working paper studying 7,000 workplaces found that AI chatbots produced no statistically significant impact on hours or wages in the legal profession. Average time savings: roughly 3%. Many users spent the saved time correcting errors, netting close to zero productivity gain. The tool saves time on the first draft and costs time on verification. Attorneys who skip verification save time and assume all the risk.

Handle it: Treat every AI output as a first draft. Establish firm protocols requiring verification before client delivery or court submission. The attorney who signs the filing bears full responsibility regardless of which tool produced the first draft. If you would not submit an associate’s work without review, do not submit the machine’s.

9. Algorithmic Bias

AI models inherit the biases baked into their training data. Historical legal outcomes reflect racial, gender, and socioeconomic disparities. The model reproduces those patterns. Research confirms that AI systems produce systematically different results based on names, demographics, and fact patterns correlating with protected characteristics. A case assessment tool trained on historical settlement data may undervalue claims from demographics that historically received lower settlements, not because the tool is malicious but because the data it learned from reflects a system that was.

No single fairness metric captures all bias dimensions. A model calibrated for overall accuracy may produce disparate error rates across groups. Vendors claiming their tools are “fair” may measure against one standard while ignoring disparities under another.

Handle it: Audit AI outputs for patterns across demographic categories, particularly in case assessment, risk evaluation, and settlement analysis. Require human oversight for any AI-assisted decision with material consequences. Ask vendors which fairness standard they apply and what tradeoffs it accepts. The ethical obligation to provide competent, unbiased representation does not diminish because you delegated the analysis to software.

10. The Prompt Engineering Gap

Garbage in, garbage out. The principle is decades old. The application to AI is immediate. A prompt that reads “research employment law for my client” will produce a generic overview mixing multiple jurisdictions, court levels, and legal standards. A prompt specifying jurisdiction, court level, relevant statute, client posture, desired format, and applicable timeframe will produce focused analysis requiring far less correction.

OpenAI’s own research acknowledges that current models are “rewarded for guessing” rather than expressing uncertainty. When your prompt leaves gaps, the model fills them with assumptions. Those assumptions may include the wrong jurisdiction, the wrong legal standard, or the wrong procedural posture. The model will not flag its guesses. It will present them as conclusions.

Effective legal prompting follows a four-block pattern. INSTRUCTIONS define the model’s role and constraints. CONTEXT provides jurisdiction, court level, facts, and applicable law. TASK specifies exactly what you need. OUTPUT FORMAT describes the structure, length, and style of the response. This mirrors the structure of a research memo assignment to a junior associate: clear parameters produce better work.

Handle it: Learn structured prompting the way you learned Westlaw and Lexis search syntax. Provide few-shot examples (samples of desired output) for consistency. Build a prompt library for recurring tasks. Specify jurisdiction and court level in every legal research prompt. The five minutes you invest in a precise prompt will save hours of correction on the back end.

The Counter Argument

The list above is a catalog of failures. A fair accounting requires the other side of the ledger.

JP Morgan’s COIN system analyzes thousands of commercial loan agreements in seconds, work that previously consumed 360,000 hours of attorney review annually. AI-powered document review in litigation consistently matches or exceeds human reviewers on recall and precision metrics while completing the work in a fraction of the time. Deloitte research confirms that AI reduces due diligence timelines from months to weeks for routine transactions. These are not theoretical capabilities. They are deployed, measured, and documented.

The attorney who reads this list and concludes “AI is useless” has drawn the wrong lesson. The attorney who reads it and concludes “AI requires discipline” has drawn the right one. Every technology the profession has adopted carried initial risks: email exposed confidential communications to interception, electronic filing created cybersecurity vulnerabilities, cloud storage placed client data on third-party servers. The profession adapted by developing competence standards, security protocols, and ethical guidelines. AI follows the same pattern. The risks are real but manageable. The competence obligation applies to managing these risks, not avoiding the technology entirely.

The Field Guide Standard

Each problem on this list shares a common thread: the AI tool did not fail silently. It failed predictably, in ways that researchers, regulators, and courts had already documented. The Denver attorney’s text message to his paralegal captured the moment of realization. But the realization should have come before the filing, not after.

Comment 8 does not require perfection. It requires awareness. An attorney who understands these ten failure modes, implements reasonable safeguards, and exercises professional judgment about when AI serves the client’s interests has met the standard. An attorney who cannot explain what a hallucination is, why uploading client data to ChatGPT creates privilege risk, or how a context window affects analysis has not.

The tools will improve. The failure modes will evolve. The competence obligation will remain constant. Print this list. Share it with your partners. Train your associates. Two years from now, these ten traps will look different. The principle behind them will not: understand the tool before you trust it with your client’s case.

This blog provides general information for educational purposes only and does not constitute legal advice. Consult qualified counsel for advice on specific situations.

About the Author

JD Morris is Co-Founder and COO of LexAxiom. With over 20 years of enterprise technology experience and credentials including an MLS from Texas A&M, MEng from George Washington University, and dual MBAs from Columbia Business School and Berkeley Haas, JD focuses on the intersection of legal technology, cybersecurity, and professional responsibility.

Connect: LinkedIn | X | Bluesky

References

ABA Model Rules of Professional Conduct, Rule 1.1, Comment 8 (Technology Competence)

ABA Model Rules of Professional Conduct, Rule 1.6(c) (Reasonable Efforts)

ABA Formal Opinion 512 (July 2024): Generative Artificial Intelligence Tools

ABA 2024 Legal Technology Survey Report

Florida Bar Ethics Opinion 24-1: Use of Generative Artificial Intelligence in the Practice of Law (2024)

Stanford RegLab / HAI, “AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries” (2024)

Humlum, A. & Vestergaard, E., “Large Language Models, Small Labor Market Effects,” NBER Working Paper No. 33777 (May 2025)

Charlotin, D., AI Hallucinations in Judicial Decisions Database (December 2025): 684 documented cases

Mata v. Avianca, Inc., No. 22-cv-1461 (S.D.N.Y. 2023): $5,000 sanctions for fabricated AI citations

Chukwuemeka Mezu v. Kristen Mezu, Appellate Court of Maryland (2025): AI misuse referral to Attorney Grievance Commission

Justia/Verdict, “AI’s Limitations in the Practice of Law” (August 2025): Context window and memory analysis

ICLR 2025: Research on LLM instruction-data confusion and prompt injection vulnerabilities

JP Morgan COIN System: Commercial loan agreement analysis (360,000 hours annually)

Deloitte: AI-driven due diligence timeline compression research

OpenAI, “How should AI systems behave?” (2023): Model tendency to guess rather than express uncertainty

OpenAI Privacy Policy: Data collection and training data practices

Altman, Sam (July 2025): Remarks on AI privilege and confidentiality limitations

Tiger, Nick, Associate General Counsel, Pearl.com (2025 remarks on AI verification)

Lady Chief Justice Sue Carr, Courts and Tribunals Judiciary UK (July 2025): AI misuse in legal proceedings

Prior Blog: “AI Won’t Take Your Job. The Attorney Who Uses It Better Will.” Part 1: The Competence Obligation (Morris Legal Technology Blog)

Prior Blog: “Your Password Is the Weakest Link in Your Security Chain” (Morris Legal Technology Blog)

Prior Blog: “The Email Privacy Illusion” Parts 1-3 (Morris Legal Technology Blog)

Prior Blog: “The FBI Says Stop Texting. Here’s the Privilege Problem Nobody’s Discussing.” (Morris Legal Technology Blog)

Prior Blog: “Why Hackers Target Law Firms: Where All the Secrets Are Buried” (Morris Legal Technology Blog)

The Technology Blind Spot

AI Won’t Take Your Job. The Attorney Who Uses It Better Will. Part 2 of 2: The Ten Traps

Like this:

Related

Leave a ReplyCancel reply

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from The Technology Blind Spot