In 2023, a New York lawyer filed a federal brief that cited Varghese v. China Southern Airlines Co., 925 F.3d 1339 (11th Cir. 2019). It was a strong case for his client. It was also entirely made up. ChatGPT had written it: the name, the reporter citation, the quotations, the judges. Opposing counsel went looking for the opinion and found nothing. A federal judge fined the lawyer and his firm $5,000, and Mata v. Avianca became the cautionary tale every litigator now knows by name.
A fabricated Tax Court case is the same object, and it breaks the same way.
If you research with a general-purpose model, you have probably already been handed cases like Varghese. You just may not have checked. The danger is not that the model is dumb. It is that the fake reads exactly like the real thing. The model knows the shape of a citation, not the docket behind it. For a CPA or tax attorney about to put a case in a memo, that gap is where the malpractice lives.
A case citation is a string the model finishes
A language model writes one token at a time, choosing whatever is most likely to come next. That is the entire mechanism, and it is the same one that makes the model invent Internal Revenue Code sections.
Renkemeyer, Campbell & Weaver, LLP v. Commissioner, 136 T.C. 137 (2011) is a near-random string in a shape the model has seen tens of thousands of times: party v. Commissioner, [volume] T.C. [page] ([year]). It has learned the shape perfectly. What it cannot learn from pattern alone is which specific names, volumes, and pages point to opinions that were actually decided.
So when your question calls for a case, the model produces the most plausible continuation of the pattern. Plausible and real are different properties, and it only optimizes for the first.
The model is not remembering a case and getting a detail wrong. It is assembling a citation that has never corresponded to an opinion. It was generated, not retrieved.
A real case can be just as dangerous
When Stanford researchers asked leading models specific, checkable questions about federal cases, the models fabricated an answer 69 to 88 percent of the time. But an invented case is only the most obvious failure. Case law adds two quieter ones.
| Citation | Real? | What is actually there |
|---|---|---|
| Varghese v. China Southern Airlines Co., 925 F.3d 1339 (11th Cir. 2019) | No | Invented by ChatGPT and filed in Mata v. Avianca; the opinion, the quotes, and the cite never existed |
| Renkemeyer, Campbell & Weaver, LLP v. Commissioner, 136 T.C. 137 (2011) | Yes | Holds that partners’ shares from performing services are not shielded from self-employment tax by §1402(a)(13) |
| Soroban Capital Partners LP v. Commissioner, 161 T.C. 310 (2023) | Yes | Adopts a functional-analysis test for the limited-partner exception |
The first one does not exist. The other two do, and they are a click from the real text, if your tool can find them.
The two quieter failures both involve real cases. A real case, cited for a holding it never reached. The model attaches Renkemeyer to a proposition the opinion does not support, because the name fits the topic. A real case that is no longer good law. The model has no sense of what happened after the decision date, so it will cite an opinion that was later reversed or overruled as if it still controls. A citation that checks out as real can still be wrong.
There is also a wrinkle unique to the Tax Court: a regular “T.C.” opinion carries more precedential weight than a “T.C. Memo.” decision. The model cites both in the same confident register, with no signal that one is worth more than the other.
Before you cite
Until retrieval is part of your workflow, four checks stand between a fabricated case and a client deliverable.
-
Open the citation. A real opinion resolves to a real docket and reporter citation; a fabricated one resolves to nothing. Thirty seconds beats an afternoon walking back a position.
-
Read the holding, not the model’s summary. A model can describe a real case as standing for something it never held. Pull the opinion and confirm it actually says what you are about to claim.
-
Confirm it is still good law. A 2011 opinion can be undercut by 2014. Citation activity hints at currency; the subsequent history settles it.
-
Keep drafting and authority in separate lanes. Let the model write the memo. Do not let it be the source of record for what the courts have held.
These steps work. They also put the burden on you to catch every fabrication, on every citation, every time.
Cite from a corpus you can open
The fix is the same one that kills fake Code sections: change where the citation comes from. If the case is retrieved from an index of real opinions and handed to the model, rather than generated by it, a Varghese can never appear, because it is not in the source to copy.
That is what Tax Court case law on Pro+ is for. taxmcp.io indexes more than 49,000 United States Tax Court opinions going back to 1942. Ask for a case by name or citation and get-case returns the actual opinion text with a citation you can open. Want to know whether an opinion still carries weight? get-citing-cases shows how many later Tax Court decisions cite it, and which ones it relies on.
The citation graph shows influence and currency, not validity. You still confirm good law before you stake a position. The tool gets you to the real opinion fast; your judgment takes over from there.
Give it a real one to read from
The lawyers in Mata were not reckless. They trusted a fluent paragraph the way you would trust a colleague, when the thing producing it was a pattern-completer that had never opened a reporter. By late 2025, a public database of these incidents had logged hundreds of court filings built on cases that did not exist.
Tax practice runs on the same trust and the same exposure. The model knows the shape of a citation, not the docket behind it. Give it a real one to read from.
Add Tax Court case law to your research with Pro+ →
Or start with the bigger question: can you trust AI for tax research at all?