Is AI accurate for tax research?

It depends entirely on the task. AI is highly accurate at language work: explaining a rule, framing an issue, drafting prose, summarizing a document you give it. It is unreliable at authority work (telling you which Code section governs, quoting it, or stating a threshold) because a general model generates those from pattern rather than retrieving them from the law. The same answer can be trustworthy in its explanation and wrong in its citation.

Can I use ChatGPT for tax research?

Use it for the writing, not as the source of record for what the law says. General-purpose assistants are the best drafting tools available, but they generate citations rather than look them up, so any section number, quote, or dollar threshold needs to be verified against primary source before it reaches a client. Treat the model as a writer, not a library.

What makes an AI tax answer trustworthy?

Two things: the citation is retrieved from primary authority rather than generated, and the answer hands you a link you can open to confirm it. When a section number is copied from the real Code and returned with a verifiable URL, a fabricated citation cannot appear, because it was never in the source to copy. Grounding plus an openable link is what separates an answer you can rely on from one you have to re-verify by hand.

Will better AI models eventually fix the citation problem?

Not on their own. Fabricated citations are not a gap in a model's knowledge that scale closes: they come from how the model produces text, one plausible token at a time, with no copy of the Code to check against. A bigger model fabricates less often but never reaches zero, and you still cannot tell which answers were the lucky ones. What removes the failure mode is grounding the model in primary authority, not waiting for a smarter one.

Can You Trust AI for Tax Research? What Every CPA Should Know

By Thomson Reuters Institute’s 2026 count, generative-AI use among tax and accounting professionals has roughly doubled in a year, and most firms now either use it or plan to. That makes “can I trust AI for tax research” an incomplete question, because in practice you are probably already using it. The more useful question is narrower: trust it to do what?

Because the answer splits cleanly. There is one job AI does better than almost any tool on your desk, and one job it is structurally unfit for. Most of the trouble in practice comes from not noticing where the line between them falls.

Two jobs wearing one interface

When you ask a model a tax question, it does two very different things inside a single fluent paragraph. It performs language work: framing the issue, explaining a rule in plain English, organizing an argument, drafting the prose. And it performs authority work: telling you which Code section governs, what it says, and where the number comes from.

On screen these are indistinguishable. They arrive in the same confident voice, in the same paragraph, often in the same sentence. But they are not the same act, and they do not carry the same reliability.

The language work is genuinely excellent. A frontier model will sharpen a vague client question into a precise legal one and explain a gnarly provision more clearly than most treatises. That is what next-token prediction is built for.

The authority work is where it breaks. When Stanford researchers tested leading models on verifiable legal questions, they fabricated an answer between 58 and 88 percent of the time. A tax citation fails the same way. Ask about the qualified business income deduction and you may get a confident reference to §199A(d)(4)(C), except that §199A(d) stops at paragraph (3), so the subsection does not exist. The model did not misremember it. It generated it, the same way it generates prose, as the most plausible continuation of a citation-shaped string. (The full mechanism is its own piece.)

The trap is that both halves wear the same face. The explanation can be flawless and the citation underneath it invented, with no change in tone to warn you.

Where it is reliable, and where it is not

Sort the work by which job it actually is, and the line gets practical:

Task	Can you trust it?
Explaining a concept or rule in plain language	Yes; it is language work
Drafting or restructuring a memo or client letter	Yes; you own the facts and the conclusion
Summarizing a document you supplied	Yes; it works from your text, not its memory
Brainstorming issues or positions to research	Yes, as a starting list, not a final one
Telling you which section governs	No; verify against primary source
Quoting statutory or regulatory text	No; confirm it word-for-word at the source
Stating a threshold, dollar amount, or effective date	No; these are low-redundancy facts it invents fluently

The pattern is one line: trust the model with the words, and verify everything that has to be exactly right. Risk also climbs with depth: a reference to §162 is usually safe, while §162(a)(2)(B)(iv) is exactly the kind of deep string a model is most tempted to assemble.

What “grounded” changes

The reason the authority work fails is simple: a general-purpose model has no copy of the Code to consult. It is not looking anything up. So a sterner prompt is not the durable fix. Telling the model to “only cite real sections” changes its tone, not the mechanism. The durable fix is to change where the citation comes from.

When a tool retrieves the section number and quoted text from primary authority and hands them to the model, rather than letting the model generate them, the failure mode disappears at the root. A fabricated §199A(d)(4)(C) cannot appear in a grounded answer, because it is not in the source to copy. The model goes back to doing the language work it is excellent at, and a retrieval layer supplies the parts that must be exact, with a URL you can open.

The thing that separates a usable AI tax tool from a risky one is simple: whether its citations are retrieved or generated, and whether you can open them to check.

That distinction is the axis the whole landscape of AI tools for tax professionals should be read along.

A three-question trust test

Before you rely on any AI tax answer, whether from a chatbot, a platform, or a purpose-built assistant, run it through three questions:

Where did the citation come from: retrieved or generated? If the tool cannot tell you it looked the section up in primary authority, assume it generated it.
Can you open it? A real section resolves to a real page on uscode.house.gov or the eCFR. A fabricated one resolves to nothing, or to text plainly about something else. No link, no claim.
Does the quoted text actually say that? A model can paraphrase a fake section into something that sounds right. It is far less able to produce verbatim language that matches a source that does not exist. When the quote and the cite disagree, trust neither.

Ten seconds per citation beats an hour walking back a memo, and it is the same discipline whether you are drafting freehand or working inside a grounded tool. The full version of this, applied end-to-end, is the division of labor in an AI-drafted research memo: the model writes, primary authority decides what the law says, and you keep the judgment.

So, can you trust it?

Yes, conditionally, and the condition is the whole answer. Trust AI as a writer and an explainer; it is better at that than the tools you replaced with it. Do not trust it as the source of record for what the law is, unless it is grounded in primary authority and returns a link you can open.

The attorneys sanctioned for filing briefs with invented case citations were not careless people. By late 2025 a public database of these incidents had logged hundreds of filings built on authority that did not exist. They trusted a fluent paragraph the way they would trust a knowledgeable colleague, not realizing the thing producing it was a pattern-completer with no copy of the reporter. Tax practice runs on the same trust and the same exposure.

So the goal is straightforward. Keep using the model for what it does well, and give it a grounded source of authority for the parts that have to be exact.

This article is general information, not tax advice. Capabilities of AI tools change quickly; confirm any citation against primary source before relying on it for client work, and test any tool against the questions above.