AI Hallucinations in Audit: How to Detect, Mitigate, and Document Them

If you use AI tools (ChatGPT, Claude, Perplexity, Grok, or audit-grade alternatives) for any audit work, you will encounter hallucinations. The model will produce confidently-stated, plausible-sounding output that's wrong. Section numbers that don't exist. Paragraph references that don't say what the model claims. Threshold amounts that are off by an order of magnitude.

This isn't a flaw to be eliminated — it's an inherent property of how LLMs work. The audit practitioner's job is to detect, mitigate, and document AI hallucinations — making them a managed risk rather than an undisclosed one.

This post walks through the 7 hallucination patterns Indian CAs encounter most, the detection workflow, and SA 230 documentation requirements when AI is in the audit work-stack.

Why hallucinations happen

LLMs are statistical pattern-matchers, not knowledge databases. They predict the next token based on training data patterns. When asked "What does SA 530 paragraph 9 say?", the model produces what it thinks paragraph 9 should say — often correct, sometimes plausibly wrong.

Three drivers:

Training data gaps — the model may not have seen the actual text of every SA paragraph during training. It generates from general patterns about what SAs look like.
Confidence without uncertainty — LLMs don't have calibrated confidence. A 90%-sure answer and a 50%-sure answer look the same in output.
Plausibility optimisation — models are trained to produce plausible-sounding text. Plausibility ≠ accuracy.

For audit work, where statutory citations, threshold amounts, and timeline obligations matter exactly — hallucination is the central operational risk.

The 7 hallucination patterns CAs encounter most

Pattern 1: Plausibly-wrong section numbers

User: "What section of the Companies Act covers auditor rotation?"

LLM: "Section 139(3) of the Companies Act 2013 covers auditor rotation, requiring audit firms to rotate every 5 years..."

The correct answer is Section 139(2) read with Rule 5. The model said "139(3)". Plausible. Wrong.

Detection: Always verify section numbers against the Bare Act. The Section 139(2) Auditor Rotation Tracker cross-checks these.

Pattern 2: Confusion between similar provisions

User: "What's the penalty under Section 271DA?"

LLM: "Section 271DA imposes a penalty for failure to maintain books of account..."

Wrong — Section 271DA is the Section 269ST cash receipt penalty (penalty = amount received). The model confused with Section 271A (books of account).

Detection: When the LLM gives a topic that doesn't match the section, verify both the section text and the topic.

Pattern 3: Threshold amount errors

User: "What's the threshold for tax audit under Section 44AB?"

LLM: "Section 44AB requires tax audit when business turnover exceeds ₹2 crore..."

Wrong — the threshold is ₹1 crore (or ₹10 crore with cash-test relief). The model is making up ₹2 crore.

Detection: All threshold amounts must be verified. Use the Cash Transaction Compliance Checker, Section 188 RPT Calculator, Section 186 Loan Cap Calculator, and other CORAA calculators which are programmed against the verified statute.

Pattern 4: Effective date / amendment confusion

User: "When did CARO 2020 become effective?"

LLM: "CARO 2020 was effective from FY 2020-21..."

Partially wrong — CARO 2020 was notified for FY 2020-21 but deferred to FY 2021-22 due to COVID-19 by MCA notification dated 17 December 2020.

Detection: For regulations with amendment history, verify the effective date against the latest MCA notification.

Pattern 5: Indian-international standard conflation

User: "What does ISA 315 require for risk assessment?"

LLM: "ISA 315 requires the auditor to..."

The Indian standard is SA 315 (Revised 2020), not ISA 315 (IAASB international). The substantive requirements are similar but Indian audit work cites SA, not ISA. Citing ISA in an Indian working paper is a peer-review finding.

Detection: Always check whether the model is citing the Indian or international version. Use SA, not ISA.

Pattern 6: Hallucinated case law

User: "Are there any Supreme Court cases on auditor liability under Section 147?"

LLM: "Yes, Supreme Court in XYZ Auditors vs ICAI (2019) held..."

The case may not exist. The model is making up a case name and holding.

Detection: Case law citations must be verified against an authoritative legal database (Manupatra, SCC Online, LiveLaw archive, Indian Kanoon). Never use an LLM's case citation without independent verification.

Pattern 7: Confidently-wrong arithmetic

User: "If turnover is ₹150 crore and PBT is ₹18 crore, what's the materiality at 5% of PBT?"

LLM: "Materiality at 5% of PBT = ₹15 crore."

Wrong — ₹18 crore × 5% = ₹90 lakh (₹0.9 crore), not ₹15 crore. The model just made an arithmetic error.

Detection: All numerical computations must be cross-checked with a calculator. Use the Materiality Calculator or similar — never trust an LLM's arithmetic.

The 5-step detection workflow

For any LLM-produced output that's going into audit work, run this verification:

Step 1: Identify the verifiable claims

Read the output. Mark every:

Section / Rule / clause citation
Threshold amount or percentage
Effective date
Case law reference
Numerical calculation

These are the verifiable claims. Everything else is interpretive content (which still needs auditor judgement but isn't a "fact").

Step 2: Source-verify each claim

Section / Rule / clause: check Bare Act. The SA library and CARO 2020 clause pages have verified content.
Threshold amount: cross-check via CORAA calculator or Bare Act.
Effective date: check latest MCA / CBDT / SEBI notification.
Case law: check Indian Kanoon / SCC Online / LiveLaw.
Numerical calculation: re-do in Excel or a verified calculator.

Step 3: Cross-check the interpretive content

If the LLM says "the auditor should issue a qualified opinion because...", verify the reasoning maps to SA 705 paragraph X. Read the SA paragraph yourself. Confirm the LLM's interpretation aligns.

Step 4: Document the verification

Working paper notes:

Original LLM output
Specific verified claims with the verification source
Any claims that were wrong + what the correct answer was
Auditor's final interpretation (which may differ from the LLM)

This documentation is what makes the audit work defensible at peer review or NFRA inspection.

Step 5: Adjust the working paper based on verified answers

The final working paper reflects what's actually correct — not what the LLM said. The LLM was a tool; the auditor takes responsibility.

SA 230 documentation when AI is used

SA 230 paragraph 8 — audit documentation must enable an experienced auditor with no previous connection to understand the nature, timing, extent of procedures performed; results; significant matters arising; conclusions.

When AI is used, the documentation should include:

1. What tool was used

"Claude Pro (Anthropic), accessed [date / time], version Sonnet 3.5" or "CORAA Reporting module, version X.Y.Z, accessed [date / time]".

2. What prompt was given

The substantive prompt (or summary) the auditor used. If sensitive client data was NOT shared with the LLM (as it shouldn't be for public LLMs), note that.

3. What output was received

The substantive output. Either preserved verbatim or summarised.

4. What was verified

The verification trail per the 5-step workflow above.

5. What changed in the final output

Where the LLM was wrong, what the correct answer was, why.

6. Who reviewed the work

The auditor's name + date of review. The auditor takes professional responsibility regardless of LLM involvement.

This documentation is more rigorous than for non-AI work because the underlying tool is probabilistic. Documenting the verification provides the audit trail.

For working paper templates that include AI-use documentation, see the SA 230 working paper template.

Mitigation strategies — reducing hallucination probability

Three strategies reduce hallucination rates before they happen:

1. Use RAG-enabled tools

A RAG-enabled tool (see RAG for Audit) retrieves the actual source text before answering. Hallucination on citations drops by 80-90% when the source text is in the prompt.

2. Use tools with built-in verification

CORAA's calculators are programmed against the verified statute — they can't hallucinate the threshold for Section 269ST because the threshold is hard-coded. Tools with this kind of structural grounding don't have the hallucination problem for the things they cover.

3. Prompt for citations

When asking an LLM a question, end with: "Cite the specific section / paragraph for every claim. If you don't know, say so."

This pushes the model toward acknowledged uncertainty rather than fabricated confidence. It doesn't eliminate hallucinations but reduces them.

4. Multi-source verification

Ask the same question to two different LLMs (e.g., Claude + ChatGPT). Where they agree, the answer is more likely right. Where they disagree, investigate.

This doubles cost (two subscriptions, two prompts) but improves accuracy meaningfully for high-stakes claims.

The honest accuracy estimate

For Indian audit-related questions, our internal testing across Claude 3.5 Sonnet, ChatGPT 4o, and Gemini 1.5 Pro (May 2026):

Claim type	Accuracy without verification
General SA description	~85-90%
Specific SA paragraph citation	~70-80%
Section number citation	~75-85%
Threshold amount	~80-90% (depends on how recently the amount changed)
Effective date	~70-80%
Case law citation	~50-70% (worst category)
Numerical calculation	~70-85% (frequent arithmetic errors)

So 15-30% of substantive claims from a public LLM are wrong on first try, without verification. For audit work, this is the central operational risk.

After applying the 5-step verification workflow, error rate drops to near-zero — but the workflow takes time. Roughly 50% of the time-saving from using AI is consumed by verification.

The net is still positive — but the verification cost is real and shouldn't be ignored.

When NOT to use AI for audit-related research

Three scenarios where the verification overhead exceeds the value:

1. Citation-heavy regulatory analysis

If the deliverable is a 20-citation regulatory memo, the verification work to check every LLM citation may exceed the time saved on drafting. Better to research directly.

2. Time-sensitive Section 143(12) decisions

When the 60-day Form ADT-4 clock is ticking, you don't have time to chase down LLM-cited paragraph references that turn out wrong. Use authoritative sources directly.

3. Adversarial / litigation contexts

If your audit work will be scrutinised by opposing counsel in a regulatory inquiry, every citation must be defensible. Don't rely on LLM citations for anything that goes into a Section 143(15) response or NFRA proceeding.

Bottom line

AI hallucinations are an inherent property of LLMs, not a bug. For Indian audit work, where statutory citations and threshold amounts matter exactly, hallucination is the central operational risk.

The mitigation:

Use RAG-enabled tools for any LLM work involving regulations or standards
Use audit-grade calculators for threshold / numerical work (don't trust LLM arithmetic)
Verify every substantive citation against the source
Document the verification in SA 230 working papers
Take professional responsibility — the auditor signs the report, not the LLM

The firms using AI well treat the LLM as a fast-but-fallible analyst. The firms getting in trouble treat LLM output as authoritative. The difference between the two is the verification workflow.

For practitioner tools that avoid the hallucination problem:

22 Calculators — programmed against verified statute, no hallucination on threshold amounts
SA library — verified text + commentary, not LLM-generated
CARO 2020 clauses — verified clause-by-clause text
120 Q&A reference — every answer human-curated and source-cited

Try CORAA → RAG-enabled, citation-grounded, India-hosted. Calculators programmed against verified statute. No hallucination on the threshold amounts that matter most. See pricing · Browse calculators · AI Lab.

தலைப்புகள்

AI hallucinations auditLLM accuracy auditChatGPT hallucinations CAAI verification auditAI documentation SA 230hallucination detection auditAI risk audit India

← அனைத்து கட்டுரைகளுக்கும் திரும்பு

AI Hallucinations in Audit: How to Detect, Mitigate, and Document Them

AI Hallucinations in Audit: How to Detect, Mitigate, and Document Them

Why hallucinations happen

The 7 hallucination patterns CAs encounter most

Pattern 1: Plausibly-wrong section numbers

Pattern 2: Confusion between similar provisions

Pattern 3: Threshold amount errors

Pattern 4: Effective date / amendment confusion

Pattern 5: Indian-international standard conflation

Pattern 6: Hallucinated case law

Pattern 7: Confidently-wrong arithmetic

The 5-step detection workflow

Step 1: Identify the verifiable claims

Step 2: Source-verify each claim

Step 3: Cross-check the interpretive content

Step 4: Document the verification

Step 5: Adjust the working paper based on verified answers

SA 230 documentation when AI is used

1. What tool was used

2. What prompt was given

3. What output was received

4. What was verified

5. What changed in the final output

6. Who reviewed the work

Mitigation strategies — reducing hallucination probability

1. Use RAG-enabled tools

2. Use tools with built-in verification

3. Prompt for citations

4. Multi-source verification

The honest accuracy estimate

When NOT to use AI for audit-related research

1. Citation-heavy regulatory analysis

2. Time-sensitive Section 143(12) decisions

3. Adversarial / litigation contexts

Bottom line

மேலும் in ai in audit.

Ready to தானியக்கமாக்கு your audit work.