HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs

Tin Nguyen · Logan Bolton · Mohammad Reza Taesiri · Trung Bui · Anh Totti Nguyen

Video

Paper PDF

Thumbnail of paper pages

Abstract

An Achilles heel of Large Language Models (LLMs) is their tendency to hallucinate non-factual statements. A response mixed of factual and non-factual statements poses a challenge for humans to verify and accurately base their decisions on. To combat this problem, we propose Highlighted Chain-of-Thought Prompting (HoT), a technique for prompting LLMs to generate responses with XML tags that ground facts to those provided in the question. That is, given an input question, LLMs would first re-format the question to add XML tags highlighting key facts, and then, generate a response with highlights over the facts referenced from the input. Compared to vanilla chain of thought prompting (CoT), HoT reduces the rate of hallucination and separately improves LLM accuracy of 5 LLMs consistently on over 22 tasks from arithmetic, reading comprehension, to logical reasoning. Consistent with the success of HoT few-shot prompting, training small LLMs (LLaMA-3.2-1B and Qwen2.5-1.5B) via supervised-finetuning on HoT examples improve LLMs accuracy (on 5 out-of-distribution tasks) over the baselines and over finetuning on CoT examples. When asking humans to verify LLM responses, highlights help time-limited participants to more accurately and efficiently recognize when LLMs are correct. Yet, surprisingly, when LLMs are wrong, HoTs tend to fool users into believing that an answer is correct.