Abstract

While Large Language Models (LLMs) have revolutionized chatbot interactions, they often fall short of aligning responses with the nuanced preferences of individual users, a challenge rooted in the inherently subjective and proprietary nature of those preferences. Consequently, prompt-based learning, though effective in enhancing factual accuracy due to its emphasis on universal correctness, remains insufficient for achieving accurate personalised response alignment. Because user preferences vary widely across individuals and contexts, aligning responses requires a more personalized and context-aware approach. To address this limitation, we propose Consistent Marginalization (CM), a novel framework that aims to unlearn misalignment by constructing a personalised memory bank of instance-response-dependent discrepancies, built from a small set of user preference samples. This personalised memory bank equips LLMs with the ability to understand, recall, and adapt to individual preferences, enabling more consistent and personalized responses. Evaluated across a diverse range of domain-specific datasets and model architectures, CM yields notable improvements in response alignment and robustness. We believe Consistent Marginalization represents a valuable step toward enabling LLMs to become genuinely personable and adaptive conversational agents by understanding user preferences and generating responses that are better aligned with individual user expectations.

Unlearning Misalignment for Personalized LLM Adaptation via Instance-Response-Dependent Discrepancies

Cheng Chen · Atsushi Nitanda · Ivor Tsang

Video

Paper PDF

Abstract