Methods for Enhancing Factuality of Large Language Models via Retrieval-Augmented Mechanisms
Julian Sterling,
Department of Computer Science, University of Oxford, Oxford, United Kingdom
Amelia Bennett,
Department of Computer Science, University of Oxford, Oxford, United Kingdom
Clara Westwood
Department of Computer Science, University of Oxford, Oxford, United Kingdom
Keywords: Large Language Models,, Retrieval-Augmented Generation, Dense Retrieval, Neural Hallucination,, Factuality Enhancement
Abstract
The rapid proliferation of large language models has fundamentally transformed the landscape of natural language processing, enabling unprecedented capabilities in text generation, summarization, and interactive dialogue. However, a persistent and critical limitation of these generative architectures is their propensity to produce factually incorrect or unverified information, a phenomenon widely characterized as hallucination. This paper presents a comprehensive investigation into methods for mitigating hallucinatory behaviors and enhancing the factuality of large language models through the implementation of advanced retrieval-augmented mechanisms. By dynamically decoupling the parametric memory of the neural network from a non-parametric, externally updatable knowledge base, retrieval-augmented generation paradigms offer a robust solution to the limitations of static pre-training. We provide a deep architectural analysis of the integration between dense passage retrieval systems and autoregressive generation processes. Furthermore, we propose a novel contextual attention mechanism designed to optimize the semantic fusion of retrieved documents with user prompts. Through extensive empirical evaluations on standard knowledge-intensive datasets, we demonstrate that our refined retrieval-augmented framework significantly outperforms conventional parametric baselines and standard heuristic retrieval approaches. The results indicate substantial improvements in exact match metrics and a dramatic reduction in hallucination rates. This research elucidates the theoretical underpinnings of factuality in generative models and establishes a scalable, algorithmically efficient framework for deploying highly reliable artificial intelligence systems in mission-critical applications.
References
Zhang, Y., Valentino, M., Carvalho, D., Pratt-Hartmann, I., & Freitas, A. (2024, June). Graph-Induced Syntactic-Semantic Spaces in Transformer-Based Variational AutoEncoders. In Findings of the Association for Computational Linguistics: NAACL 2024 (pp. 474-489).
Li, B., Gu, B., & Ding, Z. (2025). LLM-based Personalized Portfolio Recommender: Integrating Large Language Models and Reinforcement Learning for Intelligent Investment Strategy Optimization. arXiv preprint arXiv:2512.12922.
Wang, S., Yu, Y., Feldt, R., & Parthasarathy, D. (2025). Automating a complete software test process using llms: An automotive case study. arXiv preprint arXiv:2502.04008.
Tu, P., Huang, Y., Zheng, F., He, Z., Cao, L., & Shao, L. (2022, June). Guidedmix-net: Semi-supervised semantic segmentation by using labeled images as reference. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, No. 2, pp. 2379-2387).
Kong, R., Li, Y., Feng, Q., Wang, W., Ye, X., Ouyang, Y., ... & Liu, Y. (2024, August). SwapMoE: Serving off-the-shelf MoE-based large language models with tunable memory budget. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 6710-6720).
Li, Z., Zhang, Y., Pan, T., Sun, Y., Duan, Z., Fang, J., ... & Wang, J. (2025, July). FocusLLM: Precise understanding of long context by dynamic condensing. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 31087-31101).
Zeng, D., Yang, Y., Tang, Y., Zhao, L., Wang, X., Yun, D., ... & Lin, H. (2025). Shaping school for childhood myopia: the association between floor area ratio of school environment and myopia in China. British Journal of Ophthalmology, 109(1), 146-151.
Ma, Y., Qu, D., & Pyrozhenko, M. (2026). Bio-RegNet: A Meta-Homeostatic Bayesian Neural Network Framework Integrating Treg-Inspired Immunoregulation and Autophagic Optimization for Adaptive Community Detection and Stable Intelligence. Biomimetics, 11(1), 48.
Huang, Y., Zhang, K., Wang, Y., Du, D., Yuan, Y., & Zhao, Z. (2025, June). Enhancing Open-Vocabulary Panoptic Segmentation with Semantic-Guided Q-Tuning. In 2025 IEEE International Conference on Multimedia and Expo (ICME) (pp. 1-6). IEEE.
Zhang, W., Zhang, C., Gu, C., Kou, J., Yuan, H., Fang, X., ... & Fang, Y. (2024, October). Hallucination in Large Language Models: From Mechanistic Understanding to Novel Control Frameworks. In 2024 7th International Conference on Universal Village (UV) (pp. 1-36). IEEE.
Li, L., Wang, Y., Fan, J., Li, J., Qin, S., Wen, Q., & Gao, F. (2025). Quantum knowledge distillation for large language models. arXiv preprint arXiv:2505.13205.
Yang, H., Zhang, R., Huang, M., Wang, W., Tang, Y., Li, Y., ... & Zhang, D. (2025). Kvshare: An llm service system with efficient and effective multi-tenant kv cache reuse. arXiv preprint arXiv:2503.16525.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
Tang, Y., Wang, Y., Zhang, R., & Liu, J. (2024). Linguistic steganalysis via llms: Two modes for efficient detection of strongly concealed stego. IEEE Signal Processing Letters, 32, 541-545.
Chen, Y. (2025). The Lexical Bundles and Discourse Markers Between Bilingual and Monolingual Teachers’ Talk: A Corpus-Based Study. Florida Journal of Educational Research, 62(3), 19-31.
Ou, Y., Zhang, P., Yu, J., Li, M., Su, S., Zhang, M., ... & Wu, J. (2025, February). The application of the BERTopic model in natural language processing: In-depth text topic modeling. In 2025 5th International Conference on Consumer Electronics and Computer Engineering (ICCECE) (pp. 793-796). IEEE.
Guo, Y., Sekiguchi, Y., Zeng, W., Ebihara, S., Owaki, D., & Hayashibe, M. (2025). Physics-informed learning framework for lower limb kinematic prediction with sparse sensors and its application in chronic stroke. IEEE Transactions on Neural Systems and Rehabilitation Engineering.
Zhang, L. (2025). MCP: Control-theoretic orchestration for multimodal large language models. arXiv preprint arXiv:2509.16597
Zhang, Y., Carvalho, D., & Freitas, A. (2024, August). Learning disentangled semantic spaces of explanations via invertible neural networks. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2113-2134).
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.
Yifan, O. U. (2018). Participating in Chinese Social Question and Answer Communities: A Case Study of Zhihu. com.
Wang, Y., Song, R., Li, L., Zhang, R., & Liu, J. (2025). Dynamically allocated interval-based generative linguistic steganography with roulette wheel. Applied Soft Computing, 176, 113101.
Fan, D., Feng, Q., Zhang, A., Liu, M., Ren, Y., & Wang, Y. (2023). Optimization of scheduling and timetabling for multiple electric bus lines considering nonlinear energy consumption model. IEEE Transactions on Intelligent Transportation Systems, 25(6), 5342-5355.
Cui, Z., Huang, T., Chiang, C. E., & Du, C. (2025, August). Toward verifiable misinformation detection: A multi-tool LLM agent framework. In Proceedings of the 2025 International Conference on Generative Artificial Intelligence for Business (pp. 179-185).
Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., & Finn, C. (2024). Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems.
Ding, H., Fang, Y., Zhu, R., Jiang, X., Zhang, J., Xu, Y., ... & Wang, Y. (2024). 3ds: Decomposed difficulty data selection’s case study on llm medical domain adaptation.
Vuruma, S., Wu, D., Gupta, S. S., Aust, L., Lookingbill, V., Bellamy, W., ... & Huang, M. (2025, June). Automated Reddit Data Annotation with Large Language Models. In 2025 IEEE 13th International Conference on Healthcare Informatics (ICHI) (pp. 251-260). IEEE.
Zhang, Y., Carvalho, D., & Freitas, A. (2025, July). Quasi-symbolic Semantic Geometry over Transformer-based Variational AutoEncoder. In Proceedings of the 29th Conference on Computational Natural Language Learning (pp. 12-29).
Huang, T., Cui, Z., Du, C., & Chiang, C. E. (2025, June). CL-ISR: A Contrastive Learning and Implicit Stance Reasoning Framework for Misleading Text Detection on Social Media. In 2025 6th International Conference on Electronic Communication and Artificial Intelligence (ICECAI) (pp. 610-616). IEEE.
Lu, P., Zhang, Y., Zhang, H., Zheng, J., Tong, K., & Wu, W. (2025, November). Tool-Augmented Hybrid Ensemble Reasoning with Distillation for Bilingual Mathematical Problem Solving. In 2025 4th International Conference on Image Processing, Computer Vision and Machine Learning (ICICML) (pp. 1770-1776). IEEE.
Ou, J., Guo, J., Jiang, S., Li, X., Xue, R., Tian, W., & Buyya, R. (2025). Accelerating long-context inference of large language models via dynamic attention load balancing. Knowledge-Based Systems, 115018.
Yao, S., Guo, J., Li, J., Ou, J., Feng, Y., Hu, J., & Liu, D. (2025). Adversarial hard negative samples for continual relation extraction. Applied Soft Computing, 181, 113365.
Tang, Y., Kojima, K., Gotoda, M., Nishikawa, S., Hayashi, S., Koike-Akino, T., ... & Klamkin, J. (2020). Design and Optimization of Shallow-Angle Grating Coupler for Vertical Emission from Indium Phosphide Devices.
Ahmad, N. R. (2025). Exploring the impact of inflation on Pakistani society: Challenges, causes, and long-term consequences for economic stability and social well-being. https://doi.org/10.63075/7vtnh777
Ahmad, N. R. (2025). Business ethics in the age of automation: How companies can balance profitability with responsibility. Punjab Model Bazaars Management Company.