Text-to-SQL Agents Under Ambiguous User Intent: A Taxonomy, Benchmark, and Repair Strategy
Giulia Kruger
Faculty of Engineering and Information Technology University of Melbourne, Melbourne, Australia
Katja Meyer
Faculty of Engineering and Information Technology, University of Melbourne, Melbourne, Australia
Keywords: Natural Language Processing, Relational Databases, Conversational Agents, Intent Disambiguation
Abstract
The translation of natural language queries into executable database queries, commonly known as Text-to-SQL, has seen remarkable progress with the advent of large language models. However, standard benchmarking frameworks implicitly assume that user queries are fully specified, structurally sound, and semantically unambiguous. In real-world enterprise deployments, user intents are frequently characterized by missing constraints, vague terminology, and structural ambiguity, leading autonomous agents to generate plausible but incorrect SQL queries. This paper presents a comprehensive investigation into the behavior of Text-to-SQL agents operating under conditions of ambiguous user intent. We introduce a novel, fine-grained taxonomy that categorizes linguistic and structural ambiguities specific to relational database querying. To empirically evaluate agent performance, we propose a new evaluation benchmark comprising thousands of naturally ambiguous queries paired with multivalent target interpretations. Furthermore, we develop a conversational repair strategy that equips Text-to-SQL agents with the ability to detect ambiguity, formulate targeted clarification questions, and iteratively refine the generated queries based on user feedback. Through extensive experimental analysis, we demonstrate that current state-of-the-art models suffer severe performance degradation when exposed to ambiguous inputs. The implementation of our proposed interactive repair framework recovers a significant portion of this lost accuracy, reducing critical semantic errors while maintaining a low cognitive burden on the user.
References
Tu, P., Huang, Y., Zheng, F., He, Z., Cao, L., & Shao, L. (2022, June). Guidedmix-net: Semi-supervised semantic segmentation by using labeled images as reference. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, No. 2, pp. 2379-2387).
Huang, Z., Wang, J., Chen, L., Xiao, B., Cai, L., Zeng, Y., & Xu, J. (2025, October). MVISU-Bench: Benchmarking Mobile Agents for Real-World Tasks by Multi-App, Vague, Interactive, Single-App and Unethical Instructions. In Proceedings of the 33rd ACM International Conference on Multimedia (pp. 8797-8805).
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems.
Ou, J., Guo, J., Jiang, S., Li, X., Xue, R., Tian, W., & Buyya, R. (2025). Accelerating long-context inference of large language models via dynamic attention load balancing. Knowledge-Based Systems, 115018.
Wang, S., Yu, Y., Feldt, R., & Parthasarathy, D. (2025). Automating a complete software test process using llms: An automotive case study. arXiv preprint arXiv:2502.04008.
Li, Weixian Waylon, et al. "Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory." arXiv preprint arXiv:2604.11544 (2026).
Kong, R., Li, Y., Feng, Q., Wang, W., Ye, X., Ouyang, Y., ... & Liu, Y. (2024, August). SwapMoE: Serving off-the-shelf MoE-based large language models with tunable memory budget. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 6710-6720).
Lo, S. C., Zingaro, A., McCullough, J. W., Xue, X., Gonzalez-Martin, P., Joo, B., ... & Coveney, P. V. (2025). A multi-component, multi-physics computational model for solving coupled cardiac electromechanics and vascular haemodynamics. Computer Methods in Applied Mechanics and Engineering, 446, 118185.
Xu, X., Tu, W., & Yang, Y. (2023, June). Selector-enhancer: learning dynamic selection of local and non-local attention operation for speech enhancement. In Proceedings of the AAAI conference on artificial intelligence (Vol. 37, No. 11, pp. 13853-13860).
Zhang, Jiaquan, et al. "Learning global hypothesis space for enhancing synergistic reasoning chain." arXiv preprint arXiv:2602.09794 (2026).
Xu, Xinmeng, Weiping Tu, and Yuhong Yang. "Pcnn: A lightweight parallel conformer neural network for efficient monaural speech enhancement." arXiv preprint arXiv:2307.15251 (2023).
Li, W. W., Ziser, Y., Coavoux, M., & Cohen, S. B. (2023, May). BERT is not the count: Learning to match mathematical statements with proofs. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 3581-3593).
Xu, X., Tu, W., & Yang, Y. (2024). Adaptive selection of local and non-local attention mechanisms for speech enhancement. Neural Networks, 174, 106236.
Dong, J., Koniusz, P., Chen, J., & Ong, Y. S. (2024, September). Adversarially robust distillation by reducing the student-teacher variance gap. In European Conference on Computer Vision (pp. 92-111). Cham: Springer Nature Switzerland.
Yang, Huan, et al. "Kvshare: An llm service system with efficient and effective multi-tenant kv cache reuse." arXiv preprint arXiv:2503.16525 (2025).
Xue, Xiao, et al. "Fast-Forward Lattice Boltzmann: Learning Kinetic Behaviour with Physics-Informed Neural Operators." arXiv preprint arXiv:2509.22411 (2025).
Yao, S., Guo, J., Li, J., Ou, J., Feng, Y., Hu, J., & Liu, D. (2025). Adversarial hard negative samples for continual relation extraction. Applied Soft Computing, 181, 113365.
Wang J, Fan L, Li B, et al. A Dynamic Factor Gating Architecture with Market Regime Awareness for Stock Return Forecasting[J]. 2026.
Yang, Tianyue, and Xiao Xue. "Meno: Meanflow-enhanced neural operators for dynamical systems." arXiv preprint arXiv:2604.06881 (2026).
Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of EMNLP.
Zhang, Y., Carvalho, D., & Freitas, A. (2025, July). Quasi-symbolic Semantic Geometry over Transformer-based Variational AutoEncoder. In Proceedings of the 29th Conference on Computational Natural Language Learning (pp. 12-29).
Wu, Beiliang, et al. "IndexNet: Timestamp and Variable-Aware Modeling for Time Series Forecasting." arXiv preprint arXiv:2509.23813 (2025).
Zhang, W., Zhang, C., Gu, C., Kou, J., Yuan, H., Fang, X., ... & Fang, Y. (2024, October). Hallucination in Large Language Models: From Mechanistic Understanding to Novel Control Frameworks. In 2024 7th International Conference on Universal Village (UV) (pp. 1-36). IEEE.
Tang, Y., Zhang, G., Liu, J. K., & Qin, R. (2025). Weakly supervised land-cover classification of high-resolution images with low-resolution labels through optimized label refinement. International Journal of Remote Sensing, 46(5), 1913-1937.
Zhang, W., Zhang, C., Luo, Z., Ma, J., Yuan, W., Gu, C., & Feng, C. (2025). SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint Satisfaction. arXiv preprint arXiv:2511.07584.
Dong, J., Koniusz, P., Chen, J., Wang, Z. J., & Ong, Y. S. (2024). Robust distillation via untargeted and targeted intermediate adversarial samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 28432-28442).
Zhu, Guoying, et al. "Enabling MoE on the Edge via Importance-Driven Expert Scheduling." arXiv preprint arXiv:2508.18983 (2025).
Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150.
Zhou, J., Shuang, K., Wang, Q., Qian, B., & Guo, J. (2025). Bi-directional feature learning-based approach for zero-shot event argument extraction. Information Processing & Management, 62(5), 104199.
Xue, X., Wang, S., Yao, H. D., Davidson, L., & Coveney, P. V. (2024). Physics informed data-driven near-wall modelling for lattice Boltzmann simulation of high Reynolds number turbulent flows. Communications Physics, 7(1), 338.
Cohen, J., Rosenfeld, E., & Kolter, J. Z. (2019). Certified adversarial robustness via randomized smoothing. In Proceedings of ICML.
Chen, L. (2026). Beyond external constraints: The missing dimension of ai governance. Available at SSRN 6449738.
Vuruma, Sai Krishna Revanth, et al. "Utilizing large language models to identify reddit users considering vaping cessation for digital interventions." arXiv preprint arXiv:2404.17607 (2024).
Li, J., Shuang, K., Guo, J., Shi, Z., & Wang, H. (2023). Enhancing semantic relation classification with shortest dependency path reasoning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31, 1550-1560.