Reinforcement Learning-Controlled Ensemble Sampling for Imbalanced Complex Data
Alastair Cathrin
Faculty of Engineering and Information Technology, University of Melbourne, Melbourne, Australia
Marta Ferrara
Faculty of Engineering and Information Technology, University of Melbourne, Melbourne, Australia
Keywords: Reinforcement Learning, Ensemble Sampling, Imbalanced Data, Machine Learning, Data Preprocessing
Abstract
The proliferation of complex and highly imbalanced datasets across various domains poses a significant challenge to traditional machine learning algorithms, which inherently bias their decision boundaries toward majority classes. Existing data-level preprocessing techniques, such as static oversampling and undersampling, often fail to capture the underlying manifold structure of heterogeneous data, leading to problems such as severe overfitting, loss of crucial information, and degraded generalization capabilities. This paper introduces a novel framework utilizing a reinforcement learning agent to dynamically control ensemble sampling strategies tailored for imbalanced complex data. By modeling the sampling process as a Markov Decision Process, the proposed framework allows the agent to continuously interact with the data environment, observing local data distributions and iteratively adjusting the sampling ratios and synthetic generation parameters for multiple base classifiers. The reward mechanism is meticulously engineered to optimize global evaluation metrics, specifically emphasizing the minority class predictive performance without sacrificing majority class accuracy. Extensive theoretical analysis demonstrates how the reinforcement learning paradigm overcomes the rigid heuristics of conventional ensemble methods. Experimental validation on diverse, high-dimensional datasets confirms that the proposed approach substantially mitigates the adverse effects of extreme class overlap and noise. The findings suggest a paradigm shift in how sampling configurations are optimized, offering a robust, automated solution for complex predictive modeling tasks across interdisciplinary applications.
References
Wu, Beiliang, et al. "IndexNet: Timestamp and Variable-Aware Modeling for Time Series Forecasting." arXiv preprint arXiv:2509.23813 (2025).
Zhao, S., Qian, K., & Wen, C. (2022). Research on Used Car Transaction Cycle Based on Soft Voting. Information Systems and Economics, 3(1), 10-15.
Qu, D., & Ma, Y. (2025). Magnet-bn: markov-guided Bayesian neural networks for calibrated long-horizon sequence forecasting and community tracking. Mathematics, 13(17), 2740.
Li, W. W., Ziser, Y., Xie, Y., Cohen, S. B., & Ma, T. (2025, July). Tsprank: Bridging pairwise and listwise methods with a bilinear travelling salesman model. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1 (pp. 707-718).
Hu, Z., Hu, M., & Li, Z. (2026). DynSupplyNet: A Dynamic Graph Neural Network with Temporal Fusion for Supply Chain Risk Prediction and Propagation Analysis. Journal of Computer Science and Frontier Technologies, 3(2), 1-13.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Ren, Yang, et al. "Understanding the nature of system-related issues in machine learning frameworks: An exploratory study." arXiv preprint arXiv:2005.06091 (2020).
Hao, Y., Xu, J., & Liao, S. (2026). RAPS-Net: A Risk-Aware CNN-LSTM Framework for Cross-Domain Risk Prediction and Dynamic Security Control in Cloud Payment Supply Chains. Economics and Management Innovation, 3(2), 18-27.
Chen, L. (2026). Beyond external constraints: The missing dimension of ai governance. Available at SSRN 6449738.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Ma, Y., & Qu, D. (2025, November). GELNO-FD: gauge-equivariant Fourier liquid neural operators for interpretable Markovian Bayesian dynamics. In Fifth International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2025) (Vol. 13967, pp. 177-183). SPIE.
Lv, Qi, et al. "Spatial-temporal graph diffusion policy with kinematic modeling for bimanual robotic manipulation." Proceedings of the Computer Vision and Pattern Recognition Conference. 2025.
Lu, Yao, Kaiyi Hu, and Luyan Zhang. "S 3 G: Stock State Space Graph for Enhanced Stock Trend Prediction." ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2026.
Guo, H., Lu, C., Bao, F., Pang, T., Yan, S., Du, C., & Li, C. (2023). Gaussian mixture solvers for diffusion models. Advances in Neural Information Processing Systems, 36, 25598-25626.
Wang, Z., Kim, E. H., Oh, S. K., Pedrycz, W., Fu, Z., & Yoon, J. H. (2024). Reinforced fuzzy-rule-based neural networks realized through streamlined feature selection strategy and fuzzy clustering with distance variation. IEEE Transactions on Fuzzy Systems, 32(10), 5674-5686.
Chen, Y., Lyu, N., Lang, S., Yan, H., Tao, Z., Ding, X., & Zhu, X. EconAI: Dynamic Persona Evolution and Memory-Aware Agents inEvolving Economic Environments. In Workshop on Multi-Agent Learning and Its Opportunities in the Era of Generative AI.
Xue, X., Wang, S., Yao, H. D., Davidson, L., & Coveney, P. V. (2024). Physics informed data-driven near-wall modelling for lattice Boltzmann simulation of high Reynolds number turbulent flows. Communications Physics, 7(1), 338.
Ma, Y., Qu, D., & Wang, Y. (2026). Dynamic community detection using class preserving time series generation with Fourier Markov diffusion. Scientific Reports.(https://doi.org/10.1038/s41598-026-37699-1)
Wang, Jiacheng, et al. "A Dynamic Factor Gating Architecture with Market Regime Awareness for Stock Return Forecasting." (2026).
Dong, J., Koniusz, P., Chen, J., & Ong, Y. S. (2024, September). Adversarially robust distillation by reducing the student-teacher variance gap. In European Conference on Computer Vision (pp. 92-111). Cham: Springer Nature Switzerland.
Zhong, Z., Yang, S., & Wei, L. (2026, February). Privacy-enhanced federated learning via asynchronous aggregation and local differential perturbation. In Third International Conference on Big Data, Computational Intelligence, and Applications (BDCIA 2025) (Vol. 14128, pp. 1156-1165). SPIE.
Li, W. W., & Ma, T. (2025). Learn to Rank Risky Investors: A Case Study of Predicting Retail Traders’ Behaviour and Profitability. ACM Transactions on Information Systems, 44(1), 1-33.
Wu, Yuxuan, et al. "Unsupervised Disentanglement of Content and Style via Variance-Invariance Constraints." arXiv preprint arXiv:2407.03824 (2024).
Xue, X., Yao, H. D., & Davidson, L. (2022). Synthetic turbulence generator for lattice Boltzmann method at the interface between RANS and LES. Physics of Fluids, 34(5).
Dong, J., Koniusz, P., Chen, J., Xie, X., & Ong, Y. S. (2024). Adversarially robust few-shot learning via parameter co-distillation of similarity and class concept learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 28535-28544).
Kim, E. H., Wang, Z., Zong, H., Jiang, Z., Fu, Z., & Pedrycz, W. (2023). Design of tobacco leaves classifier through fuzzy clustering-based neural networks with multiple histogram analyses of images. IEEE Transactions on Industrial Informatics, 20(3), 4698-4709.
Ma, Y., Qu, D., & Wang, Y. (2026). Tracking evolving communities in fake news cascades using temporal graphs. Scientific Reports.(https://www.nature.com/articles/s41598-026-35175-4)
Dong, J., Yang, L., Wang, Y., Xie, X., & Lai, J. (2023). Toward intrinsic adversarial robustness through probabilistic training. IEEE Transactions on Image Processing, 32, 3862-3872.
Yang, Tianyue, and Xiao Xue. "Meno: Meanflow-enhanced neural operators for dynamical systems." arXiv preprint arXiv:2604.06881 (2026).
Qu, D., Ma, Y., & Pyrozhenko, M. (2026). DISPEL-GNN: De-Illusion via Spectral Stability and Perturbation Bound-Enforced Learning for Community Detection with Risk-Aware Dynamic Attention in Graph Neural Networks. Mathematics, 14(4), 602.(https://doi.org/10.3390/math14040602)
Qu, D., & Ma, Y. (2026, February). TM-GNN: treg-regulated Markov graph neural networks for stable community detection in dynamic systems. In International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2025) (Vol. 14113, pp. 244-249). SPIE.
Lo, S. C., Zingaro, A., McCullough, J. W., Xue, X., Gonzalez-Martin, P., Joo, B., ... & Coveney, P. V. (2025). A multi-component, multi-physics computational model for solving coupled cardiac electromechanics and vascular haemodynamics. Computer Methods in Applied Mechanics and Engineering, 446, 118185.
Liu, J. (2025, May). Reinforcement learning-controlled subspace ensemble sampling for complex data structures. In 2025 IEEE 7th International Conference on Communications, Information System and Computer Engineering (CISCE) (pp. 590-594). IEEE.
Hu, L., & Shen, Y. (2026). A predictive analytics approach for forecasting global stock index returns using deep learning techniques. Decision Analytics Journal, 100685.
Ma, Y., & Qu, D. (2025, August). Geftnn-ba: A gauge-equivariant fourier transformer neural network with bayesian attention for trustworthy temporal dynamics. In 2025 2nd International Conference on Intelligent Perception and Pattern Recognition (IPPR) (pp. 314-318). IEEE.
Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. In Proceedings of ICLR.
Ma, Y., Qu, D., & Pyrozhenko, M. (2026). Bio-RegNet: A Meta-Homeostatic Bayesian Neural Network Framework Integrating Treg-Inspired Immunoregulation and Autophagic Optimization for Adaptive Community Detection and Stable Intelligence. Biomimetics, 11(1), 48.
Dong, J., Wang, Y., Lai, J. H., & Xie, X. (2022). Improving adversarially robust few-shot image classification with generalizable representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9025-9034).
Qu, D., & Ma, Y. (2026). F2-CommNet: Fourier–Fractional neural networks with Lyapunov stability guarantees for hallucination-resistant community detection. Frontiers in Computational Neuroscience, 19, 1731452. (https://doi.org/10.3389/fncom.2025.1731452)
Huang, J. (2025, September). GRAPHTRUST: GRAPH NEURAL NETWORKS FOR RELIABLE SUPPLY CHAIN RISK REASONING. In The 5th International scientific and practical conference “Trends in the development of science by young scientists and students”(September 30-October 03, 2025) Warsaw, Poland. International Science Group. 2025. 122 p. (p. 18).
Xing, Jiarui, Song Wang, and Jian Wang. "Divergence is Uncertainty: A Closed-Form Posterior Covariance for Flow Matching." arXiv preprint arXiv:2605.00941 (2026).
Yang, C., Wang, Z., Oh, S. K., Pedrycz, W., & Yang, B. (2022). Ensemble fuzzy radial basis function neural networks architecture driven with the aid of multi-optimization through clustering techniques and polynomial-based learning. Fuzzy sets and systems, 438, 62-83.