Downloads: 0
United States | Mathematics and Statistics | Volume 15 Issue 1, January 2026 | Pages: 1454 - 1457
Application of the Central Limit Theorem (CLT) for Performance Modeling in AI-Based Inference Systems
Abstract: Modern AI applications- including deep learning inference services, large-language-model (LLM) serving platforms, and real-time recommendation engines- operate at massive scale and experience high variability in request latency. Individual inference requests exhibit heavily skewed latency distributions due to network jitter, contention, memory stalls, and nondeterministic scheduling effects. However, performance engineers routinely analyze aggregated metrics such as mean latency, batch averages, and windowed system statistics, which display surprisingly stable and Gaussian-like behavior. This paper demonstrates that the Central Limit Theorem (CLT) provides the mathematical basis for this stability. We simulate AI inference latency using heavy-tailed distributions, compute aggregated batch means, and show that the distribution of batch means converges to a normal distribution even when individual latencies are non-normal. The findings confirm that CLT is foundational for AI performance monitoring, AIOps-based anomaly detection, confidence-bound estimation, and large-scale inference stability analysis.
Keywords: Central Limit Theorem, AI Inference, Performance Engineering, Latency Modeling, AIOps, Statistical Methods, Normal Approximation, Batch Means, Large-Scale Systems
How to Cite?: Prakasharao Raghavapudi, Rashmi Gupta, "Application of the Central Limit Theorem (CLT) for Performance Modeling in AI-Based Inference Systems", Volume 15 Issue 1, January 2026, International Journal of Science and Research (IJSR), Pages: 1454-1457, https://www.ijsr.net/getabstract.php?paperid=SR26123224009, DOI: https://dx.doi.org/10.21275/SR26123224009