Scaling Laws in Generative AI: How Model Size and Data In?uence Performance and Cost

Troughia, Chandan Singh

doi:https://dx.doi.org/10.21275/SR25727102615

Scaling Laws in Generative AI: How Model Size and Data In?uence Performance and Cost

Chandan Singh Troughia

Abstract: The exponential growth in generative AI capabilities has been governed by predictable mathematical relationships known as scaling laws, fundamentally reshaping how we approach model development and deployment. This paper examines the complex relationships between model size, training data, performance, and cost in generative artificial intelligence systems, spanning models from millions to hundreds of billions of parameters trained on datasets ranging from gigabytes to petabytes. Through systematic analysis of empirical data and case studies, we trace the evolution of scaling laws from the seminal work of Kaplan et al. to the Chinchilla paradigm shift which revealed that previous models were undertrained by orders of magnitude and recent developments, providing a comprehensive framework for understanding how these factors interact. Our investigation reveals that while performance improvements follow power-law relationships with both model size and data quantity, the optimal balance between these factors continues to evolve with significant economic implications. We explore emergent capabilities that appear at specific scale thresholds, the critical role of data quality in determining model performance, comprehensive evaluation methodologies that capture scaling behaviors, and economic considerations that shape practical deployment decisions. The analysis demonstrates that computeoptimal training strategies can achieve equivalent performance with substantially reduced computational costs, fundamentally altering the economics of AI development. By synthesizing insights across these dimensions, we offer evidence-based guidance for researchers and practitioners navigating the trade-offs inherent in generative AI development and deployment. This holistic perspective on scaling laws provides valuable direction for advancing more capable, efficient, and sustainable AI systems in an era of increasing computational demands.

Keywords: Scaling Laws, Generative AI, Model Size, Training Data, Performance, Cost, Large Language Models, Data Quality, Evaluation, Economic Analysis, Chinchilla, Emergent Abilities, Compute-Optimal Training

How to Cite?: Chandan Singh Troughia, "Scaling Laws in Generative AI: How Model Size and Data In?uence Performance and Cost", Volume 14 Issue 8, August 2025, International Journal of Science and Research (IJSR), Pages: 920-949, https://www.ijsr.net/getabstract.php?paperid=SR25727102615, DOI: https://dx.doi.org/10.21275/SR25727102615