International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

Research Paper | Statistics | Dominican Republic | Volume 8 Issue 9, September 2019 | Rating: 6.8 / 10

# Correlation and Regression Analyses using Sudoku Grids

Carlos Ml. Rodrguez-Pena | Jose Ramon Martinez Batlle | Willy Marcelo Maurer

Abstract: In this paper, we analysed selected statistical properties of 27, 402 Sudoku grids, which we generated either by using software packages or by consulting sources on the Internet. We classified the Sudoku grids in four different groups according to their provenance as A (10k grids), B (10k grids), C (6.4k grids) and D (1k grids). We calculated the Pearson product-moment correlation coefficient (r), as well as the corresponding correlation tests, to the 36 maximum possible column pairs of each Sudoku grid. We determined that a maximum of 18 significantly correlated column pairs (SCCP) can be obtained in a single Sudoku grid. In addition, we obtained a total of 42, 826 SCCP (8.68 %) out of the 493, 236 possible in our sample. We determined that the number of SCCP with negative r are more common than those with a positive one. We generated linear regression models using SCCP, 32 models resulted for all matrices, 20 with negative correlation values and 12 with positive correlation values. The ratios of negative: positive SCCP for each group yielded 1.00: 0.18 in group A, 1.00: 0.15 in group B, 1.00: 0.16 for group C, 1.00: 0.16 for group D, and an overall ratio of 1.00: 0.16. We found that the number of Sudoku with at least one SCCP was smaller in groups A and B (37.58 % and 14.03 % respectively) than in groups C and D (89.43 % and 89.02 % respectively). We hypothesise that the total probability of models can be obtained if an algorithm can be found to build a group of Sudoku in which all SCCP can be found. We transposed each SCCP, so we turned them into position vectors or points. We conveniently assumed that the points belonged to a nine-dimensional real numbers space. We computed squared distance between points pairwise, and formed the Euclidean distance matrices, which we used to classify the SCCP in groups with a hierarchical cluster analysis. We conclude that Sudoku grids are ideal matrices for simulations and modelling with, at least 5.47 billion matrices showing the same characteristics, with different arrangement of numbers.

Keywords: correlation · negativepositive ratios · coefficient of determination · cluster analysis · modelling

Edition: Volume 8 Issue 9, September 2019,

Pages: 1036 - 1055