Machine Learning is a field of artificial intelligence that empowers computers to learn and make decisions without being explicitly programmed. There are various machine learning algorithms, broadly categorized into supervised and unsupervised learning. In supervised learning, a model learns from labeled training data, while unsupervised learning involves finding patterns in data without explicit labels.
Decision trees are a fundamental concept in machine learning. They are flowchart-like structures where each node represents a decision based on a particular feature. Boosting algorithms, like XGBoost, enhance the predictive power of decision trees. They work by combining weak learners to create a strong predictive model. XGBoost, in particular, has proven effective in various applications due to its speed and performance.
This tool aims to predict the compressive strength of concrete based on input parameters such as cement, water, coarse aggregate, fine aggregate, fly ash, superplasticizer, blast furnace slag, and age. Compressive strength is influenced by various factors like cement content, water-to-cement ratio, and curing time. Machine learning models, including XGBoost Regressor, were trained to accurately predict concrete compressive strength.
The dataset used for training the models is sourced from the UCI Machine Learning Repository. The dataset contains 1030 concrete data samples, each with the following input variables.
The following chart illustrates the relative frequency distribution of each parameter in the dataset:
The XGBoost Regressor was found to be the most accurate model for prediction of compressive strength of concrete. Model selection involved the hold-out method, and the model evaluation process involved tuning hyperparameters, accuracy analysis, and model validation using both hold-out and cross-validation techniques. The following are some of the metrics used to evaluate the model's performance:
A correlation coefficient study was performed on the input parameters. The range of the correlation coefficients indicates that the input parameters can be considered to have low to moderate correlations with each other. When two or more variables are highly correlated, they might carry similar information which can lead to instability, inflated importance for those variables and could also affect model generalization abilities.
The XGBoost Regressor achieved an impressive R-squared value of 0.96 and a mean-squared error of 11.486 MPa. The prediction distribution chart visualizes the relationship between actual and predicted compressive strength.
An analysis of feature importances highlighted the dominant influence of age and cement content on compressive strength, constituting nearly 70% of importance.
While machine learning models show high accuracy in predicting laboratory concrete strength, uncertainties in field-placed concrete, influenced by variable environmental conditions and operational uncertainties during curing and construction, may affect prediction accuracy. Interpretation of results should consider data quality, and high-quality field data can enhance model accuracy in practical applications.
Interpreting machine learning models poses challenges due to their complexity and ambiguity. Unlike traditional methods, machine learning identifies association relationships rather than causal relationships. In this particular model, while the model accurately predicts compressive strength from the input variables, there exists insufficient information about these variables; for example, cement properties, characteristics of aggregate, properties of chemical admixtures, properties of cementitious materials and curing regimes are not captured by the input variables. Regardless, the machine learning model is able to learn the implicit underlying patterns and accurately predict compressive strength from the unobserved influence of some of these underlying factors. Results from this model should thus be interpreted as associations between concrete mix ratios and compressive strength rather than causations.
The model is trained on 1030 sets of concrete samples, highlighting the need for high-quality field data to improve practical applications. Developing comprehensive data on field concrete characteristics is essential for enhancing model performance in real-world settings.
While machine learning models like XGBoost Regressor offer powerful tools for predicting concrete compressive strength, users must be aware of deployment challenges, interpretability issues, and the importance of high-quality field data for real-world applications.