USING GRADIENT BOOSTING FOR PREDICTING THE FLASH POINT OF ORGANIC COMPOUNDS
Abstract and keywords
Abstract (English):
Currently, a large amount of experimental data has been accumulated in chemistry. In this regard, there is a need to improve computational methods for storing and processing experimental data. The flash point of organic compounds is an important factor ensuring the safety of chemical industries. The modern chemical industry, in the context of the transition to Industry 4.0, is undergoing deep digital transformations due to increased safety requirements for chemical industries. The use of digital process twins has caused significant changes in the organization of chemical production. Thus, such areas of Industry 4.0 as additive technologies, the Internet of Things, etc. are currently actively developing. In such conditions, the use of machine learning algorithms is a key tool for identifying factors affecting the flash point of organic compounds and improving the efficiency of predicting this parameter. Information on the flaskpoint temperature for 1741 organic substances was included in the database for this work. The data on flash points of organic compounds were taken from the PubChem database. To simplify the analysis of the representation of organic compounds, we used 208 RDKit descriptors, as they are among the best descriptors for predicting the properties of chemical compounds. These descriptors are created based on the shared keys of the substructure. In addition, the models were calculated using Morgan's molecular fingerprints, also known as circular prints with a radius of 2. As part of this work, gradient boosting was implemented. XGBoost is built on the principles of gradient enhancement using tree-based learning algorithms to enhance predictive modeling capabilities. For the training sample, the obtained gradient boosting model showed an error-free classification, the prediction error for it is 0. The statistical characteristics of the constructed ridge regression model for the sample have the following values: R2 =0.74 and the prediction error RMSE=36.36 K.

Keywords:
BIG DATA, INDUSTRY 4.0, FLASH POINT, GRADIENT BOOSTING, ARTIFICIAL INTELLIGENCE
Text
Text (PDF): Read Download
Login or Create
* Forgot password?