ANALYSIS OF THE EFFECTIVENESS OF RECURRENT NEURAL NETWORKS IN THE TASK OF CATEGORIZING MEDIA TEXTS

Kolobova Dar'ya Alekseevna

doi:doi:10.55421/3034-4689_2025_28_4_97

Home / Journals / Herald of Technological University / Volume 28 Issue 4 / ANALYSIS OF THE EFFECTIVENESS OF RECURRENT NEURAL NETWORKS IN THE TASK OF CATEGORIZING MEDIA TEXTS

ANALYSIS OF THE EFFECTIVENESS OF RECURRENT NEURAL NETWORKS IN THE TASK OF CATEGORIZING MEDIA TEXTS

Submit manuscript Download PDF
Text

To cite

Citations:

ANALYSIS OF THE EFFECTIVENESS OF RECURRENT NEURAL NETWORKS IN THE TASK OF CATEGORIZING MEDIA TEXTS

Journal: HERALD OF TECHNOLOGICAL UNIVERSITY Volume 28 № 4

Rubrics: 3. INFORMATION TEORY, COMPUTER TECHNOLOGY AND CONTROL

Kolobova Dar'ya Alekseevna ¹

Author and publication information

Authors:

1. Kazan National Research Technical University named after A.N. Tupolev (ASOIU, inzhener)
employee

Type:

Article

DOI:

https://doi.org/10.55421/3034-4689_2025_28_4_97

Pages:

from 97 to 101

Status:

In work

Language:

Russian

Keywords:

NEURAL NETWORKS, TEXT CLASSIFICATION, NATURAL LANGUAGE PROCESSING, TOKENIZATION, RECURRENT NEURAL NETWORKS, TEXT PREPROCESSING

Abstract and keywords

Abstract (English):
The article examines modern methods and approaches to solving the problem of classifying news texts, which is an urgent problem in the context of a large amount of information available to users. News classification plays a key role in optimizing the information retrieval process, contributes to the creation of personalized content and helps analyze social trends, which is especially important in the era of digitalization. In the course of the work, the main concepts and principles related to text processing and analysis are considered, including the stages of text preprocessing, dictionary compilation, tokenization, creation of batches from text sequences and text classification. Special attention is paid to various architectures of recurrent neural networks (RNNs), their features, advantages and disadvantages in the context of the text classification task. Recurrent neural networks are a powerful tool for processing sequential data, such as text, and allow for context-based classification. Experiments have been conducted with various models of recurrent neural networks, optimal parameters have been selected to ensure high classification accuracy of news texts, and the best model has been identified - GRU_model512_2layers_dropout_epoch10, consisting of two recurrent layers of the GRU architecture, containing 512 neurons each in a hidden layer, with a dropout of 20%, trained on 10 epochs. It takes up less memory space (by 10 MB) than a model with the LSTM architecture and the same parameters, since the GRU architecture has a simpler structure. In this regard, it is also faster to learn (17 s/epoch faster than the LSTM architecture model). It also shows higher accuracy (91.6%) than models with simpler architectures, which are prone to overfitting. For the software implementation of the news text classification algorithm, the Python programming language is used, as well as the open source PyTorch machine learning framework and the NLTK natural language processing library. The process of classifying a news text is performed in the following sequence: loading the text, processing it, classifying it, and outputting the category to which the text belongs. To train the models and verify the results, a dataset containing samples of four categories of news texts is used.

Keywords:
NEURAL NETWORKS, TEXT CLASSIFICATION, NATURAL LANGUAGE PROCESSING, TOKENIZATION, RECURRENT NEURAL NETWORKS, TEXT PREPROCESSING

Text

Text (PDF): Read Download

Submit manuscript Download PDF
Text

To cite

Citations:

Confirmation

Регистрация