Student Satisfaction Classification Algorithm Using the Minority Synthetic Oversampling Technique

 Abstract —This study is based on the university students’ opinions on the social network Twitter, to learn the teaching performance in the context of virtual learning using sentiment analysis technique. However, to establishing the classification algorithm, an imbalance was evidenced in the amounts of opinions that qualify the teaching performance with the satisfied and dissatisfied class. Therefore, the objective of this investigation is to determine the improvement in the performance of the student satisfaction classification algorithm, based on the class balancing method from the application of the minority synthetic oversampling technique (SMOTE). From the methodological point of view, the research is a non-experimental design, applied type, and quantitative approach. The data was collected through the social network Twitter for fifteen weeks to a population defined by mechanical and electrical engineering students. After the application of the SMOTE data balancing technique, it was identified that the algorithm which presents the best performance is Logistic Regression. It was possible to identify that the impact of improvement of the algorithm turned out to be an average of 2.17% in the accuracy, 84.78% in precision, 42% in the Recall (Sensitivity) and 58.33% in the F1-score. Therefore, it is demonstrated that the algorithm classifies with high probability the opinions of the students.

in the conformity and satisfaction that students experience about the different services provided by the university [3,4]. Students, as the main members of education, are the ones who can best give their opinion about their assessment [5]. The relevance of monitoring satisfaction falls on the favorable impact on the student's training, personal and social development, and their learning and permanence in studies [6][7][8][9].
One of the aspects related to the quality of the educational service offered by university institutions is teaching performance [10]. This aspect is directly related to student satisfaction [11]. This is because the student, as a beneficiary of the educational service, is called upon to comment on the quality of the service they receive in their different class sessions, expressing their agreement or disagreement [12]. However, in the field of education, with the incursion of data science and data mining, many educational institutions have managed to monitor different indicators linked to the progress of institutional objectives [13,14]. With the current use of information technology, it has been possible to store and manage large volumes of data even in real time [15,16]. Tools such as automatic learning and machine learning are widely used in the educational field to generate predictive or classification algorithms from student data processing [17]. Machine learning is a branch of artificial intelligence that allows the construction of mathematical models, through supervised or unsupervised learning [18][19][20]. Supervised learning focuses on the development of algorithms from input and output data, that is, they require a training data set [21,22]. While unsupervised learning does not require output data, so it seeks to group data according to common characteristics [23,24].
When assessing the student's attitudes, emotions, or perceptions, it is possible to use machine learning techniques linked to data mining called sentiment analysis [25]. Sentiment analysis is a tool that allows extracting subjective information from opinions made by students from, for example, social networks [26]. The social network Twitter has become a means of exchanging opinions, from which it is possible to extract information by applying natural language processing techniques [27]. This makes it possible to evaluate student satisfaction with teaching performance by the generation of opinions from Twitter and the application of the sentiment analysis technique [28].
In the process of identifying learning algorithms, the multiple class imbalance problem often occurs [29,30], which is also called data imbalance [31,32]. This scenario brings with it a decrease in the quality or performance of the classification algorithms [33,34]. A solution to this problem is the use of the synthetic minority oversampling technique [35]; which creates artificial data based on feature space similarities between the existing minority class [36].
Based on what has been described, the purpose of this article is to determine to what extent the performance of the student satisfaction classification algorithm improves, through the application of the SMOTE technique. Initially, the performance of the Support Vector Machine (SVM), Decision Tree, Gaussian Naive Bayes and Logistic Regression algorithms will be determined, taking into account the "satisfied" and "unsatisfied" class imbalance present in the collected data. Then the technique of oversampling of synthetic minorities will be applied, in order to solve the problem of imbalance of the satisfaction classes. Finally, through a comparative analysis of the accuracy, precision, recall, and F1-score indicators, the impact on the performance of the university student satisfaction classification algorithm will be determined.

II. LITERATURE REVIEW
Regarding the technique of oversampling of synthetic minorities, Ipanaqué develops an investigation on the data balancing technique with SMOTE, whose objective is to compare the performance of the linear regression algorithm before and after the application of the balance technique of lessons [37]. In this regard, Garcia develops an investigation in which he compares different classification algorithm metrics, however, he identifies the presence of unbalanced data, which is why he uses the SMOTE technique to increase performance [38]. Torres carries out an investigation regarding obtaining a predictive model for the educational field using machine learning techniques and SMOTE data balancing techniques through the Python imbalanced-learn library [36].
Additionally, Chen et al. pointed out in their research that supervised and unsupervised learning algorithms do not correctly predict indicators such as student performance, due to the imbalance of data produced in the first weeks of class, in which a class can present a greater number of shows that the other classes [39]. Also, Albreiki et al. affirm that through SMOTE techniques, they manage to improve performance indicators of automatic learning algorithms [40], like the performance indicators indicated by Bhaskaran et al. [41], such as accuracy, precision, recall, and F1-score. In this regard, Marappá n et al. point out in their research that an important point in the design of classification algorithms is the data collected, so in their case, they refer to the use of improved grouping strategies [42].
Also, on the technique of sentiment analysis from Twitter, Cedeno-Moreno and Vargas-Lombardo [43] carry out research on the application of machine learning, of the supervised type in order to determine the algorithm for classifying opinions generated from the network social Twitter, to identify positive and negative feelings. In this regard, Chanchí et al. carried out a study on the application of sentiment analysis to identify the perception of students specializing in systems engineering, for which he made use of Python software libraries in order to identify the polarity of sentiments, assigning to the feelings with positive polarity the satisfied class and feelings with negative polarity the dissatisfied class [44].

III. METHODOLOGY
The data was collected through the social network Twitter during fifteen weeks of mechanical and electrical engineering students. The research was established as a non-experimental design because the data collected from Twitter was processed in its natural state, and no prior action was taken that would alter or contribute to influencing the opinion of students regarding teaching performance.
Likewise, the research is of an applied type, because the SMOTE technique will be used to solve a specific problem, previously identified as the performance of the algorithm for classifying student satisfaction with teacher performance. In addition, the research regarding a quantitative approach, because it will focus on the comparison of results regarding the performance of the classification algorithm before and after the application of the SMOTE technique. In other words, it will seek to determine the impact on performance improvement from the comparison of performance metrics or indicators such as accuracy, precision, sensitivity, and F1-score. Fig. 1 shows the research method used for the application of the SMOTE technique, with which the improvement of the performance of the classification algorithm will be achieved. As mentioned, the data collection was carried out through the social network Twitter, however, with the purpose of generating a database, authorization was requested from the same social network to download opinions, with which a file was generated of opinions of extension "Comma Separated Values (CSV)". With the purpose of conditioning the data prior to its processing, we proceeded to clean it from the use of the "stopwords" library, from Python's "nltk.corpus", with which duplicate words, words empty words, punctuation Conditioning the data, we proceeded to change them to numerical data, for which the Python "sklearn" library was used, in which the vectorization technique could be applied through "TfidfVectorizer". From this procedure, it was possible to identify the data imbalance for algorithm training, so the SMOTE technique was used to achieve the data balance and identify the impact on the improvement of performance metrics.

IV. RESULTS
Once the data collection and vectorization process was carried out, the number of tweets grouped by the class of student satisfaction of "satisfied" and "dissatisfied" teacher performance was identified. Fig. 2 shows that from a total of 254 tweets, with positive and negative polarity, the classes for student satisfaction of teacher performance were defined, identifying that for the "satisfied" class, 230 tweets or opinions were obtained and for the "dissatisfied" class, 24 opinions were obtained, clearly evidencing the imbalance of data between both classes. However, in order to identify the performance of the algorithm that presents the best indicators or metrics, for the classification of student satisfaction with teacher performance, the Python programming code shown in Fig. 3, in which the total data is defined, how many will be used for the training of the algorithm and how many will be used for the testing or proof of the algorithm. With "test_size=0.33" it was defined that 33% of the data will be used for testing, while 77% will be used for training. With "random_state=42" it is established that the random sample used for training and testing is always the same, and does not change each time the programming code is executed. Fig. 3. Code in Python for the selection of data for the training and testing of the classification algorithm. Fig. 4 shows the distribution of data used for testing and training according to the classes defined for student satisfaction with teacher performance. As can be seen, the imbalance identified from the data collection is maintained, in such a way that from the sample selected for training the relationship between the classes "satisfied" and "dissatisfied" is 9 to 1, respectively. In Fig. 5, the programming code used by Python to train the classification algorithms is shown. It should be noted that as part of an initial procedure, training, and testing trials were carried out on different machine learning models, however, in this investigation, we decided to focus on the four algorithms with the best performance, in order to delimit the study and focus it on models of machine learning whose impact of the application of the SMOTE technique is significant in comparison to other models. In this case, the code for the support vector machine, decision tree, Gaussian Naive Bayes, and logistic regression algorithms is shown. The objective is to select from among all of them the algorithm with the best metrics or performance indicators. It should be noted that in the programming code the support vector classifier (SVC) is used in order to extract the SVM algorithm. The figure also shows the codes to train the SVM, decision tree, Gaussian Naive Bayes, and logistic regression algorithms. The objective is to select from these four algorithms those that present the best metrics or performance indicators.
As a result of the training of the algorithms, Table I shows the results of the "accuracy" metric of the four algorithms obtained, in which it is evident that in the case of the SVM Algorithm and Logistic Regression, they reached a value of 0.92, the values being higher values. However, in the case of the Decision Tree and Gaussian Naive Bayes algorithms, they reached values of 0.89 and 0.87 respectively. When identifying that two algorithms are those that show the highest levels of accuracy (SVM and Logistic Regression), it is that other performance metrics were determined with the purpose of having more elements of judgment to select the best of them. Table II shows the results of the Precision, Recall, and F1-Score indicators, of the SVM and Logistic Regression algorithms, being the same for both algorithms. As can be seen in the case of the SVM algorithm, the averages of the Precision, Recall, and F1-score performance metrics are 0.46, 0.50, and 0.48 respectively. While in the case of the Logistic Regression algorithm, the averages of the Precision, Recall, and F1-score performance metrics are 0.46, 0.50, and 0.48 respectively. However, the main problem lies in the imbalance of samples between both classes, and as a consequence, it is observed that in the "Unsatisfied" class the performance metrics Precision, Recall, and F1-score are zero, in both algorithms analyzed. This means that neither of the two algorithms chosen manages to correctly classify the "Unsatisfied" class, so the average performance of both algorithms is relatively low. For this reason, the SMOTE technique was applied to achieve the balance of data of the samples of both classes and to improve the performance of the classification algorithm.   6 shows the programming code in Python in which it is established that the data balance of the satisfied and unsatisfied classes must be at a ratio of 1 (sampling_strategy=1.0); that is, they must have both classes must have the same amount of data. As can be seen from the figure, the result is that both classes after the application of the SMOTE technique have a value of 153.
The result of the performance metrics for the SVM and Logistic Regression algorithms are shown in Table III, in which it can be seen how the indicators of both algorithms improve, especially the "Unsatisfied" class, this is due to the application of the SMOTE minority sample balancing technique. However, it should be noted that of the two algorithms under analysis, the one that has shown the greatest increase in its indicators is the logistic regression algorithm, which is why it will be the algorithm chosen for classifying student satisfaction concerning teacher performance in this research. As an additional element of discrimination to establish the best classification algorithm for student satisfaction concerning teaching performance, the Receiver Operating Characteristic (ROC) curve was analyzed, which establishes the relationship between the sensitivity and the specificity of the algorithm under analysis, through the indicator Area Under the Curve (AUC). Fig. 7 shows that in the case of the SVM algorithm, the AUC takes a value of 0.86, which can be interpreted as an 86% probability that the samples used to test the algorithm are correctly classified. Note that the figure refers to the SVC that contains the SVM algorithm.  International Journal of Information and Education Technology, Vol. 13, No. 7, July 2023 This same analysis was also carried out for the Logistic Regression algorithm, therefore, analyzing the ROC curve shown in Fig. 8, it was identified that the AUC value is equal to 87%. This means that the Logistic Regression algorithm has a higher probability of correctly classifying the "satisfied" and "dissatisfied" classes than the SVM algorithm.
Finally, with the purpose of establishing the improvement in the performance metrics of the Logistic Regression algorithm after the application of the SMOTE technique, Table IV shows the average values of the metrics before and after achieving the data balance. Regarding accuracy, the increase was 2.17%, in the case of Precision it was 84.78%, in the case of Recall it was 42% and finally in the case of the F1-Score the increase was 58.33%.

V. DISCUSSIONS
From the results obtained, it is possible to identify that before the application of the SMOTE technique, the classification algorithms that show the best performance metrics on average were the SVM and the Logistic Regression algorithm. However, in the performance analysis by classes, it is evident that both algorithms fail to correctly classify the "dissatisfied" class, due to an imbalance of training data whose ratio turned out to be 9 to 1. Therefore, when applying the SMOTE technique, it was possible to achieve the balance of data in both classes, by performing the training and testing process of the algorithms under analysis again, the performances were increased in both. However, in order to identify the algorithm with the best performance for the classification of both classes, it was established that the Logistic Regression algorithm is the best. Regarding the application of the SMOTE technique, Torres-Vá squez points out that of the different data balancing or oversampling techniques, the SMOTE technique increased the performance metrics of the algorithm [21]. In this regard, Torres points out that in his research on a predictive model applied to the academic field co, he used the SMOTE technique as an alternative for balancing minority classes, which also managed to improve the performance of the algorithm's metrics [36]. In the same way, Ipanaque points out that in his research on a classification model applied to the academic field, he identified that the algorithm that shows the best performance is the Logistic Regression algorithm, with which he was able to improve his performance metrics through of the SMOTE technique [37].
It is clear that the minority sample balancing technique contributes to improving the performance metrics of the classification and prediction algorithms, so its use leads to a positive impact on the performance metrics. By focusing on the positive impact that is obtained through the SMOTE technique, it was evidenced in the results that the accuracy increased by 2.17%, the Precision by 84.78%, the Recall by 42%, and finally the F1-score metric by 58.33%. In this regard, Garcia points out that by applying the SMOTE synthetic minority oversampling technique, he achieved a positive impact, managing to increase the accuracy of his algorithm from 84.1% to 84.8% [38]. Although in the results of my research the Precision, Recall, and F1-Score metrics increase to a greater extent, the impact on accuracy coincides with the level of improvement indicated in the cited reference, since of all the metrics, the one that experienced a smaller increase is the "Accuracy".

VI. CONCLUSION
It is concluded that after the collection, processing, and analysis of the opinions of the students of the professional school of mechanical and electrical engineering carried out on Twitter about the satisfaction of the students concerning the teaching performance, it turned out that the sample of the class students satisfied with dissatisfied students was unbalanced, in a ratio of 9 to 1 respectively. By training and testing the SVM, Decision Tree, Gaussian Naï ve Bayes, and Logistic Regression algorithms, it was identified that both the SVM and Logistic Regression algorithms presented better performance metrics.
Thus, in order to improve the performance achieved by the models under study in a state of sample imbalance, the SMOTE technique was applied, which was able to increase performance metrics; thus achieving that the algorithm with the best performance is the Logistic Regression algorithm, reaching on average increases in its indicators of 2.17% in accuracy, 84.78% in precision, 42% in Recall and 58.33% in the F1-score, with which achieves that the proposed algorithm classifies both classes with high probability.
The novelty of this research focuses on the mixed application of techniques such as sentiment analysis, text mining, machine learning, and sample balancing with SMOTE synthetic minority oversampling, applied to unstructured data such as messages or opinions from university students regarding teaching performance, expressed from the social network Twitter. As part of the limitations of the proposed model, the amount of data collected by applying the SMOTE technique, since the total data for training and testing remain being low despite the fact that the balance of data for each class was achieved. This means that although performance is improved, the increase is not significant. Another limitation of the model is that it only allows the binary classification of satisfaction, however, it should be considered for future studies that the model can predict the classification of the opinions of students in multiple classes, as very satisfied, satisfied, dissatisfied, and very dissatisfied.
questions, as well as defined the objectives of the investigation, and drafted the final conclusions, as the main author. Florcita Aldana-Trejo and Nestor Alvarado-Bravo as a group was in charge of writing the Introduction and reviewing the literature. Constantino Nieves-Barreto and Santiago Aguilar-Loyaga were in charge of writing the methodological aspects of the research work. José Farfá n-Aguilar and Almintor Torres-Quiroz developed the stage of data processing and analysis of the results. Alí pio Riveros-Cuellar and Manuel Pé rez-Samanamud developed the discussion of the results. Finally, Luciano Pé rez-Guevara proceeded to write the conclusions and the final review of the edition of the scientific article. The development of the scientific article demanded coordination among all the authors.