Classification Algorithm Accuracy Improvement for Student Graduation Prediction Using Ensemble Model

According to National Center for Education Statistics, almost half of the first-time freshmen full time students who began seeking a bachelor’s degree do not graduate. The imbalance between the student enrolment and student graduation can be solved by early predicting and identifying students who are prone of not having graduation on time, so proper remediation and retention policies can be formulated and implemented by institutions. The study focused on the application of the ensemble models in predicting student graduation. Ensemble modeling is the process of running two or more related but different analytical models and then synthesizing the results into a single score or spread in order to improve the accuracy of predictive analytics and data mining applications. The study recorded an increase of classification accuracy in predicting student graduation using ensemble models and combining multiple algorithms.


I. INTRODUCTION
One main research focus of educational data mining is student graduation [1]. The student graduation rate is the percentage of a school's first-time, first-year undergraduate students who complete their program successfully. Most students' first year freshmen enrolled in tertiary level failed to graduate. According to National Center for Education Statistics, almost half of the first-time freshmen full time students who began seeking a bachelor's degree do not graduate. Addressing this problem is crucial as colleges and universities consisting of high leaver rates go through loss of fees and potential alumni contributors [2]. Most researchers already developed multiple decision-based models for modeling drop outs and retentions of students however only few considered the power of ensemble models in prediction. The study aimed to determine the accuracy of ensemble models and algorithm combination in student graduation prediction.

II. LITERATURE REVIEW
Data Mining is application of a specific algorithm in order to extract patterns from data and transform the information into a comprehensible structure for further use [3]. KDD has Manuscript received March 5, 2020; revised June 1, 2020. Ace C. Lagman, Lourwel P. Alfonso, Marie Luvett I. Goh, Jay-ar P. Lalata become a very important process to convert this large wealth of data in to business intelligence, as manual extraction of patterns has become seemingly impossible in the past few decades [4].
Data Mining is a step inside the KDD process, which deals with identifying patterns in data in a large dataset [5]. It is only the application of a specific algorithm based on the overall goal of the KDD process, which is to extract hidden patterns or develop predictive models using machine-learning techniques [6].
Educational data mining is one of the main applications of machine learning where it analyzes students' behaviors, and performance so proper interventions can be provided [7]. There is an urgent need for a new generation of computational theories and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data [8].
The Ensemble classification is based on the philosophy that a group of experts gives more accurate decisions as compared to a single expert. Literature reveals that prediction from composite tests give more better result to a single prediction. This section describes the ensemble techniques used in this paper [9].
Boosting boosts the performance of the weak classifier to a strong level. It generates sequential learning classifiers using resampling (reweighting) the data instances. Initially equal uniform weights are assigned to all the instances. During each learning phase a new hypothesis is learned, and the instances are reweighted such that correctly classified instance having lower weight and system can concentrates on instances that have not been correctly classified during this phase having higher weights. It selects the wrongly classified instance, so that they can be classified correctly during the next learning step. This process continuous tills the last classifier construction. Finally, the results of all the classifiers are combined using majority voting to find the final prediction. AdaBoost is a more general version of the Boosting algorithm [10].

III. METHODOLOGY
In order to solve research objectives, the researcher used Knowledge Discovery in Databases to extract hidden patterns form the data.

A. Knowledge Discovery in Databases
The researchers used the modified steps of Knowledge Discovery in Databases indicated in the Fig. 1 below. The modified version of the KDD consists of six steps. The six phases include understanding the problem and data, data preparation, data mining, evaluation of the discovered knowledge, and use of the discovered knowledge.

B. Problem and Data Understanding
This section entails the researcher to understand the problem and what possible solutions that can be proposed. This section determines the rational of the research and potential of the data to achieve researchers' goals.

C. Data Preparation
This section provides after examining the data, what necessary data preprocessing techniques that are necessary to improve the accuracy of the algorithm. The researcher used discretization and imputation techniques to normalize the values that is easier to extract patterns from the students' data

D. Bootstrap Algorithm
There is a method to increase the accuracy of k learned models; this method is called ensemble methods or methods that use a combination of models. Bagging or Bootstrap and stacking are the most used ensemble methods developed to increase the accuracy of the learned model [6].
Bootstrap is a method of increasing accuracy; the new test sets of data was evaluated by the learning scheme of the logistic regression. The bootstrap algorithm created an ensemble of models for a learning scheme where each model gives an equally-weighted prediction.
Algorithm (1) if prediction then (2) let each of the k models classify X and return equally weighted prediction

1) Learned model prediction combination
There is a method to increase the accuracy of k learned models; this method is called ensemble methods or methods that use a combination of models. Bagging or Bootstrap and stacking are the most used ensemble methods developed to increase the accuracy of the learned model [6].   The confusion matrix is a useful tool for analyzing how well your classifier can recognize tuples of different classes [7]. A confusion matrix for two classes given m classes, a confusion matrix is a table of at least size m by m. An entry, CMi, j in the first m rows and m columns indicates the number of tuples of class i that were labelled by the classifier as class j as seen in Table I.
The confusion matrix table illustrates a tabular display that evaluates the forecasting precision of a predictive model.
The main objective of a predictive model is to maximize the correctly classified instances. For binary classification scenarios, the misclassification rate gives the overall model performance with respect to the exact number of categorizations in the training data.
To determine the accuracy level of the classification table of the algorithms the formula was used where true positive (TP) refers to as number of actual outcomes of graduation yes accurately classified as predicted graduation yes and true negative (TN) refers to as number of actual outcomes of graduation 'no' accurately classified as predicted graduation 'no'.
International Journal of Information and Education Technology, Vol. 10, No. 10, October 2020

A. Accuracy of the Algorithm
The summary of accuracy rate from the different algorithm as reveals in Table I reveals that logistic regression algorithm has the best accuracy rate in predicting student graduation with 87.4 accuracy rate. Thereof, the values in the coefficients table were used to derive data models or equations in predicting student graduation in new test sets of data.
To cross validate the results, knowledge flow of Weka was used to determine the best algorithm that can predict the accuracy of student graduation. The data sets were tested simultaneously with the different algorithms which include decision tree, Naï ve Bayes, Logistic Regression, Multilayer Perceptron and Neural Network. Cross validation technique was used. The cross-validation technique divides the data set into ten equal parts where each part can be calculated by total number of instance over the number of fold validations which resulted to 116.4 data instances. The 9 out of 10 sets will be used as the training setthis set will be used to train the classifier and the remaining set will be used to estimate the error rate of the trained classifier. The text viewer generates the prediction accuracy results from different algorithms.
The researchers used the Knowledge Flow "data-flow" inspired interface of Weka. At present, all of Weka's classifiers and filters are available in the Knowledge Flow along with some extra tools. The flow presents results of the multiple algorithms which include Naï ve Bayes, Neural Network, Decision Tree, J48 and Logistic Regressions in one output using classifier performance evaluation. This function evaluates the performance of incrementally trained classifiers. The table below indicates the result or the performance of the algorithm.  Table III reveals that Logistic Regression has predicted more in student graduation compared with other algorithms using Knowledge Flow in Weka.

B. Data Model of Logistic Regression in Predicting Test Sets
The highest accuracy in the lists of data models in the logistic regression was used its predictive accuracy in the training sets of data. The derived equation was shown below. prob( graduated ) = 1 1+ e -(-5.716C+.888g*X 1 -.991*X 2 +.307Ve*X 3 +.250 Ab*X 4 +.289 A*X 5 +.430 F*X 6 +.567*X 7 +.423W * X 8 ) To determine and evaluate the goodness-of-fit of a logistic regression model it will be tested based on the simultaneous measure of sensitivity (True positive) and specificity (True negative) to possible cut of points through receiver operating characteristic curve.  Table IV reveals that output which shows the ROC curve. The area under the curve is .872 with 95% confidence interval (.846, .897). Also, the area under the curve is significantly different from 0.5 since p-value is .000 meaning that the logistic regression classifies the group significantly better than by chance.

Results in the
Since the model classifies group significantly better by chance, the generated data model of the logistic regression (Fig. 3) was then tested to new testing sets of data. The table below illustrates the prediction of the model in the test sets.

C. Improving Data Model of Logistic Regression by Applying 1) Bootstrap algorithm
Bootstrap aggregating, often abbreviated as bagging, boosting and stacking using majority of votes popularly known as ensemble model were used to increase the accuracy of logistic regression in the test sets. The logistic model equation derived from the training set was tested in the new sets of data.
To determine the accuracy of the new derived learned model generated by logistic regression, the model was tested using new testing sets of data. The classification table results were shown as follows. The boosted logistic model has classified 3 out of 16 misclassified instances from the initial logisitc regression data model. From 44. 82 accuracy rate of the graduated status it becomes 55.17. Table VI reveals that the after using the boostraping technique under logistic regresion model the accuracy rate of testing sets has increased to 86.82 %. The boosted logistic model has also a parallel testing using weka that accumulated also a performance of 86.82 accuracy rate as seen in Table VII.

D. Improving Accuracy by Combining Data Model Predictions
To improve accuracy rate of the test sets, prediction of data models of Naï ve Bayes, Logistic Regression, Decision Tree and Neural Network were combined in predicting student graduation using majority of votes.
To improve accuracy rate of test sets instances of logistic regression, combinations of predictions of set of classifiers were tested. The experiments were done using WEKA using majority of votes. The results of the experiment were shown below.
Combination of predictions of data models reveals that all combinations have increased the accuracy rate of logistic regression from 86.04 to 87.6, 86.82 and 86.62 respectively. Noticeably, the most cited improvement was recorded to the combinations of logistic regression and naï ve bayes with 87.6 accuracy rate. The data models combination has classified 17 out of 29 instances of student who graduated. The accuracy rate of the student graduation using models combination boosted to 87.60%

E. Improving Data Model of Logistic Regression by Combining Rule Sets
To improve accuracy rate of the correctly classified of the graduated status the 16 instances (58.62) underwent to three rules sets generated by the decision tree algorithm.
The derive rule sets from the decision tree algorithm in predicting student graduation were shown as follows.   Table IX presents that there were three instances were correctly classified by the rules sets generated by the decision tree model, hence it contributes in the increase of the logistic regression.
The rule sets generated from the decision tree algorithm has classified 3 out of 16 misclassified instances from the logisitc regression data model. From 44. 82 accuracy rate of the graduated status it becomes 55.17 after combining the prediciton of the decision tree rule sets. Table X reveals that the after combining the prediction of data model of logistic regresion and rule set of decision tree, the accuracy rate of testing sets has increased to 88.3

V. CONCLUSION
Adding Boosting technique in logistic regression made a significant increase of accuracy in predicting student graduation. Result reveals that after using the boostraping technique under logistic regression model, the accuracy rate of testing sets has increased to 86.82 %. The boosted logistic model has also a parallel testing using weka that accumulated also a performance of 86.82 accuracy rate. Model combination also is very efficient in increasing the accuracy of the classifier. Multiple test experiment should be considered by the researcher to determine the right combination test of an algorithm to give a more accurate prediction.