Multi-label classification of feedbacks

Abstract

This work deals with educational text mining, a field of natural language processing applied to education. The objective is to classify the feedback generated by teachers in online courses to the activities sent by students according to the model of Hattie and Timperley (2007), considering that feedback may be at the levels task, process, regulation, praise and other. Four multi-label classification methods of the data transformation approach - binary relevance, classification chains, power labelset and rakel-d - are compared with the base algorithms SVM, Random Forest, Logistic Regression and Naive Bayes. The methodology was applied to a case study in which 11013 feedbacks written in Spanish language from 121 online courses of the Law degree from a public university in Mexico were collected from the Blackboard learning manager system. The results show that the random forests algorithms and vector support machines will have the best performance when using the binary relevance transformation and classifier chains methods.

Keywords

Text mining multi-label classification educational data mining online education

1 Introduction

The online learning and education management systems industry is one of the fastest growing today [15]. In Mexico, online education has also shown an increase as a study alternative [3]. In online education, feedback is essential to show and make sure that the learning process is taking place [14].

Feedback is a product resulting from the review and analysis by the teacher of the activity, contribution or project sent by the student(s) [14]. It is the ability that the teacher develops when sharing specific information with the student about their performance in order to reach their maximum learning potential [21].

Feedback is one of the most important factors in- fluencing student success [2 , 17]. It is important for the improvement of student performance and the achievement of learning objectives, in addition to being the central part of the formative evaluation, so it must be constant, clear, timely, sufficient and pertinent [18]. It is the means that allows the student to identify what is needed to achieve success according to what is expected of him [14]. It helps students to identify and solve their misconceptions with which performance is improved [6].

In online education, feedback plays an important role because of the way they work, in which students submit their assignments electronically and usually receive feedback from it. Thus, as soon as the teacher sends the feedback in the first attempt at a formative assessment, the student is expected to use it to make adjustments to the activity in order to achieve expectations and learning objectives [14].

In online education, it is an essential practice as it helps to scaffold learning and establishes a connection between teacher-student distance [20]. Research on feedback focuses on trying to assess when feedback is effective to determine if and how to improve it [8]. An effective feedback is one that allows closing the gap between current and expected performance of the students [11, 20].

In the Hattie and Timperley model [11], it is established that effectiveness depends on four levels at which feedback can operate: the task level, the process level, the regulatory level and self level or praise. Based on this classification, Hattie and Timperley [11] point out that the most effective feedbacks are those located at the process level and regulatory level, that feedbacks at the task level are only effective if combined with those from process level or regulatory level and that self or praise level are not significant for learning.

In this work, a methodology is shown to classify the feedback generated by a teacher to the activities sent by students in online courses according to the levels proposed by the Hattie and Timperley model [11] to distinguish between messages that are in the Task, Process, Regulatory, Praise and Other level where those located in Other are messages that are not located in any level.

For the classification, four multi-label classification methods of the data transformation approach are used, which are: binary relevance (BR), classifier chains (CC), label powerset (LP) and rakel-d since a feedback can be located in several of the levels at the same time. For each method, four base algorithms are used: support vector machines (SVM), random forest (RF), logistic regression (LR) and naive bayes (NB). At the author’s discretion, in this work, recommendations are made on the data transformation method and base algorithm to be used to automatically classify feedbacks using the model of Hattie and Timperley [11].

The work is organized as follows: in section 2 the preliminaries of the feedback model of Hattie and Timperley and multi-label classification are given, section 3 provides details on the applied methodology. The results and discussion are in section 4 followed by the conclusions and future work in section 5.

2 Preliminaries

In this section, we present the preliminary work that describes the feedback model proposed by Hattie and Timperley [11] and the multi-label classification.

2.1 Hattie and Timperley feedback model

In the literature, there are several proposals to determine the effectiveness of feedback, Uribe and Vaugman [20] point out that there are four types: corrective, which focuses on requirements and content; epistemic that clarifies or explains through indications and questions; suggestive including tips or ideas for improvement; and epistemic more suggestive that includes questions, indications and ideas for the scaffolding of learning. Shute [22], proposes to classify them into: result, correct and elaborate answers. Hattie and Timperley [11] point out four levels in which feedback can be found: task, process, self-regulation and praise, the first three being the ones that produce an improvement in learning.

The Hattie and Timperley model [11] is the most referenced and has been used for the development of applications that offer feedback at a specific level [2], analysis of comments [1 , 17]; and as a stra- tegy to improve teacher performance [9].

The Hattie and Timperley model [11] assumes that the purpose of feedback is to reduce discrepancies between what a student has understood or performed and the desired objective.

The authors argues that there are several ways to reduce the gap between what is understood and what is expected, that feedback is not always effective in improving learning, and that those that are, answer three questions: Where am I going? How do I arrive? What comes next? The questions work together at the task level, process level, regulation level and self level, where the level to which the feedback is directed influences effectiveness.

The levels proposed in the model are: (1) task level, which are comments that may be about the task or product indicating whether it is correct or incorrect or includes indications to acquire more different or correct information, (2) process level that are comments directed to the process used to create a product or complete a task, directed to the information processing or learning processes that require understanding or completing the task, (3) regulation level that are comments directed at self-regulation, includes strategies to improve self-evaluation or confidence to commit more to a task and (4) self level, which are comments directed to the self that are not related to the performance of the task. Examples of feedbacks at each level can be found at [11].

2.2 Multi-label classification

For the automation of feedback classification accor- ding to the feedback model of Hattie and Timperley, multi-label classification techniques were used.

The multi-label classification is a prediction task in which each of the instances has associated a vector of outputs instead of a single value as is the case of the traditional classification. The size of the vector is adjusted according to the number of different labels in the data set, where each element of the vector will be a binary value indicating whether the corresponding label is relevant to the example or not. Several tags can be active at the same time [7].

Multi-label classification can be approached from different points of view: data transformation, adaptation of methods and assembly of classifiers, Herrera et al. [7], establish that the first is based on transformation methods that applied to original multi-label data sets are capable of producing one or more binary or multiclass data sets, once transformed, binary or multiclass classifiers can be used to process them.

The second seeks to adapt existing algorithms so that they can deal with multi-label data sets by produ- cing several outputs instead of one.

The third one combines adapted algorithms or data transformation methods to make predictions.

In this work the data transformation approach was used using the following methods:

Binary relevance (BR). It decomposes the multi-label classification problem into k independent binary learning problems [16].

Classifiers chains (CC). It decomposes the multi-label classification problem into k binary transformations just like BR. It differs in that the first classifier is trained only with the input attributes and the subsequent ones incorporate the output of the previous cla- ssifiers [12].

Label Powerset (LP). It transforms the multi-label classification problem into a multi-class problem [16].

Rakel-d. It divides the label space into equal partitions of size k, train a classifier (LP) for each partition, and perform the prediction by adding the results of all classifiers [10].

To evaluate the classification, there are metrics that are grouped according to two criteria, (1) How the prediction was computed, which includes metrics based on examples and based on labels; (2) How the result is provided that includes binary bi-partitioning and label ordering [7]. The following were used in this work:

Hamming Loss. It measures how well the classifier predicts each of the labels by averaging by labels and then by instances [16].

Accuracy. It measures how well the classifier predicts instance-averaged tag combinations [16].

F - measure. It measures the weighted average of accuracy and completeness [16].

Subset Accuracy. It measures the accuracy of the classification [7].

Average Precision. It measures for each label in an instance, the proportion of relevant labels that are ordered on it in the predicted ordering [7].

Coverage. It measures the number of steps to go through the ordering provided by the classifier until all relevant labels are found [7].

F1(macro). It measures the f-measure metric for each label and calculates the unweighted mean [7].

F1(micro). It first adds the counters for all labels and then compute the metric (F-measure) only once [7].

Ranking loss. It measures the number of incorrectly ordered labels with respect to the number of correctly ordered [7].

The formulas for the calculation of each metric can be found in [7].

2.3 Automatic feedback analysis

In the literature, several proposals can be found for the automatic analysis of comments to the student in online courses, for example, in [4], it is performed a study to analyze the feedback process in an online learning environment of a high school level course where English as a second language is taught, they use unsupervised learning and natural language processing techniques to analyze teacher-student interactions to identify different types of observed feedback and to gain an overview of the most effective strategies that improve student engagement.

In [1], propose a way to analyze comments to the student written in Portuguese language based on indicators of good feedback, they use the levels proposed by Hattie and Timperley in addition to text mining techniques to train and evaluate a classifier using the random forest algorithm with the binary relevance transformation method.

Unlike the previous mentioned works, in this study it is proposed to analyze feedbacks that have been written in Spanish language, a comparison is made of four algorithms of random forests, vector support machines, naive bayes and logistic regression using four for each one, data transformation methods binary relevance, classifier chains, power set and rakel-d to determine those that allow generating multi-label classifiers with better performance in the task of locating feedback at the task, process, regulation, praise and other levels proposed by Hattie and Timperley.

3 Methodology

The methodology to automatically classify the feedbacks according to the levels of the Hattie and Timperley [11] model followed in this work considers four steps: (1) data collection and integration, (2) pre-processing and TF-IDF calculation, (3) classification (4) analysis and evaluation.

The methodology is applied in a case study, the feedback generated by teachers to the activities of students in online courses of the law degree of a public university in Mexico were collected from the Blackboard learning manager system.

3.1 Data collection and integration

The data set contains feedback written in Spanish, generated by teachers who published them through the task review tool in the learning management system. The data set is made up of 11,013 feedbacks from 121 online degree courses in law. The average number of words in each feedback is 77.

Each feedback was manually classified by instructional design experts who followed the model des- cribed by Hattie and Timperley [11] to locate each feedback at levels: Task, Process, Regulatory, Praise, and Other.

At the end of the multi-label classification, a multi-label dataset was obtained in which each feedback has a vector of five elements associated to the labels task, process, regulation, praise and others where each element is a binary value indicating whether the corresponding label is relevant or not, several labels can be active at the same time.

The multi-label data set was divided into two sets: training and testing containing 66% and 34% of the feedbacks respectively, chosen randomly. The distribution of feedback classified by each level is shown in Table 1.

Table 1
Distribution of feedback in multi-label training and testing sets

Training Set Test set

ClassLevel Task Process Regulatory Praise Other Task Process Regulatory Praise Other

Class 0 4746 2742 37 3015 906 2066 1153 26 1263 389

Class 1 2963 4967 7669 4694 6803 1238 2151 3278 2041 2915

Total 7709 7709 7709 7709 7709 3304 3304 3304 3304 3304

	Training Set	Test set
Class 0	4746	2742	37	3015	906	2066	1153	26	1263	389
Class 1	2963	4967	7669	4694	6803	1238	2151	3278	2041	2915
Total	7709	7709	7709	7709	7709	3304	3304	3304	3304	3304

The characteristics of the multi-label data sets for training and testing used in this work are shown in Table 2. Cardinality counts the number of relevant mean labels for each instance in the set of data. Density refers to the cardinality normalized by the total number of all possible labels. MeanIR obtains the maximum ratio of unbalance, that is, the ratio of the most common label against the rarest. The SCUMBLE value measures the concurrency between frequent and rare labels. The formulas of each metric can be found in [7].

Table 2

Characteristics of multi-label data sets

Metric	Training set	Test set
Number of instances	7709	3304
Number of labels	5	5
Number of sub-label sets	23	21
Cardinality	1.484	1.482
Density	0.296	0.296
meanIR	27.562	17.84
Scumble	0.021	0.026

3.2 Preprocessing and TF-IDF calculation

Following the recommendations of Herrera et al. [7], the feedback from the test and training sets is preprocessed as follows: (1) HTML / CSS codes were removed; (2) cleaning methods are applied to obtain only the words and digits of each feedback and digits, web addresses and file names were replaced by identification keys; (3) feedbacks were converted to small letters to make comparable words with the same meaning; (4) the lemmas of each of the words were obtained; (5) a spelling checker was applied that uses the Levenshtein distance for words with frequency less than 10; (6) stop words were eliminated; (7) stemming was applied.

Once the preprocessing stage was completed, the feedbacks were transformed into a term-document matrix representing the frequency of each word in each feedback. The TF-IDF value combining the term frequency and the inverse frequency of feedbacks was calculated by multiplying the local frequency weight of each feedback by the inverse weight of the feedback [7].

3.3 Classification

Once the training and test data sets were preprocessed and the TF-IDF value was obtained, it was passed to the classification stage. This stage consisted of applying data mining algorithms to train multi-label classifiers to learn from the test set that contains the manual classification of feedback. In total, 16 multi-label classifiers were trained. Four for each of the data transformation methods binary relevance, classifier chains, label powerset, rakelD, described in section 2.2. Four base algorithms were used for each data transformation method: SVM, RF, LR and NB.

3.4 Analysis and evaluation

For each multi-label classifier created in the classification stage, it was analyzed and evaluated how well they did the task of locating the feedbacks according to the levels proposed in the Hattie and Timperley model.

The feedbacks from the test set described in section 3.1 were passed to each multi-label classifier, to predict the levels at which the feedbacks were located. The result was a set of predicted labels for each feedback.

To measure how well the feedback is located accor- ding to the levels of the Hattie and Timperley model each multi-label classifier, the commonly used metrics were adopted: accuracy, f1-micro, f1-macro, hamming loss, subset accuracy, ranking loss, average precision and coverage which were described in section 2.2. These metrics compare the set of predicted labels with the ones you should actually get contained in the test data set. Table 3 shows preprocessed feedbacks of the test set with the actual labels and the labels predicted by one multi-label classifiers.

Table 3
Test set feedbacks with actual labels and labels predicted by a classifier

Real labels Predicted labels

Preprocessed feedback Task Process Regulatory Praise Other Task Process Regulatory Praise Other

graci por tu particip sandr 0 0 0 1 0 0 0 0 1 0

trabaj envi ser de tiemp 1 0 0 0 0 0 0 0 0 1

muy bien luz buen trabaj 0 0 0 1 0 0 0 0 1 0

hol carolin esper te encontr bien y recib en tâŁ¦ 1 0 0 0 0 1 0 0 0 0

estim marlen lo calif se deb a que lo activ seâŁ¦ 1 1 0 0 0 1 1 0 0 0

	Real labels	Predicted labels
graci por tu particip sandr	0	0	1	0	0	1	0
trabaj envi ser de tiemp	1	0	0	0	0	0	1
muy bien luz buen trabaj	0	0	1	0	0	1	0
hol carolin esper te encontr bien y recib en tâŁ¦	1	0	0	1	0	0	0
estim marlen lo calif se deb a que lo activ seâŁ¦	1	1	0	1	1	0	0

4 Results

The experiments were carried out on an Intel Core i5-3210M CPU 2.50GHZ, using the Windows 10 o- perating system. The Python scikit-multilearn library was used [16], which is built on the well-known scikit-learn ecosystem.

The results obtained are shown in Table 4, which shows for each data transformation method BR, CC, LP and Rakeld the metrics evaluated using the base algorithms SVM, RF, LR and NB.

Table 4
Results of trained classifiers

BR CC LP RakelD

MetricAlgorithm SVM RF LR NB SVM RF LR NB SVM RF LR NB SVM RF LR NB

Accuracy 0.695 0.7 0.631 0.549 0.707 0.713 0.661 0.556 0.698 0.696 0.677 0.587 0.698 0.698 0.659 0.561

F1 (micro) 0.858 0.855 0.832 0.776 0.852 0.852 0.829 0.77 0.84 0.837 0.832 0.772 0.855 0.844 0.836 0.77

F1 (macro) 0.644 0.673 0.614 0.523 0.647 0.673 0.627 0.522 0.632 0.653 0.626 0.529 0.645 0.658 0.634 0.526

Hamming loss 0.08 0.082 0.096 0.126 0.083 0.084 0.097 0.127 0.09 0.092 0.095 0.125 0.083 0.088 0.094 0.13

SubsetAccuracy 0.304 0.299 0.368 0.45 0.292 0.286 0.338 0.443 0.301 0.303 0.322 0.412 0.301 0.301 0.34 0.438

Ranking loss 0.181 0.178 0.216 0.271 0.175 0.173 0.201 0.276 0.188 0.192 0.195 0.267 0.175 0.188 0.196 0.266

Avg Precision 0.589 0.6 0.554 0.479 0.59 0.599 0.562 0.478 0.576 0.581 0.565 0.487 0.586 0.588 0.571 0.478

Coverage 2.293 2.277 2.439 2.696 2.295 2.271 2.402 2.728 2.336 2.346 2.376 2.692 2.276 2.327 2.362 2.676

	BR	CC	LP	RakelD
Accuracy	0.695	0.7	0.631	0.549	0.707	0.713	0.661	0.556	0.698	0.696	0.677	0.587	0.698	0.698	0.659	0.561
F1 (micro)	0.858	0.855	0.832	0.776	0.852	0.852	0.829	0.77	0.84	0.837	0.832	0.772	0.855	0.844	0.836	0.77
F1 (macro)	0.644	0.673	0.614	0.523	0.647	0.673	0.627	0.522	0.632	0.653	0.626	0.529	0.645	0.658	0.634	0.526
Hamming loss	0.08	0.082	0.096	0.126	0.083	0.084	0.097	0.127	0.09	0.092	0.095	0.125	0.083	0.088	0.094	0.13
SubsetAccuracy	0.304	0.299	0.368	0.45	0.292	0.286	0.338	0.443	0.301	0.303	0.322	0.412	0.301	0.301	0.34	0.438
Ranking loss	0.181	0.178	0.216	0.271	0.175	0.173	0.201	0.276	0.188	0.192	0.195	0.267	0.175	0.188	0.196	0.266
Avg Precision	0.589	0.6	0.554	0.479	0.59	0.599	0.562	0.478	0.576	0.581	0.565	0.487	0.586	0.588	0.571	0.478
Coverage	2.293	2.277	2.439	2.696	2.295	2.271	2.402	2.728	2.336	2.346	2.376	2.692	2.276	2.327	2.362	2.676

It is observed that for Accuracy, the base algorithms that best classify are RF and SVM, as they manage to classify well 7 out of 10 instances compared to NB in which less than 60% of feedback ranks well.

Regarding the F1-macro, it is shown that: the BR and CC approaches are the ones that best classify. The base algorithm that gives the best result is RF. It should be noted that the values are affected by the imbalance of the classes.

For F1-micro that considers the level of unbalance, it is the BR approach that best classifies with the SVM algorithms, the RF algorithm also shows a good cla- ssification in comparison on what is achieved by the LR and NB algorithms. It is detected that a high value is reached in the detection of labels that are relevant where 8.5 out of 10 predictions marked as positive are correct.

For Hamming loss, the classifiers using BR and CC are the best able to classify feedbacks when u- sing SVM and RF algorithms with only 8 out of 100 misclassified instances. For the Zero-One metric, it is found that using CC with the RF algorithm the best classification is obtained by only having only 2.8 out of 10 misplaced feedbacks when considering the complete set of labels.

For the ranking loss metric, it is observed that for BR, CC and LP it is the RF algorithm that classifies the best, followed by the SVM algorithm. For the AvgPrecision metric, it is the RF algorithm that classifies the best. SVM also shows values very close to those of RF. For the coverage metric, the best classification is achieved with BR and CC using the RF base algorithm.

In the radial graphs of Figures 1 , 3 and 4, each vertex corresponds to an evaluated metric and the points that correspond to a single multi-label classifier are connected to form a polygon. The larger the area of the polygon, the better the classification. It is observed that for BR, CC, LP and rakeldD the SVM and RF base algorithms classify better as in LP and rakelD.

Fig. 1

Radial graphs of multi-label classifiers using BR.

Fig. 2

Radial graphs of multi-label classifiers using CC.

Fig. 3

Radial graphs of multi-label classifiers using LP.

Fig. 4

Radial graphs of multi-label classifiers using rakeld.

5 Conclusions

In this work, a methodology is applied to classify the feedback generated by a teacher to the activities sent by students in online courses according to the levels proposed by the Hattie and Timperley model to locate them at the Task, Process, Regulatory, Praise and Other levels. Four multi-label classification approaches are compared each with four base algorithms.

The results show that the best multi-label classifier is obtained using CC with the RF base algorithm when reaching the best values in 5 of the 7 metrics analyzed. Using the RF and SVM algorithms as the basis for the RB, CC, LP and RakelD multi-label classification approaches allows to obtain better classification metric values. The NB algorithm shows a low misclassification compared to RF and SVM.

As future work, we will try to test whether the classifiers work with feedback obtained from online courses in other areas of knowledge. It will be explored whether the values of the classification metrics can be improved by using different tokenization configurations (bigrams, trigrams,...).

References

Cavalcanti

A.P.

, de Mello

R.F.L.

, Rolim

, André

, Freitasand

and Gasevic

, An analysis of the use of good feedback practices inonline learning courses, International Conference on AdvancedLearning Technologies2161 (2019), 153–157.

Pardo

, Jovanovic

, Dawson

, Gasevic

and Mirriahi

, Usinglearning analytics to scale the provision of personalised feedback, British Journal of Educational Technology50(1) (2019), 128–138.

Asociación de Internet.mx, Educación en línea enMéxico 2018, 2019, https://www.asociaciondeinternet.mx/estudios/educacion-en-linea-mexico

Aguerrebere

, Cabeza

S.G.

, Kaplan

, Marconi

and Cobo

, and M.Bulger, Exploring feedback interactions in online learningenvironments for secondary education, CEUR WorkshopProceedings2231 (2018), 1–10.

Brooks

, Carroll

, Gillies

R.M.

and Hattie

, A matrix offeedback for learning, Australian Journal of Teacher Education44(4) (2019), 14–32.

Fui

C.S.

and Liam

L.H.

, The effect of computerized feedback onstudents misconceptions in algebraic expression, PertanikaJournal of Social Sciences and Humanities26(3) (2018).

Herrera

, Charte

, Rivera

A.J.

and Del Jesus

M.J.

, Multilabel classification, Springer, 2016.

Van der Kleij

F.M.

, Feskens

R.C.

and Eggen

T.J.

, Effects of Feedbackin a Computer-Based Learning Environment on Students’ LearningOutcomes: A Meta- Analysis, Review of Educational Research85(4) (2015), 475–511.

Ramírez

G.R.

and Lozano

D.E.V.

, El modelo deretroalimentación de Hattie y Timperley como estrategia parafavorecer el cambio en las percepciones sobre la evaluacionformativa en docentes y alumnos,nEducativa del Tecnológico de Monterrey, Revista de Investigaci’o10(19) (2019), 75–87.

10.

Tsoumakas

, Katakis

and Vlahavas

, Random k-Labelsets forMultilabel Classification, Transactions on Knowledge and DataEngineering23(7) (2011), 1079–1089.

11.

Hattie

and Timperley

, The power of feedback, Review ofEducational Research77(1) (2007), 81–112.

12.

Read

, Pfahringer

, Holmes

and Frank

, Classifier chains for multi-label classification, Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2009), 253–269.

13.

Harris

L.R.

, Brown

G.T.

and Harnett

J.A.

, Analysis of New Zealandprimary and secondary student peer- and self-assessment comments:applying Hattie and Timperley’s feedback model, Assessment inEducation: Principles, Policy and Practice22(2) (2015), 265–281.

14.

García

M.A.A.

, Retroalimentacion en educación enlínea: una estrategia para la construcción delconocimiento,n aDistancia, Revista Iberoamericana de Educaci’o17(2) (2014), 59–73.

15.

Markets and Markets, LMS Market by Component (Solution and Services), 2020.

16.

Szymanski

and Kajdanowicz

, A scikit-based Python environment for performing multi-label classification, ArXiv e-prints, 2017.

17.

Ajjawi

and Boud

, Researching feedback dialogue: aninteractional analysis approach, Assessment and Evaluation inHigher Education42(2) (2017), 252–265.

18.

Quesada

, Evaluación del aprendizaje en la educación adistancia en línea,n a, Revista de Educación Distancia15(6) (2006), 1–15.

19.

Hernández

S.C.

, El constructivismo social como apoyo en elaprendizaje en línea, Apertura7(7) (2007), 46–62.

20.

Uribe

S.N.

and Vaughan

, Facilitating student learning in distanceeducation: a case study on the development and implementation of amultifaceted feedback system, Distance Education38(3) (2017), 288–301.

21.

Vives

and Varela

, Realimentacion efectiva, Investigación en educación médica2(6) (2013), 112–114.

22.

Shute,

V.J.

, Shute, Focus on formative feedback, Review of Educational Research78(1) (2008), 153–189.

	Training Set					Test set
ClassLevel	Task	Process	Regulatory	Praise	Other	Task	Process	Regulatory	Praise	Other
Class 0	4746	2742	37	3015	906	2066	1153	26	1263	389
Class 1	2963	4967	7669	4694	6803	1238	2151	3278	2041	2915
Total	7709	7709	7709	7709	7709	3304	3304	3304	3304	3304

	Real labels					Predicted labels
Preprocessed feedback	Task	Process	Regulatory	Praise	Other	Task	Process	Regulatory	Praise	Other
graci por tu particip sandr	0	0	0	1	0	0	0	0	1	0
trabaj envi ser de tiemp	1	0	0	0	0	0	0	0	0	1
muy bien luz buen trabaj	0	0	0	1	0	0	0	0	1	0
hol carolin esper te encontr bien y recib en tâŁ¦	1	0	0	0	0	1	0	0	0	0
estim marlen lo calif se deb a que lo activ seâŁ¦	1	1	0	0	0	1	1	0	0	0

	BR				CC				LP				RakelD
MetricAlgorithm	SVM	RF	LR	NB	SVM	RF	LR	NB	SVM	RF	LR	NB	SVM	RF	LR	NB
Accuracy	0.695	0.7	0.631	0.549	0.707	0.713	0.661	0.556	0.698	0.696	0.677	0.587	0.698	0.698	0.659	0.561
F1 (micro)	0.858	0.855	0.832	0.776	0.852	0.852	0.829	0.77	0.84	0.837	0.832	0.772	0.855	0.844	0.836	0.77
F1 (macro)	0.644	0.673	0.614	0.523	0.647	0.673	0.627	0.522	0.632	0.653	0.626	0.529	0.645	0.658	0.634	0.526
Hamming loss	0.08	0.082	0.096	0.126	0.083	0.084	0.097	0.127	0.09	0.092	0.095	0.125	0.083	0.088	0.094	0.13
SubsetAccuracy	0.304	0.299	0.368	0.45	0.292	0.286	0.338	0.443	0.301	0.303	0.322	0.412	0.301	0.301	0.34	0.438
Ranking loss	0.181	0.178	0.216	0.271	0.175	0.173	0.201	0.276	0.188	0.192	0.195	0.267	0.175	0.188	0.196	0.266
Avg Precision	0.589	0.6	0.554	0.479	0.59	0.599	0.562	0.478	0.576	0.581	0.565	0.487	0.586	0.588	0.571	0.478
Coverage	2.293	2.277	2.439	2.696	2.295	2.271	2.402	2.728	2.336	2.346	2.376	2.692	2.276	2.327	2.362	2.676