An Empirical Study on Student Evaluations of Teaching Based on Data Mining

Under the influence of big data, many fields have undergone tremendous changes. In the field of education, the data still contains a wealth of practical value, but the data mining and knowledge discovery is not enough, especially in the application of student evaluations of teaching (SET). In study, the K-means algorithm is used to cluster the data of three main teaching evaluation indexes (TEI) including individual background, course content, teaching method into high satisfaction degree (HSD), middle satisfaction degree (MSD), and low satisfaction degree (LSD). The logistic regression results showed that gender was a significant factor in students’ evaluation of teachers and that there were potential connections between teaching evaluation and teachers’ gender, age, and teaching content. In addition, the research shows that the effect of satisfaction degree on students’ academic achievement is limited. The findings from this empirical study present a better understanding of reform of SET in higher education.


INTRODUCTION
Student evaluations of teaching (SET) are the most commonly used tool in contemporary higher education (Knapper, 2001).Institutions generally use SET for the following three purposes: (a) to improve teaching quality through feedback, (b) to determine the promotion and tenure of the faculty, and (c) to demonstrate an institution's accountability (Kember, Leung, & Kwan, 2002).Big data analysis is a popular research topic of many fields.In the field of education, big data will rebuild the teaching evaluation method (Picciano, 2012) from the original experiential evaluation into scientific and quantitative evaluation and from single dimension evaluation to multidimension evaluation.With the support of data mining technology, SET based on educational big data will be more scientific and accurate.
Many indexes influence students' evaluations of teachers in education, researches lack specific selection process of effective teaching evaluation indexes (TEI).In the traditional teaching evaluation, the data source is single, and not comprehensive enough, which will affect the fairness of student evaluations of teaching.In this paper, we summarize three main indexes and ten sub-indexes combining the whole semester, and analyze the effectiveness of the evaluation indexes, as well as the interaction between indexes.

RELATED WORK
In the past few years, many researchers have put forward many methods for SET in universities, SET have provoked heated discussions in the following 4 aspects.(a) Transformation of SET paradigm: by expanding current criteria(e.g., educational scholarship, academic papers), schools could better inform the selection process, as well as promote evidence-based teaching practices, career promotion, and innovations in education (Kiersma et al., 2016); Data were collected using an internet survey designed to measure students' conceptions regarding five teaching dimensions referring to goals to be achieved, long-term student development, teaching methods, relations with students, and assessment (Alhija, 2016), It is found that long-term student development is the most important dimension for students to evaluate their teaching ability; Some researchers proposed that teaching should move toward standardization, professional teaching evaluation and teaching evaluation oriented (Appleton, Christenson, & Furlong, 2008), and institutions should pay attention to the use of basic data.(b) The influence of SET: through interviewed with students, researchers found that students were generally positive about SET, and they agreed that SET can provide accountability for teaching quality.They also believed that SET might influence future teaching practice when they perceived that the results would be used by the teachers (Spooren & Christiaens, 2017).Burden (2008Burden ( , 2010))interviewed several award-winning teachers in higher education, and observed that they all attached great importance to SET, but only a few teachers considered that educational evaluation was helpful for them to improve their teaching quality.(c) Combining algorithms with SET: in the literature, (Vasconcelos, 2012) puts forward the key rule mining method based on data mining technology to find the meaningful association between data, and gives the framework of teaching evaluation data mining system.In order to study more scientifically and accurately, a study results generalized ordered logistic regression analyses show that male students express a bias in favour of male professors (Boring, 2017).(d) Indicators selection of SET: student ratings are an influential measure of teaching effectiveness, active participation by and meaningful input from students can be critical in the success of such teaching evaluation systems (Sojka, Gupta, & Deeter-Schmelz, 2002).Also, the differen5656t teaching dimensions that students value in male and female professors tend to match gender stereotypes (Eagly & Karau, 2002).(Spooren & Christiaens, 2017) recognize that based on their conceptions of their roles as evaluators of university professors, students might differ in their perceptions of teaching competence.
At present, the evaluation model of teaching evaluation focuses on the study of the whole teaching in our country, mainly focusing on theoretical research.It has an important guiding significance for the comprehensive development of education, however, it lacks of practical guidance causing the scope of the analytics is abstract.Aiming at this problem, this paper presents an empirical study on SET based on data mining.

State of the literature
• Most of papers focus on the effect of SET on students' learning.
• Realizing that the quantity and quality of evaluation indexes are of vital importance to evaluate the effect of learning, researchers begin to pay attention to the selection of evaluation indexes.
• In order to make the conclusion more scientific and practical, researchers combine algorithms with SET in recent years.

Contribution of this paper to the literature
• The paper builds the system of indicators from multi-aspects and multi-dimensions in order to ensure an objective and scientific outcome.
• The paper analyses the effectiveness of the evaluation indexes.
• In this paper, the interaction of evaluation indexes is analyzed.

METHODOLOGY
On the basis of the literature review, we focus on the potential indexes that may influence students' evaluation toward teachers, to validate whether they have an influence on teaching evaluation in our own research context.Moreover, we construct the three models based on logistic regression algorithm to verify the interaction between the various indexes.

Teaching Evaluation Indexes (TEI)
Teaching evaluation prediction is carried out under the premise of students' evaluation analytics, before that, selecting which teaching evaluation indexes for analyzing is critical.The research's object is a subjective feeling of the students to the teacher for the whole term, occurred on not only face-to-face learning in the classroom, but also other learning areas, such as Hstar cloud platform(a system that we designed for integration of education resources) .The items were worked to focus on satisfaction degree in education specifically.All survey items were answered using a five-point Likert scale.The course content included four questions on teachers' language skills, cultural knowledge, moral education and assignment.The teaching method comprised three questions each for classroom organization, activities and interaction.In addition, the teacher's personal background information is also an evaluation index, including gender, grade and professional title.In this paper, we have constructed three main indexes and 10 sub-indexes, with a total of 21 questions (except for the teacher's background), which are shown in Table 1.

Teaching Evaluation Model
Based on the extensive literature research, we construct our teaching evaluation model from the students' evaluation scores, as shown in Figure 1.

Data Collection and Pre-processing
This phase needs to complete two tasks: data collection and data pre-processing, the data on Hstar cloud platform are stored in different data types in different types of databases, database type includes relational database and non-relational database.
In our university, administrators ask all freshmen to fill in teaching evaluation before final exam.Students who do not complete their teaching evaluation are not allowed to enter Hstar cloud platform, check their scores cannot register for the next semester, cannot print a diploma.We need to pre-process the data, for some reason, the data stored in the database has a small amount of incomplete and abnormal data.For example, some students may forget to register account, to fill in, or fill in the wrong way.According to the above indexes, we combine the data of the students' actual evaluation scores to pre-process the data.

Satisfaction Degree Analytics
After pre-processing, the next thing is analyzing data further.We need classify students according to certain criteria through clustering analysis, then compare teachers of different types of satisfaction degree, analyze the TEI of different teachers.
Clustering is an unsupervised data mining technique whose main task is to group the data objects into different clusters such that objects within a group are more similar than the objects in other clusters.K-means algorithm is very popular clustering technique for numerical data.We can use the k-means algorithm which can be used to divide teachers into different groups.The k-means algorithm depends on the given value for k when grouping data.In study, we use Gap statistic which is one of cluster selection criteria to identify the number of groups.
A description of Gap statistic is given below: Consider a data set   ,  = 1, 2, … , ;  = 1, 2, … , , consisting of  data objects with values of  attributes.Assuming   is the squared Euclidean distance between objects X and Y given by   = ∑(  −   ) 2 .If the data set has been clustered into k clusters,  1 ,  2 , … ,   , where   indicates the ith cluster, then   = |  |.
Let   = ∑   , (where ,  ∈   ) is the sum of pair-wise distances for all points in cluster i and   is the collective within cluster sum of squares around the cluster means and is given by Eq. (1).  () can be defined as the difference between expected and observed values of log (  ) and given in Eq. ( 2).K can be taken for the value maximizing   (). (1) Where   * denotes the expectation under a sample size n from the reference distribution ion.A brief formal description of k-means algorithm is given below: Input: Data set consists of data objects, Number of clusters Output:  clusters Method: 1. Choose the objects at random from, as initial cluster centers.2. Repeat 3. Assign each data object to the cluster to which its distance centers.4. Update the cluster means, i.e. calculate the mean value of the objects for each cluster. 5. Until no data object changes its cluster membership or any other convergence criteria is met.

Three Model Building
In order to distinguish whether these differences of TEI were significant, we conducted a statistical analysis by using logistic regression.
This process contains three parts: build Model 1, analysis of teachers' individual background, including gender, age and professional title; build Model 2, analysis of teachers' personal background and course content that contains language skills, cultural knowledge, moral education and assignment; Model 3, analysis of all indexes.

Evaluation
Through the analysis of TEI, we can enlighten administrator to build a more important and meaningful evaluation system for teachers to conduct a more fair and scientific evaluation.Teaching evaluation is not a static, but a dynamic cycle process, school administrators, teachers, students, statisticians and other teams can analyze the evaluation process to improve the evaluation rules, enhance their ability and promote the teaching effect.

EXPERIMENTS AND DISCUSSION
We got related data generated from CCNU's Hstar cloud platform, used Gap statistic and k-means to cluster teachers and used Logistic Regression to explore the relationship between the indexes of teaching evaluation.

Data Collection and Pre-Processing
Higher Math, Moral Education, College English are all freshmen' compulsory courses in their first term at university.The data came from the Hstar Cloud Platform of 5402 freshmen who evaluated 149 teachers of the relevant courses in the fall semester of 2016.We used the eclipse tool to tally data in the database referring to 24 items (including teachers' individual background), then dealt with missed values and outliers.By data preprocessing, the study collect the effective data of 5204 freshmen and 142 teachers.

Data Analysis
Number of clusters to be made is identified using gap statistic as discussed above.The graphs clearly indicate that there are three clusters for teachers.Figure 2 shows that gap value at cluster 3 that maximizes the   () Table 2 presents descriptive statistics for the variables measured.Of the 42 teachers who reported that they received a high satisfaction degree evaluated by students, 58% were male.In contrast, 57% of the teachers who reported that they received a low satisfaction degree evaluated by students were female.With respect to age, the results showed an age difference favoring younger teachers-for example, more teachers under 50 reported that they received a high satisfaction degree than teachers over 50, and 31% of those who reported that they received a middle satisfaction degree and 35% of those who reported that they received a low satisfaction degree were in their 50s.Finally, teachers who reported to receive a high satisfaction degree showed higher scores on language skills (C_LS), cultural knowledge (C_CK), activities (T_A), and interaction (T_I), but in terms of moral education (C_ME), classroom organization (T_CO) and assignment (C_A), the scores are not too obvious different.
In sum, descriptive statistics showed a greater tendency of high scores among younger, male teachers, and those who showed higher scores on language skills, cultural knowledge, activities, and interaction.As for academic achievement, those who reported to receive a high satisfaction degree achieved similar, or somewhat higher, scores on usual achievement (U_A) and final exams (F_E) than others.
In order to distinguish whether these differences were significant, we conducted a statistical analysis by using logistic regression.In order to study how to get high evaluation, for simplicity, we combine MSD and LSD into the new group compared to HSD.

5843
In Table 3, the analysis examined TEI influencing the likelihood students in teaching evaluation would evaluate teachers with HSD.The first model consisted of a set of individual background indexes, among which only gender and age were significant predictors of the teachers' likelihood of receiving high satisfaction degree.Regarding gender, male teachers were 23% (according to the value of OR) more likely to receive HSD than female teachers.Moreover, the odds of teachers in their 30s receiving HSD were 29% higher than teachers in their 50s or above, respectively.In terms of professional title, it is not a significant predictors from the model.The pseudo- 2 of the first model was 0.065.
In Model 2, in which the course content of teachers were taken into account, significant differences in the likelihood of receiving HSD were observed between male and female gender.However, when comparing Model 2 with Model 1, the differences the 30s and 50s or older age range became non-statistically significant after controlling these indexes of course content.Among the course content, language skills and cultural knowledge were significant predictors of the teachers' likelihood of receiving HSD by positively influencing the students' decision.Model 2 increased the pseudo- 2 from 0.065 to 0.136.
In Model 3, the teaching method were also included.Among them, the activities and interaction were found to be significant predictors of the teachers' likelihood of receiving HSD.Even after controlling the teaching method and when comparing Model 2 with Model 3, the observed differences between male and female gender, remained significant in Model 3, though the effect is narrowing.So were the effects of language skills and cultural knowledge.Furthermore, one-unit changes in the course content indexes of language skills and cultural knowledge increased the odds of receiving HSD by 13% and 16%, respectively.In addition, the assignment was found to be a lightly significant predictor that increased the odds by 8% with a one-unit change.Finally, a one-unit change in interaction and activities, two teaching method indexes, increased the odds of receiving HSD by 12% and 11%.Overall, Model 3 increased the pseudo- 2 from 0.136 to 0.147.

CONCLUSIONS
The current study provides empirical evidence that enables a better understanding of students' evaluation of teachers.The study not only not only found an effective way to cluster analysis, but also attempted to investigate the relationships among indexes pertaining to individual background, course content, and teaching method.
By analyzing the relationships among indexes more closely, we can gain insights into how teachers can improve their teaching quality and how administrators can make more scientific and fair teaching evaluation criteria.For example, the reasons why satisfaction degree of male teachers are higher than female teachers is probably due to students' impression of male stereotypes (of authoritativeness and knowledgeability).The teachers could also pay attention to language skills and interact with students in the teaching process, maybe, humorous teaching makes student easier to accept difficult knowledge.Teachers should also improve their cultural knowledge, so as to enable students to concentrate on learning and reduce the probability of skipping classes.More activities effective activities should be carried out as well.
SET is impacted by many factors.Although we analyzed three main indexes including ten sub-indexes, it is only a part of learning evaluation.The more indexes in the learning evaluation should be considered.Besides, it may be worth researching to further explore the relationship between satisfaction degree and students' academic performance, and we need to continue to explore.

Figure 2 .
Figure 2. Three models building process

Table 1 .
Teaching evaluation indexes