Educational Evaluation Based on Apriori-Gen Algorithm

The issue of educational evaluation has long been a research hotspot. Using big data analysis method to conduct educational evaluation can improve the pertinence and effectiveness of education. Conventional Apriori algorithm has certain limitations in the application of educational evaluation. This paper introduces an improved Apriori-Gen algorithm and describes its application in evaluation of actual effectiveness of ideological and political course of colleges and universities. Through conducting correlation analysis of network questionnaire data, the study requirements of college students can be acquired, so as to improve the teaching effectiveness of ideological and political course. Results show that it is effective to apply the proposed study method in educational evaluation.


INTRODUCTION
Implementation of reasonable educational evaluation is the premise for education decision making.An effective education evaluation relies on a comprehensive and solid evaluation basis.Big data stresses on in-depth mining and analysis of multidimensional data so as to seek the implication relation and value behind data, which is beneficial for transforming educational evaluation from prediction based on small data to evidential decision based on comprehensive data.With the aid of big data technology, educational evaluation is no longer made to support the decisional requirement of education management departments or education institutions only, but for all groups and individuals that are concerned about education or taking parts in education.Through analyzing students' study requirements via big data, the pertinence and effectiveness of education can be improved.
The information found by data mining is normally meaningful knowledge that is impossible to be found by manual power.Data mining algorithms are many, including Apriori (Agrawal and Shafer, 1996;D'Angelo et al., 2016), K-means (Scitovski and Sabo, 2014), SVM (Support Virtual Machine) (Hu et al., 2015;Mu et al., 2017), EM (Expectation-Maximization) (Enders, 2003), Pagerank (Chen et al., 2007), Adaboost (Adaptive Boosting) (Hu, 2017a), KNN (K-Nearest Neighbor) (Hu, 2017b), Naive Bayes (Sitthi et al., 2016), etc.Data are regarded to be associated when there is a certain regularity among them.The types of association are various, including simple association, chronicle association, causality association, quantitative association, etc.The purpose of association analysis is to find the correlation relation behind data.The association rule mining is to find meaningful and valuable association relation between item and set in database.In 1993, Agrawal et al. proposed for the first time the item-item association relation in mined database.Since then, many researchers conducted further studies on the association rule proposed by Agrawal et al. such as algorithm optimization, and introducing sampling and concurrent thought to improve algorithm efficiency.
The association rule mining proposed by Apriori contains two main parts: (1) to find all frequent itemsets in database according to a given minimum support; (2) to produce association rules.The key of the first part is to efficiently list all qualified frequent itemsets, which is also the most important issue in association rule mining technology.The improvement trend of association rule mining algorithm is to find all frequent itemsets that meet minimum support threshold.
To optimize Apriori's method, many research teams successively proposed various improvement thoughts.Holt et al. proposed IHP algorithm (Holt and Chung, 2002), in which the thought was to disperse the todo-list into a hash table; Zaki et al. proposed Max Clique serial algorithm (Zaki, 1997), in which the thought was to utilize a clustering technique; Orlando et al. proposed DCP algorithm (Orlando et al., 2001), which can store and count candidate itemsets in a new way and integrate a more efficient pruning technique; Park et al. proposed DHP algorithm (Park et al., 1995), which can reduce the cost for generating candidate itemsets using the hash technology; Agarwal et al. proposed Tree Projection algorithm (Agarwal et al., 2001), in which ordered tree and mine frequent itemsets were constructed using database mapping technology; Savasere et al. proposed PARTITION algorithm (Savasere et al., 1995), which can cut database into random blocks, allowing each block individually to generate frequent itemsets; Toivonen et al. proposed Sampling algorithm (Toivonen, 1996), which can reduce the scale of frequent itemsets via a sampling technique.All these algorithms can improve data mining efficiency to a certain extent.
The bottleneck problem of itemsets generation may be encountered when using conventional Apriori algorithm, because there may generate too many candidate itemsets as well as massive amount of rule algorithms caused by repeatedly scanning the database (Song et al., 2006).How to select out interesting and valuable rules to be applied in practical situation has become a difficult issue.On the basis of analyzing the conventional algorithm, this paper proposes an improved Apriori algorithm, and elaborates the design thought, main problems and implementation method of the improved algorithm.Finally, the application of the improved Apriori association rule algorithm in ideological and political course is illustrated according to the actual educational environment of ideological and political course.

TECHNOLOGIES RELEVANT TO BIG DATA ANALYSIS
Big data mining is to mine valuable and potentially useful information and knowledge from massive, incomplete, noisy, fuzzy, and random database, which is also a decision support process.Big data mining is mainly based on artificial intelligence, machine learning, pattern learning, and statistics.Common big data mining methods include classification, regression analysis, clustering, association rule, neutral network method, Web data mining, etc.These methods realize data mining from different perspectives.Association rule refers to the association and mutual relation between data items, which means that the generation of one data item can be used to deduce the generation of another item.The mining process of association rule mainly includes two stages: The first stage is to search all high-frequency itemsets from massive amounts of original data; the second stage is to generate association rule from these high-frequency itemsets.

Contribution of this paper to the literature
• This paper introduces an improved Apriori-Gen algorithm and describes its application in evaluation of actual effectiveness of ideological and political course of colleges and universities.
• The improved Apriori-Gen algorithm modified the bias during the teaching process and improved the teaching effectiveness of ideological and political education.

Collection of Data
College students' basic evaluations on the teaching effectiveness of ideological and political course were obtained by means of a questionnaire.The questionnaire included 47 questions, covering learner factors, teacher factors, and environment factors, etc.The questionnaire was implemented on website and the resulting data were exported in an Excel form as shown in Figure 2.
The first line is the name of each question, below which is the index value of item.To reduce the redundancy of storage, the content of each item is stored in another file.

Data Preprocessing
First, the questionnaire data in excel was subjected to processing treatment, wherein the question code was added in front of index value of each question, so that the index value of different questions can be distinguished from each other.After that, the questionnaire data was imported into RStudio using excel toolkit of R language.Data set is shown in Figure 3. Before association analysis, the questionnaire data should first be converted into transaction data form.Therefore, the data was first converted into List form, then converted into transaction form.Key codes are shown below: dataList <-split(data, f) dataList <-lapply(dataList, function(x){ rst <-unlist(x) names(rst) <-NULL rst <-unique(rst)}) transaction <-as(dataList, "transactions")

Data Analysis
After being converted into transaction form, the association analysis of data was conducted.The key statement of apriori algorithm is as below: rules = apriori(transaction, parameter = list(sup = 0.2, conf = 0.9)) where the minimum support degree was set to 0.2, the minimum confidence coefficient was set to 0.9, and the results are shown as follows: We can see that there are over 160,000 qualified association rules, of which there are only 75 rules with a length of 2, and more than 2000 rules with a length of 3; rules with length of over 3 are too many, which will not be analyzed in this research.The algorithm parameters were modified, where the maximum number of association rules was set to 3, which means only association rules like A->B and A&B->C can be exported.
> myrules = apriori(transaction, parameter = list(maxlen = 3, sup = 0.2, conf = 0.9)) > myrules.sorted<-sort(myrules, by = "lift") > inspect(myrules.sorted) The key codes are shown above.The analysis results were ranked according to their lift degrees and the following rules can be obtained: The first 15 rules ranked in descending order of lift degree are given above, where the largest lift degree reaches 3.800.

RESULTS
First, the visualization analysis of all rules was carried out, where all 2507 effective rules were grouped and displayed in the form of a bubble diagram.
As shown in Figure 4, the x-coordinate represents the left operation of grouping rule, y-coordinate represents the right operation of grouping rule, the circle size represents support degree (the larger the circle, the larger the support degree), the color represents the lift degree of rule (the darker the color, the higher the lift degree).The relative frequencies of rules with support degree over 0.5 were recorded, as shown in Figure 5.We can conduct rule analysis for a single particular result.For example, the representative informant in question 1-1 is a college student, his/her association rules was specifically analyzed, and the results are shown below: Figure 6 is a directed graph analysis of few key rules of a college freshmen student, where the arrow points to the direction of rule deduction.Figure 7 is the scatter plot of rules, where the x-coordinate represents support degree, the y-coordinate represents confidence coefficient, the color of scattered point represents the lift degree (the darker the color, the higher the lift degree).Based on these rule analyses, the association relation between question and answer can be obtained combined with questionnaire results.The evaluation of the effect of education has always been a focus of research.Varank et al. investigated the effectiveness of an online automated evaluation and feedback system that assessed students' word processing assignments (Varank et al., 2014).Öztürkler examined the current situation of the quality improvement in higher education institutions (Öztürkler, 2017).
Due to the limitation in evaluation conditions, traditional education evaluation normally collects only segmental evaluation information, and therefore may easily and passively ignore some evaluation points.During the implementation of educational evaluation, it will be over-reliant on subjective evaluation due to a lack of reliable evaluation basis.In contrast, big data-based educational evaluation does not rely on one-dimensional evaluation of a single evaluation object, but includes all contextual data related to education, not only using evaluation data, but also focusing on process data.The thought of seeking association via big data technique meets educational evaluation's true requirement for rich basis and valid evidence.The introduction of big data expands the content and function of educational evaluation, making it not only an evaluation, but also an important evidence for educational decision making.Peng used the big data processing technology on the online learning behavior analysis model (Peng, 2017).
However, due to the complexity of educational data, it is difficult to describe and analyze educational big data with general data analysis tools.With the development of higher education, it is ever more necessary to analyze and evaluate educational data so as to guide the formulation of educational policy and students' learning behavior.In this paper, 160000 association rules are extracted from educational data, which is too large for analyzing and evaluating data, so it must be condensed.By using the Apriori method, these association rules can be reduced to 2507 or even to 75.Although 75 seems simpler, this article uses 2507.After streamlining, these rules must be analyzed to group them to see which rules are more effective for evaluation, which rules may be redundant, which rules can be merged, and the set weights of the rules.The results of this study show that the research results are very satisfactory.
In this paper, an improved Apriori-Gen algorithm was applied for effective evaluation of ideological and political education, not only realizing overall evaluation, but also presenting individual situation effectiveness.The improved Apriori-Gen algorithm modified the bias during the teaching process and improved the teaching effectiveness of ideological and political education.By fully utilizing technical approaches, this new algorithm collected both students' study process and result data, and integrated various evaluation data (expert evaluation,

Figure 6 .
Figure 6.Digraph of specialized analysis of rules

Figure 7 .
Figure 7. Scatter plot of rule analysis