Bayesian Assessment of Undergraduate Students About the Real Function Mathematical Concept

The evaluation of learning in mathematics is a worldwide problem, therefore, new methods are required to assess the understanding of mathematical concepts. In this paper, we propose to use the Item Response Theory to analyze the understanding level of undergraduate students about the real function mathematical concept. The Bayesian approach was used to make inferences about the parameters of interest. We designed a test containing twelve items, to which a reliability analysis and validation test were applied. The experiment consisted in administer our test to 48 undergraduate students (18-20 years old) who are in a math career. We concluded that 25% of the students reached a high level of understanding, 39.6% a medium level of understanding and, 35.4% a low level of understanding. Furthermore, that students obtained low levels of understanding for tasks with high cognitive demand, and they obtained high levels of understanding for tasks with low cognitive demand.


INTRODUCTION
The evaluation is a fundamental component in education, because from it, teaching and learning strategies are designed and public policies are dictated with the purpose of improving the understanding of the students. Elton and Laurillard's (1979) say that the quickest way to change student learning is to change the assessment system. There is a need to develop new ways to assess the level of understanding university in order to intervene accordingly. Competing demands on assessment, including the measurement and learning, pose an ongoing challenge for Higher Education (Brunker, Spandagou, & Grice, 2019). To a certain extent, the terms evaluation and assessment can be seen as synonymous; in the context of this paper, we prefer to use assessment as a method to measure or give a value judgment of a characteristic of interest.
In Mexico, there are different standardized tests that, among other aspects, evaluate mathematical skills. For example, for high school students, the PISA (Programme for International Student Assessment) test is applied to assess an individual's ability to identify and understand the role of mathematics in today's world, the PLANEA test (a national test) is applied to provide a diagnosis of the students' ability to interpret, understand, analyze, evaluate and solve different problems using their mathematical learning. Also, there are national standardized tests, as EXANI, that provide a diagnose the level of performance for admission to a bachelor's graduate degree.
In mathematics specifically, the National Governors Association Center for Best Practices and Council of Chief State School Officers (NGA Center & CCSSO, 2010), proposed assessment as a part of mathematics instruction, which includes connections between mathematical practices and mathematical content. They highlighted that teachers should know how to assess students' mathematical understanding and defined "Mathematical understanding as the ability to justify, at a level corresponding to the student's maturity, why a particular mathematical statement is true or where a mathematical rule comes from" (p. 4).
For researchers in mathematics education, a means to assess students' understanding could be important for evaluating the effectiveness of mathematics instruction, further to employ assessment instruments could also inform teachers what specific aspects of a knowledge students understand and what aspects they do not understand (Mejía, Fuller, Weber, Rhoads, & Samkoff, 2012). Then, it's important and necessary to develop tests and methods for assessment particularly mathematical concept' understanding in undergraduate students.
Our research aims to assess the understanding of the function mathematical concept using the Item Response Theory (IRT) from a Bayesian perspective. In this framework we take the ability parameter of the model as the level of understanding of the mathematical concept. To carry out this task, an assessment instrument was designed based on Sierpinska's criteria.

Mathematical Understanding
Studies in understanding in mathematics are remarkable to the mathematics education community (Afriyani, Sa'dijah, Subanji, & Muksar, 2018;Albert & Kim, 2015;Doruk, 2019;Haylock & Cockburn, 2013;Jinfa & Meixia, 2017;Kastberg, 2002;Malatjie & Machaba, 2019;Pirie & Kieren, 1994;Sierpinska, 1990;Skemp, 1976). Understanding in mathematics has been studied since the 1970's. An overview, Skemp (1976) identified relational understanding and instrumental understanding; Michener (1978) recognized the understanding mathematics as a complement to process to problem-solving; later, Sierpinska (1990Sierpinska ( , 1992 proposed understanding as an act in which you get involved a process of interpretation; likewise Nickerson (1985) gives some characteristics about understanding, as a being able to see deeper properties of a concept, looking for specific information in a situation more quickly, being able to represent situations, and envisioning a situation using mental models. Nickerson highlighted that the more one knows about a subject, the better one understands it, showing the relationship between knowledge and understanding; by the other hand, Hiebert and Carpenter (1992) mentioned that the level of understanding is determinate by the strength of the connections between mathematical ideas, procedures or facts; in this sense, Wilkerson and Wilensky (2011) note to the structure of mathematical knowledge as a network of relations between different properties, objects and procedures that come to bear on a mathematical idea.
In this sense, teaching and learning with understanding are accepted as desirable and priority objective, which has motivated an increase in initiatives that are essentially concerned with the development of understanding in the mathematics classroom, however, these initiatives are often affected by major difficulties and constraints when full understanding is not taken into account (Sierpinska, 2000).
In summary, understanding allows interpreting the attributes of the object from the functionality it represents (Pecharromán, 2014). In such a way that the student must know the different characteristics of the mathematical object, as well as its origin and precise moment of applicability or use in various situations, including relating the object to others. In other words, the understanding of mathematical objects starts from the identification of characteristic elements and, in turn, the organizational or interpretive functionality of the context.
For the study the analytical tool was taken from Sierpinska (1992), which consisted of four categories of acts of understanding, and they have been used. According to Sierpinska (1992Sierpinska ( , 1994, the focus of acts of understanding can be significant because they mark a transfer to a different level of thinking, and because in teaching which are the main concern of both teachers and students. And students acquire certain ways of understanding and knowledge helping them to experience acts of understanding. Sierpinska stated that a good understanding of a mathematical situation, such as understanding a concept, is achieved if the process of understanding contains a number of, especially significant acts. The four categories of acts of understanding are: 1) Identification, which refers the identification of an object amongst other objects; 2) Discrimination, is another category that allows to recognize the difference between two distinct objects and helps to recognize their relevant properties; 3) Generalization category leads the possibilities to extend the range of application and the range the universe of objects of the same family and; 4) Synthesis category is the perception of links of the concepts into a consistent whole.

Bayesian IRT Model
Item response data comes from applying a test to a group of individuals. A test is composed of a number of Contribution to the literature • The paper proposes using the Bayesian item response theory as an innovative application in mathematics education. • The paper provides a quantitative assessment of students' understanding when solving mathematical problems of real functions concept. • The paper shows a development and validation of an instrument to measure undergraduate students' understanding of a mathematical concept.
items. These tests are used extensively in schools, industry and government for various purposes (Baker & Kim, 2004;Fox, 2010;van der Linden & Hambleton, 1997). Item Response Theory (IRT) is a general framework for specifying mathematical functions that describe the interactions of persons and test items. The one-dimensional IRT assumes that the interactions of a person with test items can be adequately represented by a mathematical expression containing a single parameter describing the characteristics of the person, which represent unobservable hypothetical constructs (latent variables), such as ability, skill, intelligence or cognitive abilities. These latent variables can only be modelled through the measurement of other manifest variables. Lord (1952Lord ( , 1980 developed, described, and applied the item response models, and he established the basis of IRT, also called modern test theory. Traditionally, frequentist analysis has been used in IRT; however, the Bayesian approach becomes very attractive for modelling item response data (Fox, 2010).
We considerer that is a random variable denoting the response of individual to item . We model the probability of the correct answer , corresponding toth individual in the -th item, as where (•) is the cumulative distribution function (CDF) of a known parametric family. In the context of IRT, is called the Item Characteristic Curve (ICC), , and are item parameters (called discrimination, difficulty and guessing, respectively), and ability level .
In this work, we are considering (•) as a standard normal CDF and two item parameters, a and b. Then, in order to model the probability of the correct answer of the -th individual in the -th item, is given by The model above is commonly called the probit model. Note that, if the person's ability is greater than the difficulty of the item, then the probability of success is higher in comparison with the probability of failure. This model in (1), represents the conditional probability that the i-th individual responds correctly to the k-th item given an understanding level and it assumes that the responses to a pair of items are statistically independent given the parameter . In this paper, we model dichotomous response data from a Bayesian point of view, where is the parameters of interest, which is consider as a random variable and have a prior distribution. In the framework of this paper, is the parameter associated with the understanding level of an individual about the mathematical concept: real function of the real variable. Let = ( 11 , … , ) denote the observed item responses, then probability density of given the parameters is given by (1 − ) As a function of , ( | ) is called likelihood function. Based on the available information, the posterior distribution of is obtained through Bayes theorem where ( ) is the prior distribution of and ( ) is the marginal distribution of the observations. In this case, the posterior distribution is analytically intractable and thus we use Markov Chain Monte Carlo (MCMC) methods to obtain samples from (2). The Gibbs sampling (Casella & George, 1992) and Metropolis-Hasting (Chib & Greenberg, 1995) algorithms are the most commonly used MCMC methods. Nowadays, these methods are have been already implemented in computer programs, for example the JAGS (Pulmmer, 2012).

Items Development
Items were designed considering Stein taxonomy (Mellor, Clark, & Essien, 2018). So, the items were classified in terms of the level of cognitive demand that the students require to satisfactorily solve them. The Items that only need for their resolution that students make action memorization or perform procedures without connections, were considered as items of low cognitive demand. The items that require for their resolution, that students perform procedures with connections or that they construct mathematics were considered items with high cognitive demand. According to Hiebert and Lefevre (1986), the procedural knowledge can be obtained by memorized learning and can exist without being connected to some scheme, that is, it corresponds to low cognitive demand tasks, and conceptual knowledge, that is, knowledge that is rich in links, with connections to pre-existing knowledge and that is obtained through significant learning, in addition to promoting the integration of knowledge in existing schemes, corresponds to tasks with high cognitive demand.
Therefore, each item was designed with different levels of cognitive demand and according to the acts of understanding referred to Sierpinska (1994). The items corresponding to the act of Identification, were designed so that students would recognize the definition of the function concept or recognize its invariant components, that is, that a function consists of a domain, a counter domain and a correspondence rule. It is also proposed that students recognize that this correspondence rule can be a relationship that associates an element of the domain set with a single element of the counter domain.

/ 13
And in this sense to know their conceptions about it; the items corresponding to the act of Discrimination, had as objective that the students could differentiate between what is a function and what is not a function. In this sense, it was suggested that they could recognize that when an element of the domain is matched with two or more elements of the counter domain, the correspondence rule is not a function by definition; The items corresponding to the act of Generalization had the main objective that students could identify in a situation a particular case of another situation more general; and the items corresponding to the act of Synthesis were designed to identify the understanding of students when they should consider, the relations between two or more properties, facts or objects and organizing them into a consistent whole, about the function concept. Twelve items (see Appendix 1) were designed from this focus and evaluated by experts to ensure their appropriateness for measuring the academic content, the language and the academic level (see Table 1).

Data Analysis
We applied a test, which containing 12 items, to 48 undergraduate students in mathematics (18-20 years old) of the University of Guerrero in Mexico. We randomly select students who have already taken completed basic Algebra and Calculus courses, because the theme of the function is established in the study program of both courses. The test was applied in a time of 90 minutes.
The data set contains the responses of the tests, where 1 indicates a correct answer and 0 an incorrect answer. We will assume that the twelve items measure an unidimensional ability (the level of understanding of a mathematical concept: real function of the real variable) represented by , which is a continuous latent variable that assumes values on the real line. We estimate the item parameters of the probit IRT model with two item parameters using the Markov Chain Monte Carlo (MCMC) methodology. This example is implemented using the JAGS (Plummer, 2012) package within R (R Core Team, 2017) software.
The examinees are assumed to have been sampled independently from the population of the students, and a normal prior density is specified for the understanding parameters with mean zero and a variance of one. Prior densities for the item parameters are given by normal densities too, where the discrimination parameter is restricted to be positive with mean set in one, which indicates a moderate level of discrimination. For the difficulty parameter, we use a prior mean parameter of zero, which indicates an average level of difficulty. Both variance parameters are fixed to one. And for the guessing parameter, we use a uniform distribution on interval [0,1]. MCMC and the JAGS output contains sampled values from each parameter's marginal posterior density. For Bayesian inference, the sampled values were used to compute summary statistics of posterior densities of parameters of interest.

Reliability and Validity of the Test
An analysis of the reliability of the test was making through the use of a measure of internal consistency called Cronbach's alpha. For the 12 items of the mathematical test, Cronbach's alpha was 0.76, which indicates that the test applied has an acceptable reliability. Also, a statistical validation is performed to know the relevance of the items. An exploratory Factor Analysis is applied to data set obtained, which resulted four factors. The factor loadings are presented in Table 2 and the cumulative percent of variance is 63.2%. Note that, the number of factors obtained matches with the numbers of acts of understanding of Sierpinska' criterion in Table 1. In Table 2 we can observe that items with higher loadings for each factor correspond to the classification according to the acts of understanding in Table 1.

Bayesian Estimation
The probit model was fitted to the experimental data; for this purpose, we used two chains, each one with 9000 iterations, and the first 1000 were discarded, taking a thinning rate of 8, so 2000 posterior samples were used to obtain the summary statistics about the parameters of interest. Convergence diagnostics were done too. The posterior means provides information on where most of  the posterior density is located and the reported posterior mean is the expected value of the parameter of interest under the marginal posterior density. These values were used as point estimates of the parameters of interest. Also, we show the standard error (Sd) of the estimators and Rhat, which is the potential scale reduction factor. Rhat close to 1 indicates convergence in the MCMC procedure employed. The estimation of the personal parameters, which correspond to students' level of the understanding to a mathematical concept, can be summarized in the Figure 1.
Overall, the level of understanding of the students varied from -1.7 to 2.03 with a mean 0 and Standard error (Sd) 1. Rakkapao et al. (2016), proposed to use the interval (mean ±0.5Sd) to divide the students into three groups by understanding level. So, we distributed the students as shown in Table 3. We can observe that 25% of the students reach a high level of understanding about the mathematical concept of real function of one variable; in contrast, with that 40% of them who fall in the interval of a low-level of the understanding.
In Table 4, we show the item's parameter estimation for the model proposed. Items with a higher discrimination parameter were, in crescent order, 8, 4, 12. The estimated average of the discrimination level is 0.98, which is slightly smaller than the prior mean. The quantiles show that the posterior densities are non symmetrical and positively skewed. For the difficulty parameters, five items are negative and seven are positive, this means that the number of easy items is slightly smaller with respect to the number of difficult items, the most difficult items were, in increasing order, 2, 8, 10, 11, 4, 1, 12, items which were answered incorrectly by more than 50% of the examinees given a zero average population level of ability, while the items with the lowest difficulty parameter also in crescent order were 3, 7, 9, 5, 6. The proportions of correct responses are shown in Figure 2.
In Figure 3, we show the ICC's for each item distributed in each understanding act studied.

DISCUSSION AND CONCLUSIONS
In this work, we propose a methodology that allows us to evaluate the understanding of a mathematical concept using a Bayesian Item Response model. For this task, we designed and applied a test to assesses the understanding of undergraduate students on the concept function, taking into account the criteria on acts of understanding of Sierpinska, likewise, we show that it is a content-valid and reliable evaluation instrument with satisfactory discriminatory power.
In this work, we propose a methodology that allows us to evaluate the understanding of a mathematical concept using a Bayesian Item Response model. For this task, we designed and applied a test to assesses the understanding of undergraduate students on the concept function, taking into account the criteria on acts of understanding of Sierpinska, likewise, we show that it is a content-valid and reliable evaluation instrument with satisfactory discriminatory power.
According to the ICC's obtained by the proposed model, we can observe the following: For the items of the act of understanding of identification, it is observed that item Q3 is more likely to be answered correctly with respect to items Q9 and Q1 given a level of understanding \ theta. The model suggests that a higher level of understanding is required to correctly answer item Q1 regarding Q9 and Q3.
Recognizing that items Q9 and Q1 are classified as items of high cognitive demand, this means that, students are less likely to achieve the required acts of identification. For example, for item Q1, more than 60% of the students answered it wrongly, it could be considered as a problematic item since knowledge of other mathematical concepts, such as injectability, is needed to recognize the definition of function in notational terms and for the non-formal notational that are generally presented at previous educational levels.  Figure 3, strategies can be implemented to improve the level of understanding of the students. For example, the ICC of item Q9 it shows that for students' comprehension levels greater than 0.5 there is a probability of more than 0.5 of obtaining a correct answer; however, this could be improved if students were able to relate a verbal representation to its graphic representation, discriminate between domain and counter-domain, and recognize the model associated with a verbal and/or graphic representation.
In general, a similar behavior is observed with the rest of the acts of understanding; that is, items that demand higher cognitive levels also demand higher levels of understanding, and items that demand low cognitive levels require lower levels of understanding. The results of the data, for the population sample, show that a quantitative increment is needed in order to reach the understanding of the mathematical concept of real function of one real variable. For instance, students need high levels of understanding (greater than 2) to have a high probability (close to 1) of correctly answering any of the 12 items. 9 / 13 APPENDIX 1 Instruction: Circle the correct answer. You can do the operations or graphs that you consider necessaries Let ℝ be the set of real numbers and consider the functions of a subset of ℝΧℝ.
(1) Which of the following expressions represents a function of a real variable? represents a real function of a real variable if: is a continuous function in all real numbers.
(8) Observe the next points which belong to a graph of a real function of a real variable. (9) Octavio celebrated his fifth birthday on July 22, 2015. His parents say that he is going to be 5 years old for a year, so it will be until July 22, 2016, that he will turn 6 years old, and until July 22, 2017, that he will turn 7 years old, and so on. Which one of the following function graphs models the context?