Pupils’ Summative Assessments in Mathematics as Dependent on Selected Factors

School assessment is a determining factor for a pupil in all functions. The aim of the research was to determine correlation between pupils’ logical thinking and their school assessment and between pupils’ mathematics skills and their school assessment. The sample size was created by 252 high school students (117 boys and 135 girls). As research tools were used reasoning mathematical, logical thinking and mathematical skills tests. Kruskal-Wallis test was used as statistical method. A significant effect was found at the 1% significance level between these factors and school assessment, where the greatest differences are caused mainly by evaluations in mathematics that were perceived either as the best or the worst. The research also monitored the use of the newly-developed GTOLT (Group Test of Logical Thinking), which is based on the GALT (Group Assessment of Logical Thinking) and TOLT (Test of Logical Thinking) tests, rooted in standardized tools.


INTRODUCTION
It has been shown that school assessment is a determining factor for a pupil in all motivational, informative, regulating, educational, prognostic and differential functions. In the present article and the research described below, school evaluation in mathematics (at the end of a semester) is considered a demonstration of summative evaluation summarizing the pupils' learning results after completion of activity (Brookhart, 2013;Hoover & Abrams, 2013). The formative and summative assessment questions were possible to find out in studies of authors (Mohamadi, 2018;Puddy et al., 2014;Wei, 2014;Wholey, 1996). The summative assessment summarizes the pupils learning outcomes and the aim is to obtain holistic view about pupil's performance. This assessment is providing to pupil at the moment, where is impossible to influence the evaluation. (Sewall & Santaga, 1986). The summative assessment in Czech Republic has got the form (1 -5), which is similar to (A -E). These authors draw attention to the fact that summative evaluation is often given to the pupils at the moment when they can no longer change their results. It is believed that formative assessment is a much more effective tool for adapting teaching to help students master the material (Garrison & Ehringhaus, 2007). On this place is appropriate to mention in brief form, what formative assessment is. This concept is realized during the time of learners' activity due to understating how learners are in the process of representation of the concept (Kuh et al., 2014). The formative and also summative assessment serve the similar purposes, but their aims are different. The summative assessment is used for the long time (Greenstein, 2010). Currently, there is no indication that this assessment should change in any way, although it is increasingly criticized. Based on the formative assessment, students can understand the issue and justify why this or that subject is so important (Brophy, 1999). Influencing students' learning based on formative assessment is mentioned by Weurlander et al. (2012) who state that formative assessment: i) promotes motivation, especially in the long run, ii) leads to more precise study by students. The presented article points out the fact that although the summative evaluation is criticized, it is encouraging that the evaluation by teachers marks the pupil's performance, both in terms of the level of his logical thinking and in terms of success in the mathematical diagnostic test. Also Marinho, Leite and Fernandes (2017) summarized the positive effect of summative assessment, when it has a classification and measurement purpose, it simultaneously assumes a formative function.
Also, the fact is, that in the discipline like mathematic is the assessment has got great importance not only for learners and their parent, however also for teachers, policy-makers, curriculum-makers and for future employers (e.g., Boud & Soler, 2016). The similar statement is possible to find in the studies of Brown and Lally (2018), King et al. (2017), Potvin et al. (2020), and Rakoczy et al. (2019). Summative assessment is also questionable in terms of reliability, as teachers must be sure that retesting a student using the same assessment will produce consistent results (Popham, 2014). Summative assessment is considered by pupils themselves to be a satisfactory form of assessment and is preferred over other assessment methods (Rokos et al., 2019). There are many disadvantages to quantitative assessment; pupils may tend to learn "for grades" only, even though this represents an extremely simplified, abstract way of evaluating pupils´ performance. Abandoning the practice of quantitative assessment also has also created a certain discourse in the context of inclusive education (Smetáčková, 2018).
The main goal of the paper is to find out if there are differences to be found in school assessment outcomes in relation to pupils' logical thinking and mathematics skills. Within the skills for the 21st century it is necessary to use different kinds of thinking (inductive reasoning, deductive, etc.). Liu, Ludu and Holton (2015) support this view and consider valid logical reasoning to be the key element of healthy critical thinking. Although the reasoning in mathematics differs significantly from dayto-day reasoning (Bronkhorst et al., 2020), the rationale in a mathematical proof is not just a formal procedure, but also involves discussion, exploration and examination (Bronkhorst et al., 2020) and shows the need for a more informal method of solving formal reasoning tasks. Widana et al. (2018) analyzed the effectiveness of thinking skills assessment towards critical thinking skills of high school students in mathematics lesson. Authors used quasi-experimental design and the results showed, that thinking skills assessment could improve student's critical thinking skills in mathematics lesson effectively. The similar results were possible to find in the study of Ramdani, Syamsuddin and Sirajuddin (2019).
It has been shown that the key problem in research on logical thinking is certain inconsistencies in its definition, where different definitions and interpretations were provided by Albrecht (1984), Chuechote et al. (2020), and Labouvie (1992). Neither definition covers logical thinking in the way it has been understood for the purposes of this paper. We have thus created a new definition, combining these two: Logical thinking is a process in which an individual looks away from the contents of individual statements and thoroughly exploits his or her own judgements to ensure correct conclusions. The intermediate, individual steps of this process form a relation between preconditions and conclusions by demonstrating their connection to the judgements. With regard to these findings, other authors show that the development of formal argumentation should be the key priority in science education (DeCarcer et al., 1978;Lawson, 1982). This is an important issue because, for instance, logical reasoning creates a certain style of thinking that affects pupils' ability to solve tasks in physics. This claim is based in particular on the fact that tasks in physics include problems, which, need various types of logical reasoning, mathematical operation and experiment, respectively, and these then promote the formation of learners' thinking (Korsun, 2019). Logical thinking also helps the pupil to work with interactive multimedia which can improve skills for more advanced thinking (Hartini et al., 2017). In order to use the potential of logical thinking, efforts have been made to quantify it, such as described, for example, in Fadiana et al. (2019). Cresswell and Speelmann (2019) focused on the effect of logical thinking to students' achievement in mathematics. Authors declared positive relationship between these two variables. The similar results were possible to find in the research of Jeon and Park (2014) and also Sartika and Fatmanissa (2020). Insorio and Librada (2021), and Yang and Chang (2013) demonstrated significant improvements in critical thinking skills, and academic achievement. Ali (2010) investigated the difference between academic achievement of students who have high critical thinking dispositions and of students who have low critical thinking disposition and whether this difference change with students' gender. Result of this study, there is no statistically significant difference between students' academic achievement and critical thinking. Jawad,

Contribution to the literature
• The article focuses on the determination of correlation between pupils' logical thinking and their school assessment and between pupils' mathematics skills and their school assessment. • Pupils with better school results in mathematics achieve higher levels of logical thinking and pupils with better school results in mathematics have better mathematics skills. • The study provides the spectrum of statistical methods, which could be used on the analysis of presented variables.
Maiwall and Hussein (2019) focused on the effect of logical thinking on the achievement of biology students with respect to gender. Authors used experimental design and summarized, that there is no effect in achievement with respect to gender. The research studies regarding to relationship mathematical skills and pupils school assessment are lower frequent. Some findings is possible to read in the study of Finn et al. (2014) with the positive relationship between these two variables. Similar results are possible to find in research works of authors Tirpakova (2018, 2021), Watt et al. (2014). Guo and Yan (2019) found out negative affective attitudes towards summative assessment. Girls had more positive instrumental attitudes towards this kind of assessment than boys. Tt was found that students' affective and instrumental attitudes to formative assessment positively predicted students' affective and instrumental attitudes to summative assessment. As it is possible to observe number of research studies related to presented topic is little bit narrow. This statement is supported by other studies like Martinovic and Manizade (2018), Nortvedt and Buchholtz (2018), and Ubuz and Aydin (2018).

PROBLEM OF RESEARCH
The research problem was to determine correlation between school assessment and pupils' logical thinking and mathematics skills and research problem is defined in the research questions. The research questions are rooted in our belief that assessment should provide opportunities for all pupils to prove their mathematical skills and should respond to a variety of students (Klieme et al., 2004;Klinger et al., 2015). Within these research questions, the issues of logical thinking and school evaluation in mathematics are also emphasized. The aim of the research was to find the answers to the following research questions: RQ1: What is the correlation between pupils' logical thinking and their school assessment?
RQ2: What is the correlation between pupils' mathematics skills and their school assessment?
The research problems are linked to two hypotheses: H1: Pupils with better school results in mathematics achieve higher levels of logical thinking.
H2: Pupils with better school results in mathematics have better mathematics skills.

Sample of Research
The research was carried out with pupils aged 13 -16 (M = 15, SD = 0.49). The main method used was a questionnaire and further partial methods (e.g., based on qualitative research). The main target was to process data collected from a total of 252 respondents (117 boys and 135 girls). In terms of gender division, the sample is balanced. The actual test tools were distributed to more than 300 pupils from randomly selected classes in the Czech Republic, and without this selection, it would not be possible to verify the statistical level of significance. The data was collected by university students who were informed in detail about the individual research steps. The research involved an evaluation from the end of the last semester. The obtained values were first entered into Excel 2015 (Microsoft, 2016) and then transferred to Statistica 12 (Statsoft, 2016). The basic unit of the research sample was not individual pupils but whole school classes. Choosing the multi-stage random selection in which a pupil would be the basic unit would have been very time-consuming and very difficult to organize. Due to the fact that the main data collection was carried out by means of a questionnaire and takes about two hours for each pupil. As a part of the data analysis itself, there was a reduction in case the student did not answer the question. It means, id student did not fill whole research tool, he/she was eliminated from further analysis. In the case of the school evaluation, it was not possible to take into consideration evaluation 5 as it was achieved by almost none of the students.

Statistical Methods Used
A detailed descriptive and frequency analysis was preceded by cleaning the data. When outliers were identified, for instance, using a quartile graph, it is preferable (rather than using very rigorous statistical methods) to examine why outliers have occurred at all. If outliers were identified, it was checked whether there had been a measurement error. Given the size of the sample outlier values can always be expected. It is not possible to proceed mechanically when removing outliers or extreme values.
A large variety of data was acquired. It was necessary to select dependent and independent samples, nominal, ordinal and metric random variables, to assess normality in the metric variables and thus make a choice between parametric or non-parametric statistical methods. The following statistical methods and techniques were used throughout analysis of the survey: • Normality test • Non-parametric hypothesis testing • Non-parametric dispersion analysis followed by post hoc analysis (multiple observations).
The data analysis was performed with the Robinson and Levin (1997) two-step model. As a guideline for assessing the significance of the results, statistically unimagined by the range of the analyzed data set, effect size coefficients were used (Cohen, 1988;Sheskin, 2003;Thomas & Nelson, 2001). These coefficients eliminate the influence of positive dependence on statistical significance on the sample size (Rosenthal, Rosnow, & Rubin, 2000). Eta squared (η2) and Cohen's d were used to measure effect size.
Other statistical method was regression analyses with summative assessment as independent variable and achievement from test of didactic test from mathematics and test of logical thinking as dependent variables. The regression analyses was performed with model ENTER.

Logical thinking test
The authors choose research tools, which were possible in the form of pencil-paper. (Lawson, 1978;Tobin & Capie, 1980). The preference tools are GALT a TOLT. The research tool TOLT is focused on the formative operative thinking (Piaget & Inhelder, 1955), where every item is paired and answer have to be justified, because justification is needed for understanding (Weber et al., 2014). The research tool GTOLT is focused primarily on the academic achievement in the connection with formal justification. This cognitive dimension was in last two decades in marginal awareness of researchers. In our previous research (Chytry, 2015) we described pilot testing of the tool GALT (Group Assessment of Logical Thinking) for the Czech Republic, it is important to notice, the study was in Czech language. Our current research includes features of the TOLT (Test of Logical Thinking) as it was presented in important journals abroad. We are obliged to mention that not only GALT and TOLT were used, as they do not cover the full scope of logical thinking tests in the way we understand the problem. The newlycreated test is labelled GTOLT (Group Test of Logical Thinking). The presented test includes 20 items which focused on various attributes of logical thinking. The answers to the questions were assessed dichotomously, 0 -if the pupil´s answer was incorrect, and 1 -if the pupil´s answer was correct. If the pupil did not answer a question, an empty set was used in coding. This type of coding allowed the results to be interpreted in the following way: the arithmetic average of the measured values is an adequate point estimation of parameter p of alternative distribution, which is the probability that a randomly selected pupil would answer the question correctly. Some of the test entries required pupil´s reasoning. In such cases, the pupil scored only on the condition that their answer, as well as the reasoning, was correct. The entries given were then evaluated as a unique set, including the reasoning. The maximum score was 20 points. The reliability was determined by two ways: a) Split-Half with Spearman-Brown Adjustment (0.68), b) KR20 = 0.71. The research tool is reliable. Some basic characteristics of research tool are presented in Table 1.
The problematic items were 16, 18 a 19, but these items were left in analyses after consultation with experts in the field of didactics of mathematics due to two reasons: a) the work with control variable and probability is lack during teaching of mathematics; b) the research tool was created due to as unit was matched with the logical thinking definition. The research tool was in initial phase piloted by 30 respondents due to content validity. On the basis of their results and comments the tasks were revised mainly in stylistic way.

Testing mathematics skills
To analyze the mathematics skills of a pupil, we used a freed CERMAT (Centre for High School Graduation Exam Reform -test M9PID15C0T01)1 test. The test involves 17 tasks, and the equipment allowed included pens and geometry instruments. The test covers three general topics: i) number and variable, ii) processing data, dependences and relations, iii) geometry (computational and constructional geometry on the planar and spatial levels). The original evaluation of the CERMAT test allowed the respondent to score as many as 50 points; we adapted the assessment so that each answer had a value of one point if the answer was correct and no points if the answer was incorrect. The respondents in our testing were able to score up to 28 points. This is not the school evaluation mentioned above (on the scale A -E), but success in the didactic test in mathematics. The reliability was tested by two ways a) Split-Half with Spearman-Brown Adjustment (0.92), b) KR20 = 0.91. The research tool is reliable. The item analysis was realized due to basic psychometric characteristics of the test. Table 2 includes basic characteristics, as it is possible to see some items were out of standard level of difficulty.
The tasks were used from the standardized test and the items were tested on the respondents of similar age as were used in this research. The coding was dichotomy (0 -incorrect answer; 1 -correct answer). The authors are aware by limitations of this method, but for the conception of this study it is the best way.

RESULTS
As already mentioned above, the respective tables and analyses shown below do not cover the fifth grade of assessment (E), as only a low number of respondents scored that particular grade. On the basis of descriptive analysis (Table 1) we may notice a trend in the following outcomes: the less satisfactory the school performance (evaluated at level 1 -4 or A -D), the lower a pupil´s success rate within the GTOLT and the poorer the mathematics skills. The correlation matrices are presented below. The relevant p-level values for the Shapiro-Wilk normality test are also added to Table 3.
We tested the hypothesis H1 a H2 through the use of the Kruskal-Wallis test. The general formulation of the null hypothesis in this case is: the medians are identical, separate school performance assessments for both tools. The p-level found for logical thinking (Kruskal-Wallis test: H (3, N = 233) = 31.92 p < 0.001) shows that dependence is significant, at the 1% level. In the case of effect size, the results are: η2 = 0.14 and dcohen = 0.80, therefore we conclude that we notice a large effect here (Cohen, 1988). We found similar conclusions with mathematics skills, as the values measured were (Kruskal-Wallis test: H (3, N= 237) = 54.75, p < 0.001), meaning the values are η2 = 0.28 and dcohen = 1.24. Also here we may state that it is of large effect. It is interesting to compare the results of the post-hoc analysis of both tests.  Table 2 shows that in terms of statistical significance, the findings are identical for both studied domains (logical thinking and mathematical skills). In both cases, when the two best ratings (1 -2 / A -B) and the two worst ratings (3 -4 / C -D) are compared, there is no statistically significant difference between them. In all other cases the differences are statistically significant.
The figure shows that the worse a pupil´s summative school assessment in mathematics is, the less successful the pupil appears to be in the logical thinking test as well as in the mathematics skills test.
Other way is to use correlation and regression analyses, where the variable summative school assessment was as dependent variable and other variables had character of independent variables following: a) achievement in didactic test from mathematics; b) achievement in logical thinking test. The values of regression analysis are presented in Table 5. From Table 3 was possible to read, that all variables had got the significant effect. The next step was to calculation of Spearman correlation coefficient (ρ). The ρ value between summative assessment and achievement in didactic test from mathematic was ρ = -0.48 (p < 0.001) and between summative assessment and achievement of logical thinking test was ρ = 0.37 (p < 0.001).

DISCUSSION
The first research question is rooted in the assumption that assessment in Maths is an adequate tool for monitoring pupils´ skills in solving mathematics problems (Rosli et al., 2013). The second research question deals with a similar issue: the perspective of mathematics skills also in relation to school performance assessment, which should consider both the content of and the procedures for solving problems, as suggested in NCTM (2014), and Pellegrino et al. (2001). We claim that both our hypotheses have been confirmed (H1: Pupils with better school results in mathematics achieve higher levels of logical thinking H2: Pupils with better school results in mathematics have better mathematics skills) and pupils´ assessments in mathematics reflect their logical thinking skills together with their mathematics skills, even under the condition that this type of evaluation is implemented for monitoring   reasons only (e.g., in Parke et al., 2003). Also, the similar findings are possible to find in the study of Firdaus et al. (2015), positive effects of use mathematical learning module on problem-based learning to enhance learning skills. The pupils have better skills to argument and to evaluate (Pitchford et al., 2016). The mean values / medians matching the individual grades in education in both testing procedures (Table 1) are as follows: grade A (1) (9.84/8.75), grade B (2) (9.51/9.90), grade C (3) (7.15/8.00), grade D (4) (6.65/6.50); and for mathematics skills -grade A (1) (15.18/15.00), grade B (2) (12.38/12.00), grade C (3) (9.26/9.00), grade D (4) (7.50/6.00), suggesting that with a higher grade (lower evaluation) the pupil´s ability to think logically decreases along with their mathematics skills. This is not a surprising conclusion. What is interesting is the diversification itself and the possibilities of generalizing the conclusions with respect to the statistically significant differences. Based on the research, it can be said that pupils are (in terms of school evaluation in mathematics) divided by teachers into two groups, where the first group consists of those who have grades A and B and the second group, those with grades C or D. Taking Gauss's normal distribution of frequency into consideration, it could be expected that the same differences as between pupils with A and B grades will be found between all pairs. However, the research shows that differences are more likely to be between pupils with B and C, and between any other groups (meaning the groups where the difference in school evaluation is one grade). This fact could be supported by the theory of average pupils in summative assessment have got better logical thinking skills as the best and worst pupils. This statement is supported by study of Eckstein and Shemesh (1989) and it is valid nowadays. Also, Simonton (1992) stated, that the success in mathematics is not so important in the future career of pupils, but the level of logical thinking skills is predicting factor of their career and success in life.
The research shows that pupils belong to two diverse groups, where the first group are assessed with A and B and the latter group are assessed with C or D. From the point of view of logical thinking, the conclusions are in line with earlier research that we tested on elementary school respondents (in total 162 respondents), 108 grammar school students, and 23 students of different secondary schools (Chytry, 2015). We saw that a pupil´s ability to find numeral rules and the laws of geometry, as well as correct judgement in terms of working with logical connectives, depends on their assessment (grade). It means, cognitive level to understand these concepts was on higher level, when the summative assessment was better. Such an outcome was not applicable to students outside elementary schools and eight-year grammar schools. Zaman (2011) also reached the same conclusion in the field of mathematics skills. He carried out research which found a correlation between the mathematical thinking of pupils and their success rate in their school performance assessment in mathematics. The testing involved a specially designed test focusing on mathematical thinking in pupils. The sample involved 500 randomly selected respondents. The statistics analysis was realised through regressive correlation techniques. A strong dependence was revealed between mathematical thinking and school assessment in the subject of mathematics.

CONCLUSION
The most significant limitation of our research is the scope of the sample (n = 252). The size of the base set does not guarantee strong representativeness. Maximum effort was put into balancing proportions within the sample participating, therefore effect size eliminating the size of the sample was used. The testing involved a minimum of pupils qualifying as "E"failing. For this reason, these students were excluded from the data matrix. Other limitation is the answers of respondents, it is very hard to distinguish between skills and if respondent remembered the correct answer or incorrect answer was caused only by single mistake. These difficulties in the evaluation of data are common for research typical in social studies.
As it is seen in results part and also in Discussion there is positive correlation between summative assessment and level of logical thinking. So, for the all representatives of educational environment should be focus on the process how to improve the level of logical thinking among learners. The logical thinking is in the connection with the creativity level of every person, so the teachers in every school grade should focus in the teaching process on the developing of creativity and logical thinking. More concretely, it is possible to use different educational games, which support divergent thinking, creativity and as it was mentioned above also logical thinking. Also, to focus on the reflection of the connection theory with practice. In the many schools in Czech Republic the learning process is focused on the theoretical presentation of topic without any practical connection.
Also, the mathematics skills are in narrow relationship with summative assessment in mathematics. The ways, how to improve mathematics skills among learners are similar like are presented in previous paragraph. For many learners is difficult to find and understand the connection of mathematics with real life, so the higher amount of tasks, which are connected with everyday problems could lead to developing of mathematical skills and then to better summative assessment. There are others forms, how to improve mathematics kills, for example to focus on the misunderstandings in mathematics to their complete understanding, to offer more than one possible manners of solutions etc.