A Quantitative Analysis of Uncertainty in the Grading of Written Exams in Mathematics and Physics

Hugo Lewi Hammer; Laurence Habib

doi:10.12973/eurasia.2016.1240a

Full Text (PDF)

Hugo Lewi Hammer ¹ ^* , Laurence Habib ¹

More Detail

¹ Oslo and Akershus University College of Applied Sciences, NORWAY^* Corresponding Author

Abstract

The most common way to grade students in courses at university and university college level is to use final written exams. The aim of final exams is generally to provide a reliable and a valid measurement of the extent to which a student has achieved the learning outcomes for the course. A source of uncertainty in grading students based on an exam is that such exams only consist of a limited number of exercises. We investigate the extent of this uncertainty by means of a statistical analysis of the results of 23 different examinations taken by 2788 students. The amount of uncertainty is substantial and typically ranges over three grades. Increasing the duration of the examination decreases the uncertainty, however.

Keywords

examination duration
grading
quantitative research
uncertainty
written exam

License

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article Type: Research Article

EURASIA J Math Sci Tech Ed, Volume 12, Issue 4, April 2016, 975-989

https://doi.org/10.12973/eurasia.2016.1240a

Publication date: 17 Jun 2016

Article Views: 2400

Article Downloads: 1743

Open Access References How to cite this article

References

Abdul-Rahman, Syariza, Edmund Burke, Andrzej Bargiela, Barry McCollum, and Ender Özcan. 2014. "A constructive approach to examination timetabling based on adaptive decomposition and ordering." Annals of Operations Research 218 (1):3-21. doi: 10.1007/s10479-011-0999-8.
Ackerman, Phillip L., and Ruth Kanfer. 2009. "Test Length and Cognitive Fatigue: An Empirical Examination of Effects on Performance and Test-Taker Reactions." Journal of Experimental Psychology: Applied 15 (2):163-181. doi: 10.1037/a0015719
Admiraal, Wilfried, Mark Hoeksma, Marie-Therese van de Kamp, and Gee van Duin. 2011. "Assessment of Teacher Competence Using Video Portfolios: Reliability, Construct Validity, and Consequential Validity." Teaching and Teacher Education: An International Journal of Research and Studies 27 (6):1019-1028. doi: 10.1016/j.tate.2011.04.002
Agasisti, Tommaso, and Francesca Bonomi. 2014. "Benchmarking universities' efficiency indicators in the presence of internal heterogeneity." Studies in Higher Education 39 (7):1237-1255. doi: 10.1080/03075079.2013.801423.
Allais, Stephanie. 2014. "A critical perspective on large class teaching: the political economy of massification and the sociology of knowledge." Higher Education 67(6):721-734. doi: 10.1007/s10734-013-9672-2.
Bird, Fiona L., and Robyn Yucel. 2013. "Improving marking reliability of scientific writing with the Developing Understanding of Assessment for Learning programme." Assessment & Evaluation in Higher Education 38(5):536-553. doi: 10.1080/02602938.2012.658155.
Blanco-Ramírez, Gerardo, and Joseph B. Berger. 2014. "Rankings, accreditation, and the international quest for qualityOrganizing an approach to value in higher education." Quality Assurance in Education: An International Perspective 22 (1):88-104. doi: 10.1108/QAE-07-2013-0031.
Boyas, Elise, Lois D. Bryan, and Tanya Lee. 2012. "Conditions affecting the usefulness of preand post-tests for assessment purposes." Assessment & Evaluation in Higher Education 37 (4):427-437. doi: 10.1080/02602938.2010.538665.
Burton, Richard F. 2006. "Sampling Knowledge and Understanding: How Long Should a Test Be?" Assessment & Evaluation in Higher Education 31 (5):569-582. doi: 10.1080/02602930600679589
Cliffordson, Christina. 2008. "Differential Prediction of Study Success across Academic Programs in the Swedish Context: The Validity of Grades and Tests as Selection Instruments for Higher Education." Educational Assessment 13 (1):56-75. doi: 10.1080/10627190801968240
Davis, L. E., Martin C. Harrison, A. S. Palipana, and J. P. Ward. 2005. "Assessment-driven learning of mathematics for engineering students." International Journal of Electrical Engineering Education 42 (1):63-72. doi: 10.7227/IJEEE.42.1.8
Delen, Erhan. 2015. "Enhancing a Computer-Based Environment with Optimum Item Response Time." Eurasia Journal of Mathematics, Science and Technology Education 11 (6):1457-1472. doi: 10.12973/eurasia.2015.1404a
DeVellis, Robert F. 2012. Scale Development: Theory and Applications. 3rd ed. London: Sage.
Dobson, Annette J. , and Adrian G. Barnett. 2008. An Introduction to Generalized Linear Models, Texts in Statistical Science. Boca Raton, FL: Chapman & Hall/CRC Press.
Hambleton, Ronald K, Hariharan H Swaminathan, and Jane Rogers. 1991. Fundamentals of item response theory. Newbury Park, CA: Sage.
Harlen, Wynne. 2005. "Trusting teachers’ judgement: research evidence of the reliability and validity of teachers’ assessment used for summative purposes." Research Papers in Education 20 (3):245-270. doi: 10.1080/02671520500193744.
Hughes, Clair. 2013. "A case study of assessment of graduate learning outcomes at the programme, course and task level." Assessment and Evaluation in Higher Eduction 38:492-506. doi: 10.1080/02602938.2012.658020.
Irwin, Brian, and Stuart Hepplestone. 2012. "Examining increased flexibility in assessment formats." Assessment & Evaluation in Higher Education 37 (7):773-785. doi: 10.1080/02602938.2011.573842.
Jensen, Jamie L., Dane A. Berry, and Tyler A. Kummer. 2013. "Investigating the Effects of Exam Length on Performance and Cognitive Fatigue." PLoS ONE 8 (8):1-9. doi: 10.1371/journal.pone.0070270.
Kuo, Bor-Chen, Muslem Daud, and Chih-Wei Yang. 2015. "Multidimensional Computerized Adaptive Testing for Indonesia Junior High School Biology." Eurasia Journal of Mathematics, Science and Technology Education 11 (5):1105-1118. doi: 10.12973/eurasia.2015.1384a
Lord, Frederic M. 1952. A Theory of Test Scores Vol. 7, Psychometric Monograph. Richmond, VA.
Lord, Frederic M. 1953. "The relation of test score to the trait underlying the test." Educational and Psychological Measurement 13:517-548.
Lord, Frederic M. 1980. Applications of Item Response Theory to Practical Testing Problems. London: Routledge.
Lord, Frederic M., and Melvin R. Novick. 1968. Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Mumford, Christine L. 2010. "A multiobjective framework for heavily constrained examination timetabling problems." Annals of Operations Research 180 (1):3-31. doi: 10.1007/s10479-008-0490-3.
Muraki, E. . 1997. "A generalized partial credit model." In Handbook of modern item response theory, edited by W. van der Linden and R. K. Hambleton, 153-164. New York: Springer.
PARSCALE (Version 4.1). Scientific Software International, Lincolnwood, IK.
Nijmegen, Radboud University. 2011. "Conversion of Grades." Accessed 17 July 2014. http://www.ru.nl/io/english/general_0/document/.
Palmer, Edward J., Paul Duggan, Peter G. Devitt, and Rohan Russell. 2010. "The modified essay question: its exit from the exit examination?" Medical Teacher 32 (7):e300-e307. doi: 10.3109/0142159X.2010.488705.
Rue, H. 2014. "The R-INLA project." Accessed 17 July 2014. http://www.r-inla.org/.
Russell, Jill, Lewis Elton, Deborah Swinglehurst, and Trisha Greenhalgh. 2006. "Using the online environment in assessment for learning: a case‐study of a web‐based course in primary care." Assessment & Evaluation in Higher Education 31 (4):465-478. doi: 10.1080/02602930600679209.
Sadler, D. Royce. 2009. "Grade Integrity and the Representation of Academic Achievement." Studies in Higher Education 34 (7):807-826. doi: 10.1080/03075070802706553
Simpson, Lucy, and Jo-Anne Baird. 2013. "Perceptions of trust in public examinations." Oxford Review of Education 39 (1):17-35. doi: 10.1080/03054985.2012.760264.
Vu, Nv, A. Baroffio, P. Huber, C. Layat, M. Gerbase, and M. Nendaz. 2006. "Assessing clinical competence: a pilot project to evaluate the feasibility of a standardized patient -- based practical examination as a component of the Swiss certification process." Swiss Medical Weekly 136 (25-26):392-399.
Vukasovic, Martina. 2013. "Change of higher education in response to European pressures: conceptualization and operationalization of Europeanization of higher education." Higher Education 66 (3):311-324. doi: 10.1007/s10734-012-9606-4.
Westerheijden, Don F., Bjørn Stensaker, Maria J. Rosa, and Anne Corbett. 2014. "Next Generations, Catwalks, Random Walks and Arms Races: Conceptualising the development of quality assurance schemes." European Journal of Education 49 (3):421- 434. doi: 10.1111/ejed.12071.
William, Dylan. 1996. "Standards in examinations: a matter of trust?" The Curriculum Journal 7 (3):293-306. doi: 10.1080/0958517960070303
Wittek, Line, and Tone Kvernbekk. 2011. "On the problems of asking for a definition of quality in education." Scandinavian Journal of Educational Research 55 (6):671-684. doi: 10.1080/00313831.2011.594618.

How to cite this article

APA

Hammer, H. L., & Habib, L. (2016). A Quantitative Analysis of Uncertainty in the Grading of Written Exams in Mathematics and Physics. Eurasia Journal of Mathematics, Science and Technology Education, 12(4), 975-989. https://doi.org/10.12973/eurasia.2016.1240a

Vancouver

Hammer HL, Habib L. A Quantitative Analysis of Uncertainty in the Grading of Written Exams in Mathematics and Physics. EURASIA J Math Sci Tech Ed. 2016;12(4):975-89. https://doi.org/10.12973/eurasia.2016.1240a

AMA

Hammer HL, Habib L. A Quantitative Analysis of Uncertainty in the Grading of Written Exams in Mathematics and Physics. EURASIA J Math Sci Tech Ed. 2016;12(4), 975-989. https://doi.org/10.12973/eurasia.2016.1240a

Chicago

Hammer, Hugo Lewi, and Laurence Habib. "A Quantitative Analysis of Uncertainty in the Grading of Written Exams in Mathematics and Physics". Eurasia Journal of Mathematics, Science and Technology Education 2016 12 no. 4 (2016): 975-989. https://doi.org/10.12973/eurasia.2016.1240a

Harvard

Hammer, H. L., and Habib, L. (2016). A Quantitative Analysis of Uncertainty in the Grading of Written Exams in Mathematics and Physics. Eurasia Journal of Mathematics, Science and Technology Education, 12(4), pp. 975-989. https://doi.org/10.12973/eurasia.2016.1240a

MLA

Hammer, Hugo Lewi et al. "A Quantitative Analysis of Uncertainty in the Grading of Written Exams in Mathematics and Physics". Eurasia Journal of Mathematics, Science and Technology Education, vol. 12, no. 4, 2016, pp. 975-989. https://doi.org/10.12973/eurasia.2016.1240a