Research Article

Establishing a physics concept inventory using computer marked free-response questions

Mark A. J. Parker 1 * , Holly Hedgeland 1 2 , Sally E. Jordan 1 , Nicholas St. J. Braithwaite 1
More Detail
1 School of Physical Sciences, The Open University, Walton Hall, Milton Keynes, MK7 6AA, UK2 Clare Hall, University of Cambridge, Herschel Road, Cambridge, CB3 9AL, UK* Corresponding Author
European Journal of Science and Mathematics Education, 11(2), April 2023, 360-375, https://doi.org/10.30935/scimath/12680
Published Online: 05 December 2022, Published: 01 April 2023
OPEN ACCESS   1134 Views   601 Downloads
Download Full Text (PDF)

ABSTRACT

The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of 8,091 question responses were gathered to develop and test the AMS. The AMS questions were tested for reliability using classical test theory (CTT). The AMS computer marking rules were tested for reliability using inter-rater reliability (IRR). Findings from the CTT and IRR studies demonstrated that the AMS questions and marking rules were overall reliable. Therefore, the AMS was established as a physics concept inventory which uses automatically-marked, free-response questions. The approach used to develop and test the AMS could be used in further attempts to develop concept inventories which make use of automatically-marked, free-response questions.

CITATION (APA)

Parker, M. A. J., Hedgeland, H., Jordan, S. E., & Braithwaite, N. S. J. (2023). Establishing a physics concept inventory using computer marked free-response questions. European Journal of Science and Mathematics Education, 11(2), 360-375. https://doi.org/10.30935/scimath/12680

REFERENCES

  1. Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555-596. https://doi.org/10.1162/coli.07-034-R2
  2. Butcher, P. G., & Jordan, S. E. (2010). A comparison of human and computer marking of short free-text student responses. Computers and Education, 55, 489-499. https://doi.org/10.1016/j.compedu.2010.02.012
  3. Cohen, J. (1960). A coefficient for nominal scales. Educational and Psychological Measurement, 20, 37-46. https://doi.org/10.1177/001316446002000104
  4. Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Wadsworth Group/Thompson Learning.
  5. Ding, L., & Beichner, R. (2009). Approaches to data analysis of multiple-choice questions. Physical Review Special Topics-Physics Education Research, 5, 020103. https://doi.org/10.1103/PhysRevSTPER.5.020103
  6. Ding, L., Chaby, R., Sherwood, B., & Beichner, R., (2006). Evaluating an electricity and magnetism assessment tool: Brief electricity and magnetism assessment. Physical Review Special Topics-Physics Education Research, 2, 010105. https://doi.org/10.1103/PhysRevSTPER.2.010105
  7. Doran, R. (1980). Basic measurement and evaluation of science instruction. NSTA.
  8. Eaton, P., (2021). Evidence of measurement invariance across gender for the force concept inventory. Physical Review Physics Education Research, 17, 010130. https://doi.org/10.1103/PhysRevPhysEducRes.17.010130
  9. Garvin-Doxas, K., Klymkowsky, M., & Elrod, S. (2007). Building, using, and maximizing the impact of concept inventories in the biological sciences: Report on a National Science Foundation-sponsored conference on the construction of concept inventories in the biological sciences. CBE Life Sciences Education, 6(4), 277-282. https://doi.org/10.1187/cbe.07-05-0031
  10. Han, J., Bao, L., Chen, L., Cai, T., Pi, Y., Zhou, S., Tu, Y., & Koenig, K. (2015). Dividing the force concept inventory into two equivalent half-length tests. Physical Review Special Topics-Physics Education Research, 11, 010112. https://doi.org/10.1103/PhysRevSTPER.11.010112
  11. Han, J., Koenig, K., Cui, L., Fritchman, J., Li, D., Sun, W., Fu, Z., & Bao, L. (2016). Experimental validation of the half-length force concept inventory. Physical Review Special Topics-Physics Education Research, 12, 020122. https://doi.org/10.1103/PhysRevPhysEducRes.12.020122
  12. Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30, 141-158. https://doi.org/10.1119/1.2343497
  13. Hufnagel, B. (2002). Development of the astronomy diagnostic test. Astronomy Education Review, 1(1), 47-51. https://doi.org/10.3847/AER2001004
  14. Hunt, T. (2012). Computer-marked assessment in Moodle: Past, present, and future. In Proceedings of Computer Assisted Assessment 2012 International Conference.
  15. Jordan, S. (2012). Short-answer e-assessment questions: Five years on. In Proceedings of the 2012 International Computer Assisted Assessment Conference.
  16. Kline, P. (1986). A handbook of test construction: Introduction to psychometric design. Methuen.
  17. Lee, N. W., Shamsuddin, W. N. F. W, Wei, L. C., Anuardi, M. N. A. M., Heng, C. S., & Abdullah, A. N. (2021). Using online multiple choice questions with multiple attempts: A case for self-directed learning among tertiary students. International Journal of Evaluation and Research in Education, 10(2), 553-568. https://doi.org/10.11591/ijere.v10i2.21008
  18. Mitchell, T., Aldridge, N., Williamson, W., & Broomhead, P. (2003). Computer based testing of medical knowledge. In Proceedings of the 7th International Computer Assisted Assessment Conference.
  19. Nicol, D., (2007). E‐assessment by design: Using multiple‐choice tests to good effect. Journal of Further and Higher Education, 31(1), 53-64. https://doi.org/10.1080/03098770601167922
  20. Porter, L., Taylor, C., & Webb, K. (2014). Leveraging open source principles for flexible concept inventory development. In Proceedings of the 2014 Conference on Innovation & Technology in Computer Science Education (pp. 243-248). https://doi.org/10.1145/2591708.2591722
  21. Rebello, N., & Zollman, D. (2004). The effect of distractors on student performance on the force concept inventory. American Journal of Physics, 72, 116. https://doi.org/10.1119/1.1629091
  22. Scott, T. F., & Schumayer, D. (2017). Conceptual coherence of non-Newtonian worldviews in force concept inventory data. Physical Review Physics Education Research, 13, 010126. https://doi.org/10.1103/PhysRevPhysEducRes.13.010126
  23. Simon, & Snowdon, S. (2014). Multiple-choice vs free-text code-explaining examination questions. In Proceedings of the 14th Koli Calling International Conference on Computing Education Research (pp. 91-97). https://doi.org/10.1145/2674683.2674701
  24. Smith, J. I., & Tanner, K. (2010). The problem of revealing how students think: Concept inventories and beyond. CBE Life Sciences Education, 9(1), 1-5. https://doi.org/10.1187/cbe.09-12-0094
  25. Sychev, O., Anikin, A., & Prokudin, A. (2020) Automatic grading and hinting in open-ended text questions. Cognitive Systems Research, 59, 264-272. https://doi.org/10.1016/j.cogsys.2019.09.025
  26. Thornton, R., & Sokoloff, D. (1998). Assessing student learning of Newton’s laws: The force and motion conceptual evaluation and the evaluation of active learning laboratory and lecture curricula. American Journal of Physics, 66, 338. https://doi.org/10.1119/1.18863
  27. Yasuda, J., Mae, N., Hull, M. M., & Taniguchi, M., (2021). Optimizing the length of computerized adaptive testing for the force concept inventory. Physical Review Physics Education Research, 17, 010115. https://doi.org/10.1103/PhysRevPhysEducRes.17.010115
  28. Zehner, F., Salzer, C., & Goldhammer, F. (2016). Automatic coding of short text responses via clustering in educational assessment. Educational and Psychological Measurement, 76(2), 280-303. https://doi.org/10.1177/0013164415590022
  29. Zeilik, M., (2003). Birth of the astronomy diagnostic test: Prototest evolution. Astronomy Education Review, 1(2), 46-52. https://doi.org/10.3847/AER2002005
  30. Zhang, L., & VanLehn, K., (2021). Evaluation of auto-generated distractors in multiple choice questions from a semantic network. Interactive Learning Environments, 29(6), 1019-1036. https://doi.org/10.1080/10494820.2019.1619586
  31. Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103(3), 374-378. https://doi.org/10.1037/0033-2909.103.3.374