In order to continue enjoying our site, we ask that you confirm your identity as a human. Thank you very much for your cooperation.
- Avoid negative and double-negative statements. These can unnecessarily confuse students.
- Keep the proportion of false statements slightly higher than true statements. Students tend to guess “true” on uncertain questions.
- Avoid trivia. Make sure that your true/false questions directly assess learning goals.
- Many of the disadvantages of true/false questions can be eliminated by asking students to explain their answer. However, this approach has its own disadvantages (e.g., workload).
Adams, W. K., & Wieman, C. E. (2011). Development and validation of instruments to measure learning of expert-like thinking. International Journal of Science Education, 33(9), 1289–1312. //doi.org/10.1080/09500693.2010.512369.
Article Google Scholar
Alnabhan, M. (2002). An empirical investigation of the effects of three methods of handling guessing and risk taking on the psychometric indices of a test. Social Behavior and Personality, 30, 645–652.
Article Google Scholar
Angelo, T. A. (1998). Classroom assessment and research: An update on uses, approaches, and research findings. San Francisco: Jossey-Bass.
Google Scholar
Ávila, C., & Torrubia, R. (2004). Personality, expectations, and response strategies in multiple-choice question examinations in university students: A test of Gray’s hypotheses. European Journal of Personality, 18(1), 45–59. //doi.org/10.1002/per.506.
Article Google Scholar
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Marcel Dekker.
Book Google Scholar
Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21(1), 5–31. //doi.org/10.1007/s11092-008-9068-5.
Article Google Scholar
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51. //doi.org/10.1007/BF02291411.
Article Google Scholar
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2001). A mixture item response model for multiple-choice data. Journal of Educational and Behavioral Statistics, 26(4), 381–409.
Article Google Scholar
Briggs, D., Alonzo, A., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11(1), 33–63. //doi.org/10.1207/s15326977ea1101_2.
Article Google Scholar
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). New York: Springer-Verlag Retrieved from //www.springer.com/us/book/9780387953649.
Google Scholar
Burton, R. F. (2002). Misinformation, partial knowledge and guessing in true/false tests. Medical Education, 36(9), 805–811.
Article Google Scholar
Chiu, T.-W., & Camilli, G. (2013). Comment on 3PL IRT adjustment for guessing. Applied Psychological Measurement, 37(1), 76–86. //doi.org/10.1177/0146621612459369.
Article Google Scholar
Couch, B. A., Hubbard, J. K., & Brassil, C. E. (2018). Multiple–true–false questions reveal the limits of the multiple–choice format for detecting students with incomplete understandings. BioScience, 68(6), 455–463. //doi.org/10.1093/biosci/biy037.
Article Google Scholar
Couch, B. A., Wood, W. B., & Knight, J. K. (2015). The molecular biology capstone assessment: A concept assessment for upper-division molecular biology students. CBE-Life Sciences Education, 14(1), ar10. //doi.org/10.1187/cbe.14-04-0071.
Article Google Scholar
Couch, B. A., Wright, C. D., Freeman, S., Knight, J. K., Semsar, K., Smith, M. K., et al. (2019). GenBio-MAPS: A programmatic assessment to measure student understanding of vision and change core concepts across general biology programs. CBE—Life Sciences Education, 18(1), ar1. //doi.org/10.1187/cbe.18-07-0117.
Article Google Scholar
Cronbach, L. J. (1941). An experimental comparison of the multiple true-false and multiple multiple-choice tests. Journal of Educational Psychology, 32(7), 533.
Article Google Scholar
Crouch, C. H., & Mazur, E. (2001). Peer instruction: Ten years of experience and results. American Journal of Physics, 69(9), 970–977. //doi.org/10.1119/1.1374249.
Article Google Scholar
de Ayala, R. J. (2008). The theory and practice of item response theory (1st ed.). New York: The Guilford Press.
Google Scholar
Diamond, J., & Evans, W. (1973). The correction for guessing. Review of Educational Research, 43(2), 181–191.
Article Google Scholar
Dudley, A. (2006). Multiple dichotomous-scored items in second language testing: Investigating the multiple true–false item type under norm-referenced conditions. Language Testing, 23(2), 198–228. //doi.org/10.1191/0265532206lt327oa.
Article Google Scholar
Eagan, K., Stolzenberg, E. B., Lozano, J. B., Aragon, M. C., Suchard, M. R., & Hurtado, S. (2014). Undergraduate teaching faculty: The 2013–2014 HERI faculty survey. Los Angeles: Higher Education Research Institute, UCLA Retrieved from //www.heri.ucla.edu/monographs/HERI-FAC2014-monograph-expanded.pdf.
Google Scholar
Ellis, A. P. J., & Ryan, A. M. (2003). Race and cognitive-ability test performance: The mediating effects of test preparation, test-taking strategy use and self-efficacy. Journal of Applied Social Psychology, 33(12), 2607–2629. //doi.org/10.1111/j.1559-1816.2003.tb02783.x.
Article Google Scholar
Ericsson, K. A., Krampe, R. T., & Tesch-romer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363–406.
Article Google Scholar
Fox, J. (2010). Bayesian item response modeling. New York: Springer.
Book Google Scholar
Frary, R. B. (1988). Formula scoring of multiple-choice tests (correction for guessing). Educational Measurement: Issues and Practice, 7(2), 33–38. //doi.org/10.1111/j.1745-3992.1988.tb00434.x.
Article Google Scholar
Frey, B. B., Petersen, S., Edwards, L. M., Pedrotti, J. T., & Peyton, V. (2005). Item-writing rules: Collective wisdom. Teaching and Teacher Education: An International Journal of Research and Studies, 21(4), 357–364.
Article Google Scholar
Frisbie, D. A. (1992). The multiple true-false item format: A status review. Educational Measurement: Issues and Practice, 11(4), 21–26.
Article Google Scholar
Frisbie, D. A., & Sweeney, D. C. (1982). The relative merits of multiple true-false achievement tests. Journal of Educational Measurement, 19(1), 29–35. //doi.org/10.1111/j.1745-3984.1982.tb00112.x.
Article Google Scholar
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis, 1(3), 515–534.
Article Google Scholar
Gelman, A., Hwang, J., & Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24(6), 997–1016. //doi.org/10.1007/s11222-013-9416-2.
Article Google Scholar
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309–333. //doi.org/10.1207/S15324818AME1503_5.
Article Google Scholar
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park: SAGE Publications, Inc.
Handelsman, J., Miller, S., & Pfund, C. (2007). Scientific teaching. New York: W. H. Freeman and Co.
Google Scholar
Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30(3), 141–158.
Article Google Scholar
Hubbard, J. K., & Couch, B. A. (2018). The positive effect of in-class clicker questions on later exams depends on initial student performance level but not question format. Computers & Education, 120, 1–12. //doi.org/10.1016/j.compedu.2018.01.008.
Article Google Scholar
Javid, L. (2014). The comparison between multiple-choice (mc) and multiple true-false (mtf) test formats in Iranian intermediate EFL learners’ vocabulary learning. Procedia - Social and Behavioral Sciences, 98, 784–788. //doi.org/10.1016/j.sbspro.2014.03.482.
Article Google Scholar
Kalas, P., O’Neill, A., Pollock, C., & Birol, G. (2013). Development of a meiosis concept inventory. CBE-Life Sciences Education, 12(4), 655–664. //doi.org/10.1187/cbe.12-10-0174.
Article Google Scholar
Kim (Yoon), Y. H., & Goetz, E. T. (1993). Strategic processing of test questions: The test marking responses of college students. Learning and Individual Differences, 5(3), 211–218. //doi.org/10.1016/1041-6080(93)90003-B.
Article Google Scholar
Kreiter, C. D., & Frisbie, D. A. (1989). Effectiveness of multiple true-false items. Applied Measurement in Education, 2(3), 207–216.
Article Google Scholar
National Research Council (NRC). (2012). Discipline-based education research: Understanding and improving learning in undergraduate science and engineering. Washington, D.C.: National Academies Press.
Google Scholar
Nehm, R. H., & Reilly, L. (2007). Biology majors’ knowledge and misconceptions of natural selection. BioScience, 57(3), 263–272. //doi.org/10.1641/B570311.
Article Google Scholar
Nehm, R. H., & Schonfeld, I. S. (2008). Measuring knowledge of natural selection: A comparison of the CINS, an open-response instrument, and an oral interview. Journal of Research in Science Teaching, 45(10), 1131–1160. //doi.org/10.1002/tea.20251.
Article Google Scholar
Newman, D. L., Snyder, C. W., Fisk, J. N., & Wright, L. K. (2016). Development of the Central Dogma Concept Inventory (CDCI) assessment tool. CBE-Life Sciences Education, 15(2), ar9. //doi.org/10.1187/cbe.15-06-0124.
Article Google Scholar
Parker, J. M., Anderson, C. W., Heidemann, M., Merrill, J., Merritt, B., Richmond, G., & Urban-Lurain, M. (2012). Exploring undergraduates’ understanding of photosynthesis using diagnostic question clusters. CBE-Life Sciences Education, 11(1), 47–57. //doi.org/10.1187/cbe.11-07-0054.
Article Google Scholar
Piñeiro, G., Perelman, S., Guerschman, J. P., & Paruelo, J. M. (2008). How to evaluate models: Observed vs. predicted or predicted vs. observed? Ecological Modelling, 216(3), 316–322. //doi.org/10.1016/j.ecolmodel.2008.05.006.
Article Google Scholar
Pomplun, M., & Omar, H. (1997). Multiple-mark items: An alternative objective item format? Educational and Psychological Measurement, 57(6), 949–962.
Article Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainments tests. Copenhagen: Danish Institute for Educational Research.
Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3–13. //doi.org/10.1111/j.1745-3992.2005.00006.x.
Article Google Scholar
Semsar, K., Brownell, S., Couch, B. A., Crowe, A. J., Smith, M. K., Summers, M. M. et al. (2019). Phys-MAPS: A programmatic physiology assessment for introductory and advanced undergraduates. Advances in Physiology Education, 43(1), 15–27. //doi.org/10.1152/advan.00128.2018.
Smith, M. K., Wood, W. B., & Knight, J. K. (2008). The Genetics Concept Assessment: A new concept inventory for gauging student understanding of genetics. CBE-Life Sciences Education, 7(4), 422–430. //doi.org/10.1187/cbe.08-08-0045.
Article Google Scholar
Stan Development Team. (2017). Stan modeling language users guide and reference manual, version 2.15.0 (version 2.15.0). //mc-stan.org.
Google Scholar
Stenlund, T., Eklöf, H., & Lyrén, P.-E. (2017). Group differences in test-taking behaviour: An example from a high-stakes testing program. Assessment in Education: Principles, Policy & Practice, 24(1), 4–20. //doi.org/10.1080/0969594X.2016.1142935.
Article Google Scholar
Summers, M. M., Couch, B. A., Knight, J. K., Brownell, S. E., Crowe, A. J., Semsar, K., et al. (2018). EcoEvo-MAPS: An ecology and evolution assessment for introductory through advanced undergraduates. CBE—Life Sciences Education, 17(2), ar18. //doi.org/10.1187/cbe.17-02-0037.
Article Google Scholar
Thissen, D., Steinberg, L., & Fitzpatrick, A. R. (1989). Multiple-choice models: The distractors are also part of the item. Journal of Educational Measurement, 26(2), 161–176. //doi.org/10.1111/j.1745-3984.1989.tb00326.x.
Article Google Scholar
Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432. //doi.org/10.1007/s11222-016-9696-4.
Article Google Scholar
Vickrey, T., Rosploch, K., Rahmanian, R., Pilarz, M., & Stains, M. (2015). Research-based implementation of peer instruction: A literature review. CBE-Life Sciences Education, 14(1), es3. //doi.org/10.1187/cbe.14-11-0198.
Article Google Scholar
Wood, W. (2004). Clickers: A teaching gimmick that works. Developmental Cell, 7(6), 796–798. //doi.org/10.1016/j.devcel.2004.11.004.
Article Google Scholar
Page 2
Skip to main content
From: Multiple-true-false questions reveal more thoroughly the complexity of student thinking than multiple-choice questions: a Bayesian item response model comparison
A | Mastery, TTFF partial mastery, informed reasoning based on attractiveness, informed reasoning with double-T endorsement bias, individual student performance | 18,520.7 | 0 | 200.0 | 342.1 |
B | − remove question-level mastery | 18,927.0 | − 406.3 | 199.1 | 288.7 |
C | − remove TTFF partial mastery | 18,566.0 | − 45.3 | 200.2 | 328.9 |
D | − remove TTFF partial mastery + replace with TTFF-TFTF partial mastery | 18,534.3 | − 13.6 | 199.7 | 341.8 |
E | − remove TTFF partial mastery + replace with TTFF-TFTF-TFFT partial mastery | 18,536.5 | − 15.8 | 199.5 | 333.5 |
F | − remove informed reasoning based on attractiveness + replace with random guessing | 19,582.4 | − 1061.7 | 203.4 | 268.4 |
G | − remove double-T bias | 18,656.1 | − 135.4 | 201.4 | 330.3 |
H | − remove double-T bias + replace with multi-T bias | 18,533.4 | − 12.7 | 200.3 | 344.6 |
I | − remove question-level double-T bias + replace with global double-T bias for each student | 18,539.6 | − 18.9 | 200.0 | 325.1 |
J | + add random guessing for students not in mastery, partial mastery, or informed reasoning | 18,519.9 | + 0.8 | 199.9 | 356.1 |
K | − remove individual student performance | 19,448.5 | − 927.8 | 196.7 | 180.2 |