Using Three-Tier Diagnostic Test to Assess Students’ Misconceptions of States of Matter

This study involves the development of a three-tier diagnostic test to measure high school students’ understanding of states of matter concepts. The States of Matter Diagnostic Test (SMDT) is a 19-item three-tier diagnostic test consisting of three-tier items for assessing students’ understanding of states of matter concepts. The SMDT was administered to 195 10th grade high school students in the pilot study and 102 10th grade high school students in the main study. Cronbach alpha reliability indexes for the SMDT were estimated to be .78 and .83 for the pilot and main study, respectively. Point-biserial coefficients ranged from .20 to .69 with an average of .44 for the pilot study and with an average of .49 for the main study.


INTRODUCTION
In the science education research literature, several studies have been conducted on students' difficulties in learning about various phenomena (see Duit 2007 for a bibliography of literature on students' conceptions in science education).There has been a debate related to the term that is used to describe students' ideas of science concepts that are different from scientifically acceptable understandings.Many researchers have characterized students' ideas that are different from the definitions accepted by experts in various ways, such as misconceptions (Nakhleh, 1992), alternative conceptions (Abimbola, 1988), and children's science (Gilbert et al., 1982).Although there are some differences among these definitions, in this study, the term of misconceptions is used for students' ideas that differ from the definitions accepted by experts.There have been numerous studies indicating that students' misconceptions have considerable influence on students' learning of fundamental science concepts and the subsequent more advanced concepts (Artdej et al., 2010;Ayas et al., 2010;Gabel et al., 1987;Tytler, 2000;Voska & Heikkinen, 2000).Therefore, identification of the students' misconceptions is crucial for the planning of effective instruction and remediation of students' difficulties in understanding science concepts.
Interviews (Bou Jaoude, 1991;Griffiths & Preston, 1992;Osborne, 1980;Osborne & Gilbert, 1980), concept maps (Ingec, 2009;Kaya, 2008;Novak & Gowin, 1984), and multiple-choice tests (Hestenes et al., 1992;Ingram & Nelson, 2006) are the popular tools that are often used to identify students' misconceptions.However, these tools have several limitations.However, there are several limitations of these methodologies.For example, interviews are used for exploring students' ideas because interviews provide detailed understanding of students' conceptions.Apart from bias resulting from the personal involvement of the interviewer, interviews are very time-consuming to conduct and to transcribe and analyze the data (Marshall & Rossman, 2006).Also, concept maps are useful tools for identification of misconceptions (Martin et al., 2000;Novak, 1990).Yet, concept maps have disadvantages such as the requirement of training both for teachers and students about how to use concept map and the need for a large amount of time to conduct in the classroom (Kaya, 2008).Multiple-choice tests are the other common tools often used for their better content domain sampling and mechanical scoring.They can also be efficiently administered to large samples of students (Haladyna, 1997).However, multiple-choice tests do not provide reasons for students' holding a particular conception.A student can give a correct answer with a wrong reason or a wrong answer with a correct reason.
Consequently, because of the above limitations of the aforementioned tools, two-tier multiple-choice instruments have been developed by researchers (e.g., Odom & Barrow, 1995;Peterson et al., 1986;Tan et al., 2002;Treagust, 1986;Voska & Heikkinen, 2000).The first part of each item includes a conventional multiple-choice question and the second part of each item contains a set of possible reasons for the given answer in the first part.Two-tier tests are generally superior to conventional multiple-choice tests, since they provide researchers with an understanding of students' reasoning behind their answers (Peterson et al., 1986).Hestenes and Halloun (1995) indicated that the major problem for using conventional multiple-choice tests was to minimize false positives and negatives.Students could provide correct answers with wrong reasoning as "false positives" and wrong answers with correct reasoning as "false negatives".They recommended that minimizing false positives and negatives provides a more valid test.Although a two-tier test eliminates the above-mentioned drawback of a conventional multiplechoice test, it has a limitation: It cannot differentiate misconceptions from lack of knowledge.Three-tier tests enable researchers to address this limitation by adding an extra tier that require students to state whether or not they are sure about their answers to the first two tiers (Caleon & Subramaniam, 2010;Pesman & Eryilmaz, 2010).Three-tier tests are valid tests that can be used efficiently with large samples of students, and help researchers to understand students' reasoning behind their answers without conducting interviews to distinguish misconceptions from lack of knowledge, and to estimate percentages of false positives and negatives (Kutluay, 2005;Pesman & Eryilmaz, 2010).
Three-tier tests are novel in the research literature (Pesman & Eryilmaz, 2010).There are only a few studies in physics on the development and application of three-tier tests (Caleon & Subramaniam, 2010;Kutluay, 2005;Pesman & Eryilmaz, 2010).No study on the development and application of a three-tier test in chemistry has been reported in the literature.Therefore, this study describes the development and application of a three-tier diagnostic test to measure 10th grade high school students' understanding of states of matter concepts after they were taught that subject.
States of matter is one of the crucial subjects in the 10th grade Turkish chemistry curriculum.It includes fundamental concepts such as solids and liquids, gases, evaporation, condensation, boiling, and vapor pressure which are conceptually related to each other and helpful in explaining several everyday phenomena.Students' understanding of these concepts has attracted considerable research interest over the past 30 years (e.g., Aydeniz & Kotowski, 2012;Bar & Galili, 1994;Bar & Travis, 1991;Canpolat, 2006;de Berg, 1995;Gopal et al., 2004;Johnson, 1998a, b;Novick & Nussbaum, 1978;Osborne & Cosgrove, 1983).Osborne and Cosgrove (1983) examined students' (aged 8 to 17 years old) conceptions about the changes of the states of water by using a clinical interview methodology.They found that conceptions of students of all ages are similar; some nonscientific ideas even were common

511
among older students compared to younger students.For example, both younger and older students have the same idea that water breaks apart into hydrogen and oxygen gases when boiling.In line with this study, Gopal et al. (2004) also conducted interviews with second-year chemical engineering students to investigate their conceptions of evaporation, condensation, and vapor pressure.They found that the following misconceptions were held by students: i) evaporation and condensation require a temperature gradient, ii) evaporation only occurs in a closed system, and iii) the higher the vapor pressure, the faster the evaporation.Bar and Travis (1991)

Sample
In the pilot study, the SMDT was administered to 195 10th grade high school Turkish students aged 15-16 years (49% females and 51% males) after they were taught about the states of matter.In the main study, the SMDT was administered to 102 10th grade high school Turkish students aged 15-16 years (60% females and 40% males) after they were taught the subject.Two types of schools -general high school and Anatolian high school -were included in the pilot study since students in these schools are known to differ in achievement.Every student could be registered to a general high school after graduation from junior high school.However, only the students who are successful in the Secondary Schools Student Selection Examination are able to register in an Anatolian high school.For convenience, three Anatolian high schools and two general high schools were selected for this study.

Procedure
The SMDT was developed using the procedures employed by Kutluay (2005), Pesman andEryilmaz (2009), andTreagust (1986).The following five stages were pursued for the development of the SMDT: i) defining content boundaries, ii) identification of the reported misconceptions in the literature, iii) conducting interviews to explore whether or not students hold misconceptions different from the reported ones, iv) administering open-ended questions so that students' responses are categorized for writing the distracters of the items, and v) the development and administration of the SMDT for the pilot study.
The content boundaries were defined based on the Chemistry curriculum and textbooks with a list of objectives (see Table 1) that were examined by four chemistry educators and one chemistry teacher.Appropriateness of the content, confirmation of the accuracy, and content validation were established on expert agreement.Students' misconceptions were identified by examining the related literature, conducting interviews, and administering open-ended questions.The interview was semi-structured and consisted of 13 questions and follow-up probes to investigate high school chemistry students' understanding of states of matter, evaporation, condensation, boiling, and vapor pressure.The interview protocol was piloted and revised for face validity.A total of 12 interviews were conducted with each interview lasting up to 50 minutes.
In the light of the findings from the interviews and related literature, 13 multiple-choice items were

Table 1.Objectives of the SMDT Objectives
Items To explain the relationship between temperature and volume of an amount of gas at constant pressure.(Charles's Law).

1, 3, 5
To apply the law of conservation of mass in different contexts.
2, 8 To explain the relationship between temperature and pressure of an amount of gas at constant volume.(Gay-Lussac's Law).

4, 6
To explain the relationship between pressure and volume of an amount of gas at constant temperature.(Boyle's Law).

7, 9
To interpret evaporation operationally.constructed with open-ended questions requiring reasons for the selection of a particular response to an item.Most of the questions were the same as the questions in the interview.The questions were examined by the experts (four chemistry educators and one chemistry teacher) to assure that the questions were appropriate and unproblematic, and that the objectives and misconceptions intended to be examined were assessed.The 13 questions were administered to 54 high school students in one hour lasting up 45 minutes.
Students' answers to the questions were categorized and the categories with high frequencies were written as the distracters of the second tier of the items to produce 13 two-tier multiple-choice items in the SMDT.The distracters were selected from students' common misconceptions.In the third tier, the students were asked whether they were confident about their answers for the first two tiers with the aim of differentiating misconceptions from lack of knowledge.
An additional six questions were written by the researchers using the follow-up questions in the interview guide and the questions in Chemistry textbooks.The content validity of SMDT was established by the experts (four chemistry educators and one chemistry teacher) in terms of the objectives and misconceptions intended to be assessed, and whether the questions are appropriate for the grade level and unproblematic.The 19-item three-tier diagnostic test was administered to 195 10th grade students in the pilot study.In the main study, 102 10th grade students were given the SMDT.The SMDT were completed by the students in one class hour lasting up 45 minutes.

Instrument
The SMDT is a 19-item three-tier diagnostic test consisting of three-tier items for assessing students' understanding of states of matter concepts.The first tier consists of a conventional multiple-choice question with three or four choices.The second tier includes one correct reason and alternative reasons.The alternative reasons are the misconceptions identified from semistructured interviews and open-ended questions.In addition to alternative reasons, a blank space is provided for the students to write their reasons if their reasoning is different from the given reasons.The third tier requires students to state how confident they are about their answers for the first two tiers.
The SMDT examines the conceptual areas of Charles Law (three items), Boyle Law (two items), Gay-Lussac Law (two items), conservation of matter (two items), evaporation (three items), condensation (two items), boiling (three items), and vapor pressure (two items).Example items of the SMDT are available in the appendix.The original version of the SMDT is in Turkish; however, it was translated into English in order to present the items for this journal submission.The misconceptions that were probed by the SMDT are shown in Table 2.

Data Analysis
The SMDT scores of students were typed into a Microsoft Excel datasheet.Variables were written in the columns and students names were written in the rows of the Excel datasheet.Seven variables were produced: i) one-tier scores, ii) two-tier scores, iii) three-tier scores, iv) confidence tiers, v) misconception one-tier, vi) misconception two-tier, and vii) misconception threetier.
One-tier scores: This score was created by using students' answers for only the first tiers of items.
Correct answers were coded as 1 and others were coded as 0. Two-tier scores: This variable was based on the first two tiers of items.When a student's answer to both the first and second tiers was correct, it was coded as 1; otherwise, 0. Three-tier scores: This score was produced by taking all three tiers into account.When a student's answer to all tiers was correct, it was coded as 1; otherwise, 0. This means that it was coded as 1 if the student answers the first two tiers correctly and s/he selects "I am sure" in the third tier.
Confidence tiers: This variable was based on students' answers to only third tiers.When a student was confident about her/his answers for the first two tiers, it was coded as 1; otherwise, 0.
Misconception one-tier: Misconception one-tier was created according to students' answers to the first tiers of items for each misconception in Table 2.When a student's answer to the first tiers was the misconceptions as indicated in Table 2, it was coded as 1; otherwise, 0. Misconception two-tier: Misconception two-tier was based on students' answers to the first two tiers of items for each misconception in Table 2.When a student's answer to both the first and second tiers was the misconceptions as indicated in Table 2, it was coded as 1; otherwise, 0.
Misconception three-tier: Misconception three-tier was produced by considering students' answers to all tiers of items for each misconception in Table 2.When a student's answer to the first two tiers was the misconceptions and when s/he selects "I am sure" in the third tier as indicated in Table 2, it was coded as 1; otherwise, 0. The Cronbach alpha reliability was calculated for one-tier scores, two-tier scores, and three-tier scores.Descriptive statistics of the SMDT for three-tier scores were reported (see Table 3).In addition, false negatives and false positives were calculated based on all three 513 tiers.For "false positives", if a student who was confident about the responses given to the first two tiers gave a correct response to the first tier with an incorrect reasoning in the second tier, it was coded as 1; otherwise 0. For "false negatives", if a student who was confident about the responses to the first two tiers gave an incorrect response to the first tier with a correct reasoning in the second tier, it was coded as 1; otherwise 0. Furthermore, the correlation between two-tier scores and confidence tiers was investigated for the validity of the SMDT.

RESULTS AND DISCUSSION
In this part, first, the results of the pilot study are given and then, the results of the main study are reported.Cronbach's alpha reliability coefficients of the SMDT were estimated to be .61,.70,and .78,respectively for one-tier scores, two-tier scores, and three-tier scores in the pilot study.Table 3 summarizes the descriptive statistics of the SMDT for the three-tier scores in the pilot study.Table 3 shows that the Point-biserial coefficients except for two items (items 6 and 16) based on ITEMAN are good with an average of .44 (Ebel, as cited in Crocker & Algina, 1986).This shows that items are functioning quite satisfactorily.Ebel (as cited in Crocker & Algina, 1986) proposed that if the item-scale correlation value was greater than .40, the item was functioning quite satisfactorily.If it was between the values of .30and .40, the item was functioning somehow good.If it was between the values of .20 and .30, the item needed revision.If it was below .19, the item should have been deleted or completely revised.Items 6 and 16 were revised after the pilot study.The Turkish grammar in these sentences was changed.It was also seen that the difficulty levels of items except one item were below .40 with an average of .23.The mean

515
score was found to be 4.57 and the possible maximum score was 19.The skewness of the three-tier scores was found to be .90.The difficulty level and positive skewness explains the low mean value of 4.57.
In order to check the validity of the SMDT, the relationship between the two-tier scores and the confidence tier scores was investigated in the pilot study.In addition, the probabilities of false negatives and positives were calculated.The correlation between two-tier scores and confidence tier scores was examined as a quantitative approach to provide evidence for the validity of the SMDT (Cataloglu, 2002;Pesman & Eryilmaz, 2010).Cataloglu (2002) and Pesman and Eryilmaz (2010) reported that there should be at least a moderate positive correlation between two-tier scores and confidence tier scores since students with high scores are expected to be more confident than students with low scores.Pearson-product moment correlation coefficient between two-tier scores and confidence tier scores of the SMDT was calculated.It was found that there was a moderate positive correlation between twotier scores and confidence tier scores (r= .34, n= 195, p< .01).The moderate positive correlation provides validity evidence in that more confident students have higher scores in the SMDT.Hestenes and Halloun (1995) reported that minimizing the probabilities of false negatives and positives was important for validity of the test.They suggested that the probability of false negatives needs to be less than ten percentages.In addition, they added that minimizing the probability of false positives is more difficult because of the chance factor.Table 4 demonstrates the percentages of false negatives and positives in the pilot study.When the items were checked for false negatives, it was found that all the items, except for item 14, were below 10 with the average of 4.0.Item 14 is related to the condensation in an open system.When item 14 was examined, it was seen that most of the students chose one of the wrong alternatives -hot air condenses and water droplets form on the outer surface of the bottle-for the first-tier although they gave the correct answer for the secondtier.The correct answer for the first-tier is "water vapor in the air condenses and water droplets form on the outer surface of the bottle".This could have resulted from students' carelessness in that they could not differentiate between those two alternatives.When the percentage of false positives was checked, it was seen that items 1 and 5 had the highest percentages.These items were checked and no problem was found related to these items.However, this result could be attributed to the students' misunderstanding of the constant pressure in a system.The students selected correct answers for the first-tier of items 1 and 5; however, they did not give a correct reason for their answer to the first-tier since they may think that "in a closed container filled with a gas, when temperature increases/decreases, the gas pressure always increases/decreases".
In terms of lack of knowledge values, it was found that all values were high with the average of 38.8.This   Kutluay (2005) and Pesman and Eryilmaz (2010).
After the items 6 and 16 were revised by changing their grammar and wording in the pilot study, the SMDT was administered to 102 10th grade students.Cronbach's alpha reliability coefficients for the main study were estimated to be .62,.73,and .83,respectively for one-tier, two-tier, and three-tier scores.Table 6 shows the descriptive statistics of the SMDT for threetier scores in the main study.
Table 6 shows that Point-biserial coefficients based on ITEMAN are good with an average of .49(Ebel, as cited in Crocker & Algina, 1986).This shows that items are functioning quite satisfactorily.It was also seen that the difficulty levels of items was medium with an average of .42.The mean was found to be 7.96 and the possible maximum score was 19.The mean explains the difficulty level of items.The skewness of the three-tier scores was found to be 0.02.Since the skewness value is close to 0, the distribution of the scores is nearly symmetrical.The kurtosis value is negative and this means that the distribution of the scores is rather flat.
Table 7 shows the percentages of false negatives, false positives, and lack of knowledge for the SMDT for three-tier scores in the main study.When the items were checked for false negatives, it was found that the average value of false negatives was 3.8.When the percentages of false positives were checked, it was seen that the average value of false positives for all items was 8.9.The average values of false negatives and positives were below 10 and this showed the validity of the test (Hestenes & Halloun, 1995).In terms of lack of knowledge values, it was found that the average of lack of knowledge scores was 25.2.Pearson-product moment correlation coefficient between two-tier scores and confidence tiers of the SMDT was also calculated.It was found that there was a high positive correlation between two-tier scores and confidence tiers (r= .57,n= 102, p< .01).The high positive correlation provides validity evidence in that more confident students have higher scores in the SMDT.
There are various reasons to use three-tier diagnostic instruments.Apart from their objectivity in scoring, broad content domain sampling, mechanical scoring, and generalizability, they have advantages in terms of enabling researchers to examine the validity of the instrument and estimate misconception scores.The correlation between two-tier scores and confidence tiers and the percentages of false negatives and false positives provide evidence for validity of the test.Three-tier tests estimate misconception scores more accurately compared to one-tier and two-tier tests since they differentiate misconceptions from lack of knowledge (Caleon & Subramaniam, 2010;Kutluay, 2005;Pesman & Eryilmaz, 2010).
It could be concluded that the SMDT provides a valid and reliable three-tier diagnostic instrument for evaluating students' misconceptions and conceptual understanding of states of matter concepts as Caleon and Subramaniam (2010); Kutluay (2005), and Pesman and Eryilmaz (2010) indicated.Furthermore, this study demonstrated that the three-tier test seems to be the most reliable one among all types of instruments since the reliability coefficients for the SMDT in the pilot study were estimated to be .61,.70,and .78,respectively for one-tier, two-tier, and three-tier scores and Cronbach's alpha reliability coefficients for the main study were estimated to be .62,.73,and .83,respectively for one-tier, two-tier, and three-tier scores.
Consequently, three-tier tests are superior to the tools such as interviews, multiple-choice tests, and twotier tests due to their broad content domain sampling, mechanical scoring, validity evidence, and differentiation of misconceptions from lack of knowledge (Kutluay, 2005;Pesman & Eryilmaz, 2010).Further studies could use the SMDT as a tool for assessing students' misconceptions of States of Matter subject.In line with Caleon and Subramaniam's (2010) study, the SMDT could be used as pre-and post-test to assess students' understanding of the subject.Researchers may prefer the SMDT, evidently a valid and reliable diagnostic instrument, to evaluate the effectiveness of an instruction designed for helping students remediate their misconceptions they hold about the states of matter, and acquire a better understanding.In addition, with similar purposes, teachers also would like to use the SMDT.Three-tier tests provide opportunity for teachers to gain deeper insight about understandings of their students.By using the percentages of lack of knowledge, teachers can evaluate their instruction.The large percentage of lack of knowledge may mean that the instruction did not facilitate students' understanding of the related concepts.The science education research literature lacks three-tier tests.In this study, in order to measure 10th grade high school students' understanding of states of matter concepts, the SMDT was developed

Table 3 .
Descriptive Statistics of the SMDT for Three-Tier Scores in the Pilot Study

Table 4 .
The Percentages of False Negatives, False Positives, and Lack of Knowledge in the Pilot Study

Table 5 .
The Percentages of Misconceptions for One-tier, Two-tier, and Three-tier Scores in the Pilot Study

Table 6 .
Descriptive Statistics of the SMDT for Three-Tier Scores in the Main Study

Table 7 .
The Percentages of False Negatives, False Positives, and Lack of Knowledge in the Main Study