What Do We Know about Teachers ’ Knowledge? Assessing Primary Science Teachers ’ Content Knowledge in the Jewish and Arab Sectors

This study concerns the importance of assessing teachers ’ content knowledge (CK) and it is one of the very few international studies that directly measures science teachers ’ CK. We developed a comprehensive content knowledge test (CKT), relating to the content strand: states of matter . The study compared teachers from the Jewish and the Arab sectors in three stages of professional development: Results showed that all participants ’ CK level was low (M = 61.3%). Comparison between the sectors showed that CK scores among the Arab pre-service and science teachers were significantly higher than CK scores among the Jewish pre-service and science teachers. Our study is an example of a direct CK assessment of teachers that provides valuable diagnostic information. Its dismal findings illuminate the need for innovative methods to measure teachers ’ CK directly in various disciplines, and contributes strong impetus for a renewed focus on the training and professional development of teachers.


INTRODUCTION
A little learning is a dangerous thing; Drink deep, or taste not the Pierian spring: There shallow draughts intoxicate the brain, And drinking largely sobers us again (Pope, A., 1711, An Essay on Criticism) Ball (1991) stated, "Teachers cannot help children learn things they themselves do not understand" (p. 5). Studies have supported this idea: of all factors that influence classroom learning processes, teachers are the most important. Moreover, low levels of teacher knowledge lead to low levels of knowledge among their students (Boothe, Barnard, Peterson, & Coppola, 2018;Campbell et al., 2014;Demirdogen, Hanuscin, Uzuntiryaki-Kondakci, & Koseglu, 2016).
The term subject matter knowledge has been discussed in the research literature in the context of the structure of the disciplines (Bruner, 1977;Schwab, 1964). Shulman (1986) distinguished between content knowledge (CK), often called subject matter knowledge (SMK), and pedagogical content knowledge (PCK) which "goes beyond knowledge of subject matter per se to the dimension of subject matter knowledge for teaching" (p. 9). Numerous studies have focused on PCK; their aim has been to understand the kinds of knowledge needed for teaching (e.g., Jüttner, Boone, Park, & Neuhaus, 2013;McConnell, Parker, & Eberhardt, 2013;Scheiner, Montes, Godino, Carrillo, & Pino-Fan, 2019). In contrast to PCK, few studies have explored CK and changes in teachers' CK experienced over the years (Boothe et al., 2018;Rice, 2005;van Driel, Berry, & Meirink, 2014). Our research is based on the consensus that there are connections and even overlap between CK and PCK, but in the field of science education it is important to measure them as separate knowledge domains (Großschedl, Welter, & Harms, 2019;Jüttner et al., 2013).
Teachers' knowledge is usually evaluated by background variables related to their professional studies (Arzi & White, 2008;Baumert et al., 2010;Sadler, Sonnet, Coyle, Cook-Smith, & Miller, 2013). These data are easily accessible, yet they do not directly examine the teacher's comprehension of scientific concepts or principles. The present study is thus important, as it directly examined the CK of three groups at varied stages of their professional development. The CK assessment of teachers is exemplified in our study on primary school science teachers (STs) and focused on the topic of states of matter, a central content strand in the science curriculum. Our findings emphasize the need to employ a variety of strategies to measure teachers' CK.

THEORETICAL FRAMEWORK Theoretical Conceptualizations of Teachers' Professional Knowledge
Most current conceptualizations stem from Shulman's (1986Shulman's ( , 1987 model of professional knowledge. Over time, the term "teachers' knowledge" was significantly broadened (Ben-Peretz, 2011, p. 8). Accordingly, the definition of CK was also extended, yet in the context of education, "content knowledge" remains an elusive concept. And although it is a truism that teachers "need" content knowledge to teach effectively, it is still much less clear what kind of content knowledge is needed (Etkina et al., 2018). Despite the multiple perspectives, there is agreement that CK comprises two dimensions: declarative knowledge (i.e., knowledge of facts and concepts), and procedural knowledge (i.e., knowledge of methods and strategies) (de Jong & Ferguson-Hessler, 1996;Fenstermacher, 1994;Jüttner et al., 2013;Krathwohl, 2002). In our study we adopt Shulman's definition and relate to the concept CK solely as disciplinary academic knowledge held by teachers, which may differ from that of scientists. The two knowledge dimensions were operationalized specifically to science, as knowledge about everyday life situations relating to states of matter phenomena. We focused on core concepts and principles that describe how chemical processes occur.
Research on science teaching emphasizes that CK advances teachers' practices (McConnell et al., 2013;Rice, 2005;Scheiner et al., 2019). A high level of CK appears to be linked to teachers' efficacy to integrate interactive and challenging teaching strategies, which help students construct knowledge, and improve achievement. A deep understanding of CK is needed in constructing a meaningful curriculum, including appropriate demonstrations and models. STs who exhibit inadequate CK tend to use limited teaching strategies. They rely heavily on course textbooks, teach frontally, and give students few intellectual challenges. Inexact use of language and partial or erroneous answers are possible sources of students' misconceptions (Kirbulut & Geban, 2014;Sadler et al., 2013;van Dreil et al., 2014). These findings highlight STs' CK as basic for effective teaching, and thus indicate the importance of finding ways to evaluate this knowledge.

Assessing Science Teachers' Content Knowledge
Teachers' CK, especially in mathematics, has been conceptualized and measured over the years, while research on STs' CK has developed primarily in the last two decades (e.g., Baumert et al., 2010;Campbell et al., 2014). Most of the methodologies used to assess STs' CK have been indirect and qualitative, while quantitative tools are less frequently used (Jüttner et al., 2013). Indirect assessment has usually been based on number of college subject matter courses completed, grades, and the kind of certificate that the teacher has (Arzi & White, 2008;Baumert et al., 2010;Kind, 2014;Sadler et al., 2013). Another indirect method of measuring CK in science teaching applies tools that expose misconceptions, such as appropriate interviews or tests. Nevertheless, study of teachers' misconceptions is relatively minimal; teachers' CK level is indicated, but this is not a direct measure of their knowledge (Ayas, Özmen, & Çalik, 2010;Tatar, 2011;van Driel et al., 2014).
Measuring teachers' CK directly is challenging, as teachers often sense that their level of professionalism is being threatened (Arzi & White, 2008;Jüttner et al., 2013). Various strategies have been used to directly measure teachers' science CK. Researchers have conducted tests utilizing true/false questions (e.g., Schmidt et al., 2007), multiple choice items (e.g., Hill et al., 2008;Schmidt et al., 2007), open-ended items (e.g., Baumert et al., 2010), check-marking items that required explanations of the chosen answer. In some tests, a combination of strategies was introduced (e.g., McConnell et al., 2013).
One of the challenges in measuring teachers' CK is identifying the context in which teachers would be willing to participate in such a test. One of the most prevalent contexts in research gauging STs' CK is science Professional Development (PD) programs. These programs are geared to individuals who teach the Contribution to the literature • The impact of teachers' content knowledge (CK) on their students' knowledge is inestimable.
• Few studies have focused directly on measuring teachers' content knowledge.
• This study uses direct measurement of content knowledge of basic chemistry concepts on the topic of states of matter. • It compares teachers in three different stages of professional development -pre-service science teacher (STs), non-specialized STs, and specialized STs who teach science in primary schools. • It compares the CK of different cultures -the Jewish sector and the Arab sector.
In the present study, we conducted CK test utilizing true/false questions in one content strand only -states of matter. We used this strategy to construct a test with a large number of questions that could be answered in a relatively short time, and would also indicate mastery of CK that is essential for science teaching in primary school.

Difficulties in Acquiring Content Knowledge about States of Matter
"Science education for all" is one of the manifest educational goals of many countries, such as the United States, Great Britain and Israel (Israel's Ministry of Education, 2015;Demirdogen et al., 2016). In science education, the topic of states of matter is one of the basic subjects; it is first taught in primary school and continues through science education at all levels. The topic is also emphasized in the document on national American standards (National Research Council, 1996), which refers to the properties and the changes in the structure of matter (Rice, 2005;Sadler et al., 2013;Skamp, 2009). This makes clear that teaching about states of matter is of vital importance. Understanding the content strand of states of matter requires a grasp of the particulate nature of matter, which relates to atoms and molecules; this demands a high level of abstract thinking, as particles are invisible.
Studies have shown that misconceptions about the state of matter exist at all levels of learners. Thus, even in the third year of college study and after a few courses in the sciences, pre-service STs still hold misconceptions about the state of matter (Ayas et al., 2010;Karsli, Ayas & Çalik. 2020;Özmen & Naseriazar, 2018). Studies that expose such misconceptions are, in fact, an indirect measurement of CK and emphasize the need for direct measurement of CK in this vital content strand. This need is particularly crucial, as we will see below, in light of international data indicating that teachers are often not well-prepared for science teaching (e.g., Rollnick & Mavhunga, 2016).

Varying Levels of Teacher Training among Israeli Science Teachers
A variety of routes for training STs exist worldwide (Rollnick & Mavhunga, 2016). In Israel there are two formal frameworks for science teacher training: teaching programs for university graduates who major in chemistry, physics or biology, and science teaching programs at colleges of education. In this paper, graduates of programs who specialized in teaching science in primary schools will be termed in-service specialized STs. However, because there are not enough specialized teachers of science for all of Israeli primary schools, some teachers teach science without having received any specialized training. We use the term inservice non-specialized STs for this group.
National Academies in the U.S. (2006) found that low student achievement in science is linked to a dearth of highly qualified STs. Similar findings were presented in the OECD report (2010) regarding European countries. In addition, it was found that students of nonspecialized STs scored approximately 20% less academic growth per year compared to students of specialized STs (Tretter et al., 2013). The recommendation was to develop various interventions of professional development programs to improve science education. Similarly, in Israel, low achievement levels in international tests, led to the establishment of science PD programs for non-specialized STs, offering courses to provide teachers' basic knowledge of natural sciences (RAMA, 2016).
This study measures and compares CK of three teacher profiles: (a) pre-service STs; (b) in-service nonspecialized STs who participate in PD program; and (c) in-service specialized STs. To obtain a comprehensive picture of STs' CK concerning our chosen topic, we need to distinguish between two educational subsystems that exist in Israel: the Jewish and the Arab. Each educational subsystem is responsible for the education of a distinct ethnic sub-population: the Hebrew-speaking majority and Arabic-speaking citizens. In the Arabic educational subsystem, the students, teachers and principals in schools are all Arab citizens of Israel and the language of instruction is Arabic. Both ethnic groups have the same curriculum in science and final examinations (matriculation exams). As for higher education, Arabicspeaking students can choose to attend either of the educational frameworks.
In terms of demographics, the Arab population tends to have larger families, lower levels of parental education, and lower income levels than the Jewish population (Zuzovsky, 2010). The separation between the two educational systems enables each sector to tailor its educational program to its unique culture and heritage. The Arabic educational subsystem tends to use traditional teaching methods and has made slow progress in the application of student-oriented pedagogies (Reichel & Arnon, 2009). Recent research has also found differences in attitudes toward chemistry teaching among these sectors (Markic et al., 2016).
The state of Israel does not administrate CK tests for in-service STs. In effect, across the globe, CK tests for teachers on chemistry topics are relatively rare compared to the field of biology and physics (Gurel, Eryılmaz, & McDermott, 2015). In light of dissatisfaction in Israel with the results of PISA tests (OECD, 2016), and inquiry into the causes for low achievement, direct measurement of STs' CK is a high priority.
We, the authors, teach chemistry at colleges of education and serve as pedagogical advisors in chemistry and science in the Jewish sector. Our collaboration with a college of education in the Arab sector made it possible for us to compare one content strand of CK among teachers from the Jewish and the Arab sectors.
We developed a Content Knowledge Test (CKT) on the content strand of states of matter, as a representative central topic that STs are required to teach in primary schools. The present study has three unique characteristics. (1) It directly measured the CK of basic chemistry concepts through a comprehensive test among in-service STs in Israeli primary schools. (2) It compared STs at three different stages of professional development -pre-service STs, non-specialized STs and specialized STs. (3) It compared CK in different cultures -the Jewish sector and in the Arab sector -in two of these groups: pre-service STs, and specialized STs.

Research Questions
a. What level of CK concerning states of matter do teachers have at various stages of professional development: pre-service STs, non-specialized STs and specialized STs?
b. Are there differences in levels of CK among the three professional groups?
c. Are there differences between teachers from the Jewish and Arab sectors and within each sector?
d. What characterizes the study participants' level of CK, and what differences, in terms of the thinking levels of Bloom's taxonomy (Bloom et al., 1956), are there between the various groups and sectors?

METHOD Participants
We conducted convenience sampling of teachers living in the center of Israel. There were 423 participants at three stages of professional development: pre-service STs (N = 224); in-service non-specialized STs, who are currently taking a PD course for certification in teaching science (N = 119); and in-service specialized STs (N = 80). The pre-service STs were students from colleges of education who had finished a course in basic chemistry. The in-service non-specialized STs received teaching certificates in disciplines other than science; they neither majored in science nor were trained to teach science. This group of teachers was in the process of receiving certificates for teaching science in primary schools. The in-service specialized STs are teachers who have been trained to teach science in primary schools (grades 1-6). In the Arab sector, PD programs for teachers do not exist, because there are enough teachers in the Arab sector with a certificate in science. As a result, only Jewish nonspecialized STs were included in this study. The characteristics of all the participants appear in Table 1. As can be seen, the majority of the Arab pre-service STs and the Arab specialized STs had taken high school matriculation exams in science (chemistry/physics/ biology). In comparison, only half of the Jewish preservice STs and specialized STs had taken matriculation Table 1  exams in science in high school. The Arab specialized teachers had higher levels of education in the sciences (19.4% had undergraduate university degrees in science) than their Jewish counterparts did.

Instruments
The CKT for this study was developed in stages. We began by identifying key concepts and disciplinary core ideas concerning states of matter through interviews with expert chemistry teachers. To construct effective items, we included prevalent alternative conceptions in this content strand. Some stem from the research literature and others from the Chemical Concepts Inventory (CCI) (e.g., Gabel, Samuel, & Hunn, 1987;Kirbulut & Geban, 2014;Mulford & Robinson, 2002). That 22-item CCI, developed in 2002, is one of the main sources for tests in the field of chemical education (Gurel et al., 2015;Kruse & Roehrig, 2005). Only three items in it related to the concept of phase change of matter (Schwartz & Barbera, 2014); we thus constructed additional items to cover the content related to the state of the matter. This enabled us to conduct a test containing a relatively large number of items on a specific and important content strand.
Rather than multiple-choice questions, we opted for true/false items. This made it easier to fill out the CKT, respected the time investment of volunteer respondents and encouraged them to collaborate. In general, this type of CKT is less threatening, and can be conducted relatively quickly. It has clear advantages for researchers as well: a large number of questions can be applied to a large number of participants; this type of test is easily scored. In contrast to multiple-choice testing, it is easy to develop without the need to construct precise distracters. Conversely, this type of test also has disadvantages. Examinees' guessing contributes to the error variance and reduces the reliability of the test; the selected answer do not provide deeper insight into participants' ideas or conceptual understanding (Gurel et al., 2015). In our study, the potential of enhanced collaboration on the part of teacher participants and the ability to build a comprehensive test led us to choose the true/false format. The CKT was a "paper-and-pencil" assessment and took about 45 minutes to complete.
The CKT was first administered in a pilot study to pre-service STs (N = 90). Based on the results, we revised the closed-ended CKT. The final CKT included 35 items. The instrument was translated to Arabic via the backtranslation method (Brislin, 1970) by two professional bilingual translators; in cases of discrepancies between translations, we consulted a third academic translator. The research participants were asked to mark items as true/false. The score on the CKT -from 0 to 100represented the percentage of correct responses.
We performed content analysis of the 35 items through a category extraction process (Corbin & Strauss, (2015[1998). The mapping process was done by five science education professionals who served as independent judges (inter-rater reliability). The 35 items were divided structurally by two core categories: (1) content knowledge and (2) Bloom's taxonomy (see Figure 1).

Figure 1. The Content Knowledge Test (CKT)
Notably, in the process of category extraction it was difficult at times to classify certain items into subcategories in a definitive manner. To validate the content analysis, inter-rater reliability was tested. Following discussion between the judges until consensus was reached, the reliability was calculated and ranged from 94% to 96%.
The 35 items (α=.80) were grouped by content as follows: 12 directly tapped knowledge about the microscopic particulate structure of matter, 9 dealt with gaseous state of matter, 12 related to phase transitions, and 5 related to conservation of mass. Three items were grouped to more than one category.
The CKT items were then sorted into thinking levels based on Bloom's taxonomy (1956): knowledge, comprehension and application. Reaching a higher cognitive level thus depends on successfully achieving the level that precedes it. For example, on the lower level, knowledge, a teacher uses terminology correctly and has memorized definitions and scientific facts; comprehension is dependent on knowledge of scientific facts, and represents the ability to engage in two-way translation from one symbolic level to a different representative language (i.e., from a macroscopic description to a microscopic one). Application is the teacher's ability to explain the relationship between a given concept and new situations.
Our categorization yielded six knowledge items, 20 comprehension items and nine application items. An example of an item that reflects knowledge: "All substances boil at 100℃." An example of an item that represents comprehension and requires the ability to move from a macroscopic property to a microscopic description: "Molecules of the same substance are larger when the substance is in a solid form than when it is in a gaseous form." An example that reflects application: "A large amount of iron has a higher melting point that a small amount of iron." To evaluate the discriminant validity of the categories, we calculated Pearson's correlations. The results yielded medium strength correlations between knowledge and the other factors. The correlation between knowledge and comprehension was r (423) = 0.55, p<.001). The correlation between knowledge and application was r (423) = 0.45, p<.001).
As Bloom asserted, categories of educational objectives, regardless of discipline, are hierarchal categories. Each category is based on the previous category and includes all of the previous category's traits. As a result, we expect that the categories will correlate with one another (Bloom et al., 1956). In our research, we found that the correlations reflect the relations noted by Guttman: 0.45 < 0.8 and the product of 0.55 x 0.8 is close to 0.45.

Procedure
Data was collected with strict adherence to ethical principles. The goal of the research was explained to the participants and they gave their consent to participate in the study. It was emphasized that the research data would remain confidential and anonymous and would be used for research purposes only. The CKT was administered to respondents with the researcher (the second author) present, in various ways. The pre-service STs were given the CKT in their classes in the colleges of education. The non-specialized STs were given the CKT in their course for science teaching certification, and the specialized STs received it in their professional programs. About 90% of each group volunteered to participate in the study.

Achievement Level for All Participants
In the first stage, the level of achievement of all the participants was calculated using descriptive statistics. We calculated the percentage of correct responses to the 35 items. The mean score (the mean percentage of correct responses) was 61.03 for all the participants (SD = 18.40). The two items with the highest scores exposed knowledge according Bloom's taxonomy: "The melting point is the point at which matter turns from a solid into liquid" and "All substances boil at 100℃" (the percentage of correct respondents was 93.03 and 95.40, respectively).
Focusing on the topics covered in the CKT, the mean score of the items that dealt with the microscopic structure of matter was 71.10 (SD = 17.70). The items that dealt with the gaseous state of matter had a mean score of 62.71 (SD = 19.60). Of the 12 items that received the lowest scores, eight items tapped knowledge on the gaseous state.

Achievement Level According to Group and Sector
In order to examine possible differences in level of achievement among the groups, we undertook a oneway analysis of variance (ANOVA), in which the independent variable was the group and the dependent variable was the score. We found significant differences between the groups (F (2,411) = 6.3, p<.01). To identify the source of the differences, we undertook a Tukey Post-Hoc test, which showed that the achievements were significantly higher among the specialized STs in comparison to the non-specialized STs (M = 65.94, SD = 19.18 versus M = 56.68, SD = 19.05). The differences between the pre-service STs and the two other groups were not statistically significant (M = 61.66, SD = 19.18).
To examine the differences in levels of achievement according to this classification, we also undertook a oneway analysis of variance of each sector (see Table 2). No significant differences were found among the professional groups in the Jewish sector (F (2,312) = 1.54, p>.05).
As Table 2 shows, in the Arab sector there were significant differences between the levels of achievement of the pre-service STs and the specialized STs, as was expected (F (1, 99 = 12.3, p<.01). The mean score of the pre-service teachers was significantly lower than the mean score of the specialized STs. In the Jewish sector, the mean score of the in-service non-specialized group was lower than the mean score of the two other groups; but the differences were not significant, as was expected.
To evaluate differences in achievement based on group and sector, we undertook a two-way analysis of variance. The independent variables were group and sector, and the dependent variable was level of achievement. This statistical test was only carried out on pre-service and specialized STs, since there were no nonspecialized STs in the Arab sector.

Level of Achievement According to Bloom's Taxonomy
As noted above, the CKT items were analyzed based on Bloom's categories: knowledge, comprehension and application. In order to examine the level of achievement for all the participants, referring Bloom's categories, we undertook an analysis of variance with repeated measures. The results were significant (F (2,646) = 208.31, p<.001). We then undertook a Bonferroni correction, finding significant differences among the three categories. The score on knowledge (M = 72.54, SD = 20.44) was higher than the score on comprehension (M = 60.07, SD = 19.63). The score on comprehension was higher than the score on application (M = 50.64, SD = 29.98). These findings are consistent with the hierarchy proposed by Bloom, and thus they also strengthen the CKT instrument on which this study is based .

Level of Achievement Based on Bloom's Categories According to Group
To examine the different levels of thinking reflected in the various categories, we undertook a one-way analysis of variance in each separate group. There were differences between the groups concerning comprehension (F (2, 420) = 3.18, p<.05) and application (F (2, 420) = 6.48, p<.01). After a Bonferroni correction, we found the reason for the differences: the level of achievement of the specialized STs was higher than that of the non-specialized STs in both categoriescomprehension and application. The mean scores appear in Figure 2.
To examine the influence of sector (Jewish/Arab) and group (three stages of professional development) on the level of achievement in Bloom's three categories, we undertook a Multivariate Analysis of Variance (MANOVA), which pointed to significant differences between the sectors (F (3,416) = 6.87, p<.001) and group (F (6,832) = 3.56, p<.01). No interaction effect was found  Knowledge. In the category of knowledge, there was a significant effect of sector (F (1,300) = 5.22, p<.05), due to the fact that the Arab participants' (pre-service and specialized STs) score on knowledge was higher than scores in the Jewish sector (M = 74.9, SD = 17.10 vs. M = 71.8, SD = 21.34, respectively). No significant interactional effect was found. That is, the differences between Arab and Jewish participants existed among the pre-service and specialized STs.

Comprehension.
In the category of comprehension, the results showed a significant effect for sector (F (1,300) = 13.11, p<.001). Here, too, the Arab teachers' score was higher than that of the Jewish teachers (M = 66.29, SD = 11.97 and M = 58.12, SD = 21.12, respectively). Moreover, results yielded a significant effect of interaction between sector and group (F (1,300) = 5.05, p<.01). To understand this interaction, we undertook t-tests for independent samples that examined the difference between the sectors in each of the separate groups. A significant difference was found between the Arab and the Jewish specialized STs (see Table 3). Here, the Arab teachers' score was higher. However, no significant difference was found among the pre-service STs in the two sectors (see Figure 3).

Application.
In this category, a significant effect of sector was found (F (1,300) = 22.10, p<.001): for application, Arab participants scored higher than Jewish participants (M = 63.37,SD = 19.22 vs. M = 49.15,SD = 31.62,respectively). Moreover, there was a significant interactional effect between sector and group (F (1,300) = 4.86, p<.01). To understand the interaction, t-tests for independent samples were undertaken, which examined sectors differences in each separate group. The results showed that in both groups, there was a significant difference between the sectors. The findings, presented in Table 4, show that the level of application among Arab teachers was higher than that of Jewish teachers in both groups. However, the difference was greater between the groups of teachers than it was between the groups of pre-service STs (see Figure 4).

DISCUSSION
Our research focused on the concept of content knowledge (CK), a term coined by Shulman (1986) in relation to discipline knowledge. "CK can be said to be   at the very heart of teachers' practice […]. Hence a starting point to equip all teachers would be to ensure that they know the material they have to teach" (Rollnick & Mavhunga, 2016, p. 6).
Most studies world-wide on teachers' knowledge focus on PCK. This is one of the few international studies to measure STs' CK directly (e.g., Boothe et al., 2018;Kind, 2014;Rice, 2005;Sadler et al., 2013). CK measurements have been conducted primarily in PD programs for teacher professionalization, due both to the need of evaluating these programs and the availability of the teachers involved. Most of the research on CK has been done in the field of mathematics, a small part in the field of sciences, and rarely in regard to chemistry concepts. With the help of experts, we developed a comprehensive assessment tool, CKT, comprised of a relatively large number of items relating to the content strand states of matter. We chose this content strand for the CKT as an example of a representative subject in the science curriculum, a subject studied from primary to higher education. Our study was conducted on a population from central Israel, by no means an underprivileged society. Its dismal findings illuminate the need to find ways to measure teachers' knowledge directly and provide strong impetus for a renewed focus on the training and professional development of teachers.
All three groups (N = 423), exhibited low levels of achievement (mean score = 61.03, SD = 14.80) in their responses to our CKT. These achievements were much lower than expected. Particularly surprising was specialized STs' low achievement (M = 65.94, SD = 19.18). We had assumed that these teachers had attended continuing education courses during their years of teaching, enabling them to expand their disciplinary CK. In addition, their teaching experience, logically speaking, should have promoted their CK (e.g., Hill et al., 2008). Against our expectations, however, their CK was poor.
In contrast, the findings of the analysis concern Bloom's levels of thinking did correspond to our expectations. There were significant differences among the three categories. The score on knowledge was higher than the score on comprehension. The score on comprehension was higher than the score on application. These findings support the strength of the CKT tool developed for this study.
We begin our discussion with some examples of the difficulties involved in acquiring CK concerning states of matter. We then compare Jewish and Arab teachers on their achievements and levels of thinking according to Bloom's taxonomy. We conclude with practical recommendations for improving the CK of primary school science.

Lack of Content Knowledge
The purpose of the study was to measure CK directly. Analysis of incorrect answers also revealed some profound misconceptions, which have been discussed in the research literature. Following are some examples illustrating insufficient CK.
The first example relates to the particulate nature of the matter and its connection to everyday life. To the statement, "In the transition from liquid to gas, matter breaks down into its elements," 63.8% of the respondents gave the correct answer (false). Correct responses to the statement "Boiling water is bubbles of hydrogen and oxygen" (false) was much lower -34.5%, even though both statements refer to the same phenomenon. Examining these two items, we see that the first is general, concerning the transition of matter from liquid to gas on the microscopic level, while the second describes a specific instance of transition in the case of water. We had expected a higher percentage of correct Figure 4. Application -Comparison of Achievement between Groups, according to Sector answers and greater similarity between the percentages. Even though the phase transitions of water are recognized from everyday life experience, the respondents did not make the expected link between the two items. That is, they did not transfer their knowledge of one well-known phenomenon -bubbles in boiling water -to the general process addressed in the first item -the phase transition. These findings reflect the common misconception, that "in the transition from liquid to gas, water decomposes into its elements, hydrogen and oxygen" (Aydeniz & Kotowski, 2012;Cooper, Corley, & Underwood, 2013). Here, it became clear that this misconception indicates a lack of basic knowledge that is essential for understanding core chemical principals and processes needed for teaching the science curriculum.
Poor knowledge was also found in our CKT in items concerning the gaseous state and the microscopic structure of matter. For example, "Molecules of the same substance are larger when the substance is in a solid form than when it is in a gaseous form" (false,answered correctly by 74.4%), and "When a substance changes state from a gas to a liquid, it increases in mass" (false -63.5% answered correctly). Molecules in all three states of matter are the same size, and thus the mass of substance does not change in the transition between phases. Yet while most gases are invisible, people often mistakenly assume that they are weightless. Understanding the microscopic structure relating to atoms and molecules and its relationship to macroscopic properties requires a high degree of abstraction (Demircioglu & Yadigaroglu, 2014;Gabel et al., 1987;Karsli et al., 2020;Pabuccu & Erduran, 2016;Rice, 2005;Tatar, 2011;Yalçınkaya & Boz, 2015). Therefore, it is a source of inadequate knowledge as can be seen from this example as well as other items in our CKT.
Some of the difficulties in acquiring CK about states of matter actually tie into everyday experiences (Aydeniz & Kotowski, 2012;Cooper et al., 2013;Rice, 2005;Tatar, 2011). Learners often construct their knowledge on a small base of examples, especially those from daily life. For instance, one CKT item was: "Oxygen is gas at room temperature, since the temperature of its boiling point islower than room temperature" (true -58.1% answered correctly). People boil water every day, and, because of our experiences with water, we expect that we need to heat something for it to boil. Rice (2005) offered a similar explanation of a finding in her study: more than 50% of 400 pre-service and 70 in-service primary school STs suggested that oxygen (like water) boils at 100 o C.
As we have noted, declarative knowledge and procedural knowledge were operationalized in our study as knowledge about core concepts and principles that describe how chemical processes occur in everyday life situations. Difficulty in formulating a scientific principle that refers to a familiar phenomenon from everyday life is illustrated in the following example: A relatively low percentage of respondents (62%) correctly answered the item, "Evaporation occurs only when the matter is boiling" (false). Yet these same individuals teach about the cycle of water in nature, linked to evaporation and condensation, and they are well aware that water evaporates from a puddle and from wet laundry with no boiling involved. In other words, incorrect interpretation of everyday phenomena may impede people from constructing scientific principals correctly.
Our findings showed that in the content strand states of matter, participants had insufficient knowledge of four topics: the particulate structure of matter, the gaseous state, phase transitions, and conservation of mass. These topics, especially the particulate structure of matter, are fundamental and comprehensive subjects that bear implications for all science studies (Cooper et al., 2013;Tsaparlis, 2018). These findings indicate a high probability of deficient knowledge in other topics taught in the science curriculum of primary school as well, and should ring warning bells for those who deal with teacher education.

A Comparison of the Jewish and Arab Sectors
The achievement levels of the Jewish teachers were lower than those of the Arab teachers, both when the groups were compared to one another on CK, and when they were compared on Bloom's taxonomy. Moreover, it is important to note that the distribution of scores was consistently higher within the Jewish sector. In other words, overall the Arab STs were a more homogeneous group than the Jewish ones. In the Jewish sector, the standard deviations were high, a sign of large gaps in knowledge within this sub-group. For example, in the category of application, Jewish teachers had a mean score of 43.97 with a SD of 35.51. Another significant finding relates to differences in achievement between the specialized STs and the preservice STs. In the Arab sector, teachers' achievements were much higher than the achievements of the preservice STs, in accordance with expectations. Yet in the Jewish sector, in contrast, no significant difference was found between the achievements of the specialized STs and the pre-service STs. This is a startling finding, and deserves serious attention.
In the analysis of Bloom's taxonomy, the following significant results were obtained. In the three categories of knowledge, comprehension and application, there was a significant effect of sector. The Arab teachers' achievements were significantly higher than those of the Jewish teachers, a finding that did not reflect our hypothesis. In the category of knowledge, there was no significant effect of interaction between sector and group. In the category of comprehension, there was a significant interactional effect between sector and group: the score on comprehension among the Arab STs were significantly higher than the score on comprehension among the Jewish specialized STs. When we examined the source of the differences in the category of application, we found that the two Arab groups had a significantly higher level of achievement in this category than the Jewish participants. The differences were greater, however, between the groups of specialized STs.
One explanation for the gap in CK between the Jewish and Arab sectors is linked to the academic characteristics of the participants: the level of training and expertise of both groups of respondents in the Arab sector was higher than in the Jewish sector. Higher percentages of Arab pre-service STs had taken matriculation exams in physics, chemistry and/or biology. The academic characteristics of the specialized STs also showed that the level of training of the Arab teachers was higher than that of the Jewish teachers. In other words, higher percentages of Arab teachers had studied science in universities and colleges of education, in comparison to the Jewish teachers, who had studied science in other professional programs. These figures can explain the substantial gap in the achievement levels between the two sectors. Furthermore, the gap can be explained by the homogeneity of the Arab sector teachers, who for the most part had studied science systematically.
Notably, in addition, the two sectors have dissimilar perceptions of the status of the teacher (Reichel & Arnon, 2009). A substantial portion of Jewish high school and university graduates who major in science do not go into teaching. They continue to graduate degrees and/or to employment in high-tech companies. In comparison, in the Arab sector, more science majors become teachers due to cultural factors, employment opportunities, and other reasons (Zuzovsky, 2010).

CONCLUSIONS
Our study is an example of a direct CK assessment of teachers in the subjects they are required to teach and provides valuable diagnostic information. The results, which indicates participants' insufficient CK of basic concepts in chemistry, emphasize how problematic it is to assume that teachers necessarily master required CK, and highlight the need for direct assessment of teachers' CK in various disciplines.
While the findings of this study provide a picture of the CK of STs from the Jewish and Arab educational subsystems, a limited number of convenience sample was examined; it is recommended to expand the study to a broader population base.
The CKT we developed was a true/false test. This type of test cannot distinguish between correct answers based on adequate knowledge and those reached by guesswork. The possibility of answering even without understanding means that the participants' scores may be overestimated (Gurel et al., 2015). This drawback, inherent in the test type, implies that lack of knowledge may be even more severe than the results obtained.
As our CKT revealed teachers' low CK on the basic topic of states of matter, there are grounds for concern that teachers may also lack the required levels of CK in other vital subjects in the curriculum. How, then, can teacher training be improved to ensure that primary school teachers gain better CK of basic content strands? To address this issue, we propose looking at the CK of states of matter in a metaphorical way. Knowledge itself might be conceived in three different states of matter: solid, gaseous, and liquid. In the "solid" state, knowledge is fixed in the context in which it was first learned. Students are unable to transfer concepts, ideas or rules from that framework to new contexts; in other words, no transfer in learning occurs. In the "gaseous" state, fragments of knowledge are mutually disconnected, distant from one another like gas particles. Students hold them in memory in isolation, with no logical connections or concepts joining them. Such knowledge is pragmatic yet meaningless. As for "liquid" knowledge -here, fragments of knowledge are connected to one another in a flexible manner; they adapt themselves to their container, to the context into which they flow. Learners have flexible thinking and can transfer concepts, ideas and rules to the relevant contexts. This, finally, is the form that CK should have in the learner's mind.
We conclude with some practical recommendations to aid STs as well as teachers in other disciples in constructing "liquid" CK:

Make Conscious Connections between Fragments of Information
Connections between Content Knowledge in chemistry and other fields. Understanding of the molecular basis of processes and phenomena must be deepened, with emphasis on the basic chemistry principle of the connection between macroscopic and microscopic structures. It is also important to connect chemistry principles with everyday phenomena and underline the connection between principles of chemistry, biology and physics. This helps learners perceive the CK of chemistry as being part of a wider world of scientific concepts. Similarly, interdisciplinary connections with other fields of instruction should be highlighted.
Connections between Content Knowledge and common misconceptions in the discipline. As deficient and faulty knowledge are associated (among other factors) with misconceptions, in all teacher training frameworks, special attention must be paid to the common misconceptions that teachers might transmit to their students on each topic. Moreover, an important component of the curriculum should be learning how to cope with such misconceptions. Teaching strategies recommended in the academic literature to address misconceptions should be integrated (e.g., Ayas et al., 2010;Özmen & Naseriazar, 2018;Pitjeng-Mosabala & Rollnick, 2018;Tatar, 2011;Treagust et al., 2010;van Dreil et al., 2014).

Assessing Teachers' Content Knowledge
Our study indicates the need for required exams tapping onto basic science topics for teachers as well as teachers' CK in other disciplines. These exams would require teachers, at different stages of their professional careers, to refresh their knowledge and engage in transferring this knowledge to varied contexts. While this recommendation obligates the educational system to make the needed arrangements, it has the potential to improve teaching and learning processes. Teachers' CK should also be strictly evaluated by setting 76 as the minimum grade that would allow the teacher to continue teaching in the school.

Changes in Training of Primary School Science Teachers
In the curriculum for pre-service Science Teachers, more chemistry studies are needed. In most countries, primary school teachers receive relatively poor CK preparation, in terms both of level and proportion of time allocated for science (Lederman & Lederman, 2015;Pitjeng-Mosabala & Rollnick, 2018;Rollnick & Mavhunga, 2016). In the Israeli curriculum in colleges of education, the students are required to take only one year-long course (30 hours) in chemistry. One course does not meet the needs; college requirements thus need to be expanded in order to give the pre-service STs a broader base in the discipline.
The curriculum for the non-specialized Science Teachers needs revision. Our research presented the problem that exists among teachers from the Jewish sector who did not undergo disciplinary training in the subjects they teach. It is important to deepen the disciplinary academic knowledge of these teachers in all fields of science, and especially in basic chemistry topics. This is a challenging mission for PD programs internationally (Jüttner et al., 2013;Rollnick & Mavhunga, 2016), in light of the fact that these teachers are teaching science without any prior science knowledge.

Research Cooperation between the Educational Systems in the Jewish and Arab Sectors
The results of the study show the need for more collaborative studies that compare the CK of teachers and pre-service teachers in the Jewish and Arab sectors. The goal would be to improve student achievements, bridge between cultures in Israel, and broaden our understanding of the cultural factors related to CK. The present study could serve as an example for other countries coping with cultural diversity.
In sum, our research proposes to take a strategicsystemic way of thinking that can help primary STs gain broad-based content knowledge in a "liquid state." This discourse has ramifications for the shape of CK in teacher education in science, math, and other disciplines as well. As early as 1709, the English writer Alexander Pope stressed the importance of having broad, deepseated knowledge. In the epigraph that opened this article, from Pope's An Essay on Criticism, he warns about the dangers of "intoxication" -of having a little knowledge about a subject. It's better to know nothing than to have shallow knowledge. But best of all is to "drink deep" and truly understand your subject.