How Chinese and American Students Construct Explanations of Carbon-Transforming Processes

Previous studies reported a learning progression that described the development of American students’ explanations of carbon-transforming processes. This study examined the validity of this learning progression for Chinese middle school students. The comparison of American and Chinese students’ performances showed both similarities and differences between the two groups. They shared similar general trends in their learning progressions from simple force-dynamic accounts to scientific modelbased reasoning. Most students did not construct model-based explanations: (1) they did not trace matter and energy separately, and (2) they did not connect phenomena at the macroscopic scale to mechanisms at the cellular and atomic-molecular scale. There were some key differences. These differences might be due to culture, exam systems, or other aspects of science education in these two countries. Implications for improving science education in each country are discussed.


INTRODUCTION
This article reports on our work in exploring how Chinese middle school students construct model-based explanations of carbon-transforming processes and comparing Chinese and American students' performances.
Currently, there is limited understanding of how Chinese middle school students understand carbon cycling. We conducted a search in the Education Resources Information Center (ERIC) and China National Knowledge Internet (CNKI) using 'carbon cycling' and 'secondary education in China' as keywords in titles and found few results. We use a carbon cycling learning progression framework to describe the development of students' explanations of carbon-transforming processes.

•
Provides evidence that the American carbon cycling learning progression framework and American assessments could be used in China, implying that learning progression (LP) frameworks and assessments developed in one country could be used in the other.
• Gives us deeper insight into the similarities and main problems that American and Chinese students both have when explaining carbon-transforming processes, suggesting the Carbon TIME curriculum could help Chinese students achieve environmental science literacy • Presents the key differences between American and Chinese students' explanations of carbon-transforming processes.

Model-based Explanations of Carbon-transforming Processes
With the release of the Next Generation Science Standards (NGSS) in the United States, science teachers, curriculum developers and researchers are working on integrating practices, core ideas, and crosscutting concepts into science classrooms for students. In this study, we focus on the practice of constructing explanations because it is an important measure of students' understanding of scientific concepts and a central aspect of science education (Achieve, 2013;NRC, 2012).
Previous studies refer to scientific explanations generated from constructed models as model-based explanations, which is a powerful sense-making practice (Zangori et al., 2015). Models are idealized structures that we use to represent the world, via resemblance relations between the model and real-world target systems (Giere, 1988). A critical purpose of a scientific model is to help explain natural phenomena (Justi et al., 2002a;Zangori et al., 2015). Students can use models as sense-making tools to help them understand the underlying 3 / 34 scientific theory for phenomena and generate scientific explanations (Zangori et al., 2015).
When students consider both visible and non-visible components in a system, they can use models as explanatory tools to identify the underlying cause and effects in the system and to generate model-based explanations connecting what happened with how and why it occurred (Gilbert, 2004). When students develop models of their ideas about the phenomena and use their models to articulate model-based explanations, their thinking becomes visible and their understanding about the phenomena deepens.
In this study, we focus on students' explanations of carbon-transforming processes. Carbon-transforming processes include organic carbon generation, transformation and oxidation (Jin, Zhan, & Anderson, 2013).
Organic carbon generation refers to the process that generates organic carbon compounds from inorganic substances (carbon dioxide, water, etc.). Photosynthesis is the only process that generates organic carbon compounds. Organic carbon transformation refers to the processes of passing on chemical energy within ecosystems and from ecosystems to human socio-economical systems. For example, a child needs materials from other organisms within ecosystems to live and grow. Digestion and biosynthesis are processes that transform organic carbon. Organic carbon oxidation refers to the processes of releasing energy through oxidizing the organic carbon compounds. Cellular respiration and combustion fulfill this role.
Understanding carbon-transforming processes is central to scientific literacy because explanations of carbon-transforming processes are examples of applying specific ways of scientific reasoning to real-life situations (Jin, Zhan, & Anderson, 2013). Atomic-molecular models of carbon-transforming processes (photosynthesis, digestion, biosynthesis, cellular respiration and combustion) are used to explain macroscopic phenomena (plant growth and movement, animal growth and movement, decay, burning, etc.). Therefore, constructing explanations of carbon transforming processes helps students apply scientific reasoning to solve problems in their daily life and improve their scientific literacy.
To construct scientific explanations of carbon-transforming processes students must also understand and apply crosscutting concepts (NRC 2012). In our work, we pay particular attention to three crosscutting concepts: (1) Energy and Matter: Flows, Cycles and Conservation, (2) Systems and System Models, and (3) Scale, Proportion, and Quantity. These crosscutting concepts are necessary to create thorough explanations, including (1) connecting systems at different scales, and (2) tracing matter and energy. When students connect systems at different scales, they construct explanations that describe carbon-transforming processes at large, macroscopic, cellular, and atomic-molecular scales that provide a more complete story of how a phenomenon happened.
When students trace matter and energy, it means they employ the principles of matter and energy conservation as rules to interpret familiar and unfamiliar natural phenomena, and apply these principles consistently across contexts.
However, prior research shows that American students struggle with using the two crosscutting concepts above to construct a thorough explanation. For example, regarding connecting systems at different scales, although students learn about cellular work that supports organism function and study ecosystem structures and functions, they still struggle to develop descriptions for materials and functions at a cellular level and explanations for carbon-transforming processes at multiple scales, especially at the middle school level . Also, regarding tracing matter and energy, research shows that students often describe the matter cycle as atoms or molecules moving without changing forms, and they describe energy flow as an energy cycle without degradation (Lin & Hu, 2003). In addition, curriculum materials used in secondary schools often address reactants and products of carbon-transforming processes without articulating the big ideas of how matter and energy are transformed (Stern & Ahlgren, 2002). Therefore, learning how to construct explanations of carbon-transforming processes is both useful for students and an important issue for educators to explore.

Carbon Cycling Learning Progression Framework
Learning is an ongoing developmental process. Using a learning progression approach facilitates tracing the development of students' explanations as they increase in sophistication with experience. Previous work investigating students' ideas about carbon cycling has led to a learning progression that focuses on carbon-transforming processes that were developed based on data from American elementary, middle, high school, and college students that participated in the Carbon TIME project Jin & Anderson, 2012).
The carbon cycle learning progression framework contains three important dimensions: Processes, progress variables and levels of achievement.
1. The learning progression is organized around key processes that tie systems together: the generation of organic carbon (photosynthesis), the transformation of organic carbon (digestion, biosynthesis,), and the oxidation of organic carbon (cellular respiration, combustion) ).
2. Progress variables include four key elements of scientific explanations in the learning progression: context-specific knowledge, orientation towards principles of matter and energy, precision in matter and energy words, and scale. "Context-specific knowledge" refers to factual components of the disciplinary core ideas.
"Orientation towards principles of matter and energy" refers to how students use the crosscutting concept of Matter and Energy Conservation as principles or rules that can be applied across contexts. "Precision in matter and energy words" identifies how clearly students distinguish matter and energy words. "Scale" identifies how students connect the large scale, macroscopic scale, cellular scale, and atomic-molecular scale (Miller, Johnson, Freed, Doherty, & Anderson, 2017).
3. Previous studies Jin et al., 2012) about the carbon cycle learning progression framework have identified four levels of achievement that describe students' progress toward more sophisticated reasoning about matter and energy within these processes.  ).
The four learning progression levels are as follows: Level 1: Students at level 1 use force-dynamic reasoning to explain how enablers help actors fulfill their natural tendencies and how antagonists prevent actors from fulfilling their goals. For example, a cow has a natural tendency to grow. The enablers, such as water and grass, support its natural tendencies because they are necessary to the growth of a cow, but this growth may be prevented by antagonists (i.e., no water or no grass).
Students pay attention to the phenomena at a macroscopic scale without recognizing the underlying matter movement and chemical change in them.

Level 2:
Students at level 2 use elaborated force-dynamic reasoning to explain processes beyond natural tendencies. They still focus on actors, enablers, antagonists and natural tendencies, but they attempt to explain processes using "hidden mechanisms", add some detailed and complex information to their explanations about processes at larger and smaller scales, and begin to trace materials and energy forms that are visible. Level 3: Students at level 3 use incomplete scientific reasoning to explain processes. They are aware of chemical reactions and pay attention to how processes work at the cellular scale. They show awareness of important scientific principles and of models at smaller and larger scales. However, they have difficulty connecting accounts at different scales and applying principles consistently and often interconvert matter and energy to account for matter movement and energy change.

Level 4:
Students at level 4 use coherent scientific reasoning to explain processes. They successfully apply fundamental principles, such as conservation of matter and energy, to phenomena at multiple scales and construct scientific model-based explanations about carbon-transforming processes.

Research Questions
In this article, we characterize the ways Chinese students construct explanations of macroscopic carbon-transforming phenomena (growth of plants and animals, movement and functioning of organisms, decay, combustion) that are linked to atomic-molecular processes that generate, transform, and oxidize organic carbon (photosynthesis, digestion & biosynthesis, cellular respiration and combustion) and compare their explanatory approaches to those of American students. Our work was guided by the following research questions: (1) How well do the American learning progression framework and American assessments describe and measure the proficiency of Chinese students?
(2) What are the differences among Chinese students in grades 7, 8 and 9 in their explanations of carbon-transforming processes?
(3) What are the similarities and differences between American and Chinese students in their explanations?

Participants
In this study, we collected written responses to assessment instruments from 337 students (7th, 8th and 9th grades) from one public school in Sichuan province in November, 2015. Table 1 lists the number of Chinese students at each grade level. We compared the Chinese student written responses to the responses of American middle school students' who participated in the Carbon TIME project during the 2015-16 school year. American teachers administered the 2015-16 Carbon TIME Assessments to their students as baseline-tests (at the end of the school year before the teachers participated in the Carbon TIME program: 2,287 students), pre-tests (at the beginning of the school year the teachers participated in Carbon TIME: 3,200 students), and post-tests (at the end of Carbon TIME instruction: 2,106 students). In addition, we interviewed 12 Chinese middle school students from Beijing and Guangzhou province and compared their responses with those of 50 American middle and high school students from Carbon TIME (Miller, Johnson, Freed, Doherty, & Anderson, 2017).
Our data collection process was based on a convenience sample and only provides a glimpse of how Chinese and American students construct model-based explanations of carbon-transforming processes.
Therefore, the conclusions we draw may not be applicable to all Chinese and American students.

Instruments
The written assessments and interview protocol in the Carbon TIME project were translated into Chinese by the first author and used to collect data from Chinese students. In order to make sure the translation was accurate and easy for Chinese students to understand, two additional researchers from the Carbon TIME project reviewed the translations: one was a native Chinese speaker, the other was a native English speaker but worked in China for several years and was proficient in Chinese. We made changes to the translations based on their comments. After confirming the tests and interview protocol were completely understandable, they were administered to Chinese students.
Written assessments were designed to assess students' learning progression of carbon cycling, and included three alternate forms: Form A had 13 assessment items, Form B had 12 assessment items, Form C had 12 assessment items. Some items acted as linking anchor items and appeared on more than one form. Most items had two parts; a multiple choice or multiple True or False part, followed by a constructed response part that required students to explain their choice. Three types of items were included in each form: Carbon LP (Learning Progression) items, Inquiry LP items and Large-scale-LP items. The IRT analyses for Carbon LP items in 2015-16 datasets were more reliable, Carbon LP items were chosen to compare how American and Chinese students explain carbon-transforming processes. The interview protocol (see Appendix A) was an expanded version of written tests and included a total of seven tasks designed to elicit more thorough student reasoning about carbon-transforming processes.

Scoring Process
Chinese students' responses to the written assessments were scored using scoring rubrics that correspond to the response characteristics described at each of the four learning progression levels in the Carbon TIME project. One example of scoring rubrics can be found in the Appendix B. Students' written responses were coded into levels 2 through 4. (Test items were not designed to elicit Level 1 responses, which are most common in elementary students' performances instead of middle school students'.) 10% of the responses were randomly selected to be translated into English by the first author, then were double scored by a second rater to examine the scoring quality and reduce scoring errors. If there was less than 90% agreement of assigned codes for this 10% of responses, the raters met to discuss any issues with the scoring rubrics or their interpretation and re-score responses to that item. An inter-rater reliability between the first and second rater of >90% was achieved for all items.
The first and second raters coded students' responses based on the indicators for each learning progression level in the scoring rubric. Chinese responses were classified according to whether they fit into the indicators in the American scoring rubric. We found some disconnect between Chinese students' responses and the American scoring rubric. For example, some of the Chinese students mentioned photosynthesis or the reaction between the leaves and oxygen as the cause of heat generation, which did not appear in American students' responses. To account for this discrepancy, we assigned level 3 for "photosynthesis" responses and level 2 for "the reaction between the leaves and oxygen", and considered these Chinese responses to be "different".
Most of American students' written responses were coded by computer. It is expensive and time consuming to score composite items that include both forced choice (FC) and constructed response (CR) components requiring human coders to read and score each CR component). Therefore machine learning was used for automated scoring of student responses (Thomas, Kim and Draney, 2018). ML engine (LightSide Researcher's Workbench) was used to extract information at the SUBLEVEL category and able to notice patterns in FC responses that were not part of the human scoring rubric. A model was considered acceptable if it produced a quadratic weighted kappa (QWK) greater than 0.7 with the training set (Fleiss & Cohen, 1973). If an acceptable model could not be built, then problematic answers were back checked by an expert human coder to determine if the error was of human or computer origin. Once the revised human codes replaced the problematic codes, the model was rebuilt and tested (Thomas, Kim and Draney, 2018). For the 15-16 data, 30 of 31 models had at least marginal QWK of 0.6 or greater when backcoded by human scorers with a stratified random sample.
All Chinese students' interviews were translated into English and coded using a framework developed in a previous American study (Miller, Johnson, Freed, Doherty, & Anderson, 2017) that identified four progress variables mentioned above. Because the main purpose of analyzing the interviews was to characterize the similar and different ways American and Chinese students reasoned in the interviews, we did not calculate inter-rater reliability for this analysis.

Data Analysis
Descriptive statistics and qualitative analyses were used to summarize the general patterns in responses to written assessments and interviews from American and Chinese students. In order to address the three research questions, we conducted three sets of Item Response Theory (IRT) analyses. First, we fitted the unidimensional Partial Credit Model (PCM; Masters, 1982) to the ordinally scored students' responses which had learning progression levels. In the PCM, the conditional probability that person p with ability " would respond with category score m on item with step difficulty parameters %& is defined as: . This IRT model produced estimates of step difficulties between levels of an item, proficiency estimates for students, and fit statistics for individual items and persons (students). A Wright map, item fit plot, and person fit histogram were generated to present these results, which can enable us to obtain validity evidence based on the internal structure of the Carbon assessments. Second, the unidimensional latent regression IRT model based on the PCM was fitted to the Chinese data to examine differences in overall Chinese students' proficiency between the three grades. Third, we conducted a differential item functioning (DIF) IRT analysis on the merged American and Chinese data to compare individual item performance between the two countries. DIF is a good way to analyze how Chinese and American students respond differently to individual items. This analysis puts the overall average proficiency of both American and Chinese students at 0, then compares item difficulties, so negative DIF parameters mean that the item was easier for that group.
The ConQuest software (Wu, Adams, Wilson, & Haldane, 2007) was used for all IRT analyses, and the R package 'WrightMap' was used to draw Wright maps and item fit plots (Irribarra & Freund, 2014).

Progression Framework for Chinese Students
We provide three kinds of evidence to verify whether or not the American assessments and carbon cycle learning progression framework work for Chinese students: IRT analysis, comparisons of American and Chinese students' constructed responses, and patterns in students' interviews.

Evidence from IRT Analysis
In Figures 1 and 2, the round and triangle dots are the thresholds, which are indicators of "score difficulties". The threshold for a score category is defined as the ability at which the probability of achieving that score or higher reaches 50%. The round dots represent the difficulty for achieving a score of 3 and above, or the thresholds from level 2 to 3. The triangle dots represent the difficulty for achieving a score of 4 and above, or the thresholds from level 3 to 4.

Figure 2 The Wright Map for American Students on Carbon LP Items
Based on our IRT analyses, we found that the patterns of learning progression levels were generally similar between the American and Chinese students (Figures 1 and 2). Most of the thresholds from level 2 to 3 were in the same logit range, and well separated from the thresholds from level 3 to 4, suggesting the carbon cycle learning progression framework from the Carbon TIME project is valid to classify the responses from the Chinese student sample and measure Chinese student reasoning of carbon-cycling items. Only a few items had thresholds that were not well differentiated for both level 2 to 3 and level 3 to 4.
Item fit statistics ( Figure 3) provide information about how well the data for an individual item fit the IRT statistical model. Each dot represents a fit statistic for one item. Items that have low fit statistics show less random variation than expected, and are usually not a concern. However, items that have high fit statistics show more random variation than expected. This means that a large number of high-performing students who are doing worse than expected, and/or a large number of low-performing students who are doing better than expected. Therefore, item fit statistics provide evidence to examine internal structure validity. The results showed that the Mean-Square statistics (MNSQs) of all the items were within the acceptable range (0.6-1.4) (Wright, Linacre, Gustafsson, & Martin-Loff, 1994), suggesting such items have acceptable fit and can be used for pretest and posttest analysis in an assessment.

Figure 3 The Item Fit Plot for Carbon LP Items
Low student fit statistics indicate sets of responses that are very regular, and high student fit statistics indicate more random variation than expected. Students with fit statistics below the left blue line (the lower bound of the acceptable range) show very regular responses. A large proportion (40.36%) of students falls in this range (Figure 4), which is additional evidence of the consistency of student reasoning across a wide variety of contexts.

Figure 4 The Overall Student Fit Statistics
Based on the evidence from Wright maps, item fit statistics and student fit statistics, we could make a preliminary conclusion that the American assessments and American carbon cycle learning progression framework appropriately describe and measure the proficiency of Chinese students.

Evidence from comparisons of constructed responses by American and Chinese students
We compared the written responses from American students with those from Chinese students for specific items to further explore whether American assessments and the American carbon cycle learning progression framework were appropriate for Chinese students. To do this, we chose one to two items from each phenomenon (plant growth and movement, animal growth and movement, burning and decay) and compared the kinds of written responses given by both sets of students. We found that the majority (71%) of Chinese responses (878/1231) were qualitatively similar to the American ones (Table 2). Very few Chinese responses were not codeable. Notably, 24% of the total Chinese responses were left blank (i.e., "non-response"), compared to only 9% of total American responses.

Evidence from the patterns in interviews with American and Chinese students
Like the written assessments, the interviews generally showed that the American learning progression framework could be used to analyze the Chinese students' ways of talking about carbon-transforming processes.
Chinese and American students had similar reasoning patterns for each progress variable. This shows that the similarities were apparent in spoken language as well as written responses. A more detailed comparison of American and Chinese students' interview responses is included in the results for Research Question 3, below.

Research Question 2: Comparing Students in Grades 7, 8 and 9
We compare the overall proficiency of Chinese students in different grades and provide the mean percentage of students' constructed responses at each grade for each learning progression level to explore the differences among them.
The overall proficiency of students in grades 7, 8 and 9

Figure 5 Estimated Latent Mean of Student Proficiency
We observed a significant increase in Chinese student proficiencies from grades 7 to 9 ( Figure 5). IRT analyses produce estimates of Chinese student proficiencies measured in logits, which are a measure of how likely a student of some proficiency is to get a particular item right or wrong, and the zero logit is set to be the student average.

Overall Changes in Estimated Latent Mean of Student Proficiency
Note: Mean Difference = (1) Mean of grade 7 students' proficiency-Mean of grade 8 students' proficiency; (2) Mean of grade 7 students' proficiency-Mean of grade 9 students' proficiency; (3) Mean of grade 8 students' proficiency-Mean of grade 9 students' proficiency.
Interestingly, we saw the greatest increase in student proficiency in successive years between 8th and 9 th grades, with a smaller, though still significant increase, between 7 th and 8 th grades (Table 3). The data in Table 4 above were calculated based on the percentage of the Chinese students at each learning progression level (see Appendix C). We found that the majority of the Chinese students at grade 7 were at level 2, the majority of grade 9 students were at level 3, and the students at grade 8 were divided between levels 2 and 3.

Research Question 3: Similarities and Differences between American and Chinese Students
We summarize the similarities and differences between American and Chinese students based on quantitative and qualitative analyses. The quantitative analyses of written tests included comparisons of estimates of students' overall proficiency and analyses of differential item functioning (DIF). Qualitative analyses of students' written and spoken language showed patterns of both similarity and difference.

Similarities and Differences between American and Chinese Students on Written Assessments
American middle school students who took at least three units of the Carbon TIME curricula and then were administrated to posttests were approaching Level 4, while American students' pretests were at Level 2, implying the Carbon TIME curricula considerably improved students' understandings of carbon-transforming processes ( Figure 6). And the majority of Chinese students (who only participated in the pretests) were at Level 2, suggesting Chinese and American students had similar average proficiencies ( Figure 6).  (Table 5). However, two forced-choice (FC) items were easier for American students, while eight FC items were easier for Chinese students (Table 5). In addition to the quantitative analyses, we also made qualitative comparisons between the constructed responses of American and Chinese students. Key similarities and differences are summarized below.

Similarities
Difficulty connecting systems at different scales. First, many American and Chinese students in our sample could not connect the large or macroscopic scale to the atomic-molecular scale. For example: • The item BRNMATCHEN asks students where the heat and light energy comes from when a match burns. 53% of the American students and 29% of the Chinese students mentioned that because a flame needs air to burn, the air must have the most energy, and the heat and light energy comes from the air; or because the match would never have started to burn without the person, the heat and light energy comes from the person who struck the match. To them, if something is needed, it will provide energy.
They only paid attention to the phenomenon at the macroscopic scale.
• The item BODYHEAT2 asks students how food contributes to people's body heat. 53% of the American students and 32% of the Chinese students both thought that since we eat food, food is changed to energy. They did not connect the food that people eat at the macroscopic scale to the chemical energy stored in the food at the atomic-molecular scale, explaining this phenomenon only at the macroscopic scale.
Attributing macroscopic properties to atoms and molecules. Second, when American and Chinese students tried to explain a phenomenon by connecting the atomic-molecular scale with the large or macroscopic scale, many students failed to recognize the special properties of atoms and molecules. For example, some American students responded to the item FATLOSS by saying that the atoms in the fat of a person who loses weight shrink or get smaller or decrease in size. Some Chinese students had similar responses. These students used what they know or see at the macroscopic scale to explain what they do not know or cannot see at the atomic-molecular scale. In essence, they interpreted the macroscopic scale and the atomic-molecular scale as being the same.
Distinguishing matter from energy. Third, many American and Chinese students in our sample both were unable to trace matter and energy separately, instead describing implicit or explicit matter-energy conversions. For example: • The BRNMATCHMAT item asks students why the ashes of a match weigh less than the original match when it burns. 46% of the American students and 41% of the Chinese students said that the match or some of the match would turn into heat and light energy as it burns.
• The item FATLOSS asks students what happens to the atoms in a man's fat when he exercises and loses weight. 49% of the American students and 36% of the Chinese students thought the atoms in the fat of the person who loses weight are converted into energy or heat. For example, some American students said, "The fat turns into energy when he exercises." Some Chinese students said, "Those atoms will be transformed into energy." Although American and Chinese students in our sample were generally aware of conservation of matter and energy, many of them did not account for matter and energy separately and apply them to the daily life contexts correctly.

Differences
The key differences between the American and Chinese students are (1) Chinese students were reluctant to write their ideas when they did not know a scientifically correct answer, while American students were more willing to express their ideas, (2) American students paid more attention to the environmental impact of human behaviors than Chinese students, and (3) Chinese students used chemical equations to explain chemical changes much more often than American students. These key differences are as follows:

Expressing ideas in constructed responses. Overall, 26% of Chinese students left open response items
blank, compared to 9% of American students for these items. Moreover, the DIF analysis shows that American students were relatively more successful on constructed-response explanation portions of items while Chinese students were relatively more successful on forced-choice responses. These all suggest that Chinese students were reluctant to write their ideas when they did not feel confident that they know a scientifically correct answer.

Environmental awareness.
American students wrote more about the environmental impact of human behaviors. For example, the item FLBULBS1112 asks whether using fluorescent light bulbs which use less energy instead of incandescent light bulbs can reduce the amount of carbon dioxide going into our atmosphere.
Around 12% of the American students were able to make connections between burning of fossil fuels/coal and the release of CO 2 compared to only 6% of the Chinese students.
The items ARCTICICEONE and ARCTICICEFIVE ask students to predict how sea ice extent will change in one and five years from November, 2013, and explain why this occurs. Some of the Chinese students explained that global warming or the greenhouse effect caused the decrease of the extent of Arctic sea ice without explaining why, while more American students explained the reasons or described global warming in a detailed way. For example, some American students said, "Due to global temperatures rising at an alarming rate, sea ice has dramatically decreased in coverage. This means that in the future arctic sea ice will occur further and further north over a small area." "Earth-friendly inventions are in the process of getting tested. Electric cars and solar panels are in more use now, however, it takes a long time to get sea ice to form again, therefore 11.0 msg is possible, but not likely yet." Using chemical equations. Chinese students used chemical equations to explain chemical changes much more often compared to the American students. For example, the item OCTAMOLE asks students what happens to the atoms in the octane when it burns inside a car. About 5% of the Chinese students used chemical equations in their explanations of chemical changes when the octane burns, while only 0.3% of the American students used chemical equations when answering this item. Notably, Chinese students at level 3 or 4 in our sample often used partially correct or correct chemical equations, while level 3 or 4 American students rarely did.

Similarities and Differences between American and Chinese Students in Interviews
The majority of interviews with Chinese and American students showed similar patterns for each of four progress variables in the interview protocol, although there were two key differences in the ways these groups of students reasoned (Tables 6 and 7).
For the first progress variable, "context-specific knowledge", the majority of American and Chinese students both focused on actors and enablers for a purpose (growth, movement, survival) instead of offering scientific details, and offered limited or wrong knowledge (Table 6). For the second, "orientation towards principles of matter and energy", two groups of students both broken the conservation of matter and energy. For the third, "precision in matter and energy words", students in two countries both lacked understanding of molecules and atoms, misused "nutrients", and conflated matter, energy and nutrients. For the fourth, "scale", students in different groups both omitted the atomic-molecular scale, and used the language of the atomic-molecular scale in ways that treat molecules as macroscopic materials. Example: I: Okay. So could you divide the pictures into groups in terms of how matter changes during the event? S: They're kind of -they would probably be in the same -in about the same way since they would be using -their matter may be going -they would be creating matter.

Example:
I: Of the three [carbon dioxide, gasoline, and dead wood], which would you say has the most stored energy? S: [carbon dioxide, gasoline] have more stored energy maybe because the deadwood now that it is dead is like lost its energy.

Example:
I: Can you explain each group [baby girl growing, tree growing, girl jumping; flame burning, car running; tree decaying]? S: The matter during baby girl growing, tree growing, and girl jumping changes and increases. The matter during flame burning and car running burns up and disappears. The matter during the tree decaying also decreases.

Example:
I: Why did you think dead wood, sand, glass, oxygen and CO2 have less stored energy? S: Because the dead wood is already dead, I don't think it plays a great role in the nature..

Scale
Patterns: (1) Omit the atomic-molecular scale and rely on macroscopic descriptions by naming only materials and processes to explain phenomena.
Patterns: (2) Use the language of the atomic-molecular scale, but in ways that treat molecules as macroscopic materials that either stay intact or change from one material to another during processes.

Example:
I: So where does the energy go inside the tree? S: Well, when it is absorbed through the leaves it has photosynthesis in the leaves and it basically becomes food for the tree.

Example:
I: So how does [the tree] use CO2 to grow? S: Well that's like our oxygen. We breathe in and we breathe out carbon dioxide. It breathes in the carbon dioxide and then breathes out oxygen. It's kind of like its lungs takes it in and then uses that as part of its nutrients and for it to stay alive longer.

Example:
I: How does the tree use sunlight to grow? S: It uses sunlight to do photosynthesis to produce nutrients.

Example:
I: Do you think that anything is going out of the leaf cells as the leaf grows? S: Waste and oxygen. I: How is the oxygen produced? S: The plant breathes in CO2 and turns it into oxygen in chloroplasts.
There were two key differences in the ways Chinese and American students reasoned about carbon-transforming processes ( Table 7). Example 2: I: Okay. Do you think that anything is going into or out of the leaf cells as the leaf grows? S: Yes. I think when the cells expand, it gets a space in between like the cells walls are getting thinner and thinner and thinner because they're expanding...

Example 1: I: What do the leaf cells do as the leaf grows? S: The numbers of cells increase. The substances in cells increase.
I: Which substances will increase? S: For example, vacuole and chloroplast.

Example 2: I: What do the leaf cells do as the leaf grows? S: The numbers of cells will increase.
Note: "I" represents "Interviewer", "S" represents "Students".

Structure and function vs. hierarchy of structures:
One general pattern is that American students gave more explanations that related structures to functions, while Chinese students frequently described hierarchies of structures, meaning their answers included more structures at different scales rather than how those structures work.

Size vs. Numbers:
Another general pattern is that when students explain how organisms grow, American students described things growing mainly because they expand or get bigger, while Chinese students explained the main reason why organisms grow was because the numbers of substances in them increased.

DISCUSSION AND IMPLICATIONS
The results suggested that the American and Chinese students both have some problems constructing model-based explanations of carbon-transforming processes. In this section, we hypothesize about reasons for these problems, suggest solutions to improve students' understanding of carbon-transforming processes, and propose improvements in science education in each country.

Discussion
Some conclusions were drawn from the similarities and differences between the American and Chinese students, and the underlying causes for them are discussed.
Most American and Chinese students do not trace matter and energy separately when they explain carbon-transforming processes.
One reason students struggle with tracing matter and energy separately is that they have difficulty connecting knowledge in one discipline to the knowledge in another. This is especially true for energy, where students learn energy concepts and energy conservation principles in their physical science classes, then have problems applying the energy-related knowledge to biology. Secondly, it is difficult for students to understand how energy transformation works. They learn simple organic reactions, and chemical change which happens in simple physical contexts, but chemical change in complex biological contexts is challenging and requires teacher assistance. Finally, the crosscutting concept (energy and matter: flows, cycles, and conservation) is not emphasized in the curriculum, so teachers rarely emphasize helping students understand how energy flows in biological systems. Therefore, students struggle to trace energy in biological contexts.

American and Chinese students need to learn how to connect systems at different scales when they construct explanations of carbon-transforming processes.
We suggest one reason students have problem connecting systems at different scales is because the crosscutting concepts (especially, scale, proportion and quantity; systems and system models) are less privileged in curriculum and students have only limited opportunities to understand and apply those crosscutting concepts in biological systems. In thinking scientifically about processes and systems, students need to recognize that biological processes and systems happen in a hierarchy of systems at different scales.

American and Chinese students have different explanatory ideals for structures and growth of organisms.
Previous research (Toulmin, 1961;Hesse & Anderson, 1992)  all structures at different scales inside cells, while American students think naming core structures and describing functions of those structures is a good explanation. The difference in explaining how organisms grow suggested size-related explanations for growth are explanatory ideals for American students, while for Chinese ones number is much more important than size.
Therefore, American and Chinese students have some different explanatory ideals when they explain the structures and growth of organisms. Thus when we think about what a good explanation is, one thing we need to consider is the effect of school curricula in different countries on students' explanatory ideals.
Chinese students are more reluctant to write their informal ideas.
The results showed that one of the differences between American and Chinese students is more Chinese students chose to leave the question blank when they did not know the answer, while more American students were willing to answer questions, even if incorrect. The American and Chinese students have different cultures and science education systems. Our assumption is that the Chinese culture and exam system may have a strong impact on Chinese students' responses.
Education is aligned with culture because education is itself a component of culture. Since traditional Chinese culture holds Confucian culture at its core, Confucianism influences every aspect of traditional Chinese education, from educational philosophies and values to educational content and methodologies of education (Gu, 2013). The Confucian-heritage culture and educational values have both positive and negative sides.
Regarding the negative sides, some scholars have commented that one word could summarize Chinese education over thousands of years: obedience (Gu, 2013). Overemphasis on obedience means that students are hesitant to think, speak, or explore new paths. Therefore, we assume that lots of Chinese students in our study may feel uncomfortable venturing a new idea and worry that they may be incorrect if they give a new idea, so left the answer blank.
Additionally, the exam-oriented education system in China emphasizes high-stakes testing, and the aim of learning mainly focuses on passing examinations. In China, students face numerous examinations as soon as they start their schooling (Qi, 2004). Examinations play a pivotal role in student success. Focusing solely on exams often comes at the cost of students losing their critical thinking, imaginations and creativities (Schmitz, 2011). In the Chinese exam system, mastery of scientific knowledge is more privileged, while innovation is less privileged. In addition, exams in China are primarily summative assessments that focus on formal scientific knowledge. Although there are some tasks and tests in classrooms that elicit students' informal understanding and explore their initial thinking about things that happen in their daily life, students' informal understanding and formative assessment are less privileged.
However, the tests in the Carbon TIME project focus on students' informal and formal reasoning, innovative thinking, explanations about new phenomena, and using models to address new problems. The tests in the Carbon TIME project and typical Chinese exams thus have different goals and expectations for students.
Therefore, the Chinese students who are used to exams that emphasize contributions of scientific knowledge in formal language may be more likely to leave a question blank rather than venture an idea based on their informal knowledge.
However, we must note that the assessment instructions given to American and Chinese students were different, which may explain why many Chinese students did not write anything as answer for items. The instructions for the American students encouraged students to express their ideas and write anything that they wanted to say, while the instructions for the Chinese students just mentioned that students needed to write answers on the answer sheet, rather than also encouraging students to express their thinking and ideas boldly.
Chinese students need to know more about the connections between science and environmental or social impacts through some science topics.
Another difference between the American and Chinese students in this work was more Chinese students wrote less about the relationship between ecosystems and human beings than American students. The relationship between human activities and the release of carbon dioxide, and the negative effect of the excessive amount of carbon dioxide on the global ecosystem, were rarely included less in the textbooks of Chinese students. In order to better develop students' environmental literacy, it's imperative for Chinese science educators to emphasize science topics that have important environmental and social impacts.

Implications
There are many similarities between the ways that Chinese and American students make sense of carbon-transforming processes, suggesting that learning progression (LP) frameworks and assessments developed in one country can be useful in the other. Moreover, neither country has middle school curricula and teaching strategies that are successfully enabling three-dimensional learning for most students, giving us opportunities to reach our potential for working together to develop three-dimensional learning in different countries. The significant differences in performances between pretests and posttests for American students in this research suggest the Carbon TIME curriculum is effective for improving American students' three-dimensional learning, implying this curriculum could also work for Chinese students.
The American science education reform movement emphasizes three-dimensional learning, which is the integration of content knowledge, crosscutting concepts and practices. Our research indicates two notable challenges for students when constructing model-based explanations: They do not trace matter and energy separately or connect systems at different scales, which are both crosscutting concepts in the three-dimensional learning framework that is used for the American Next Generation Science Standards (Achieve, 2013).
Currently, curriculum reform movements in the United States emphasize that students need opportunities to experience the use of crosscutting concepts and practices in multiple contexts in order to develop their capacities to address new problems, which is an important goal of science learning (National Research Council, 2000, 2007. Our work suggests curricula should emphasize integrating the crosscutting concepts into the curriculum and engaging students in practices using crosscutting concepts.
Based on our work, Chinese science teaching and learning can also benefit from this approach because many Chinese students also struggle to use crosscutting concepts, and practice using crosscutting concepts or rules they are not familiar with to address novel issues that they have never met before. Chinese educators need to think about whether they should care about and carry out the three-dimensional learning.
Furthermore, to make improvements in science education in each country, another needed reform is to improve education assessment systems. One way to do this is to use learning progressions. The assessments in the Carbon TIME project interpret students' responses using the learning progression framework; the assessments also use tools and activities to grade students' responses and give students the summative tests.
Learning progressions focus on students' informal and formal scientific ideas and provide useful tools for formative and summative assessments.
However, because educational assessment systems are being perceived less as a technical matter of measurement and more as a sociocultural practice of teachers and students in the classroom, assessment systems are embedded in social and cultural contexts. Many American teachers still struggle to understand three-dimensional learning and effectively apply it to classrooms in a scientifically rigorous way (National Research Council, 2000, 2007Thompson, Hagenah, Kang, Stroupe, Braaten, Colley, & Windschitl, 2016). For example, regarding formative assessment, we need to think how to transfer the formative assessment used to improve science teaching and learning in the Western educational systems to the Chinese context. Thomas, J., Kim, J. H., & Draney, K. (2018). Machine scoring and IRT analysis. Paper presented at the annual meeting of NARST, Atlanta, GA. Thompson, J., Hagenah, S., Kang, H., Stroupe, D., Braaten, M., Colley, C., & Windschitl, M. (2016). Rigor and responsiveness in classroom activity. Teachers College Record, 118 (5)

Appendix A: Interview Protocol
Interview Protocol for Carbon TIME Project

Preparation for Interview
Before you start, clearly articulate to yourself what specific information needs to be gathered about the student's learning in order to be useful for research. This helps you keep clear focus on the intent of each question. As an interviewer, it may be useful to ask clarifying and follow-up questions to the student that are unscripted in order to fully investigate their thinking. Examples of good questions are "what do you mean by that?" "Could you summarize that answer for me again?" "Could you tell me in terms of [matter and energy] or [atoms and molecules]? Examples of bad questions are "And the name of that process is…?" "Remember how we did something like this in class?" These are very leading questions and tend to preclude many possible types of student responses.
Choose a setting with little distraction. Don't conduct multiple interviews in one room. Please make sure that the camcorder captures your interviewee's voice clearly. If an interview has very bad sound quality, it will not be useful for analysis. Please make sure that you find a quiet room for interview.
If you use an external microphone with the camera, don't forget to turn it on! If you have time, check for sound quality by recording a short segment and playing it back. Be sure that both you and the student can be heard clearly.
Possible things to explain to the student: • The purpose of the interview is to understand how you explain processes in nature at this point in your learning.
Don't be concerned about answering the question correctly as this will not be graded. I'd like you to tell me all that you can about what you know, and how you know it, even if you're unsure. • I can't give you feedback about right and wrong during the interview, but at the end of the interview I can answer any questions you may have about our conversation. • I will ask you sets of related questions, and I might take some time while I think about your response. I may also ask questions that sound repetitive, but I'm just trying to be sure that I've covered all the questions that I need to ask. • This interview should take about 45 minutes. Do you have any questions before we get started?

I. Matter Association Questions: How Materials are Alike and Different
Purpose of the questions: How do students connect and relate different items? What kinds of materials do students relate to one another, and why? Do they associate organic and inorganic materials together? Or do they see items as connected in other ways? How do they understand the role of energy in common systems? Reminder: Be sure to say aloud how the student grouped the pictures so that it will be recorded in the written transcript. Also say the name of the picture aloud any time a student is referring to one card in particular. ii. "Can you think of ways that these materials are different with respect to energy? How?"

II. General Tracing Questions: Tree Growing
Purpose of the questions: We want to know how students explain trees (and plants more generally) growing. What "enablers" do students think are needed for the tree to grow? How do students think about trees and gas-exchange? Upper level students will know that the majority of tree biomass came from carbon in the air. Water from the soil is an important input, as are small amounts of minerals from the soil (many students dramatically overestimate the amount of soil minerals taken up by plants [Use the cards for cross process questions (6 cards-car running, tree growing, baby girl growing, girl jumping, tree decaying, flame burning). Show the 6 cards and tell the student what is happening in each card. Explain: Each of these pictures is about an event: Something is happening.]

VI. Cross Process Questions: Ecosphere
Purpose of the questions: Students may be able to talk about particular processes like photosynthesis or respiration, but have a difficult time applying these processes to an ecosystem setting. Connections between multiple processes are important to understanding the movement of matter and the flow of energy in ecosystems. Upper-level students may be able to say that: The algae, shrimp and bacteria rely on each other to survive. The algae photosynthesize by taking CO 2 from the air and create new cells as they grow. The shrimp eat the algae and produce CO 2 as they respire and nutrients as waste. The bacteria eat the waste from the shrimp and also produce CO 2 as they respire. All of the organisms need water and oxygen and mineral nutrients, all of which are recycled throughout the ecosphere.
The ecosphere does exchange energy with the outside environment rather than just recycle energy within the ecosphere. Light allows the algae to photosynthesize and store energy in carbon bonds. All of the organisms use energy as they grow, move and metabolize, and energy is released from the ecosphere as heat that is produced whenever organic carbon changes form. [If the answer is yes] "What energy goes into the EcoSphere? What energy comes out?" 5. "If I put the EcoSphere in a dark room for one week, what do you think will happen? Why?"

VII. Vocabulary and Plant Structure Questions
Purpose of the questions: Students often struggle to talk about the way that organisms are organized into different systems at different scales, including cells, molecules and atoms. In this question you find out how well a student can "dig in" to a leaf down through the hierarchy of scales, and how they understand the processes of growth and gas exchange at different scales. During growth, leaf cells "make themselves" from water, minerals, and CO 2 . During gas exchange in the leaf, the carbon atom from CO 2 gets incorporated into sugars, which then gets made into lots of different types of molecules that make up a plant.