Interactive computer assessment and analysis of students ’ ability in scientific modeling

Scientific modeling (SM) is a core scientific practice and critical for students ’ scientific literacy. Previous research has not used interactive computer assessment to investigate students ’ SM ability. This study aimed to explore an effective way in human-computer interaction to reveal the challenges faced by students in the four-element process of constructing, using, evaluating, and revising models. Contextualized in the solar system, eleven interactive tasks assessed 419 students in grades 4, 7


INTRODUCTION
Since information technology (IT) penetrates all aspects of our everyday lives nowadays, IT's revolutionary influence naturally also affects educational assessment. Program for International Student Assessment (PISA) (OECD, 2019), Trends in International Mathematics and Science Study (TIMSS) (Mullis & Martin, 2017), and National Assessment of Educational Progress (NAEP) (NAGB, 2019) have implemented the human-computer interaction (HCI)based method, also known as interactive computer assessment (ICA), which is indicative of the international reform in educational assessment (Farrell & Rushby, 2016;Zainuddin et al., 2020). Compared with paper-andpencil tests used for decades, ICA is still relatively new. However, ICA did not appear spontaneously but results from many empirical studies in the past half a century. Studies have shown ICA's unique value in assessing higher-order thinking and abilities, and even attitudes (e.g., Baker & O'Neil, 2002;Kuo et al., 2019;Seifried et al., 2020). ICA has advantages over traditional modes of testing with respect to economy, safety, and convenience in implementation (e.g., Ko & Cheng, 2008;Nissen et al., 2018;Terzis et al., 2012). In the past 30 years, international education assessment reform has highlighted the interaction between artificial intelligence and the human brain in ICA tasks (Burkhardt & Pead, 2003;Silva et al., 2021;Thelwall, 2000;Thurlow et al., 2010). HCI in ICA enables a more intelligent assessment of student literacy to promote the quality education in the 21st century more than ever before.
Scientific modeling (SM) is a scientific practice that involves higher-order thinking and is often viewed as a core component of scientific literacy (Schwarz et al., 2009). Science has been described as ''the process of constructing predictive and conceptual models for predicting or explaining relevant phenomena,'' which means that science is essentially a SM process (Gilbert & Justi, 2016). In this view, to improve students' scientific literacy, science education and assessment should focus on students' SM ability (Schwarz et al., 2022). Accordingly, the PISA framework for science assessment has the specific requirement of identifying, using, and generating explanatory models and representations (OECD, 2019, p. 108). The TIMSS science assessment requires student to collect data, present and organize relevant data in various intuitive ways, and explore the relationship between variables in the data (Mullis & Martin, 2017, p. 55). NAEP requires explaining relevant scientific principles, connecting different principles, and identifying theoretical models' data types (NAGB, 2019). China's science curriculum standards also include requirements regarding students' SM ability. For example, the Physics Curriculum Standards for high schools emphasize model-based teaching to contribute to students' key competencies of scientific thinking (MOE, 2017). China's national science education quality monitoring used a paper-and-pencil test in 2017 to assess students' ability to understand and use scientific models for explaining phenomena and solving problems. However, the number of items assessing SM was small and students' performance was poor (MOE, 2018).
Previous research has indicated that providing students with visual representations can facilitate students' SM (Barak & Hussein-Farraj, 2013;Chang, 2022;Chittleborough & Treagust, 2008). The computerbased interactive tasks can present visual representations of phenomena and provide timely feedback to students' performance, which can create a supportive SM environment for students. This study used an ICA tool around SM with Chinese students in compulsory education to investigate and promote the students' SM ability through a data-driven approach. Furthermore, previous studies have indicated that students' SM ability develops with maturity (Fortus et al., 2016;Pierson et al., 2017;Schwarz et al., 2012), but the relationship between other individual factors and SM is largely unclear to date. It is worthwhile to examine individual differences from the four-element process of SM. Hence, the research questions for this study were: 1. How do the grades 4, 7, and 10 students perform in the process of SM? 2. Does the SM performance of students vary by individual differences in gender, grade, time devoted to and interest in science learning? 3. Is the timely feedback in ICA tool effective in promoting student modeling?

Cognitive processes of scientific modeling
SM is broadly referring to the practice of constructing and using models to explain or predict corresponding phenomena (NRC, 2012). Models are a systematic abstraction, simplification, and characterization of a phenomenon (Justi & Gilbert, 2002). Judging from preexisting research, SM is complicated and challenging for students (Dori & Kaberman, 2012;Ruppert et al., 2019). Halloun (1996) divided students' SM process into five elements: the selection, construction, validation, analysis, and deployment of models. Students must first select a model framework according to their purpose of use, then reorganize the components and structure of the selected model to construct their own models, then use a variety of methods to assess these models, then use the models to analyze data collected about the phenomena or problems, and finally use the models to explain the phenomena or solve the problems. Schwarz et al. (2009) further refined the SM process of students from the perspective of cognitive processing as a ''four-element'' process of construction, usage, evaluation, and revision of a scientific model. Schwarz et al. (2009) defined the first element as model construction, which integrates the model selection and construction of Halloun's (1996) theory. Schwarz et al. (2009) deemed that students built an initial model according to their prior knowledge. Students may refer to an existing similar model or not. They may need to detect the related components in the phenomenon and define the structure to build a new model. Second, students need to use their initial models to explain or predict the relevant phenomena, i.e., model use. Students should clarify the structure and function of the model they have constructed through practices such as phenomenon interpretation and problem-solving. Third, students evaluate their own models based on their explanatory or predictive power, i.e., model evaluation. Students should apply the model to a new problem to identify the strengths and weaknesses of their own models. Fourth, students consider various evidence or new phenomena to revise their models according to the findings of model

Contribution to the literature
• This study developed and validated an effective ICA tool diagnosing students' performance during the four-element process of scientific modeling. • Model evaluation and model revision were more challenging for students than model use and model construction. • Only grade significantly predicted student modeling ability, while student interest in and time devoted to science learning did not, revealing a need for change in Chinese science classrooms.
3 / 13 evaluation, i.e., model revision. Students should summarize and reflect on their models to further refine and improve their modeling products. In other words, students start from their own knowledge to develop their own models of a phenomenon through preliminary construction, usage, evaluation, and final revision.
This "four-element" process is consistent with the concept of interaction learning based on practice proposed by evidence-based practice and design-based practice (Cohen, 2011). The four elements also support high-quality instruction. Model-based instruction is not a manual activity for students or the studying of teachermade models. Only by constructing, using, and correcting their own models can students develop their understanding of related phenomena (Peel et al., 2019;Zangori et al., 2017). For example, when students evaluate and revise models, they should consider the components which characterize the phenomenon's important characteristics, the relationship between these components, and the rules governing these relationships (Krell et al., 2015). Therefore, the theory of the four elements of SM is pedagogically sound and practical. Schwarz et al. (2009Schwarz et al. ( , 2012 assessed students' SM from two dimensions, generative and dynamic. The generative dimension describes "the reflective practice for how models predict or explain aspects of phenomena when models are constructed or used," which takes SM as a generative tool for predicting or explaining the phenomena. The dynamic dimension describes "reflective practice for when and how models need to change when students evaluate and revise them," which reflects SM is related to an individual's understanding of scientific knowledge and the nature of science. Schwarz et al. (2012) create four sub-dimensions that fit both the generative and dynamic dimensions. The first subdimension is ''attention to the model's level of abstraction,'' which describes models constructed by students as literal depictions of, or abstract generalizations of phenomena. The second is ''attention to audience and communication clarity,'' which describes how well students attend their audience's knowledge of the phenomena being modeled and the components that comprise their models. The third is ''attention to evidence or authority,'' which describes the nature of the evidence or support used in students' models. The last sub-dimension is the ''nature of the relationship between model and phenomena,'' which describes how well the model could explain and predict the mechanism and process of phenomena.
Drawing on the generative and dynamic dimensions of Schwarz et al. (2009Schwarz et al. ( , 2012, and follow-up studies that presented some more detailed sub-dimensions (e.g., Fortus et al., 2016;Pierson et al., 2017), this study took the four-element theory as the process of SM and also referred to the two dimensions of Schwarz et al. to assess students' SM ability.

Assessment methods for scientific modeling
Preexisting research into students' model-based learning is mainly based on cognitive information processing theory (Gagné, 1975) and has used interviews (e.g., Grosslight et al., 1991;Justi & Gilbert, 2002, questionnaires (e.g., Chang & Chiu, 2009;Fortus et al., 2016), and classroom observations (e.g., Ke & Schwarz, 2021;Pierson et al., 2017). In early studies, some researchers (e.g., Chang & Chiu, 2009;Halloun, 1996;Lin & Chiu, 2008) used questionnaires to assess students' SM ability and their disciplinary knowledge. Based on the study of Halloun (1996), Chang and Chiu (2009) constructed a modeling structure with 6 elements including model selection, model construction, model validation, model analysis, model deployment, and model reconstruction. The researchers used this structure to test grade 10 students' SM ability and their knowledge of the ''galvanic cell,'' then analyzed and described students' performance level.
To assess grade 8 students' SM process and products, Pierson et al. (2017) used a combination of interviews, paper-and-pencil tests, classroom observations, and performance-based assessments. The study followed the students for a semester, during which students focused on the questions: (1) why is the soil in a garden moister in some places than in others? (2) how do roots interact with the surrounding environment? (3) how can plants help our communities? During the course, students learnt in the classroom, participated in the measurement of data in a garden, and used the data to construct, evaluate, and revise their own models. The longitudinal data collected by the researchers included video recordings of classroom, audio recordings of the researcher's conversations with students, students' written reflections on SM, students' SM products, and the researcher's field notes during the classroom observation. Students were also interviewed and tested at the end of the course.
In the study by Pierson et al. (2017), students constructed multiple different models, including diagrammatic models which reflected the relationship between roots and the environment, physical models which represented the physiological activities of plants, and simulation-based computational models developed by runnable simulation, which simulated the growth of roots. Pierson et al. (2017) indicated that students' computational models reflected most of their SM ability.
To summarize, most studies assessed students' knowledge of models rather than their SM practices and ability. Some studies have indicated that context is important in the assessment of SM ability and that the computational models are useful for assessing SM ability. Therefore, this study used computers to simulate specific contexts and designed interactive tasks to test 4 / 13 students' SM ability to engage in the four-element process of SM.

Key Points for Designing Interactive Computer Assessment
Unlike paper-and-pencil tests, ICA provides twoway feedback: it can not only provide users feedback on their performance, but also ''learn'' from the users' responses and make changes accordingly (Carlson, 1994). Hence, the purpose of ICA is not to simply digitize paper-and-pencil tests, nor to replace performancebased assessments in real situations, but to make full use of HCI's advantages to assess students' higher-order thinking and practices more effectively (Kuo et al., 2019;Zainuddin et al., 2020). To do so, the ICA tool needs to match the assessment content and its representations with users' cognition to reflect their abilities.

Task analysis
A challenge for ICA is developing the interactive tasks, which includes to determine the indicators and requirements for assessment and ''task analysis'' (Crystal & Ellington, 2004) to design the script for the ICA software development. Three main task analysis methods have been used to date: technical methods, conceptual methods, and work process methods.
Technical methods take a behavior-based perspective (Annett & Duncan, 1967;Kadir & Broberg, 2021) and use hierarchical task analysis (HTA). HTA will analyze and represent tasks' behavioral aspects (Fyiaz et al., 2018), then categorize tasks, break them into subtasks, and check the tasks' overall accuracy.
Conceptual methods focus on analyzing the ''black box'' of cognition (Mason et al., 2019). One of the main techniques is cognitive modeling analysis (Norman, 2008), which creates valuable insights into ''natural mappings'' between cognition and interface. Another technique is the model human processor (Card et al., 1983;Kitajima & Toyota, 2012), consisting of three interacting systems (i.e., perceptual, motor, and cognitive), which aims to map out the restrictions imposed on behavior by the nature and features of the task environment and to define what and when users know about the task. Cognitive task analysis (Hoffman & Militello, 2012) targets more abstract, advanced cognitive functions of the tasks, which requires some subject-matter experts to deeply engage in the particular knowledge domain to analyze various tasks (Chipman et al., 2000).
Work process methods take the perspective of ''activity'' to do task analysis (Bedny & Meister, 1999;Panteli & Kirschen, 2015). Activities are inherently context-sensitive (Nieuwenhuis et al., 2005;Otto & Vassena, 2021). The technique of activity analysis focuses on the entire activity process in a specific context, which needs to balance the activity's efficiency and effectiveness and establish the most economically optimized context through empirical research (Hashim & Jones, 2014).
This study integrated the perspectives of the above three main methods for task analysis. Technical methods' perspective was taken in the hierarchical decomposition of ICA tasks based on the cognitive process of SM. Conceptual methods' perspective was adopted to design the matching between the tasks and the participants' cognitive interaction according to the participants' existing cognitive level. Work process methods' perspective was put in the arrangement of the overall task flow according to the four-element process of SM.

Usability test
Interactive task development should fully consider the users' features such as age, experience, interest to make tasks useful, easy, and fun (Nikou & Economides, 2019). If the task interface is complicated and confusing, the users are prone to negative psychological effects, and their responses will be affected (Guler et al., 2014). Both boys and girls are willing to engage the test if the contents are useful, interesting, and clear (Terzis et al., 2012). The attractive colors, simple and clear words, and readable tasks were employed to build the friendly interface to engage students in the ICA in this study.
The ''usability test'' (Albert et al., 2010) is conducted during the process of ICA task development in this study. Usability testing is a ''user-centric'' technical approach (Preece et al., 2002;Stragier et al., 2013). By observing, recording, analyzing, and judging the users' responses, usability tests detect the users' effectiveness, efficiency, and satisfaction to finish the tasks. The effectiveness indicates the tasks' reliability and validity. Efficiency analysis is to assess whether the resources used by the ICA software are optimized to promote users to achieve their performance. Satisfaction analysis observes the users' experience and feelings to eliminate discomfort, tension, anxiety, and other interfering factors. The ''think aloud'' method (Padilla & Leighton, 2017) in one-on-one interview was taken in this study to let users express their feelings about the test, their thoughts on problem-solving, and their evaluation for tasks, etc. The ICA software was modified according to data analysis on usability test results. Then another round of usability test started again until a satisfactory software product was produced.

Influencing factors
During the task development process, this study also focused on controlling these factors such as gender, computer familiarity, computer self-efficacy, computerbased test expectations, computer anxiety, and computer hardware features which affect ICA assessments (Bahar & Asil, 2018;Skryabin et al., 2015). Boys are more 5 / 13 attracted to ICA if the assessment has practicality. For girls, the simpler the operation system, the more they can be motivated to invest in ICA (Terzis & Economides, 2011). The participants of this study had experienced in computer operation and computer-based assessments, because the first experience with ICA will affect students' expectations of success, which in turn influences the effort students invest in the assessment process (Hewson & Charlton, 2019;Meyer et al., 2016;Timmers & Veldkamp, 2011). Students with different computer self-efficacy levels show significantly different ICA performance (Hewson & Charlton, 2019;Liaw & Huang, 2012). Computer anxiety can interact with test anxiety, leading to poorer performance (Lu et al., 2016;Norris et al., 2007). For the same ICA assessment, the higher the image resolution, the better the students' performance (Guler et al., 2014;Wang et al., 2018). Hence, the images were preferred in designing the task interfaces, the operation requirements were simple with dragging the mouse to answer questions in this study.

METHOD Participants
Participants were recruited in the study via a convenience sampling. First, a call for volunteer schools was issued to partner schools of the first corresponding author's university, elaborating the objective of the test and the requirements for participating schools and students. There were 21 schools responded initially. After further confirmation of the computer facilities and available date, 10 schools were selected from five Chinese provinces of Heilongjiang, Hunan, Shandong, Shanxi, and Zhejiang. All participating schools had classroom equipped with internet accessible computers and their students had experienced computer-based assessment. The teachers who were specifically responsible for the test received an instruction manual and online training. In addition, each school was required to double check the network and computers the day before the test and confirmed that there were no problems with the facilities.
Second, another call for research volunteers was delivered to students at participating schools. Regarding of the time and labor costs, we only released the call to 4 th -, 7 th -, and 10 th -grade students who represented the primary, middle, and high school students respectively, to explore the grade differences in student SM performance. A total of 603 students volunteered to take the test. However, some schools were still disconnected during the test, or the interface was too slow to load, certain students did not submit their answers successfully. We finally received responses from 462 students. After excluding vacancies or repeated submissions, 419 valid student-response were obtained, with a reliability of Cronbach's α=.732. There were 168 students in grade 4, 199 in grade 7, and 52 in grade 10; 230 were girls, and 189 were boys.

Construct of Scientific Modeling Assessment
Combined with specific scenarios and tasks, the construct of ICA task for assessing SM is illustrated above (see Table 1). The solar system was employed as the context for the test because the galaxies and planets are a type of macroscopic models in space science. It has been indicated that SM ability is crucial for understanding space science phenomena and concepts (Plummer et al., 2016(Plummer et al., , 2022Sung & Oh, 2018), which  (8) Students reflect on their responses in the previous steps to further revise their models means modeling the solar system is valid to test students' SM ability. In China, space science is not one of the high school entrance examination disciplines, and as its presence in the university entrance examination is also very low. As a result, most students know little about space science in general and the solar system in particular. Thus, modeling the solar system can be a SM task for students with reliability, rather than rotting memory. Furthermore, students were informed of the brightness of the planets with different spatial positions and sizes in the way of computer prompts. With these prompts, students' tasks in the assessment were to construct, use, evaluate and revise corresponding models towards phenomena. Table 1, the interactive tasks scaffolded students' SM practice. Students first constructed their own initial solar system model based on their prior knowledge (i.e., items 1 and 2 assess students' model construction). Then, students used their models to explain a relevant phenomenon (i.e., items 3, 4, and 9 assess model use). Next, students used new information to reflect on and evaluate their own models (i.e., items 5 and 7 assess model evaluation). Finally, students built off this new information to revise their models (i.e., items 6, 8, 10, and 11 assess model revision). Students were required to consider eight planets comprehensively to respond to the series of tasks. For each planet with right position and association in the solar system in each task, students got one point. The full score of each item was 8 from items 1 to 10, while 9 for item 11 was due to a new (virtual) planet was added to the scene.

Interactive Computer Assessment Tasks Design
The ICA development process of this study is shown in Figure 1. The first step was task analysis. The assessment was divided into four major parts with a total of 11 items (see Table 1). Students were asked first to consider the planets' far and near positions to construct the model and gradually combine various planetary distance and brightness information. The students then used the model to predict and interpret newly discovered planets in the solar system, including considering the planets' sizes to evaluate and revise the model.
In the second step, to promote students' comfort with the assessment, the interface was designed to be simple, the text was concise and clear, the pictures were soft in color, the feedback was positive and gentle, and there were question-and-answer prompts to avoid students' anxiety (see Figure 2 and Figure 3 for the test guide interface and operation description).
Except for the two model evaluation items, which required students to type text, all other items only required students to move and click a computer mouse. In this way we minimized the influence of the students' computer familiarity on their performance. The computer provided students with information in three ways: one was providing task information to support students' modeling process; second was providing timely feedback for the students to reflect upon when they used the model; the third was to provide an opportunity to re-answer to promote students' reflection and revision during the SM practice.
There were three rounds of the usability tests. After the task script was formed, two students in grade 4, two students in grade 7, and two students in grade 10 were tested individually and interviewed by the researchers to revise the items' assessment points and textual expressions. After converting the script into ICA software, the second round of tests and interviews was conducted, this time focusing on students' thoughts on the interface, answering experience, problem understanding, problem-solving ideas, etc. When the software was developed, a pilot test (α=.701) was done with a selection of students from grades 4, 7, and 10 using a think aloud approach. Further revisions were made to the tasks based on the pilot test data.

Items for Interest in Science Learning
Interest in science learning was measured with six four-point Likert-type items to exam students' attitude towards science and science learning (Cronbach's α=.881). Two of the items assessed attitudes towards science, and four were about making judgments on their own science learning. For example, students were demanded to self-report their attitude towards "I think science is interesting" or their judgement towards "Learning science is important to me because it always stimulates my thinking" on a scale from 1=''totally disagree'' to 4=''totally agree.''

Data Collection
The ICA tool was deployed with automatic recording. Participants answered in their classrooms equipped with computers for 40 minutes. In addition to the 11 items, there were three sections which inquired into students' basic information (school, gender, grade), the hours they devoted to science learning every day and their interest in science learning. IT teachers in charge of the test opened each computer to present the task interface before the test. During the test, the teachers asked students to raise their hands to indicate that they had finished answering all items and waited for teachers to check their submission before leaving their seats.

Data Analysis
Data was first summarized by means and standard deviations (SDs) for the four-element scores of SM. Then, multiple linear regression was used to test the dependency of the four-element scores of SM on gender, grade, science learning time, and science learning interest. Shapiro-Wilk test was conducted to test the normality of scores obtained before and after feedback. Wilcoxon rank-sum test was used to test the effect of the feedback information of ICA. All analyses were conducted with SPSS 25.0. Table 2 indicates that the mean scores of students in "model construction" and "model use" were higher than those of "model evaluation" and "model revision" and that the mean score of "model evaluation" was clearly the lowest, for all grades. Figure 5 show students' time for and interest in science learning. It could be seen from that the number of students in grade 10 who studied science for 1~2 hours and more than 3 hours per day was relatively high, and the attitude of thinking that science and learning science were important was also relatively strong than other graders.

Analysis of Related Factors of Scientific Modeling
The results of the correlation analysis showed that students' grade, science learning time, and science learning interest were significantly and positively correlated with their SM ability (p<.05), while gender was not significantly correlated with SM ability (p=.804).
The results of the multiple linear regression analysis are shown in Table 3. The Durbin-Watson value was 1.605, between 1.5 and 2.5, indicating this study's sample was strongly independent. All VIF values in each dimension were less than five, verifying the absence of collinearity between the variables. The adjusted R² was  .204, indicating that the four independent variables in the study explained 20.4% of the students' SM ability; there were likely other important factors that were not addressed in this study. Among the four independent variables, only the grade significantly predicted the SM ability (p<.001).
One-way analysis of variance (ANOVA) was used to further test the difference among three grades. F-test result (p=.613>.05) examined the homogeneity of variance, indicating ANOVA was appropriate. ANOVA results were significant (F=54.344; df=2, 416; p<.001) and a post-comparison using LSD indicated that all three groups were significantly different from the others (p<.001). The SM ability of the senior students was significantly higher than that of the junior students which was significantly higher than that of the primary students.

The Effect of Feedback Information
The option of "answer again" in the test allowed students to improve their models based on the feedback that the system gave to them. Students' second answers on the items that had a re-answer option (items 3, 4, 6, 8, 9, and 10) were compared to the original responses to determine whether students improved their performance when answering again.
The total number of re-answered answers was 930. A Shapiro-Wilk normality test was performed on the data, indicating that the data was non-normally distributed (p<.001). Therefore, the initial and repeated response results were compared using the Wilcoxon rank-sum test (see Table 4). A significant difference between the two responses was identified (p<.001). When students answered the second time, after receiving feedback, their models improved.

CONCLUSIONS AND DISCUSSION
The originality of the study lies in developing an ICA to assess students' SM ability based on the four-element process of SM through task analysis and usability test approaches. The results show that the most challenging task in the process of SM is model evaluation, followed by model revision. Considering that the test in this study demands students to enter text in response to the tasks of model evaluation, which may have increased the item difficulty over the multiple-choice tasks in the other three elements, the conclusion drawn forth on the SM challenge for students is that model evaluation and model revision are more difficult than model construction and model use. Previous studies have also found that grade 10 students have poor performance in evaluating a model (Chang & Chiu, 2009) and very few grade 6 students are able to revise models (Bamberger & Davis, 2013).
The results of this study also show that individual maturity contributes to students' SM ability, but not gender, science learning time nor science learning interest. These results seem to indicate that spending more time learning science does not appear to improve students' SM ability. One possibility might be that students have fewer learning opportunities for SM. The report for China's national science education quality monitoring in 2017 tells that 4 th and 8 th grade students have insufficient scientific practice opportunities and poor performance in SM (MOE, 2018). The results of the study also seem to indicate that like or dislike of science plays little role in SM ability. But the finding is possible if students have relatively few opportunities for scientific practices in their science learning which may result in students disconnecting SM from science. The findings of the study may be an indication that what is being taught in the classrooms needs to change.
Previous studies have found that reflection is crucial for SM (e.g., Fortus et al., 2016;Pierson et al., 2017;Schwarz et al., 2009Schwarz et al., , 2012. The results of this study indicate that the interaction with re-answer according to feedback can promote students' in-depth reflection and performance in SM. The findings from this study can motivate science educators to take more advantage of HCI to create immediate feedback and re-answer opportunities to advance students' learning, and thus promote the cultivation of scientific practices for each student in response to the mission at this historical moment. The study has the following several limitations. The first is that the study does not collect the data and analyze the influence of individual computer selfefficacy, computer anxiety, and other factors on the test from the physiopsychological perspective, so it is impossible to know whether these factors affect the SM ability between different grades. Second, many school computers cannot be used for the test due to the browser's requirements to get the test interface, limiting the samples in this study. Third, the item type is relatively single. In the four-element process of SM, multiple-choice questions can be added to the model evaluation part, and the model use part can be designed  This present study is the initial research of the SM ICA. Further studies need to focus on the following two aspects. One is to integrate multi-technical methods to collect multi-modal data to explore the complicated SM process. For example, the computerized wearable psychophysiological measurement equipment such as a heart rate variability monitoring and feedback system, can be used to record the changes in the difference in the heartbeat cycle of students to assist in the analysis of the cognitive challenges encountered by the students in the SM process. Non-invasive brain imaging techniques, such as functional near-infrared spectroscopy, can also collect students' cerebral cortex responses and combine the ICA test results to analyze students' cognitive load during the four-element process. In addition, the automated scoring system and teamwork ICA for SM can also be studied in the future. All in all, looking forward to more research on the in-depth development of quality education with smarter assessment to promote the cultivation of scientific literacy for each student. Funding: This study was supported by the National Natural Science Foundation of China with a grant awarded to Jing Lin (Grant No. 62077008), and Collaborative Innovation Center of Assessment for Basic Education Quality Foundation (China) with a grant awarded to Jing Lin (Grant No. 2021-01-103-BZK01). Acknowledgements: The authors would like to thank the students, teachers, and principals involved in this study for their collaboration. The authors would also like to thank to the support from Beijing Normal University and National Natural Science Foundation of China. Ethical statement: Informed consent was assured with participants. Declaration of interest: No conflict of interest is declared by authors. Data sharing statement: Data supporting the findings and conclusions are available upon request from the corresponding author.