Rasch Analysis for Disposition Levels of Computational Thinking Instrument Among Secondary School Students

Computational thinking is a strategy of thinking to tackle complex problems. There is a paucity of conceptualization and instruments that cogitate on computational thinking disposition and attitudes. This study reacts to these constraints by establishing an instrument to test computational thinking related dispositions and attitudes. The computational thinking disposition Instrument is an indicator of student’s disposition towards computational thinking in daily life. The objective of this study is to investigate the psychometric features using Rasch model. Data of 535 form four computer science students in Malaysia were obtained. Instrument consists of 55 core measures in three domains: cognitive, affective and conative. The Rasch analysis indicated good psychometric features of the instrument. In these three domains no items showed disordered thresholds and the reliability was good. As a result, the Rasch analysis provides basis for cautious optimism permitting more detailed and finer level investigation of the instrument.


INTRODUCTION
Computational Thinking (CT) is a universal attitude and skill set that should be included in every child's repertoire, making it a critical competency that spans practically all subjects. CT demystify problem solving, designing systems, and not to be missed understanding human behavior by drawing on the concepts fundamental to computer science (Wing, 2006). Thus, computer-based learning is one approach of teaching and learning that has been shown to improve Higher-Order Thinking Skills (HOTS) (Salihuddin et al., 2016); nevertheless, thinking abilities alone are insufficient (Chongo et al., 2020). It necessitates a problem-solving strategy and a tool for problem solving using CT. In general, computer science's objectives include inspiring students to go beyond the screen and explore how computers work and how to solve a variety of problems (Syso & Kwiatkowska, 2015). Olabe et al. (2014) discovered that novel teaching approaches, such as simple Scratch programming, demonstrated a capacity to solve real-world challenges.
In recent times, emphasis has been placed on teaching children to think like computer scientists (i.e., computational thinking) and on the importance of prioritizing computer science principles in elementary and secondary schools (Barr & Stephenson, 2011;College Board, 2014;Wing, 2006;Yadav et al. 2014). However, despite the high interest in developing CT among school children and the large investment in CT initiatives, there are a range of issues and challenges to incorporate CT in the school curriculum (Bocconi et al., 2016). Consequently, the education sector faces rising pressure to demonstrate the use of computational thinking in everyday life through the use of computing tools as it is one of the contributing factors before entering the industry. Students who are exposed to CT as part of their education may begin to perceive connections between academic subjects and life in and out of the classroom. Since 2006, the concept of CT as a capability, a set of skills, and a mindset that every child should acquire has gained traction and importance.
However, adaptation of CT concepts in everyday life are not going to be easy and require thorough study (Sondakh et al., 2020a). Most of the attention on embedding CT during the past decade has centered on integration of CT skill in students with only little prominence about their perception, feeling or attitude towards the application of CT in problem solving across 2 / 15 various discipline or specifically in daily life (Sondakh et al., 2020b). Thus, development of an instrument to measure students' disposition towards CT is required. The new CT disposition tool is presently being developed to assess students' dispositions. In statistics, empirical evidence is particularly important for building new instruments. The purpose of this study is to determine the validity and reliability of a scale that can be used to assess students' disposition levels in secondary schools.

LITERATURE REVIEW
CT is a newer curriculum field in this digital culture that has very quickly to be adapted into classrooms. The researchers were unable to anticipate all the issues that arise before implementation (Belanger et al., 2018). Although there has been a broad discussion demystifying pedagogical aspects of CT, the study on assessing CT skills and attitude continues to take place. The attitude which is developed by using CT should be improved to analyses systematic approaches and complex problems (Qiu, 2009). Looking at studies on the past 5 years (2016 to 2020), it became evident that minimal studies were devoted to address the issue of CT disposition between students (Haseski et al., 2018;Jong et al., 2020). CT, likewise, is not only characterized by skills, but also by attitudes (Wing, 2006). Some researchers view CT as a subset of the critical thinking skills required in today's society (Tang et al., 2019), the complexity of CT prompts others to dig deeper, implying a more comprehensive understanding of CT as disposition (Wing, 2008). Nurturing students to be self-directed problem solvers in a digital world and arming them with CT skills and knowledge may not be sufficient. While CT makes use of coding knowledge to solve problems, it does not account for the disposition to apply these competencies to pertinent problems. Thus, researchers argue for the importance of CT dispositions as a motivator for persistently distinguishing complex real-world problems and seeking efficient solutions via coding (Abdullah et al., 2012;Denning, 2009). In other words, the CT disposition concept encompasses both the psychological and cognitive dimensions of computational problem solving. The National Research Council (NRC) stated that specific thinking skills are positively correlated with an internal motivation to think and are constituents of specific thinking dispositions (NRC, 2011). As a result, good thinkers typically possess both thinking abilities and a disposition toward the way of thinking . Although it has been suggested that CT should be integrated into K-12 classrooms in order to foster students' dispositions toward CT, a validated measure of CT dispositions appears to be lacking.
Measuring attitude related to CT is required because there does not yet exist any widely adopted standardized assessments (Haseski et al., 2018;Sondakh et al., 2020a;Weese, 2016). As a result, it's unsurprising that CT evaluation continues to be a significant weakness in this sector. There is no commonly agreed method for assessing CT, making it difficult to assess the impact of interventions correctly and objectively (Grover & Pea, 2013;Kim et al., 2013;Settle et al., 2012;Shute et al., 2017). Nevertheless, there is always an urge to distinguish ways to envision the measurement of CT across all disciplines. Consequently, the issue of assessment in current studies was found lacking compared to the studies investigating approaches to teach CT (Sondakh et al., 2020b). On the other hand, an instrument from the west may not suitable for use in Malaysia due to cross cultural differences. Indeed, the obstacles confronting each individual from a variety of countries, institutions, and levels of education are unique.
Hence, a total of 241 items are created, leaving 143 items after the content validity phase. Then, 87 items underwent a first pilot test incorporating factor analysis and then followed by the second pilot test using Rasch model which remain 55 items finally. This study will determine which final items best fit the requirements of the Rasch model. Georg Rasch founded the Rasch measurement model in 1960. It is a comprehensive statistical method with unique mathematical properties based on a parameter model that combines the difficulty level of items and the respondents' capabilities, as well as interactions between the two on a similar logit scale (Aziz et al., 2015).
Additionally, the Rasch measurement model can convert four Likert scale alternatives for each ordinalscale data item to a scale proportional to the size of the logits unit. As a result, the pupils would have no difficulty making judgments that are most

Contribution to the literature
• To raise awareness to the disposition level of computational thinking. We have clarified the concept "computational thinking" in the aspect of disposition framework. • We have developed an instrument with good psychometric properties to measure students' disposition level of computational thinking. • The study revealed the need to develop an instrument to measure students' disposition to computational thinking. This study has the potential to generate more knowledge and literature on students' CT disposition. There are very few empirical studies in this regard.

/ 15
representative of themselves. At the Rasch analytics stage, calibration between students and their responses to the items can be used to verify the compatibility of each item produced in the model, hence avoiding an item recurrence on the same measure (Wright & MOK, 2004). Only high-quality items are preserved for subsequent testing by precise testing based on the MNSQ value in the range of 0.60 to 1.40 (Bond & Fox, 2007), the PTMEA CORR value, and local independence. Additionally, local independence can ensure that each produced item measures a distinct latent construct and does not overlap with other constructed items (Baghaei, 2008).
As a result, the research gap can be resolved by conducting psychometric feature testing on the development of instrument items and applying rigorous empirical analytic techniques such as the Rasch model. The Rasch model has undoubtedly attracted the attention of numerous researchers' both domestically and internationally in order to validate the item on their instrument's development (Balsamo et al., 2014;Othman et al., 2014). This article examines the validity and reliability of measuring instruments using the Rasch model's three core assumptions, namely item fit, unidimensionality, and local independence. The primary objective of this research is to ascertain the dispositions that secondary school students exhibit when practicing CT in their daily lives. Thus, it is thought that endorsing an item for each CT disposition construct using the Rasch model is capable of improving item quality measurement.

CT DISPOSITION FRAMEWORK
Long-term involvement in computational practices with an emphasis on the CT process, as well as ample learning opportunities in a motivating environment, are required to cultivate CT dispositions (Brennan & Resnick, 2012). CT disposition recognized as the values, motivations, feelings, stereotypes, and attitudes applicable to CT (Barr & Stephenson, 2011). Furthermore, disposition is a person's consistent internal motivation to act toward CT, or to respond to, persons, events, or circumstances in habitual, yet potentially malleable ways (CSTA, 2017).
Since dispositions have been recognized as a psychological construct, social psychologists classify disposition as "an attitudinal tendency" (Facione, 2000;Facione et al., 1994;Sands et al., 2018). Despite this, dispositions are often being defined as a "cast or habit of the mind" or "frame of mind" which is necessary for exercising critical thinking (Beyer, 1995). In addition, theorists argue that thinking requires something more fundamental than knowledge or skill i.e., a set of dispositions (Beyer, 1988;Norris & Ennis, 1989). Therefore, in this study, disposition is defined as internal motivation and a combination of attitudes, values and beliefs which comprises, dispositional thinking theory (Beyer, 1995), Tripartite classification of mind (Hilgard, 1980) and Tricomponent attitude model (Schiffman et al., 2012) to form a theoretical framework for this study. Figure 1 describes the integration of cognitive, affective and conative components which accomplish neither the three modes of mental functioning nor modes of attitudes to distinguish the disposition construct towards CT.

Research Design
The study took a quantitative approach, focusing on a cross-sectional quantitative survey. The quantitative technique was used for this study because it enables the collection and analysis of data in a numerical framework to explain the phenomena being studied (Gay & Mills, 2018). The data was collected via a self-administered internet survey since it is less expensive, requires no Additionally, it is simple to administer and capable of collecting detailed and ordered data (Creswell, 2012;Creswell & Creswell, 2018). Thus, the data are nearly immediately ready for statistical analysis (Hair et al., 2017). The data collection method was an online survey in which participants were required to respond to all items before submitting their responses; this eliminated the potential of missing data.

Study Sample
In this study, the researchers have employed probability sampling. Sampling is intended to be applied to selected individuals because they have experiences at the center of the phenomenon (Creswell, 2009). Probability sampling techniques employ some type of random selection and allow for the calculation of sampling error, hence reducing selection bias. Thus, 535 secondary school students with a background in computer science were surveyed for this study, with 247 males (46%) and 288 females (54%). They were selected using simple probability sampling from four zones: north (80; 15%), east (80; 15%), west (252; 47%), and south (123; 23%). To proceed with the data collection procedure, authorization from the Ministry of Education is required. Henceforth, the researcher must obtain permission from the principal before meeting with the respondents. Respondents were invited to participate in research until the required sample size was attained. The respondents were required to meet the following criteria: 1. have a background in computer science, 2. be willing to complete questionnaires, and 3. be able to complete items online.
The number of respondents in the field study is sufficient in accordance with Linacre's recommendation (Linacre, 1994), who specified a minimum requirement of 108 respondents for polytomous data with a 99 percent confidence interval and a calibration value of 0.5 logits in order to implement the Rasch measurement model analysis. The Rasch measurement model was used to evaluate the data in this study to determine item fit, polarity, local independence, unidimensionality, item-individual map, reliability, and separation index for both items and respondents. The overall response rate for surveys distributed is 92 percent. This response rate is deemed sufficient because it exceeds 90% and is consistent with previous research (Marret & Choo, 2017;Masek & Nasaruddin, 2016).

Instrument Development
To develop the instrument, we adopted Miller and Lovler's (2019) scale development guideline, which divided the test development process into 11 steps. The first step in the instrument development is to define the testing universe, the target audience and the purpose of the test. The testing universe is the body of knowledge that the test represents, the target audience is the group of individuals who will take the test, and the purpose of the test is the information that the test will provide to the test user. This stage provides the foundation for all other development activities. Accordingly, the test plan specifies the characteristics of the test, including an operational definition of the construct and content to be measured, the format of the questions, and the administration and scoring of the test. Then, on the third step we choose the item format whether objective or subjective based on information in the test plan. After writing the test items, we administer them as a test, with appropriate instructions for the test administrator and test taker, to a sample of the target audience. This test provides objective data to help determine whether the items yield the desired information in order to choose the effective items only. Consequently, administration instructions will be designed as a guidance for the person administering the test, another for the person taking the test and a third for the person scoring the test and interpreting its results.
We then follow up the pilot test with other studies that provide the necessary data for validation and norming. Thus, conducting the pilot test and analyzing its data are an integral part of the test development process. Quantitative item analysis examines how well each test item performs. Subsequently, in revision of the test step, items are dropped based on their consistency, difficulty, discrimination and bias until a final form of the test is reached. After, the test has been revised, we conduct the validation study by administering the test to another sample of pupil. Standards for the validation study are similar to those for designing the pilot study.
The validation process provides sufficient information on reliability and validity. After validation is complete, we develop norms (distribution of test scores used for interpreting an individual's test score) and cut scores. At the end of the validation process, the test manual is assembled and finalized. Figure 2 depicts the development process.
Each student self-assessed the instrument in Malay. The instrument is composed of 55 items. Three constructs of CT disposition are included in the instrument (cognitive, affective, and conative). The instrument is scored on a four-point Likert scale, with 1 indicating "strongly disagree" and 4 indicating "strongly agree." One week is allotted for completing the questionnaires. The raw scores for scales are calculated using the mean score. To begin constructing the scale, a literature review and interviews with experts (professional and lay) were conducted, and a list of the characteristics that a 'person' must possess was compiled. The list was then transformed into statements describing behaviors that students may evaluate. Students indicate their level of agreement with the items using a four-point Likert scale (1=strongly disagree, 2=disagree, 3=agree, and 4=strongly agree). A middle point was omitted from this instrument (Sumintono & Widhiarso, 2014) to avoid respondents responding without making a choice (Fisher, 2006). The scale is better appropriate for this study's usage of the Rasch model than the standard scoring approach.
With 55 items, the scale is intended to assess three primary domains of disposition. A linguist and two educational professionals reviewed the drafted scale for clarity, language, spelling, and punctuation issues. After making the necessary modifications, an instrument consisting of 55 items was constructed. The following are the findings from the validity and reliability evaluations of the data.

Rasch Model
The data were analyzed using the Rasch measurement model and suitable to evaluate and assess an instrument's psychometric qualities in terms of validity and reliability. Software WINSTEPS version 3.71.0 (Linacre, 2007) was used to analyze the following aspects of item functionality: 1. item fit based on in-fit and outfit values in the range of 0.60 to 1.4 logits (Bond & Fox, 2007 The reliability factor can be calculated using a good internal consistency value (Cronbach's alpha), which is considered acceptable when it surpasses 0.7 (Nunnally & Bernstein, 1994). The item separation index is used to describe a range of item difficulty levels, whereas the individual separation index is used to describe a range of students' ability levels when responding to questionnaires.

RESULTS AND ANALYSIS
The psychometric properties of the instrument were determined using the Rasch measurement modal. A total of 55 items were evaluated for item fit, polarity, local independence, unidimensionality, reliability index, and separation index, based on three constructs. Items that fit and contribute to the psychometric features of the instrument were kept, while items that did not fit were submitted for revision or elimination (Linacre, 2010). Additionally, the Rasch measurement model may be used to assess the adequacy of the Likert scale employed in this study using Linacre's six criteria (Linacre, 2002). Table 1 contains descriptive statistics on the mean and standard deviation of three CT disposition constructs. The highest degree of CT disposition was discovered to be affective (M=3.2604, SD=0.5213), followed by cognitive (M=3.2407, SD= 0.4885). Conative, on the other hand, exhibits the lowest level with (M=3.1409, SD=0.5473). The highest mean score on the affective dimension reflects respondents' excitement, interest, awareness, and empowerment to learn and apply CT in daily life. Finally, responders' cognitive abilities reveal that they are capable of acquiring CT knowledge, creative thinking, value, and perception. The conative level with the lowest mean score reflects persistence, tolerance, collaboration abilities, and still self-confidence in practicing CT in daily life. This score indicates an individual's eagerness to learn more about CT in depth. Thus, the findings emphasize construct validity, or the degree to which the questions on an instrument correspond to the corresponding theoretical construct (DeVon et al., 2007). Additionally, it contains  conclusions regarding the dimensionality of subconstructs and validation of the conceptual framework's structure.

Item Fit
Item fit refers to the analysis of the Rasch measurement model's fit for each item in a questionnaire (Ariffin, 2008). Fit statistics is the criteria of mean square (MNSQ) to identify the information weighted (in-fit) and outlier-sensitive (outfit). MNSQ values are ranging from zero to infinity with an expected value of 1. Items outside the range of value of MNSQ are considered over-fit or misfit. Over-fit means the items are too predictive while misfit means the items are erratic (Bond & Fox, 2007). As per Likert-scaled polytomous data used in this investigation, the mean square MNSQ value chosen was between 0.60 and 1.4 (Bond & Fox, 2007). Meanwhile, the productive Zstd value ranged between 2.0 and +2.0 (Bond & Fox, 2007), and this value can be discarded if the MNSQ value is accepted (Linacre, 2005). The MNSQ score is calculated using both in-fit and outfit values. This ensures that the items that fit the model are considered in the subsequent analysis, while misfit items do not contribute to the measurement of constructs and are thus considered weak. All values are in the suggested range; it can be concluded that there is no item that need to be removed. The value varied between 0.81 and 1.31 logits. The standard error value for the data was observed to be between 0.08 and 0.10, which refers to the element of precision in a calculation (Linacre, 2005). Fisher (2007) considered the error value's range to be excellent.

Polarity
Additionally, item fit can be determined based on the polarity of the item by calculating the PTMEA CORR value. This value refers to a collection of items that all measure the same construct, assuming that the items all measure the same construct (Bond & Fox, 2007). The PTMEA CORR value achieved in this study was between 0.43 and 0.68, which was within the minimal value of 0.3 (Wu & Adam, 2007). Item C42 "I am confident of being able to solve difficult problems by thinking computationally", represent the maximum polarity index meanwhile minimum PTMEA CORR index is 0.43 of item C1 "I appreciate the group members' contributions during problem solving" also from the same construct namely, conative. The positive PTMEA CORR score indicated that the retained items could contribute to the instrument's psychometric features, allowing it to distinguish computer science students. In addition, this indicates that all the items used are parallel to the measurement of CT disposition. Table 2

Local Independence
The following feature of item measurement is local item dependence analysis. Local item dependence is often quantified using the standardized residual correlation value between two items, which should not exceed 0.3 (Balsamo et al., 2014). Likewise, if the correlation between two items is greater than 0.7, only one item is kept and the other is excluded from the model (Linacre, 2005). The retained item will be determined using the MNSQ value, which should be close to or equal to 1.0 (Bond & Fox, 2015;Linacre, 2005), as this value represents the predicted value for model fit (Aziz et al., 2015). This procedure is taken to ensure that retained items do not duplicate existing ones (Matore et al., 2020). Ten matching residual correlation coefficients ranging from 0.29 to 0.47 are shown in Table 3. Correlation values greater than 0.3 were preserved because the association remained within the approved range of 0.7 (Aziz et al., 2015) and the pair of items was within the same construct. None of the items breach the 0.70 limit indicating item independence in instrument.

Gender Differential Item Functioning (GDIF)
The purpose of this analysis was to determine the presence of gender differential item functioning (GDIF) in the instrument used. Winstep uses a two-tailed t-test to determine the significance of the difference between two index difficulties when analyzing GDIF. For all DIF analyses, the confidence level is 95 percent and the level of t critical value is 2.0. Additionally, the GDIF Contrast index is utilized to demonstrate the difference in gap confirmation levels between males and females when males and females are compared. According to Lai and Eton (2002), the Likert scale requires a value of 0.5 logits DIF contrast. Meanwhile, in Pallant and Tenant (2007), Wright and Panchalakesan suggest that GDIFs with a size less than 0.5 logits are regarded inconsequential. A low GDIF Contrast index indicates that the item is more easily affirmed by female respondents. DIF Measurement is the difficulty index of this group while all other variables are kept constant. The DIF contrast results indicate that 11 out of 55 items illustrate the relevance of GDIF in terms of t ≥±2 logit value. However, the GDIF contrast (±0.5 logits) indicates that 11 items do not exhibit significant GDIF, as indicated by the GDIF index being less than 0.5 logit. The GDIF Contrast value ranges between -0.41 and 0.49. As such, it is identified that 55 items remain. Items that passed this GDIF analysis demonstrated that they satisfy the disposition testing element of fairness. The study uses DIF to identify all 55 items that did not exhibit evidence of injustice when a group of students with varying skill levels of the same sex was compared.

Unidimensionality
Compliance with the unidimensionality assumption indicates that a collection of items in the designed instrument measures only a single construct (Wright & Master, 1982). Unidimensionality is a detection of the construct validity in the test that has been developed. Items should test constructs which measure a single dimension only. Furthermore, unidimensionality is very important to measure the internal consistency of the instrument using the principal component analysis (PCA). The PCA was determined to be 40.7 % of variance explained by measures, which was close to model estimates of 40.5 % and was sufficient in comparison to the minimal value given by Linacre (2012), which is 40%. Additionally, unexplained variance in the first contrast was 4.4 %, and values less than 5% are well accepted (Linacre, 2007(Linacre, , 2016. Additionally, the reported variance's Eigenvalue was 4.1 and less than 5.0 (Linacre, 2005), indicating the absence of a second dimension. Additionally, Table 4 indicates that the unexplained variation in first to fifth contrast was between 3% and 5%, which is also considered very good (Linacre, 2007). Likewise, the ratio of variation described by item size (17.9%) to variance explained by the first component (4.4%) was 4.07, exceeding the three-ratio minimum value (Conrad et al., 2012;Embretson & Reise, 2000).

Reliability Index
The instrument's reliability index is provided in Table 4. The interpretation of person reliability is equivalent to Alpha Cronbach or KR20 (Wright & Master, 1982). Cronbach's alpha is 0.97, and item reliability is 0.98, both of which are considered excellent values (Nunnally & Bernstein, 1994). In this study, the reliability index for respondents is 0.94, which is an acceptable range (Pallant, 2001;Sekaran, 2003) which could expect consistency level personal situation arrangement in the log scale if this sample answers different set item, but to measure that the same construct (Wright & Masters, 1982).

Separation Index
The instrument items' reliability was determined using the index of person separation, which is comparable to Cronbach's alpha. The term "person separation" refers to the process of classifying persons and estimating how well a measure can separate individuals on a construct. The presence of a high degree of person separation or stratification (two distinct levels of performance, i.e., high and low, that can be separated based on test scores, person reliability of 0.7) indicates that the measure may be sensitive to distinguishing between high and low performers. Separation of items is used to validate the item hierarchy. The presence of a high degree of item separation or stratification (three items representing three distinct levels of difficulty, namely high, medium, and low; item reliability of 0.9) indicates that the person sample is sufficiently large to corroborate the item difficulty hierarchy (Linacre, 2017). Individual separation index is recorded at 6.88, as per in Table 5, which means there were 7 ability level of respondent's ability level and is regarded adequate when it exceeds 2 (Fox & Jones, 1998;Linacre, 2012). Meanwhile, Table 6 represents the item separation index was 3.97, which is considered acceptable. This means the scale can be statistically differentiated into 4 difficulty levels. Increased item separation index values imply a more effective separation of items of varied difficulty. Separation is dependent on item reliability (Wright & Masters, 1982). This outcome confirms Linacre's (2005) assertion that separation indexes of two and above indicate greater reliability. As a result, the instrument has a wide spread when it comes to determining the level of CT disposition. Additionally, it demonstrates that the tool is measuring what it is designed to measure, thereby establishing its validity.

Scale Review
The Rasch measurement model can be used to determine the efficacy of a scale used in an instrument based on six specified criteria (Linacre, 2002). The first condition is that each concept contain at least ten observations, which was met in this study. With regards to the second condition, each scale must exhibit a probability curve peak, as illustrated in Figure 3.
The study's use of a four-point Likert scale also met the third condition, as the average measure of each category grew in lockstep with the scale level, increasing by (1) -3.65 logits, (2) -1.42 logits, (3) 1.26 logits, and (4) 3.94 logits. This demonstrates typical, consistent, and steadily rising response patterns (Matore et al., 2020). Following that, all outfit MNSQ values were within the range of 0.97 to 1.20, which meets the fourth condition of outfit MNSQ values being fewer than 2.00 logits. Concerning the fifth criterion, the threshold values of 2.47, 0.34, and +2.81 were ordered in an orderly fashion, indicating that there was no bias in the selection of any category of the scale utilized as shown in Table 7 and  Table 8. The sixth requirement specifies that each scale's restriction should be between 1.00 and 5.00 for a four-    Table  9, revealed that the difference between each scale category exceeded one and fell within the range of five.

Item-Individual Map
The item-individual map depicts the distribution mapping of the items and respondents in this study on a similar logits scale following the calibration process. In terms of instrument creation or questionnaires, all of the items and respondents depicted in Figure 4 demonstrates varying degrees of difficulty in terms of respondents' agreement with each item (Perera et al., 2018). This mapping is extremely beneficial since it enables researchers to optimize the psychometric qualities of the instrument, they have constructed item (Perera et al., 2018).
The standard deviation values for the items (1.488 to +1.05) indicated that the difficulty level measurement is within the acceptable range of +3.00 to 3.00 (Andrich & Styles, 2004;Hill & Koekemoer, 2013, Linacre, 1994. The student position with the highest arrangement was +8.17 logits, and the student is a male. The student was the easiest to agree on all of the developed instrument's elements. Meanwhile, the lowest rank was held by a female student with a logit value of -2.07, indicating that the student was the most difficult to agree on these items. Additionally, based on the item position hierarchy, item K38 (+1.05 logits) was the most difficult for students to agree on. Item K38 represents "I am capable of suggesting a concept that has never been considered by anyone," equating to the cognitive construct. For the construct of conative, the lowest item position was represented by item C1 (1.488 logits). As a result, this issue was the most easily agreed upon by the students. Item C1 states, "I appreciate the group members' contributions during problem solving." This demonstrates that pupils are capable of expressing their gratitude and respect for their peers. According to Figure 3, the left side of the logits scale represents the order of item difficulty levels, while the right side shows the respondents' position, namely, computer science students. In general, the mean of respondents (2.23 logits) was found to be greater than the mean of items (0.00 logits). This demonstrates that the student can more easily agree on the instrument's items.

DISCUSSION
We detailed the development of a scale to assess secondary school students' disposition toward CT. Item response theory (IRT) method was selected to apply the Rasch measurement model to analyze each test item to determine the validity and reliability of the instrument. To begin, the instrument had good psychometric qualities. The study established strong reliabilities for the construct. During the pilot test, the CT disposition constructs were empirically validated in Malaysia using EFA and Rasch analysis.
Our findings in this investigation substantiated unidimensionality at the scale level. The summation of raw item scores into an interpretable total scale score is acceptable since each component's items all measure the same latent characteristic. Compliance with the unidimensionality assumption is critical in the Rasch measurement model, which is based on the premise that the items in this instrument have a single capacity (Sumintono & Widhiarso, 2014). This is an early indicator of the construct validity at the level of field study.
In general, the estimated abilities of individuals and the difficulty of items spread rather evenly around the logit continuum. However, psychometrically, the items on this scale were insufficient to capture these participants at their high ability levels. In the latter instance, additional "difficult" items at the high ability tiers are required. Given that the CTDI was developed for use in the context of computer science education, there is a strong requirement to clearly differentiate participants at the most self-or other-focused levels.
We explored gender differences in differential item functioning (DIF). To summarise, DIF happens when individuals with the same aptitude level respond differently to an item simply because they belong to distinct groups. In other words, a DIF item is a question that has been skewed by a particular set of people. All items in this study were DIF-free, allowing for meaningful comparisons across groups. These findings established a foundation for further testing of the DIF with more samples, and researchers should proceed cautiosly when conducting worldwide comparisons using this instrument.  Table 9. Revision scale check Scale Gaps calculation Range of acceptance Decision S1 -S2 0.00 -(-2.47) 1.00 < 2.47 < 5.00 Accepted S2 -S3 -0.34 -(-2.47) 1.00 < 2.13< 5.00 Accepted S3 -S4 2.81 -(-0.34) 1.00 < 3.15 < 5.00 Accepted Figure 4. Item-individual map In addition, Person separation index (PSI) and Cronbach's alpha values were within the margins of reliability. PSI is derived using logit-transformed individual estimations, whereas Cronbach's alpha is calculated using raw scores. PSI is equivalent to Cronbach's alpha when the distribution is normal. PSI and alpha values more than 0.7 are typically regarded adequate (Fisher, 1992;Tavakol & Dennick, 2011). The findings have answered all the possibilities designed to examine the suitability of the items. The item reliability is high and this means the items are stable.

Limitations and Future Directions
The current research had some limitations, which also gave directions for further research. The study's primary limitation is that it was limited to secondary school pupils in a single country, Malaysia. However, we drew our conclusions from highly cited literature on CT in a variety of domains. As a result, the instrument should be applicable to additional domains. Second, caution is advised when applying this instrument to other situations, and more testing with samples from other cultural groups is necessary. Additionally, when extending this instrument to other contexts, it is necessary to investigate differential item functioning in order to draw relevant comparisons. Additionally, replications in various nations would bolster the relevance of the study across diverse countries. Finally, other types of validity, such as convergent and discriminant validity, could be investigated in future studies, however they are beyond the scope of this research. Comparing research across different tests may also provide a more holistic psychometric assessment of the findings from multiple angles. Not only will this analysis influence subsequent analyses, but it may also improve the psychometric qualities of the items. Most critically, the researcher must match appropriate dispositions to pupils in the Malaysian environment. Nonetheless, this questionnaire does not yet cover all of the characteristics listed in the literature, and it is probable that some pertinent variables were omitted. Future study could build on this work by examining additional elements of CT dispositions.

CONCLUSIONS
In summary, the data from each item of the CTDI met the Rasch model's assumptions. Each of the 55 items was maintained. Each item demonstrated a good performance of item fit, polarity, and local independence. This work adds to the body of research about CT teaching and learning by providing a more comprehensive overview of CT dispositions and attitudes, as well as their impact on their readiness to participate in digital workplaces. This is important in order to adhere to a variety of computational thinking concepts across the K-12 curriculum. Rasch analysis validated the applicability of the CTDI as an instrument for assessing emerging students' attitudes regarding CT in daily life, particularly in the educational context.