The Physics Problem-Solving Taxonomy (PPST): Development and Application for Evaluating Student Learning

This study addresses the development and evaluation of the Physics Problem-Solving Taxonomy (PPST), comprising five levels: retrieval, diagnosis, strategy, conceptual, and creative thinking. The taxonomy draws on Bloom’s revised taxonomy in the cognitive domain, the Types of Knowledge Taxonomy, and the Problem-Solving Taxonomy in engineering. The study includes applying PPST to analyze the content of the Israeli national physics exam (the Bagrut), student Bagrut scores (n = 18,000), and student answers to a school-level physics exam (n = 164). The findings indicate that in both the Bagrut and the school exam, the higher an item ranks on PPST, the lower the students’ grades on this item. In addition, the distribution of student scores on the two exams is similar, indicating high reliability and validity of the PPST scale. This tool could help physics teachers to rank difficulty levels of the high school physics exam questions, and create high school physics questions, to foster students’ proficiency in physics problem solving.


INTRODUCTION
It is widely agreed that a major objective of science education, especially physics teaching, is to foster student proficiency in problem solving. Current educational requirements indicate that introductory science courses in particular should emphasize skill building in quantitative and qualitative problem solving, along with developing a knowledge base (American Association for the Advancement of Science, 1993;National Research Council, 1996;Teodorescu, Bennhold, & Feldman, 2008). However, research in the cognitive sciences over the years has revealed the complex and dynamic character of the problem-solving process. Fabby and Koenig (2015) pointed out that many physics problems are complex. They require students to integrate content knowledge with critical thinking to determine what the problem is asking so they may determine the best approach to resolve it. On the other hand, many students enrolled in introductory physics courses are novice problem solvers who may memorize problem types or simply apply a set of solutions to problems with similar surface features. To address these issues, it is essential to provide educators with tools that can help them carefully design the types and difficulty levels of problems presented to students in class, homework, or exams that allow learners to progress gradually from solving simple problems to coping with more challenging questions and problems. One way to handle this topic is using educational taxonomy to design class teaching and evaluation. In this paper, we present the theoretical framework and practical process of developing the Physics Problem-Solving Taxonomy (PPST) and its application for analyzing high-school students' achievements in the Israeli national physics exam, called the Bagrut, and in a school physics exam. Recommendations about using the PPST for curriculum development and school teaching are also proposed.

LITERATURE REVIEW
In this chapter, we review a number of educational taxonomies and discuss the rationale for the development of PPST addressed in this paper.

Revised Bloom's Taxonomy in the Cognitive Dimension
In the 1990s, Bloom's student Anderson initiated a revision meeting in which cognitive psychologists and curriculum and instruction experts worked to improve the classifications of educational objectives in the cognitive domain. In 2001, a new version of cognitive classification was published (Anderson & Krathwohl, 2001) in the hope of strengthening the links among curriculum design, instructional activities, and assessment. The revised version consists of a two-dimensional table: the horizontal dimension is a modification of Bloom's taxonomy in the cognitive domain, with verb forms replacing the noun forms of the original category labels. It includes the following six categories: 1. Remembering: Retrieving, recognizing, and recalling relevant knowledge from long-term memory; 2. Understanding: Constructing meaning from oral, written, and graphic messages through interpreting, exemplifying, classifying, summarizing, inferring, comparing, and explaining; 3. Applying: Carrying out or using a procedure through executing or implementing; 4. Analyzing: Breaking material into constituent parts, determining how the parts relate to one another and to an overall structure or purpose through differentiating, organizing, and attributing; 5. Evaluating: Making judgments based on criteria and standards through checking and critiquing; 6. Creating: Putting elements together to form a coherent or functional whole; reorganizing.

Types of Knowledge Taxonomy
The scholars who developed the revised version of Bloom's taxonomy (Anderson & Krathwohl, 2001) also proposed a second dimension to the original cognitive scale, the knowledge dimension, comprised of four categories: factual (declarative) knowledge, procedural knowledge, conceptual knowledge, and metacognition knowledge. However, unlike the six levels of the cognitive scale, the four categories in the knowledge taxonomy are not hierarchical (Blumberg, 2009). Later, several authors used this taxonomy or close versions of it separately from the original scale in the cognitive domain to analyze teaching and learning subjects, such as mathematics (Voutsina,

Contribution of this paper to the literature
• This study addresses development and evaluation of the Physics Problem Solving Taxonomy (PPST) and contributes a new taxonomy to the literature for teaching science.
• The paper demonstrates examples of PPST ranking for common physics questions and how the taxonomy was used to analyze student achievement on the Israeli national high-school physics exam and a school physics exam.
• The PPST could help teachers design problems of different difficulty levels and improve learners' abilities to solve problems based on conceptual understanding of physics rather than procedural solutions of exercises.

EURASIA J Math Sci and Tech Ed
3 / 16 2012), science (Leppävirta, Kettunen, & Sihvola, 2011), and technology (McCormick, 1997, 2004. Let us closely examine the definitions of the four categories in the knowledge taxonomy and how they apply to teaching science and technology (Barak, 2013).

•
Factual knowledge. Factual knowledge (also called declarative knowledge, descriptive knowledge, or propositional knowledge) is the part of knowledge that describes information, such as names of people, places, dates, and events. In the context of science and technology, factual knowledge includes knowledge of terminology, names or symbols of components, technical vocabulary, or names of processes. Although factual knowledge appears as surface-level knowledge, it is the foundation upon which all other types of knowledge are built. The educators who articulated this taxonomy emphasized the need for instructors to help students use factual knowledge in constructing or enhancing their conceptual and procedural knowledge (Anderson & Krathwohl, 2001).

•
Procedural knowledge. Procedural knowledge is the discipline-specific knowledge of skills, algorithms, techniques, or methods. It often involves a series of logical steps and includes knowledge of the criteria that determine when to use various procedures. According to Hiebert and Lefevre (1986), procedural knowledge includes formal language and symbol representation systems, as well as the algorithms and rules involved in doing something. In mathematics, engineering, and technology, this type of knowledge includes, for instance, techniques, procedures, routines, protocols, and given courses of action. For example, there are common methods and procedures to calculate the current in an electric circuit, choose a motor for an electro-mechanical system, or write a computer program in a specific programming environment. Procedural knowledge also includes methods to compare different solutions to a problem and choose the optimal one.

•
Conceptual knowledge. Conceptual knowledge is the knowledge of classifications and categories, principles, generalizations, theories, models, and structures. It is more complex and organized than factual knowledge and reflects a deep understanding of content. Rittle-Johnson and Koedinger (2005) articulated that properly structured knowledge requires people to integrate their contextual, conceptual, and procedural knowledge within a domain. Whereas conceptual knowledge provides an abstract understanding of the principles and relations between pieces of knowledge in a certain domain, procedural knowledge is about "how to do" something, enabling us to quickly and efficiently solve problems. According to Kilpatrick, Swafford, and Findell (2001), procedural knowledge in mathematics is the knowledge of when and how to use a procedure; conceptual knowledge is about understanding concepts, operations, and connections among interrelated constructs. Conceptual understanding within the area of mathematical physical functions, for example, involves the ability to translate the different representations, tables, graphs, symbols, or real-world situations of a function (Davis, 2005). Hiebert and Lefevre (1986, p. 3) described conceptual knowledge as "rich in relationships, a connected web of knowledge, a network in which both the links and clusters of knowledge are important." McCormick (1997), as well as Groth and Bergner (2006), pointed out that conceptual knowledge is needed to understand problems, adapt known strategies to solve original problems, and generate new strategies. In science, technology, and engineering, the term conceptual knowledge relates to understanding broad concepts such as force, momentum, energy, waves, feedback, amplification, noise, or oscillations, and how these concepts or phenomena appear in different fields such as mechanics, electronics, or electromagnetics.

•
Metacognitive knowledge. Metacognitive knowledge relates to awareness of one's own cognition and particular cognitive processes. It is strategic or reflective knowledge of how to go about solving problems and cognitive tasks and includes contextual and conditional knowledge and knowledge of self. Pintrich (2002) distinguished three types of metacognitive knowledge: strategic knowledge of general strategies to learn, think, and problem solve; knowledge about cognitive tasks and awareness that different tasks can be more or less difficult and may require different cognitive strategies; and self-knowledge of one's strengths and weaknesses.

Taxonomy of Problem Solving in Engineering
Problem solving, which unquestionably lies at the heart of science, engineering, and technology, is one of the most complex intellectual functions. Although psychologists have studied the nature of human problem solving over the past century, the term has remained rather ambiguous. Questions often discussed regarding problem solving include: What characterizes a good problem solver? To what extent can people learn problem-solving methods and improve their competencies in this regard? How can we distinguish "simple" or "easy" problems from "complex" or "difficult" problems? We describe this topic by presenting the Problem-Solving Taxonomy (PST) proposed by Plants, Dean, Sears, and Venable (1980) more than 35 years ago in a book on teaching engineering. Several studies on engineering education (Heywood, 2018;Waks & Barak, 1988;Waks & Sabag, 2004;Wankat & Oreovicz, 1993;Yokomoto & Rizkalla, 2002) used or mentioned Plants et al.'s (1980) book, which includes five complexity levels of assignments or activates in learning engineering and technology: routine, diagnosis, strategy, interpretation, and generation. Following are definitions of these levels in the PST and examples of related activities.
• Routine. Problems at this level are those that afford little opportunity for decision but proceed by simple or complex steps to a unique solution. For example, solving a quadric equation, finding the current in an electrical circuit, or calculating the energy required to heat an iron mass from one temperature to another are ordinarily routine problems for learners of these subjects. Solving routine problems often requires the use of factual and procedural knowledge mentioned earlier in this section.
• Diagnosis. Problems at this level often require selecting the correct routine or routines to solve a particular problem. Learners need to identify the problem type or characteristics and the solution method. In mechanics, for example, deciding on the flexure formula to determine stress at a given point in a beam is diagnosis. In electronics, designing the parameters of a high-pass filter for a given frequency range can be considered a diagnosis problem. In both cases, there is a reasonable solution that designers are likely to suggest, and they have to find it.
• Strategy. This level necessitates the problem solver choose a particular routine or routines that may be treated in many ways, all of which are known to the learner. It has to do with optimizing either the problem-solving method or the result; for example, choosing between solving a problem based on energy conservation or momentum considerations. Solving a mechanical system that includes masses, forces, friction, constant speed, or acceleration is often at the strategy level because the solver must design the steps to reach a solution.
• Interpretation. Interpretation consists of solving real-world, open-ended problems. For example, designing a solar energy for a greenhouse requires considering many aspects in physics and engineering, such as lighting, air-conditioning, or irrigation systems. At this level, solving problems often involves using factual, procedural, and conceptual knowledge, as discussed in the previous section.
• Generation. Generation implies solving problems that require developing routines or methods new to the problem solver and bringing together previously unrelated ideas to spark a new attack on a problem in a way never before learned. This description of problem solving aligns with the definition of creative thinking as the ability to produce work that is both novel (original, unexpected, imaginative) and appropriate (useful, adaptive regarding task constraints; Guilford, 1967;Simonton, 2000;Sternberg & Lubart, 1996). Plants et al. (1980, p. 23) wrote: By focusing on groups of behaviors leading to a particular outcome, rather than on individual behaviors, the problem-solving taxonomy (PST) cuts across Bloom's taxonomy and groups behaviors as they occur in the solution of problem. For instance, Diagnosis, an activity in PST, may combine knowledge, comprehension, and application as identified by Bloom. In addition, each activity may combine factual knowledge, procedural knowledge, or conceptual knowledge, as described earlier in this paper. Teodorescu, Bennhold and Feldman (2008) noted that physics education studies have made great efforts in recent years in adopting important findings from expert-novice literature and cognitive science. Asking students to solve high-level thinking problems can help them become more expert-like problem-solvers. To achieve this goal, educators need to understand the relationship between the problems themselves, and the thinking processes and knowledge that they involve. To this end, these authors proposed the Taxonomy of Introductory Physics Problems (TIPP), which combines:

•
The Cognitive domain, including four levels of cognitive processes: Retrieval, Comprehension, Analysis and Knowledge utilization

•
The Knowledge domain, including two levels: Declarative knowledge and Procedural knowledge Later, Teodorescu, Bennhold, Feldman and Medsker (2013) proposed the New Taxonomy of Educational Objectives (NTEO), which includes the TIPP model mentioned above plus aspects of the self-esteem and metacognitive system. This work was derived from the Taxonomy of Educational Objectives developed by Marzano and Kendall (2007).

The Structure of the Observed Learning Outcome (SOLO) Taxonomy
Another taxonomy that is of interest to the present study is the SOLO taxonomy (Bigges and Collis, 1982), which stands for the Structure of the Observed Learning Outcome. This taxonomy classifies learning outcomes in terms of their complexity, enabling educators to assess students' work quality and not how many 'bits' they have got right. Hook and Mills (2011) describe the five levels of the SOLO taxonomy as follows: -Prestructural level -the task is inappropriately attacked, and the student has missed the point or needs help to start.
-Unistructural level -one aspect of the task is picked up, and student understanding is disconnected and limited.

/ 16
-Multistructural level -several aspects of the task are known but their relationships to each other and the whole are missed. - Relational level -the aspects are linked and integrated, and contribute to a deeper and more coherent understanding of the whole.

-
Extended abstract level -a new understanding at the relational level of the task is presented, and the learner uses this for prediction, generalization, reflection, or creation of new understanding.

Rationale for Developing the Physics Problem-Solving Taxonomy (PPST)
O'Neill and Murphy (2010) note that learning taxonomies are classification tools for describing different kinds of learning behaviors and characteristics that we wish our students to develop, to identify different stages of learning development, or to determine the appropriateness of learning outcomes for particular module levels within our programs. However, we can regard these taxonomies as educational aids or heuristics rather than precise tools for teaching a specific curriculum.
We found it necessary to develop the Physics Problem-Solving Taxonomy (PPST) discussed in this study because our experience showed that novice physics teachers, very experienced ones or even official pedagogic instructors for teaching physics require a tool to gauge difficult levels of a physics problem and design physics problems of different difficulty levels.
Although the various taxonomies reviewed in the above section certainly involved the development of the new taxonomy, PPST was designed to fit as much as possible the specific needs of high school physics teachers. For example, PPST is unique in that it addresses both conceptual understanding and procedural knowledge in physics problem solving, as required in teaching physics today. Throughout the paper, we will refer to the validity of the taxonomy in terms of its suitability to teaching high school physics, and the reliability of PPST, that is, to what extent the results of applying the taxonomy is repeated in similar contexts.

RESEARCH OBJECTIVES AND METHOD
The objective of this study was to develop a problem-solving taxonomy for teaching high-school physics and explore its effectiveness in the context of national-and school-level physics exams. To achieve this end, the research methodology included the following stages: 1.
Develop the PPST as a tool for curriculum design, teaching and evaluation of student achievement on the high-school physics class, in collaboration with academic experts and senior high-school physics teachers.

2.
Apply PPST for analyzing the content of the Israeli national physics exams (the Bagrut), which consists of 10 questions (49 sub-items) in the subjects of mechanics, electricity, and magnetics.

3.
Explore a possible correlation between students' scores (n = 18,000) in the Bagrut exam sub-items and the items' ranks on the PPST scale.

4.
Develop a physics multilevel thinking exam for high schools in collaboration with 14 experienced teachers, deliver the exam to eight high-school classes (n = 164), and analyze student achievements in the exam items according to the PPST scale.

5.
Compare student achievement in the physics Bagrut exam and the multilevel thinking exam developed in the present study. Stages 1-3 described above have to do with checking the validity of PPST by examining whether the ranking of the questions in the matriculation exam in physics according to the taxonomy is correlated with the student scores in this exam, which is considered a tool of high validity and reliability in the Israeli education system. Stages 4-5 presented above also contribute to exploring the validity of PPST by trying to apply its direct work with schoolteachers. These stages also enable observing the reliability of the taxonomy by comparing its findings in the school context to those obtained by analyzing the scores in the national Bagrut exam (repeated measure in a different context).

Developing the PPST
We developed the PPST as a tool for teachers to design problems, questions, and exercises for students in the physics class. The PPST was derived from taxonomies discussed earlier (e.g., Anderson & Krathwohl, 2001;Barak, 2013;Bloom & Krathwohl, 1956;Plants, Dean, Sears, and Venable, 1980) and adapted to the special context of teaching physics. It consists of five levels, as illustrated in Figure 1. Following are examples of the thinking processes required of students in solving problems at each level of the PPST.

Problems at the Retrieval Level
At the retrieval level, students are required to cope with questions that are very close to what they have already solved many times. They have to identify the problem, retrieve a close solution from memory, and adapt it to the given case. Figure 2 is an example of a question at the retrieval level. We can assume that the students have already solved many questions asking them to mark the gravity force and normal force acting on bodies and show that ΣF = 0. Therefore, they can answer the question in Figure 2 easily.

Problems at the Diagnosis Level
At the diagnosis level, students are required to cope with questions that are close to ones they already know but include certain changes, such as the way the question is presented, existing data, or required problem-solving processes. Figure 3 presents an example question to calculate the acceleration of a body moving on an inclined plane with no friction.   In the question presented in Figure 3, a student must calculate the gravity force FG acting on mass m, the force F1 acting opposite F, the equivalent force FT = F-F1, and the mass acceleration α = FT/m. Students often encounter questions similar to this example in physics class, homework, or textbooks. In each case, they must diagnose the system structure, identify the given parameters such as masses and forces existing in the system, and apply a sequence of steps to solve the problem.

Problems at the Strategy Level
At the strategy level, the student must process complex relationships by making a decision about finding a solution, such as a verbal solution, a graphical presentation of the solution, or a presentation using an algebraic calculation. Sometimes the student must choose a method to solve the problem, for example, dividing the solution into stages, dividing a subsystem into its components, radicalizing, or comparing with a borderline value. Following is an example of a question at the strategy level. Figure 4 shows a mass M1=1 Kg that slides on an inclined plane of angle α =17 O . The mass is connected to a cord that passes over a pulley and carries a number of weights. The mass of a single weight is 100 g. The coefficient of static friction between the box and the surface is us = 0.35. The mass of the cord is negligible. Find the number of weights to be hung on the wire so that the box will not start moving up or down the slope.
To solve the problem presented in Figure 4, a student must identify all forces operating in the system, break down the forces on the mass M1, find an expression for the equivalent force, and draw the answer. The learner must identify that the mass can move either up or down and analyze the forces acting on the system in each case. Because the learner is required to make a decision in a series of stages to solve the problem, the problem is ranked at the strategy level on the PPST scale.

Problems at the Conceptual Level
In a problem at the conceptual level, a learner is required to solve a problem by explicit reference to one or several physical principles, as demonstrated in the following example, Figure 5.   Solving the question presented in Figure 5 requires the learner to analyze a compound system composed of the trolley and the weight (a pendulum) and understand the cases of velocity and acceleration the graph shows. A learner can answer the question only by relying on physics principles in kinematics (motion properties: position, velocity, acceleration) dynamics (the rules according to which a body moves in these forms). This example demonstrates the frequent case of a qualitative question that students can answer only if they understand physics concepts well and can integrate a number of concepts into answering the question. Solving a problem on the Conceptual level in PPST is required for a learner to suggest a solution on the Relational level according to the SOLO taxonomy mentioned above. At this level, the learner has to address and integrate a range of aspects related to the problem to come to a deeper and more coherent understanding of the whole.
Since ranking or creating physics questions at the conceptual level is often a challenge for teachers, we show below a second example of a physics question at this level on the subject of electricity and magnetism (from the physics Bagrut exam, 2009). The scale shows N1 so N1<mg.

a.
Draw a diagram of the forces acting on rod PS. b.
The direction of the current in the rod PS is from P to S. What is the direction of the current that passes through the RQ rod: from R to Q or from Q to R? Explain your answer. c.
Write an expression for calculating the distance d between the two rods using the parameters I, I1, m, L and N1. Use physical constants, if needed. d. It is given that I1 > 4I. Point A is in the plane of the rods, and the magnetic field strength at this point is zero.
(1) Is point A between the PS and RQ rods above the rod RQ or under the rod PS? Explain your answer.
(2) Express the distance between point A and RQ by d. e.
The rod PS is replaced by a magnetized rod (with similar dimensions). Can the RQ conductor rod, through which a current flows, exert force on the permanent magnet? Explain your answer.
Items a and b in the question relating to Figure 6 could be ranked at the diagnosis level because students often cope with similar questions in learning the mechanics chapter in physics. Items c, d and e, however, are on the conceptual level because students must understand profoundly the electromagnetic phenomena relating to a wire carrying current and the interaction between a pair of wires carrying current. A student might remember that two wires carrying a current in the same direction attract each other, but this will not be enough for him/her to solve these questions.

Problems at the Creative-Thinking Level
Some problems require a student to invent a completely new solution-a path he or she has not encountered in the past. This might entail applying laws from other fields in a new way or using knowledge learned in one topic to solve a problem in another topic and in a new way. The following example, Figure 7, demonstrates questions at this level. The question in Figure 7 corresponds to the creative-thinking level because it is very unusual to analyze the acceleration, speed, and distance of a body in reference to another one. The student must develop a solution and draw a graph new for him or her. Questions of this sort appear frequently in science competitions for excellent students but less often in regular science classes in schools.
In summary, the examples presented in this section demonstrate that the PPST can help identify or design problems at different difficulty levels for the physics class. The PPST is unique in that a level assigned to a specific problem or question relates to learners' previous experiences solving similar problems. The more experienced the learner is in solving similar problems, the lower the task or question ranks on the PPST. Consequently, the PPST can rank the same question low for one class and high for another.

APPLYING THE PPST TO ANALYZE THE ISRAELI NATIONAL PHYSICS EXAM (THE BAGRUT)
In Israel, all high-school students take a range of final national matriculation exams (the Bagrut) in compulsory subjects such as Hebrew, English, and mathematics, as well as elective subjects such as physics, biology, or electronics. The Bagrut exam in physics is considered one of the most important and difficult exams because it is the key for higher education enrollment in subjects such as medicine or engineering.
In this study, the researchers and a colleague, a very experienced physics teacher with a PhD in science teaching, analyzed the thinking level of the questions in the physics Bagrut exam from one year in the chapters of "Mechanics" and "Electricity and Magnetics." These chapters included 10 questions, each having four to six subsections (a total of 49 items).
The two researchers ranked each item in the exam independently and then compared the results. In cases of discrepancy, they conducted a second review and consulted with a third experienced physics teacher. Figure 8 is an example of a question from the Bagrut physics exam. A box with a mass of 5 kg moves along a straight line on a rough horizontal surface in the positive direction of the X-axis. The coefficient of kinetic friction between the box and the surface is uk = 0.1. Figure 7 shows the kinetic energy Ek versus the distance X.
a. In the first 20 meters of the box's movement, is there another force acting on the box in addition to the friction force? Explain your answer. b. During the movement of the box from point x = 20 m to point x = 50 m, a constant horizontal force F is applied to the box in addition to the friction force. Calculate the force F. Figure 8 is especially difficult because the kinetic energy Ek is described versus the distance X and not versus the time t, as is customary. To solve the problem, a learner is required to understand the given system, correctly read the graph, identify the physical concepts in kinetics and dynamics upon which the solution is based, write the relevant formulas, and use algebra to find the answer. Therefore, this question was ranked at the strategy level on the PPST scale.

The question presented in
The composition of the entire exam (49 items) according to the PPST scale is presented in Table 1. Table 1 shows that items of the higher-thinking levels of strategy, conceptual, and creative-thinking comprised 67% of the total score on the Mechanics exam (16 of 24 items) and 48% of the total score on the Electricity and Magnetics exam (12 of 25 items). On the Mechanics exam, 8 of 24 items related to conceptual thinking. This picture reflects the need for teachers to stress solving problems and questions at the conceptual level, rather than teaching procedures to solve physics problems.
In summary, some questions on the Israeli Bagrut physics exam might be quite complicated for students. The PPST could help educators identify the difficulty level of each question in the Bagrut exam or questions from a physics book and other resources, design questions of different levels for their class and teach students how to cope with these questions.

EXPLORING STUDENT ACHIEVEMENTS ON THE BAGRUT PHYSICS EXAM ACCORDING TO PPST
The Israeli national Bagrut exams in physics, as in all subjects learned in high school, are prepared by experts in each field. Experienced teachers check students' answers according to a detailed scale, and two independent reviewers calculate the final score for each item as a mean score. These procedures afford high reliability and validity to the exams. In the present study, we received the final scores of 18,000 students for each item (subquestion) in the physics exams in Mechanics and in Electricity and Magnetics for a specific year. As previously noted, the two exams included 49 items, which we ranked on the PPST scale.  To examine the validity of this ranking, we calculated the Spearman's rho correlation coefficient between student scores on the test items, which is a continuous ratio variable of normal distribution in the range of 0-100, and the PPST ranks of the items, which is an ordinal variable in the range of 1-5. In such a case, Spearmen coefficient was obtained by first converting the scores (continuous data) to an ordinal scale and then calculating the Pearson coefficient between the two sets of ordinal data. The result of Spearman rho = -0.7033 indicates a strong negative correlation between the two variables. The negative sign of rho indicates that students' scores were high on "easy" items (ranked low on the PPST) and low on "difficult" items (ranked high on the PPST). This outcome is presented in Table 2 and Figure 9.
Table 2 and Figure 9 show the mean scores of items at the strategy and conceptual levels were 64.44% and 57.69%, respectively. These scores indicate that about half of the students faced difficulties solving problems at these levels. Data on the creative-thinking level are not presented because they were marginal to the present study. Similar findings were also obtained in a school exam, as shown in the following section.

Developing the Physics Multilevel Thinking Exam
With the objective of examining student achievements in solving physics problems at different thinking levels, we developed the physics multilevel thinking exam, comprised of 12 questions in mechanics taken from common physics books and other learning materials. The items were designed to meet four levels defined in the PPST: retrieval, diagnosis, strategy and conceptual. An advanced draft of the exam was presented to a group of 14 experienced teachers who serve as regional Ministry of Education supervisors for teaching physics. They worked in small groups of two to three participants to check the exam item contents and the grading of each item on the PPST scale. The participants upgraded several items and suggested including two questions in the exam relating to students' metacognition in learning physics instead of items on creative thinking, which were found to be less relevant to teaching physics. In summary, the participants ranked the questions on the PPST after a thorough discussion, without significant difficulties or arguments that were important to document.
The final version of the physics multilevel thinking exam included items on four PPST levels-two retrievallevel items, three diagnosis-level items, three strategy-level items, and four conceptual-level items-as well as feedback on two metacognition items. These items exam were similar to those in the Bagrut exam but, this time, the researchers formulated the exam with the teachers.

Student Achievement on the Physics Multilevel Thinking Exam
The physics multilevel thinking exam was given to eight 11th-grade classes (total 164 students). To ensure reliability of the findings, the researcher and three class teachers independently checked all exams for four classes and compared the results. For the remaining classes, only the researcher checked the exams. Table 3 presents the mean scores of all students who took the exams.
In Table 3, the mean scores for items at the retrieval level (Items 1, 2) are 82.2% and 86.15%, whereas the mean scores for items at the conceptual level (Items 3b, 8b, 9, 10) range from 37.35% to 64.44%. These findings indicate students' difficulties in coping with items ranked at high PPST levels. The mean scores for the items in the four categories are retrieval, 84.18%; diagnosis, 57.78%; strategy, 53.59%; and conceptual, 50.38%.

Comparing Student Achievement on the Bagrut Physics Exam to the Multilevel Thinking Exam
In this study, the PPST was developed and applied to analyze achievements of high-school students majoring in physics within two contexts: (1) the Bagrut physics exam, composed and checked by experts under the responsibility of the Ministry of Education (n = 18,000) and (2) the physics multilevel thinking exam, developed and checked within the framework of the present study (n = 164).
We found it interesting to compare the students' scores on these exams in relation to the four main PPST-scale categories. The findings of this comparison, based on data presented in Tables 2 and 3, are shown in Figure 10.  Figure 10 shows very similar distribution of students' scores in the four categories on the PPST scale on the two independent exams. In both cases, it is clear that the higher an item ranks on the PPST scale, the lower the students' grade on the item. This indicates the validity of the PPST scale. The correlation between the scores in the four categories of the two exams is r = 0.99, indicating the high reliability of the PPST tool.

DISCUSSION
The PPST developed in this study draws mainly on three taxonomies, as illustrated in Figure 11: the revised version of Bloom's taxonomy; the Types of Knowledge taxonomy; and the Problem-Solving Taxonomy (PST) in engineering (Anderson & Krathwohl, 2001;Krathwohl, 2002;Plants et al. (1980). Other taxonomies also make a contribution to PPST.  Although Bloom's taxonomy, either in its original or revised version, is very famous, it is used very little in the physics class. As we mentioned in the literature review, teachers often find it difficult to agree whether a specific physics question belongs to the level of remembering, understanding, applying, analyzing, evaluating, or creating in Bloom's revised taxonomy. In contrast, it could be much easier to rank a physics question on the PPST scale, including levels of retrieve, analysis, strategy or conceptual knowledge because these PPST levels have been defined especially for physics teaching. For example, as demonstrated earlier in this paper, the strategy level has to do with questions that can be solved using more than one approach, and the conceptual knowledge level refers to questions in which the learner needs to explain the scientific concepts on which a solution is based.
The Types of Knowledge Taxonomy presented as the knowledge domain of Bloom's revised taxonomy (Anderson & Krathwohl, 2001;Krathwohl, 2002) describes four types of knowledge in learning a subject: factual, procedural, conceptual and metacognitive. This taxonomy complements Bloom's taxonomy in the cognitive domain, but still relates only little to learners' previous experiences and familiarity with solving problems in a given subject. Obviously, a certain problem could be easy for a learner who is experienced in solving similar problems, but difficult for another learner who encounters the problem or question for the first time. To address this issue, we applied PPST using the Problem-Solving Taxonomy (PST) (Plants et al., 1980), which describes five problemcomplexity levels with which a learner copes: routine, diagnosis, strategy, interpretation and generation. The PST relates not only to problem content, but also to the learners' experience in solving similar problems.
The development of PPST also borrowed from the work of Teodorescu, Bennhold and Feldman (2008), who suggested the Taxonomy of Introductory Physics Problems (TIPP). These researchers also aimed at categorizing physics problems in the cognitive process and knowledge domain, which is part of PPST. We also learned from the New Taxonomy of Educational Objectives (NATO) (Teodorescu, Bennhold, Feldman, & Medsker, 2013), which added to TIPP the categories of metacognitive system and self-system. As previously mentioned, some of the experienced teachers who worked on the present study suggested including the category of metacognition of physics problem solving as the highest level of PPST, instead of creative thinking, which was found to be less relevant to the high school physics class. In addition, it is worth mentioning that the Structure of the Observed Learning Outcome (SOLO) taxonomy (Bigges & Collis, 1982;Hook & Mills, 2011) also has a parallel point to PPST. Solving a problem on the Conceptual level according to PPST is quite analogous to the Relational level according to SOLO. In both cases, the learner has to address a range of aspects related to a problem and come to a deep understanding of the system or phenomenon under discussion. Importantly, the PPST was developed in collaboration with academic experts and 14 experienced teachers and instructors from the Ministry of Education. These partners took part in defining the categories of the PPST, analyzing the findings from the national Bagrut physics exam, and applying the PPST to develop and evaluate the physics multilevel thinking exam. Therefore, it is not surprising that the PPST was introduced in many in-service courses for physics teachers, including programs to train engineers from high-tech industries as physics teachers.

CONCLUDING REMARKS
Educators often consider teaching high-school physics as one of the most challenging issues in school. We hope that the PPST introduced in this study will help curriculum developers and teachers carefully design the types and difficulty levels of problems presented to students and accordingly improve learners' ability to deal with problem solving based on a conceptual understanding of physics rather than the procedural solutions of exercises. However, it is important to note that taxonomies of all types, including the PPST, are just educational aids or heuristics, rather than precise tools for teaching a specific curriculum. An excellent teacher is the one who naturally combines the ideas behind the taxonomy with classroom teaching.
A limitation of the present study is the issue of fostering and evaluating creativity in the physics class. The original version of PPST, derived from the literature, included creative thinking as the highest level of the PPST scale. However, we did not present data on creative thinking because it is difficult to address creativity in physics classes, which are absolutely committed to teaching obligatory subject matter for the matriculation exam. More work is required to address aspects of creativity and metacognition in the physics class and PPST.