Are Concept Maps a Valid Measurement Tool for Conceptual Learning? A Cross-case Study

The approaches of “problem-based learning” and “writing to learn” are known for facilitating the apprehension of concepts and better retaining of knowledge. In educational research, concept maps are sometimes used to assess the learners’ level of knowledge. In this paper, the main aim is to investigate the validity of concept maps as an instrument for the assessment of learning. Therefore, six students were observed for more than a year and their learning process was documented in various ways. The concept maps were used in the form of a pre-post-test, and the different students’ results were compared in a cross-case analysis using a master concept map. The results presented in this study indicate that the validity of concept maps compared to interviews and reports are questionable. It is possible to measure some parts of the learning process with concept maps, but conceptual learning seems to be hidden from the instrument. Therefore, concept maps might not be the most useful tool to measure conceptual change.


INTRODUCTION
Mind maps and concept maps have been widely employed in educational research and in science education in the past thirty years. Their use as a learning tool has been evaluated and tested in various settings (Ruiz-Primo & Shavelson, 1996). They are also used as an instrument to measure conceptual understanding for a wide range of subjects. This measurement is based on the hypothesis that a better understanding leads to a more complex and more structured map. According to Chi, Glaser, and Farr (2014), experts in a field construct a considerably more extensive and denser map than novices.
In the majority of the existing literature the validity of concept maps is not questioned. Additionally, there are studies to support the hypothesis from above with empirical evidence. However, this assumption should be corroborated (Ruiz-Primo & Shavelson, 1996). For that reason, this study aims to respond to this main research question: Are concept maps a valid instrument to measure conceptual learning?
As part of a bigger project on students' conceptions and learning processes about radiation, concept maps were used as a measurement tool. During the evaluation process, we made the observation that the results from the concept map evaluation were not congruent with those of the other instruments. Our findings show that the assumption of changing concept maps only works for some parts of the learning process. There might be an observable change in the mind maps before and after learning about a particular subject. After more detailed analysis of our results, a different picture on the validity of concept maps as a learning assessment tool can be drawn. Most students in the current study were not able to improve their conceptual understanding about radiation even when they showed improvement of their mind map and vice versa. Therefore, drawing a concept map may not be a valid instrument for measurement of conceptual change. According to the results presented in this study, mind maps seem to be more valid in measuring the change of surface knowledge (surface learning) as compared to deeper changes in conceptual understanding.

/ 22
In this paper, the results of a cross case comparison are presented. Therefore, the setting of the study, the treatment for the students and the cases are described before the results of the study are delineated.

THEORETICAL PERSPECTIVE
This section is divided in two different parts. The first part focuses on mind maps and concept maps and their different use in the literature. As a conclusion, the gap in the existing literature concerning the validity of concept maps is pointed out. The research question is based on this gap and discussed in the following section. In the second part, the theoretical framework for the intervention used in this study is described. An overview over the existing literature on writing to learn (WTL) and on problem-based learning (PBL) is provided and conclusions on the effect of the intervention based on the literature are drawn.

Mind-and Concept Maps as Measurement Tool
According to the literature, mind and concept maps are used in a wide variety of ways: as a learning tool for better comprehension in lessons, as a research tool in qualitative studies and as a measurement tool for knowledge. Nesbit and Adesope (2006) found in their meta-analysis that concept maps appeared in over 500 peer-reviewed articles in education and psychology. Concept maps are first described and used by Novak and Gowin (1984), and described extensively by Novak (2010). According to Novak (2010), they consist of two different parts, i.e. concept and proposition. Concept is defined as "a perceived regularity in events or objects, or records of events or objects, designated by a label" (Novak & Cañas, 2008, p. 1), with the label referring to one or more words. Those concepts are connected by propositions. These are described as "statements about some object or event in the universe, either naturally occurring or constructed […] contain [ing] two or more concepts connected using linking words or phrases to form a meaningful statement." (Novak & Cañas, 2008, p. 1) Novak and Cañas (2008) also described a scoring system for concept maps that was simplified by Thompson and Mintzes (2002).
Mind maps were first introduced as a possibility to improve learning by . The main difference between mind maps and concept maps is the structure of the map. Concept maps should include a hierarchy and the connections between the different nodes describe different relations between the nodes. Mind maps are well known in education by students and teachers. They are often used to structure a brainstorming process. There was no literature found that used them as a measurement tool. Hence, we focus on concept maps.
Concept maps are frequently used in different fields of science education, especially in biology. Edmondson (2000) pointed out that concept maps are a more effective to reveal different dimensions of students' thinking than traditional approaches, but that there should be a focus on the validity and reliability of this instrument. The first argument is also claimed by  in the following statement: "Construct-a-map scores most accurately reflected the differences across students' knowledge structure." (p. 275). Llinás, Macías, and Márquez (2018) strengthen this point in their paper. They argue that concept maps are generative and not simply responsive, and that the students are required "to understand content with precision and to express that understanding explicitly. " (Llinás et al., 2018, p. 3). According to Liu (2013) concept maps assess how concepts and relations are organized and the construction of those maps enable the students to express their holistic view on a given topic.

Contribution of this paper to the literature
• In the past 30 years, concept maps and mind maps were widely used in different research areas. Only a small fraction of studies focused on the validity of those instruments to measure the learning process. Hence, this study expands the discussion on the topic to further make contribution to the existing body of knowledge on the subject.
• By comparing how both concept maps and other data sources (interview, reports) measure learning, it was possible to gain insights that contradict the initial hypotheses. There are validity concerns raised by the result of this study in connection to concept maps.
• Concept maps are helpful instruments for researchers and teachers. Both groups should be aware of the problem that the validity of this instrument is doubtful.
• A master concept map for the topic of 'radiation' was developed.

EURASIA J Math Sci and Tech Ed
3 / 22 Mintzes, Wandersee, and Novak (2001) mention concept maps as one tool to measure knowledge of students ranging in age from seven to senior students. They emphasize that it is mandatory to teach the creating of concept maps before using them in class. With this premise, they highly recommend the use of this tool and refer to it as the "most powerful assessment strategy" (Mintzes et al., 2001). In addition, Conradty and Bogner (2012) highlight the fact that "concept maps very likely are capable of representing students' knowledge" (p.351). Ruiz-Primo (2004) supports this point of view writing that "concept maps scores can consistently rank students relative to one another and provide a good estimate of a student's level of performance, independently of how well their classmates performed" (p. 5). However, in the same paper she wrote that "there are still some issues that need to be resolved before we can conclude that they can reliably and validly evaluate students' connected understanding" (Ruiz-Primo, 2004, p. 7).
Although Ruiz-Primo and Shavelson (1996) addressed the problem of the reliability and the validity of concept maps, they emphasize the great variety of maps and their different use as an assessment tool. Schecker and Klieme (2000) were also concerned about the validity of concept maps compared to other methods, as they found a big dependence of the results on the task formats and the scoring system. This point was also addressed before by Pearsall, Skipper, and Mintzes (1997) who cite a large number of studies examining the reliability and validity of concept maps.
So what do we know about the validity and the reliability of concept maps reviewing the existing literature? Hollenbeck, Twyman, and Tindal (2006) reported in their study a low correlation between student generated concept maps and problem-solving-essays. Himangshu and Cassata-Widera (2010, p. 63) investigated the problem of reliability further and found a connection between the given task and the reliability of the results (see also Figure  1): "The more directed the task the easier the assessment is to grade and thus reliability and task directedness increase proportionally" (Himangshu & Cassata-Widera, 2010, p. 59). Hahn-Laudenberg (2017) argues that the problem of the small correlations between concept map scores and other traditional formats (multiple choice, essays, open tasks, …) emerge due to the weakness of the traditional instruments with regard to measuring conceptual understanding. Graf (2014) and Aguiar, Lannes, Garcia, and Ferreira (2014) also support this claim.
There seems to be a disagreement in the literature concerning the validity of concept maps. Krabbe (2014) concluded in his book chapter that "it is not remarkable that results in research concerning reliability and validity of concept mapping are not consistent (amongst others see McClure, Sonak, and Suen (1999), , Ruiz-Primo and Shavelson (1996))". In an earlier study Ley, Krabbe, and Fischer (2012) found a "slight positive connection between the two instruments -competence test and concept maps" (p. 6). Referring to the dependence of the connection to the kind of concept map and scoring system, they concluded that the concept maps were representations of competences of the basic concept "energy". Llinás et al. (2018) tackled the problem from another angle. They tried to find a better way on scoring a concept map to obtain comparable results between concept maps and traditional multiple-choice test. They assumed that the problem is not the general validity itself but the scoring of the maps. In the end they concluded that "the results presented in this work suggest that we are on the right track to obtain an objective and easy-to-use tool to measure a student's conceptual understanding of a particular topic." (p. 12) In contradiction to those studies, Ozdemir (2005) found no correlation between the score on the concept map and the score from multiple-choice tests. He investigated math pupils and found a connection between their marks on traditional tests and their score on concept maps. This connection was also found by other studies (Åhlberg and Ahoranta (2008) or Ciliberti and Galagovsky (1999)). İngeç (2009) strengthens the point made by Ozdemir (2005) in his study about the knowledge of pre-service teachers. He found weak correlation between tests and concept maps, but states that the pre-service teachers had knowledge, but were not able to establish the necessary relationships between the concepts in their concept maps.
Reviewing the existing literature, research exists for both the argument that concept maps are a valid measurement tool and also that this is not the case. Hence, this study strives to make a contribution to the existing body of literature by providing additional input on the validity of concept maps.

Theoretical Framework for the Intervention
The learning environment in which this study was conducted consisted of a small research project that the students had to carry out on their own and is described in detail in the research design section. This intervention is set within the framework of two different theoretical approaches: an inquiry point of view and a "write to learn" point of view. As shown in Figure 2 the two approaches impact the intervention and are the theoretical framework for this part of the study. Both offer broad descriptions, forms and varieties and are deeply ingrained in a constructivist theory of learning, which indicates that they should therefore work quite well together. With regard to the aforementioned findings, a consensus throughout all studies can be seen. PBL/IBL and WTL are suitable arrangements to increase students' learning. Combining these learning arrangements should lead to an overall increase in content knowledge. This is a crucial hypothesis to this study because the validation of concept maps can only be done if the intervention generates a difference in the content knowledge. In the following section, a brief overview of the existing literature for both learning approaches is provided.

Inquiry-based learning
Numerous different definitions for the term inquiry learning exist, ranging from authentic pedagogy to projector problem-based learning (PBL). All those definitions have several points in common: The students deal with authentic problems, issues or questions, work collaboratively and present their results or their newly gained knowledge (Busan, 1974). Therefore, the students work in an inquiry-learning context to discover fundamental and known principles of science. So, the focus of this brief overview will be on literature for PBL. Following the criteria formulated by Friesen and Scott (2013), the task of conducting a small research project corresponds to PBL. First, Figure 2. Visualization of the interplay between the different theoretical approaches the students are responsible for their own learning. Second, they work on an ill-structured problem and can or should integrate different subjects into their work. Third, students are working on a valuable real-world problem. Thus, the students' results are possibly new and might not exist in the literature prior to these research projects, which could mean that the students see significance in their work beyond the school context.
Looking at the literature focusing on the effects of PBL there are numerous studies, showing a significant effect of this educational strategy on students' learning outcome. Savery (2006), Holm (2011) and Thomas (2000) reviewed the body of PBL-literature. Overall, they certify the positive effect of PBL on the learning outcome. They also identify different conditions that are necessary to construct meaningful PBL. As one critical point, the effect of different scaffolding conditions is stated. The better the scaffolding is constructed or integrated in the arrangement, the better the learning outcome. Additionally, as Barron and Darling-Hammond (2008) pointed out in their metaanalysis, inquiry-based science teaching has a higher impact on the students' learning if there are teacher-led activities included. The situation in this study provided no planned scaffolding for the students and little teacher activity is included. Keeping the two results from above in mind, only a small gain in content knowledge is expected.

Writing to Learn (WTL)
This approach to learning has been well reviewed in the last decade and various studies investigated the effect on students' learning. Within the last decade, three meta-analyses investigated the effect of WTL-arrangements. All studies found small to medium positive effects on the students' learning in those arrangements. Furtak, Seidel, Iverson, and Briggs (2012) conducted a meta-analysis including 48 studies from 1994 to 2004. Their analysis points out three main findings of which the most important two are that writing to learn typically generates small effects and the length of the treatment moderates the effect (the longer the treatment the higher the impact). Bangert-Drowns, Hurley, and Wilkinson (2004) examined 26 studies and confirm the improvement of the students across different subjects (science, mathematics, social science…). The latest meta-analysis by Graham and Perin (2007) reviewed 66 studies and focused on understanding information in the written text. Most studies (94%) endorse that students' understanding of the content increases. Like in the study before, the effect was similarly independent of the subject. Each of the abovementioned studies shows a trend that WTL is an effective way of learning content. Graham and Hebert (2011) reported that most teachers in secondary school endorse the importance of reading and writing for their students. However, most science teachers do not know how to incorporate writing into their classes in a meaningful way. They follow the misconception that reading and writing belong to English lessons. According to Pearson, Moje, and Greenleaf (2010), the majority of writing tasks in the classroom are mechanical. To ensure learning through writing, it is necessary to write in a more informal way, as is done in reports or reflections. Therefore, students are not well prepared to write a text where they are expected to organize and reformulate their knowledge.

Conclusion
Reflecting the results above in connection with this study, a positive effect on students learning on radiation can be expected, after they successfully completed the intervention. Due to the fact that the students have to write a longer report over the timespan of half a year, a moderate positive effect on their learning may be expected. However, the negative impact from the students' inexperience with writing might interfere, as they receive no prior training in writing those reports.

Research Question
As mentioned above, the validity of concept maps is an open question in the field of science education. According to the literature above, a growth in the complexity of the maps and a better knowledge about the specific topic, which -in this case -is radiation, is expected. The aim of this study is to respond to the following questions: • Do concept maps depict the learning progression of the students in a valid way in comparison to interviews and reports?
• Does the change in the concept maps reflect a conceptual change process?
• Can the positive effects of WTL and PBL be confirmed with the results of this study?
A research design was set up to investigate and respond to those questions and is described in the next section.

RESEARCH DESIGN
The study was designed in a pre-post-format with a problem-based intervention. Before and after the intervention, a semi-structured interview was conducted with each learner. During these interviews the students produced the mind-maps. A detailed description of the interview setting will be given subsequently.

Description of the Intervention and the Sample
In fall 2013, six students participated in this study. At that time, these students were at the age of 17 and in their last year of their school education. There were two girls and four boys from three different schools in Vienna. Due to the fact that the students were volunteers, the sample cannot be considered random.
The intervention in this study was consisted of a task for which the students had to conduct a small research project including a final report and a presentation. Due to a change in the graduation procedure this is now mandatory for every student in Austria. According to the ministry of education, the aim of this task is to show that students have the ability and the knowledge to investigate, communicate and discuss a topic. The intervention was open in a way that the researcher was not able to control the actions of the students and their involvement in the task.
According to Banchi and Bell (2008), the task fulfils the criteria of an inquiry level 4 task. The students had control over the research question and all parameters of their research. The students were allowed to write a theoretical research paper, they could conduct experiments or investigate social questions. The report must contain between 40,000 and 60,000 characters. The students participating in this study had the advantage that they could ask the author of this paper for help with finding a research question. Therefore, they were tasked with investigating pupils' conceptions of electromagnetic radiation in their research (excluding the visible light and nuclear radiation). Within these boundaries, they were free to investigate whatever they were interested in. The author operated in a double role. On the one hand, he provided his knowledge for the students and acted as coach when the students asked for help. On the other hand, he investigated the students and their conceptions and knowledge about radiation. There was no scaffolding in a traditional way provided for the students. However, they had the opportunity to contact that author at any time during the process.
After conducting their research, the written report was evaluated by the students' teacher. This report fits into the theoretical description of a WTL-treatment. Writing the report is a long-lasting and not only a mechanical writing task. The final step for the students was the presentation of their work in front of an examination board including the school principal and various teachers.

Description of the Data Collection Process
During the whole period (from November 2013 till April 2015) different sorts of data were collected. After the first meeting with the students, the researcher wrote notes taken from his memory describing the students and their interactions and motivations. The next step was the first interview. Further details to the whole interview are provided in the next section. In the third phase of data collection, the students were on their own, investigating other pupils' conceptions. Some of the students offered the author advice on possible methods of investigation or the research question itself.
In this phase, the author put together a verbatim of the meetings and the written mails. During the writing phase, several students sent pieces of their reports asking for feedback, which was provided. In the end, the written reports of all students were collected, and the presentation was recorded on video.
The last phase consisted of an interview, which used the same procedure as the first interview, to enable a comparison between the two interviews. Knowing that the intervention in between had no given structure, the interviews were conducted before and after the students worked in their assignments. In Figure 3, an overview of the project is given.

The Interview Setting
The interview was semi-structured and took place at university of Vienna in the office of the author. The student and the interviewer were alone in the room and set beside a table. The interview always started with the question "Can you tell me everything you associate with the term 'electromagnetic radiation'?" The students then wrote down the associations with the term on paper notes. These notes were the centre of the second part of the interview.
In this second part the students were asked to arrange the words in into a concept map. The students had no training on constructing concept maps before the interview. Aim of this procedure was to get an unfiltered image of the conceptual structure. Afterwards, they explained the map to the author. This step helped to understand the connections between the content, although the students did not draw physical lines to their maps.
In the third part of the interview, the students had to explain different kinds of radiation (UV-radiation, IRradiation, X-rays, microwaves) in their own words. The mind map was still in front of them and they often referred to it. The last two questions addressed the conceptions about danger of radiation.
The second interview had the same structure as the first one to embrace the possibility of comparing the two interviews. There were no additional questions about preconceptions from the first interview. The students were not shown pictures of their first mind maps. All interviews were taped, and the second interview was filmed too to get a broader database. The interviews took roughly an hour.

Extended Mind Maps
According to the literature, the students did not produce classical concept maps, as there was no top-down hierarchy or defined relations in the maps the students made. Due to the lack of training in making concept maps, it was not possible to use this approach directly. However, the maps created by the students cannot be considered as mind maps either. There is a structure and the students explained the relations between the nodes verbally in the interview setting. Therefore, it is reasonable to refer to them as extended mind maps because the maps provide more information than regular mind maps without the complexity to learn the procedure of making a concept map.
The students created the maps with their own words. After thinking about the term "electromagnetic radiation", they wrote down their associations on paper notes. It was crucial that the notes were moveable, so they can easily be ordered in different ways. The students were free to choose an order they agreed with for their map. After creating their personal map, they were asked to explain the organization of the notes to the author.
There were two aims of the investigation linked to the map: First, the mind map was used as an assessment tool. The goal was to measure the change in the knowledge structure (Edmondson 2000). The hypothesis was that -due to an increase in knowledge -a denser, bigger and more connected map would be drawn in the course of the second interview. The assumption was driven by literature on IBL which indicates a gain in knowledge even if there is little or less guidance. The second aim was to use the mind map as an opportunity to talk about radiation. It helped the students to talk about radiation and they often referred to the map during the interview.

Case Analyses
To analyse the cases, different steps were taken to be able to answer the research question. First, we analysed the interviews and the reports written by the students using a grounded theory approach (Plotz, 2017a;Plotz & Hopf, 2016). This analysis helped us gain an insight into the learning progression of the students during their projects. In a second step, the concept maps were analysed. We have to keep in mind, that "[a]ssessing the quality of a concept map is a complex issue" (Cañas, Novak, & Reiska, 2015, p. 17) and there are a lot of different methods to evaluate the quality of concept maps. In the analysis of the maps, no scoring system like Mintzes et al. (2001), but a more holistic approach was used. Cañas et al. (2015) also endorsed this approach by writing about the importance of looking "at both the content and the structure" (p. 8) to determine the quality of a concept map.

Master Concept Map
To be able to analyse the concept maps from the students, a master concept map was created. This idea was raised by , and they also introduced a procedure to construct such a map. Following their procedure Ruiz-Primo, Schultz, et al. (2001, p. 276), different steps were taken: 1. Selection of the panel. It was composed of experts in the content domain to be tested, teachers, and the researchers or assessors. In this case there were 14 people on the panel.

Each panel participant provides a list of the most important concepts in the subject domain.
3. Two to three panel participants compared and discussed their lists of selected concepts until a consensus was reached about which are the most important concepts.
4. Each group constructed a concept map with the key concepts.
5. The author constructed a concept map with relations that appear in at least 80% of the participants' concept maps.
6. The resulting map was discussed and modified with participants until a consensus was reached about which relations should be present in the map.
The resulting concept map and the different parts can be seen in the Figures 4-8. The map encompasses four main parts: (1) the spectrum, (2) a theoretical part for students, (3) a part of experts, (4) one part with different applications. Those parts will be described in the next paragraph.

The spectrum
The spectrum is located in the centre of the concept map. This unique position is rooted in the importance of the spectrum for the understanding of electromagnetic radiation. On the one hand, the spectrum is a tool to order the different kinds of radiation using only one variable (energy, wavelength or frequency). On the other hand, the spectrum is a unifying representation of different forms of radiation.

Theoretical knowledge for school
The concepts in this part of the map are the structure vital in understanding the arrangement of the different kinds of radiation on the spectrum. It is very important to teach the link between energy and the spectrum to enable students to understand the various levels of danger linked to the different forms of radiation. This part is relatively small and there are key ideas that support this concentration to the spectrum (Plotz, 2017b).

Theoretical knowledge for experts
This large part of the master concept map contains all the concepts an expert in the field should know in order to fully grasp electromagnetic radiation. It is possible to divide this big field in two smaller ones. On the left side, there is the theoretical foundation for radiation, namely the Maxwell-equations and the electric and magnetic field. On the right side, the properties of radiation and the interdependency between radiation and matter on a very fundamental base of understanding (for example absorption should be understood on an atomic level) are depicted.

Applications
The big field on the right side of the master map includes all kinds of applications for radiation. The concepts shown in this map are nowhere near complete. There are countless applications for radiation and the students should be able to name some of them and cluster them together in an appropriate way. One crucial cluster is the one with the different sources, because all the participants of the panel agreed that students should be able to name at least some sources of radiation.

Data Analyses
First, the structure of the mind maps was interpreted and the position of the cards towards each other were analysed. In addition, the words on the cards were conceptualized and put into the context of radiation. To analyse the maps, they were compared by their structure before and after the intervention. The author also searched for patterns that fit the physical concepts within the master concept map, like the 'spectrum'. The concept maps were also compared to the master concept map in terms of concepts (content) and structure. To illustrate this process, six maps will be presented afterwards. The three cases were chosen because they represent different perception of the subject matter and the students' different learning sets.
In a second step, the results from the maps were triangulated and compared with the results from the interview and the written reports. Through the different additional data to the extended mind maps, it is possible to get an impression of the learning process that occurred during the intervention. Therefore, the interviews and the reports were analysed by means of content analysis. The data and the following analysis can be seen as a valid instrument to track a learning process. Hence, this is a possibility to validate the extended mind maps as a feasible instrument to measure learning. The hypothesis was that an increase in knowledge when measured with the interview and the report should also lead to an increase in knowledge when measured with the mind maps and vice versa.

RESULTS
To answer the research question, the results of the study are presented in a special form. First, three different cases (Lilly, Maria and Carl 1 ) are described and analysed. Afterwards, these three cases are compared to each other.

Lilly Mind Map 1
In her first map 2 (Figure 9), you can spot a sort of structure. Lilly explained this structure in her own words, as "physics must be on top, because everything below belongs to it". So it is reasonable to assume a top-down structure within her mind map. There are superordinate words on top and in going down the lines we see more specified terms. Lilly built matching and ordered categories. She placed the card with the nuclear power plant on the left side together with the term dangerous. Later in the interview Lilly mentioned that other forms of radiation, like UV rays, exist which are also dangerous. She also talks about infrared radiation and its applications, like infrared lamps. However, she never included the term UV or infrared radiation in her map. Looking to the left side of the map, there are two characteristics, non-ionizing and invisible radiation. Those two build the categories for the terms below. Lilly explains this connection thus that "every term shares these two characteristics". She does not explain the contradiction of the terms invisible and sun.
The big bulk of terms inside the red box 3 is not well structured. The words were grouped together because of their connection to non-ionizing and invisible. Connections between the terms were not explained by Lilly. It is interesting that the card with the word electrons was put in the middle of the map. Lilly explains the double role of this term such that "electrons play a role on both sides". It seems that she sees electrons as a major factor in the field of radiation.
Overall, Lilly presented bits and pieces of information about radiation in her first interview. However, she was not capable of sharing a stringent conception about radiation. The lack of knowledge about physical concepts like frequency or energy is visible. 1 Names were changed by the author. 2 The original maps were written in German and the author translated them into English. 3 Box made by the author

Lilly Mind Map 2
On the left side of the map (Figure 10), we see different forms of radiation, ranging from X-rays to infrared radiation (red box), and the terms frequency and wavelength at the top of the map. Lilly explained this as follows: "I put these terms on top, because they characterize the whole thing". In the interview, Lilly talked about this group  in the red box as different sorts of radiation. Shortly afterwards, Lilly changed the map and included light into her mind map. She did this without being asked to do so. The impulse came out of the explanation process when talking about different sorts of radiation and the connection to wavelength. Lilly explained the concept of the spectrum correctly, including wavelength and frequency. However, it is easy to see that Lilly did not get the order of the spectrum completely right. Reviewing those pieces of evidence, a clear hint to the gain of knowledge can be made. The concepts behind the map are true for the student and only small errors were found (order of the spectrum).
Comparing the mind maps from the two interviews, the incorporation of structure based on scientific definitions and knowledge in the second map is obvious. Shifting the focus to the size and the depth of the map, Lilly showed a decrease of concepts and relations between the first and the second map. In her second interview and in her report, Lilly was able to explain the different forms of radiation and had a sound understanding of physical facts. Therefore, the results from the mind map and the interview are contradictive.

Maria Mind Map 1
In Maria's mind map (Figure 11), there are several different interesting regions. First, there is a large section that is related to radioactivity (red box). In the explanation, Maria referred to Fukushima and cancer as closely related to radioactivity. She explains that this is so "because gamma-rays makes cells mutate" and that "Fukushima belongs to that because it is a broken nuclear power plant" when talking about the terms grouped in this box. On top of the map and beside the radioactivity group, there are three groups of terms. Every group matches thematically in Marias opinion. Mobile phone is linked to infrared via the data-link-port in older mobiles (yellow box). Wave and microwave are connected via the word stem (green box), because, as she puts it, "thinking of microwaves, er, they are waves". The terms UV, light and laptop are grouped because the student combined UV with light and light to laptop (blue box). She also talks about the sun in this group but did not write the term on a note. "So the sun gives us light and also UV that makes the skin turn brown." The connections between the different regions are very vague and not linked to any physical system. Electricity and the heat lamp do not fit into the system. This is easy to spot because of the different, rotated position in the map. She also talks about that, commenting that "those two do not fit to any other term". Overall, there is a loose structure in the map, which cannot be attached to a physics framework. Knowing the terms is not nearly enough to understand and explain the theoretical concept of radiation. This was confirmed in the interview, where Maria showed little knowledge about radiation. The different pieces of knowledge were often pieced together in the wrong way. For example, she explained the connection between UV and radioactivity thus: "UV comes from the sun and there are different kinds like alpha-, beta-and gammaradiation."

Maria Mind Map 2
Maria shows a significant increase in knowledge in her second mind map (Figure 12) compared to her first one (Figure 11). Maria was also able to include a proper order concerning the physical terms and concepts in her map and divides the different radiations in ionizing and non-ionizing. Conversely, Maria is not able to explain microwaves correctly. Maria struggles with the concept of transmitting energy to particles during the interview as well as in the report. According to her, "the microwaves set the particles in motion. This motion generates friction energy and the food gets warm.", what is known as a typical misconception about microwaves.
There is evidence for learning in her mind map. She classified radiation into non-ionizing (red box), ionizing (green box) and visible light (yellow box). In this structure, the visible light is a kind of bridge between ionizing and non-ionizing radiation. In the red box, she introduced a subcategory for microwaves: "The oven, the radar and the mobile application are different technical applications for microwaves." When she talked about the yellow box, she had trouble bringing the different terms together: "The sun, because sun and light and UV belong to each other. And light and photons also, but I am not sure about the photons." At the top of the map there is the blue box that contains overall concepts for radiation. In the interview Maria supports this fact: "I put up what belongs to everything…" Additionally, there are a lot of different examples for radiation, not only those that she investigated in her study. However, the physical concepts are not right in her explanations. The reason why Maria is not able to clarify the conception for microwaves is not clear, notwithstanding microwaves were the focus of her research project.
In the analysis of her report, there are some major errors concerning radiation. This is the most alarming sign, because the students had enough time to correct those mistakes in their report.
The case of Maria is the opposite to the case of Lilly. Contrary to Lilly's, Maria's mind maps show an immense increase in knowledge measured by the mind map. The problem is the contradiction of this measurement by the interview and the report. The last case presented is that of Carl. Carl was very motivated at the beginning of the project. Due to his difficulties in school, however, it appears that he shifted his focus away from the project.
In Carl's first mind map (Figure 13), an interesting structure unfolds. There are three clearly unconnected columns. He came up with a lot of terms dealing with nuclear radiation (green box), connecting his knowledge of comic books (Hulk) to the same concept as the pollution that unfolded because of the events in Fukushima. He also placed this column in the centre of his map, the most important and prominent place.
In the red box, different devices connected to radiation were mentioned. Carl labelled them as "common objects". When analysing his map, the connection between radar and mobile phones is of interest. Carl explained here that "[r]adar fits because mobile phones have a navigation system". However, he was not able to explain whether those two technologies use the same type of radiations or different ones. He was also the only student who came up with a connection to the magnetic field (left column). He connected this column to magnetism such that "Earth and compass because of magnetism." Carl was able to distinguish between the different columns, but his map lacks a sign of a structured physics concepts.
In the interview, Carl showed little knowledge about the scientific concepts connected to radiation. However, he exhibited an interesting concept of radiation in his explanations. When he talked about observing X-rays closely, he conceptualized the beam as a sequence of particles (see Figure 14). This model of "particles" was later transferred to microwaves. He also used the model to explain the heating process in the microwave oven with this model "[i]t gets warmer because the particles collide with each other".

Carl Mind Map 2
In his second map (Figure 15), Carl mentioned plenty more terms related to radiation. His structure is a little bulky and not very clear from a hierarchical point of view. However, the spectrum is clearly visible. The order of the different kinds of radiation, however, is messy and not correct.
During the interview, Carl changed the order of the different kinds of radiation and ordered them according to their dangerousness starting with the most dangerous one: radioactivity followed by UV-radiation, X-rays, microwaves, LASER and infrared radiation. This order was also false. Carl clearly knows that an order in the spectrum exists, but he is not able to produce a correct sequence. Carl also mentioned a lot of different terms representing concepts like the lack of a medium to propagate or the propagation of radiation as a wave. He mentioned in his explanation that "[o]n the left side, there are things that every type of radiation is capable of".
The comparison of the two maps shows a medium growth of the knowledge about radiation. In his second map, Carl was able to incorporate several correct concepts about radiation like "speed of light" or "not dependent on medium". On the other hand, there were false concepts in the interview and in the report such as a false value for the speed of light. Overall, Carl represents the average of the group. Although there is a light growth, there are gaps in his conceptual knowledge.

Cross Case Comparison
Comparing the different developments of the mind maps, there is a shift towards the spectrum in every mind map. The different cases show different forms of the spectrum and not every representation of the spectrum is accurate. A suitable conclusion might be that there is a gain in knowledge grounded on those results. However, looking more closely at the reports and interviews, a different picture emerges.
Carl's case is a good example for the three other boys whose ideas are not presented. His gain in knowledge can be estimated as average. Whilst their maps showed a gain in size and depth, the reports and interviews included errors made on a conceptual level. Other concepts are pointed out in a rudimentarily manner. Lilly was the top performer in the sample. Therefore, here mind map was chosen. On the other hand, there is Maria. She underperformed in her report and in the presentation and was chosen for those reasons.
The results highlight an interesting mismatch between the data from interview, written report and mind maps. According to the reports, Lilly made a good progress in different fields. She showed an increase in knowledge about radiation. The evidence lies in the fact that she is able to talk about different physical concepts in the correct way even months after finishing the writing process. Lilly was able to discuss and interpret the results of her questionnaire in a scientifically acceptable way and communicated her knowledge in the presentation. Focusing on her two maps, one sees a second mind map that gives very little indication of her increased conceptual understanding and knowledge. The map is smaller and does not include a lot of connections.
In contrast to Lilly's results, we have Maria. She shows little gain in knowledge about radiation according to the interview and the report. In the report, Maria was still not able to get all concepts about radiation right. Those false concepts were also repeated in the interview over and over again. Obviously, Maria learned something because she was able to implement a physics-related structure in her second mind map. She does not, however, use the spectrum as tool to arrange the terms in the first place, but mentioned it as a tool to structure the terms. The second map included many terms and all terms fitted into it. According to the map, a gain in knowledge is provable. Similar conclusions could be drawn for Carl. He had great lack of knowledge concerning radiation in the first interview. Although his knowledge increased, he did not show definitive signs of a conceptual change.
The comparison of the cases is also shown in Figure 16. On the x-axis, there is the diagnosed increase in knowledge based on the report and the interview. On the y-axis, the growth based on the analyses of the concept map is plotted. The axes may not be seen a continuous spectrum, but are categorized into three levels (low, medium and high). Following the initial hypothesis, the different cases should align on the 45-degree line. Surprisingly they do not! Looking at the conclusion above, the research question "Are explained mind maps a suitable and valid instrument to measure knowledge and especially the change in knowledge?" can be answered. According to our findings, this question should be negated for this study. Relying only on the extended mind maps is not enough to measure a gain in content knowledge.
The three cases that are not presented in this paper (Paul, Albert and Erich) did not differ significantly to the case of Carl. He stands as a surrogate for the three other students that are similar to Carl in the structure of their cases. According to Yin (2009), it is beneficial to only use extreme cases in a cross case comparison.

DISCUSSION AND CONCLUSION
In all investigated cases, a physics-related structure appeared in the second mind map. So, it seems that there is a gain in content knowledge when focusing on the structure alone. This first result corresponds with results presented in the literature (Edmondson, 2000;Mintzes & Quinn, 2007;Mintzes, Wandersee, & Novak, 2000;. Looking more closely at the two special cases (Lilly and Maria) of this paper, a gain in content knowledge could be estimated for both. The difference appears in the development of more deeply rooted concepts. Lilly's knowledge about radiation seams to not only has changed on the surface, that is, her basic knowledge (terms, simple connections), but also her understanding on the conceptual level. The results from the analyses of the interview and the report she wrote contain evidence that she was able to overcome her misconceptions about the topic and additionally acquired the ability to identify conceptions of other pupils. Comparing her success to the learning gain of Maria, there are three main differences in the way of treating the task that may explain the different outcomes. The first difference is the time on task. Lilly started the work on her project early in the year and finished her first report four months before the deadline. Maria started her work (making the questionnaire, finding pupils joining the project) a month before the end of the project. The second difference is the scaffolding and the help that was provided by the teachers and the author. All students were intentionally treated the same. Everyone had the same chance to ask for help. Lilly took those chances several times in different situations during her research project. Contrarily, Maria reached out for help only once at the "last minute" of the project. The third difference is related to the first. Because of the time issue, it was neither possible for Maria to revise her report nor could she consult her teacher or the author to ask for feedback. Therefore, the report was apparently her first draft.
So, to answer one of the research questions, there is no evidence that a conceptual change is also visible in the concept maps. Lilly is the prime example, with her lack of concepts in the map, but her very good understanding of the topic in the second interview compared to the first.
In a second research question, we focused on the effects of PBL and WTL. According to the literature (Bangert-Drowns et al., 2004;Furtak et al., 2012;Graham & Hebert, 2011;Holm, 2011;Savery, 2006), the effects of PLB and WTL should also be visible in the results of this study. Every student was capable to incorporate a variation of the spectrum into the concept map. In the actual arrangement, there is not much involvement of the teacher in the process. Therefore, the outcomes may not be considered huge. However, the positive effect of a sort of scaffolding is also evident due to the fact that Lilly was the only student that used the offer of consulting the author.
Finally, it is time to address the last and most important question: Do concept maps picture the learning progression of the students in a valid way in comparison to interviews and reports?
Having all the limitations of this study in mind, the answer to the validity question has to be "no" for this study. Considering our results, the instrument 'concept map' has shown two false results, Lilly and Maria. This would not be a problem if we had 20 students, but two out of six is a lot. One interpretation of this result could be that the study itself was not suitable to measure the validity of the instrument. However, the design of the study and the results regarding the assumption that PBL and WTL is an effective method seems true. This leads to the second interpretation that concept maps are not valid to measure the conceptual learning process. This is in accordance with various results in the literature (e.g. Hollenbeck et al., 2006) and in contradiction to the results by Graf (2014) or Aguiar et al. (2014). Kern and Crippen (2008) concluded that concept mapping as a learning tool and conceptual change fit together very well. However, there is no clear correlation between the knowledge measured with the concept maps and for example the academic performance of students (Aguiar et al., 2014). In addition, Cañas, Novak, and Reiska (2012) wrote: "The success of a concept map activity depends greatly on the kind of concept map chosen and the skilfulness of the implementation". Hence, the possible variations to make a concept map are maybe too much. Standardizing the process may solve this problem.
The limitations and problems of the actual design are easy to spot. First, the main focus of the overarching project was not to investigate the validity of concept maps. This result was a surprise to us. Nevertheless, those results might be a small piece to answer the question of the validity of concept maps.
Second, there is the issue that the students had no formal training in generating concept maps. This lack of training results in a concept maps that described in detail in the research design section. To what extant the lack of knowledge about concept maps held them back to produce more complex maps is hard to estimate. The way the students had to produce their concept maps (open ended, no given concepts or linking words) maybe influenced the reliability of the concept maps as stated by Himangshu and Cassata-Widera (2010). However, the specific use of concept maps is not uncommon, and the students were all familiar with the method of mind mapping, that is very similar to the approach we chose.
Because this study is a case study, the results are not generalizable. They are clues pointing to a larger issue, that of the validity of concept maps, which should be monitored and investigated further. Another problem is incorporated by the design of the research task. When students start to investigate conceptions, they have their own conceptions too and, contrary to experienced scientists, they are not aware of their own conceptions. Therefore, there is a possibility that through their own investigation students find confirmation for their own conceptions, no matter if they were right or wrong from a scientific perspective. Maria and Carl are examples for this possibility in a non-physics-conforming way.
The obligatory task of conducting a small research project was not intended to be a learning instrument. Notwithstanding, this study shows that students learn something from a content perspective. They tackle the task in different ways and the results indicate some consequences of this different behaviour. Overall, installing this task in the final exams as change in educational policy opens the possibility to investigate the combination of WTL and PBL further. This study identified some conditions for learning instruments that help increase a change in the conceptions. Subsequent investigations may focus on identifying other conditions.
In the end, the study is a small but, in our perspective, important contribution to the existing body of literature. Although concept maps have been used for more than 30 years, the most important question for their use as an