Identifying Students' Interests in Biology Using a Decade of Self-Generated Questions

An identification of students’ interests in biology can help teachers better engage their pupils and meet their needs. To this end, over 28,000 self-generated biological questions raised by students from kindergarten through graduate school were analyzed according to age and gender. The sample demonstrated a dominance of female contributions among K12 students. However, girls’ interest in submitting questions dropped as they grew older. Topics popular among different age groups of males and females were identified, and the development of interest was described. Ways in which students’ interests can be incorporated into a standard-based curriculum are discussed, mainly as a trigger for the learning of less popular subjects which are required by the curricula.


INTRODUCTION
Teaching students what they want to know can be a very beneficial pedagogical strategy.However, curriculum developers and teachers often lack the necessary knowledge on which to base teaching which is responsive to students' genuine interests and informational needs.In order to create such a lesson, the teacher needs prior knowledge regarding the development of students' interest in biological issues, as well as familiarity with their self-generated questions on the topic he or she are about to teach.The aim of this study is to shed light on both -the development of students' interest, as well as their specific question in biological topics, based on a decade worth of self generated biology questions submitted to an online Ask-A-Scientist site.

The role of students' interest in science education
Positive relationships have been reported between interest and a wide range of learning indicators (Pintrich & Schunk, 2002;Schiefele, 1998).When allowed to pursue their own interests, students participate more, stay involved for longer periods, and exhibit creative practices in doing science (Seiler, 2006).Interest has also been found to influence future educational training (Krapp, 2000) and career choices (Kahle, Parker, Rennie, & Riley, 1993).Beyond being a useful and pragmatic practice, involving students in decisions about their lives in school is an important moral and educational principle (Davie & Galloway, 1996).Jenkins (1999) examined the implications of "citizen science", i.e. science which relates in reflexive ways to the concerns, interests and activities of citizens as they go about their everyday lives, for the form and content of school science education.He suggested constructing science curricula that enable young people to engage in science-related issues that are likely to be of interest and concern to them (Jenkins, 1999).This idea also appears in the recommendations of several organizations, including the National Research Council (1996) and the American Association for the Advancement of Science (1993), which have proposed that science curricula provide a common basis of knowledge while addressing the particular needs and interests of students.
Listening to the students is still a frequently overlooked approach to improving academic success (Conboy & Fonseca, 2009).Many scholars have pointed to the importance of relevance to curriculum development (e.g.Edelson & Joseph, 2004;Kember, Ho, & Hong, 2008) and science teaching (e.g.Darby, 2009).However, when aiming at creating relevant learning materials, developers frequently rely on an adult notion of what should be relevant and interesting to students (e.g.Bulte, Westbroek, de Jong, & Pilot, 2006;Chamany, Allen, & Tanner, 2008;Edelson & Joseph, 2004).However, for science to be relevant to its practitioners, the origin of the questions which are being investigated are of great importance (Tippins & Ritchie, 2006).Therefore, the ability to identify students' own interests in biology may be used to contextualize and personalize some of the formal biology curriculum.

Students' Interest in Biology
Research has provided some insight into students' interest in biology.It is the most popular science subject among students and adults (Baram-Tsabari, Sethi, Bry, & Yarden, 2006;Baram-Tsabari & Yarden, 2005;Baram-Tsabari & Yarden, 2009;Dawson, 2000;Falchetti, Caravita, & Sperduti, 2003;Murray & Reiss, 2005;Osborne & Collins, 2000;Qualter, 1993), and especially among females.Ayalon (1995) describes biology as an emerging "feminine niche" in science.It is the only science subject that has escaped a masculine image. .Differences exist between the topics that males and females find interesting within biology.According to results from the international project 'Relevance of Science Education' [ ROSE] in Denmark (Busch, 2005), England (Jenkins & Nelson, 2005), and Norway (Schreiner, 2006), girls are most interested in biological topics dealing with health, mind and well-being.Moreover, interest in biology is not a constant trait: interest in zoology, for example, decreases with age, while interest in human biology increases.This trend has been identified among young (<14-year-old) Israeli children (Baram-Tsabari & Yarden, 2005) as well as adolescents from various countries (Baram-Tsabari, Sethi, Bry, & Yarden, 2006), and it continues among adults (Baram-Tsabari & Yarden, 2007).The increased interest in human biology among adolescents is probably due to the approach of puberty and the related increasing interest in one's body.Adults seem to be more interested in human biology because they are more concerned with health issues.Older pupils' interest in human biology is well attested to by a number of other studies, including some conducted in England (Osborne & Collins, 2000), Israel (Tamir & Gardner, 1989), and Poland (Stawinski, 1984).
Questions are an important part of the ongoing scientific research process and have an important educational role (Biddulph, Symington, & Osborne, 1986;Brill & Yarden, 2003;Keeling, Polacek, & Ingram, 2009;Scardamalia & Bereiter, 1992).However, it is difficult to use children's questions in a classroom setting, as they are frequently a negligible component of general classroom learning.As Dillon (1988) plainly states "Children qua students do not ask questions.They may be raising questions in their own mind…but they do not ask questions aloud in the classroom."Researchers attribute this situation to a classroom atmosphere in which revealing a misunderstanding may render the student vulnerable, open to embarrassment, censure or ridicule (Pedrosa de Jesus, Teixeira-Dias, & Watts, 2003;Rop, 2003).
However, students do pose science questions in a free-choice science-learning environment, such as the world-wide web.An option open to children trying to find complex answers on the Web, is to submit their questions to asynchronic human-mediated questionand-answer services, which are sometimes referred to as "Ask-A" services, such as "Ask a Scientist".This study is a part of a larger project in which a decade long worth of questions were collected from an Ask-A-Scientist site, in order to use children's self-generated questions as an indication of their interest in science (Baram-Tsabari, Sethi, Bry, & Yarden, 2009).This study focuses on the development of K-graduate students' interest in different biological topics, as it is being mirrored by their questions.

METHODOLOGY Data source
MadSci Network is an award-winning independent non-profit organization operating from a server in Scottsdale, Arizona, USA (http://www.madsci.org).MadSci Network receives 90 to 150 questions daily, most of which are answered automatically by the site's search engine.Fewer than 20% of the questions are answered by nearly 800 globally distributed volunteer scientists, usually within two weeks.
MadSci Network covers all branches of science.It collects information and stores key demographic information, allowing ready mining of information in the archives.Many other English-language Ask-A-Scientist services are available on the net, but none were found suitable for this study.The reasons for this were varied, among them -sites that did not ask for the age of the asker or did not record all the information in their archives, sites which served a limited age group, or had a rather small database.
The webmasters of MadSci Network, who are two of the authors of this study (R.J.S and L.B), anonymized and provided for analysis the questions submitted to the site between 1995 to mid 2006.This data includes all the questions received in the site, and not only those sent to the scientists or published online.

Sample characteristics
Over 146,000 questions were sent to Madsci Network between its establishment at the end of 1995 and the first half of 2006.Almost 79,000 of the surfers disclosed their grade level, country of origin, and filled in the name and subject fields.An analysis of all of the questions in this sample is reported in another paper (Baram-Tsabari, Sethi, Bry, & Yarden, 2009).This study reports a more comprehensive analysis of questions allocated to the biological topics.
Users submitted their questions under one of 25 topics.Of these, the following 18 were biology topics: Biochemistry, General Biology, Zoology, Botany, Anatomy, Cell Biology, Environment and Ecology, Medicine, Genetics, Microbiology, Neuroscience, Agricultural Sciences, Evolution, Molecular Biology, Development, Virology, Immunology, and Biophysics (for examples of questions see Table 1).The topics 'Environment and Ecology' and 'Biophysics' include some questions which are not biological in nature (e.g."Can the millions of miles of black roads be increasing global warming?").
Questions on these topics made up 37.65% of the overall sample, making biology the most popular field of interest.Of these 1,205 questions asked by teachers were not included.The resulting sample was made up of 28,484 biology questions asked by students from kindergarten through graduate school.A few questions were missing some of the data, and therefore the n values differ between variables.
Age split: Submission of questions to MadSci Network requires that the user enter a grade level.28,480 of the inquirers provided their grade level; 68.3% of the surfers were school students: 2.8% were K-3 students, 9.5% 4-6th graders, 26.2% junior-high-school students and 29.8% senior-high-school students.Undergraduates contributed 20% of the questions, science graduates 7.7% and non-science graduates 4%.
Gender split: Gender identification was based on the asker's first name.Initial classification was done semiautomatically using an English name gender finder (epublishing.nademoya.biz/japan/names_in_english.php?nid=A).In the next step, the names that were not automatically classified and appeared twice or more in the data were analyzed individually using baby name guesser (www.gpeters.com/names/baby-names.php),which operates by analyzing popular usage on the internet.In this way, we were able to identify the gender of the asker for 17,840 of the questions.The rest were either names that could equally belong to boys or girls, meaningless scrambles, or names that appeared only once in the database.Of the gender-identifiable questions, 55.7% were asked by girls (n = 9,943) and 44.3% were asked by boys (n = 7,897).
Split by country of origin: 28,402 of the inquirers indicated their country of origin.The surfers originated from 126 countries.The great majority of the questions (81%) originated from the USA, UK, and Canada.An additional 10% originated from another five Englishspeaking countries (not necessarily as mother tongue): Australia, India, Singapore, Philippines, and New Zealand.
Statistical analysis: Unless otherwise indicated, a twotailed Pearson chi-square test was used to calculate probabilities.Not all the inquirers provided their full details; therefore, sample sizes differ from graph to graph and are indicated by n values.Significant differences within proportions were determined according to a cell chi-square test.

RESULTS AND DISCUSSION
A decade of biology questions sent to an Ask-A-Scientist internet site were analyzed by age and gender in order to learn about the interests of students in biological topics.

Age distribution of female participants
Overall, females used the site more than males to ask biology questions (55.7% vs. 44.3%,respectively).This surprising majority of females should be viewed in the context of females' general reluctance to use media that foster informal learning about science (National Science Foundation [NSF], 2004; Nisbet, Scheufele, Shanahan, Moy, Brossard, & Lewenstein, 2002) or to take part in extracurricular science experiences (Greenfield, 1998), and their relative lack of formal and out-of-school experience in using computers and the worldwide web (Kafai & Sutton, 1999;Shashaani, 1994).Two factors worked together to explain this female majority of contributors, who are traditionally found to be less interested in science than males: the general interest of female students in the field of biology, and the attractive and secure science-learning environment provided by the internet.This female dominance was not consistent among age groups.Girls participated in the sample more than boys while in school (K-12), especially during the middle-school and high-school years, but their number dropped dramatically upon moving to college and even more so at the graduate level, making the males the more dominant group in this latter sample (Figure 1).Although it is known that students, especially females, tend to lose interest in science as they grow older (Friedler & Tamir, 1990;George, 2006;Greenfield, 1998;Kahle & Lakes, 1983), this decrease usually takes place during the middle-school and high-school years.In this free-choice online setting, the decrease seems to have been postponed (Figure 1).

Identifying interest in biological topics
Not all topics demonstrated the same level of popularity.The most popular biology topics were biochemistry, general biology, botany and zoology, each receiving approximately 10% of the questions.Anatomy, cell biology, environment and ecology, medicine, genetics, and microbiology received 6 to 7.5% of the questions.Questions in neuroscience, agricultural sciences, evolution, and molecular biology received 2 to 4.5% of the questions.The least interesting topics were development, virology, immunology, and biophysics, with around 1% of the questions each (the full list of frequency and percentage of questions for each topic can be seen in Table 1).Male and female students differed significantly in their interest in some of the topics (p < 0.0001).Females were more interested than males in asking questions about botany, cell biology, and genetics, while males were more interested than females in asking questions about medicine, neuroscience, evolution, virology, immunology and biophysics.
Although all of the questions in this sample were self-generated by the askers, it is important to note that some of them were raised by the students as a consequence of a school assignment.In a previous study, we learned that topics such as anatomy and physiology, sickness and medicine, and genetics and reproduction are all characterized by relatively more 'spontaneous' than school-related questions (Baram-Tsabari, Sethi, Bry, & Yarden, 2006).Botany and mycology, microbiology, virology, and cell biology yielded many more teacher-and textbook-generated questions than spontaneous ones.Topics such as  Bry, & Yarden, 2006).From the current analysis, we learned that both males and females used the site to get help with their school-work as well as to satisfy their own curiosity, since both spontaneous and schoolrelated topics appear to be more 'masculine' or 'feminine'.Student interest in the various topics differed significantly among the various age groups (p < 0.0001).For example, interest in medicine increased with age (Figure 2), while interest in zoology decreased as students matured (Figure 3).This trend is in agreement with the known pattern of increased interest in human biology and decreased interest in zoology with age, which had been previously identified in several Ask-A-Scientist sites (Baram-Tsabari, Sethi, Bry, & Yarden, 2006;Baram-Tsabari, Sethi, Bry, & Yarden, 2009;Baram-Tsabari & Yarden, 2005).

A. Medicine
Other topics which were characterized by a decrease in interest with age were environment and ecology (Figure 4), botany (Figure 5), and agricultural sciences (data not shown).Botany was a relatively popular topic among K-9 students.It was previously found to be a topic that elicits many questions regarding school assignments (Baram-Tsabari, Sethi, Bry, & Yarden, 2006).Thus, it can be assumed that this is the reason for the relatively high percentage of questions on this topic elicited by school children.
Four additional topics showed an increase in the percentage of questions with age: genetics (Figure 6), evolution (Figure 7), neuroscience, and biochemistry (data not shown).The first three were previously found to elicit a large number of children's spontaneous questions (Baram-Tsabari, Sethi, Bry, & Yarden, 2006), therefore the increase is probably not due to school assignments.The increase was not identical for males and females.While females developed an interest in genetics (Figure 6), males asked more about evolution (Figure 7) and neuroscience (data not shown).Biochemistry, on the other hand, appealed equally to both genders.It became popular among high-school students and retained its popularity among the older age groups (data not shown).The reason for this increase may be related to the formal study of biochemistry.

D. Evolution
Overall, it seems that the topics which were most popular among young age groups have to do with macroscopic levels of organization and concrete entities, such as plants and animals, while topics popular among older students have to do with microscopic levels of organization and molecular entities, such as DNA, neurotransmitters and proteins, and with abstract concepts such as genes and phylogeny.
Cell biology (Figure 8) and microbiology (data not shown) garnered an increase in interest during middle school and high school, followed by a decrease in the older age groups.This finding is in agreement with the results of previous research which found them to be topics that elicit many questions regarding school assignments and less spontaneous questions (Baram-Tsabari, Sethi, Bry, & Yarden, 2006).

Research limitations
As early as the fall of 2003, nearly 100% of public schools in the US had access to the internet (National Center for Education Statistics, 2005).There have been virtually no differences in school access to the Internet by school characteristics since 1999 (National Center for Education Statistics, 2006), theoretically allowing all students to be part of the sample.In 2009, there are over 172 million active home users (users who have logged on from home in the previous 30 days) in the US alone (Marshall, 2009).As access to and use of the Internet becomes more widely and representatively distributed worldwide, new opportunities exist for data collection online (Rhodes, Bowie, & Hergenrather, 2003).Massive multi-player online games, for example, are used as a platform for science education research (Bainbridge, 2007), such as evaluation of scientific habits of the mind (Steinkuehler & Chmiel, 2006), and infecting avatars with virtual epidemic as a model of educational intervention (Kafai, Feldon, Fields, Giang, & Quintero, 2007).However, online data mining also has methodological drawbacks, which will be discussed here.
Non-representative sample: This research made use of a self-selected, non-control sample.There is a positive correlation between knowing about science and being interested in it (Ziman, 1991).Therefore, students who send questions to science web sites are probably more interested in and more knowledgeable about science than the general student population.Furthermore, there is also a marked difference in ease of access for children from different socioeconomic statuses to the internet, which was our source for the questions.
The validity of the study can be supported by the notion of using data that originates from the researched population itself, not as a response to a stimulus from a researcher, thus ensuring high ecological validity.Another way to achieve validation is by comparing any conclusions drawn with other independent observations.Reliability may be assured by the use of a very large sample (Reid, 2006).
Potential of multiple questions from the same user.Surfers in MadSci Networks are not provided with userIDs.As a result, multiple questions from the same user would have been recorded as arriving from different users.We assume that the number of multiple questions does not differ between genders and age groups; however, this uncertainty is a setback of our research.Gosling, Vazire, Srivastava, and John (2004), found that internet samples are relatively diverse, generalize across presentation formats, are not adversely affected by nonserious or repeat responders, and are consistent with findings from traditional methods (Gosling, Vazire, Srivastava, & John, 2004).It is also true that a surfer can fake his identity online, or ask a question he or she are not really interested in, a key issue, then, is whether the subject would have a good reason to want to fake (Anderson, Ball, & Murphy, 1975).Rhodes, Bowie et al. (2003) conclude that many of the criticisms of online data collection are common to other survey research methodologies.
Allocation to topics: The classification of the questions to the various topics was performed by the surfers.In some cases questions were misplaced, either because the surfer did not recognize the right topic or did not pay attention to the process.We assume that most of these misplacements were distributed evenly among the topics, and therefore did not cause a major bias.
Although web-based experiments of the kind used here are more difficult to control than are experiments conducted in formal setting, they present an important methodological advantage for studying interest-driven science learning, taking into consideration that this kind and amount of data does not exist anywhere outside the web (Baram-Tsabari, Sethi, Bry, & Yarden, 2009).Other limitations, however, are not exclusively related to the data collection approach used in this study, but rather to its pedagogical implications.
Formalizing free-choice learning: Asking a question in a free-choice environment does not guarantee willingness to invest time and effort in learning the answer in a school setting.It is not clear what would happen if students' interests were implemented into the school science curriculum.Would free-choice learning lose all of its appeal once it became compulsory?
The role of students' interests in determining the curriculum: Even if we had a clear-cut understanding of what students really wish to know, the biology curriculum would not rely solely on students' interests.Principles in biology should be taught, even if they do not spontaneously elicit questions from the students.On the other hand, how can a curriculum claim to be 'relevant' to the students if it does not incorporate any of their interests?

Implications for teaching
There are several ways in which students' interests can be incorporated into a standard-based curriculum.To list a few: a teacher can present a new principle or concept using a context which is relatively engaging rather than alienating for the target audience (e.g. in biology: zoology vs. human health, (in physics see: Haussler & Hoffmann, 2002)); allow students to create their own research questions within a given topic in project-based learning or use their questions as a starting point for inquiry-based learning (Yerrick, 2000); construct a lesson based on students' questions, or even teach a whole topic using a tailor-made question-based curriculum (Gallas, 1995).Knowledge regarding the development of students' interests in different topics may be used for choosing an engaging context for different groups of learners (Baram-Tsabari & Yarden, 2007).In the following, we discuss another way of using students' individual interests in class, as a trigger for the learning of less popular subjects which are required by the curricula.
Let us imagine a novice biology teacher.Her goal for the lesson is to teach the fundamental classification of cells into prokaryotes and eukaryotes, but she is unexpectedly being asked about a very daily-life aspect of reproduction in birds: "Is it true that if you leave an egg outside the fridge a chick will hatch from it?".This seemingly unrelated question can be used as a trigger for discussing some of the differences between prokaryotes and eukaryotes-the former are simply uni-cellular creatures that usually reproduce by division, while the latter are the building blocks of all multi-cellular creatures, many of which use sexual reproduction, and ultimately, this is why unfertilized eggs do not hatch.Thus, a spontaneous question about reproduction in the context of zoology could have been converted into a formal discussion on cell biology.Seiler (2006) notes that many students' connections with science take the form of questions that a teacher might consider offhanded or even off-task, but they represent significant intellectual efforts by the students to connect science with their lives and experiences.These questions may be used as student input for the development of a student-interest-focused curriculum (Seiler, 2006).
The teacher could also have planned in advance.Since she knows that students at this age are increasingly interested in medicine, she could have started by asking the students why they think antibiotics kill bacteria, but not the person who takes it.The students would probably not be able to answer the question at that point in their education, but the question may engage and interest them.
Teachers who are attentive listeners are able to recognize and extract their students' questions and interests (Seiler, 2006), but ideas for triggering questions can be found using the "frequently asked questions" (FAQs) section presented by some of the Ask-A-Scientist sites, or just by browsing their archives.Questions such as: Where does the fat go when a person loses weight?Why do males have nipples?Can lions become vegetarians?Are dogs color-blind?, all asked by students at Ask-A-Scientist sites, may serve as triggers for standard biology-curriculum issues, such as nutrition, evolution, ecology and the senses (respectively).When choosing questions, the age of the target audience should be taken into consideration, since topic popularity varies with age.Ask-A-Scientist sites seem to be an attractive environment for girls, allowing the teacher to choose from a variety of girls' questions, which are usually rare in a school-science setting.
At Ask-A-Scientist sites the questions are asked by the learners, but the locus of control over the learning process is external, since the answers are given by asynchronic human experts (Nachmias & Tuvi, 2001).When used in class, the locus of control over the learning process is transferred to the teacher.If the questions which are used originate from the students themselves, then they receive some control over their learning, along with the engagement and interest that characterize the process of learning something that one really wants to know.

Figure 1 .
Figure 1.Distribution of biology questions according to gender and age group (n = 17,838)

Figure7.Figure 8 .
Figure7.Interest in evolution among boys and girls in different age groups.Percentage is calculated out of the total boys' or girls' questions

Table 1 . Examples of questions in biological topics, their frequency and percentage (n = 28,484). Topic a Frequency Percent Example b (gender, age group, country) c
How can I measure the water retention in soil?(f, 7-9, US); Can air pollution affect the size of insects?(m, 7-9, US) Medicine 2,036 7.1 Is there a high percentage for a boy to get diabetes if his mother has it?(f, 7-9, US); Why does an adult recover from a fracture much slower than a child?(f, undergrad) Genetics 1,750 6.1 Can a DNA test distinguish paternity between brothers?(f, non-science graduate); Is there a genetic element that determines the sounds of our voices?(m, 10-12, US) Microbiology 1,676 5.9 Are there bacteria that eat lava and will they destroy the earth?(m, 7-9, US); How fast do bacteria, mold-fungi or viruses grow on your body?(f, 4-6, US) Neuroscience 1,283 4.5 How do alcohol/drugs lower inhibitions?(m, 7-9, US); how come when I flunk a test food doesn't taste good?(f, undergrad, US) Agricultural Sciences 1,282 4.5 What are the scientific names of weeds?(f, 10-12); How would I design an experiment about the effects of gray water?(f, 10-12, Australia) Evolution 774 2.7 What selected for, groups, organisms, or genes?(m, undergraduate, Canada); Could an herbivore evolve from a carnivore?(f,10-12, US) Molecular Biology 529 1.9 2 Strands of DNA-do these make 2 different batches of proteins?(f, undergraduate, Australia); Why can't there be more number of binding sequences for the given primer?(m, science graduate, India) Can you tan under black lights?(m, undergrad, US); At what temperature does popcorn pop? (m, 4-6, US) a The topics are listed in order of popularity.b These are verbatim quotes.In some cases only part of the question is shown.c Where data are available.m = male; f = female; US = United States.d Not all of the questions in this topic are strictly "biological".