Digital Interactive Quantitative Curiosity Assessment Tool : Questions Worlds

Curiosity is a 21st century skill and is of paramount importance in the digital age. However, the assessment of curiosity is often based on self-report or subjective observations. We present the development and evaluation of a digital quantitative assessment game for question-asking-based exploration. The student navigates a graphically presented question graph by selecting questions about a series of virtual alien worlds. The game extracts question-related quantitative measures, e.g., the breadth, depth and specificity of the answers to the questions. We conducted a study with Youth University students and administered a curiosity-based questionnaire to their class teachers as an external validation. Our results show that the measure of total question specificity in the last presented world is a significant predictor of children’s curiosity, as rated by their teachers. This suggests that curiosity can be quantitatively assessed by an entertaining digital question-based game.


I. INTRODUCTION
In the information age, where knowledge is only a click away, curiosity is becoming one of the most significant aspects of human learning [1]. The intrinsic drive to learn as much as possible, to discover new things and to resolve uncertainty [2], [3] is the cornerstone for self-regulated learning, especially in the digital world [4]. One of the main activities associated with curious people is question-asking [5]. Children often ask many questions as an expression of their curiosity, but they do so less as they grow older and enter the formal educational system [6]. On one extreme, though, talented and gifted children express more curiosity [7] and are thus expected to ask more questions.
How does one assess curiosity? Adults' curiosity has been mainly assessed through self-reported questionnaires [3], [8]. However, these, like all self-reported questionnaires, have inherent flaws. Children's curiosity has been mainly assessed via subjective observations and reporting [9]. Only recently have novel types of assessments surfaced, namely, a game to assess children's uncertainty seeking [10] and a human-robot interaction setup to assess adults' physically expressed curiosity [11].
However, all the aforementioned assessment tools have not addressed the issue of question-asking. To address this topic, one has to quantify the following: "what is a good Manuscript received January 16, 2020; revised April 11, 2020. The work is supported by the National Institute for Testing and Evaluation, Israel.
Noam Tor and Goren Gordon are with the Curiosity Lab, Department of Industrial Engineering, Tel Aviv University, Tel Aviv, Israel (e-mail: noamtor@gmail.com, goren@gorengordon.com). question?" In the binary case of yes/no questions, information theory provides a mathematical answer, namely, a good question is one that divides the possible answer space in half [12]. This is easily demonstrated in the famous game of "20 questions" [13], where it has been shown that children adapt their search to more efficient questions.
What is a good WH question (who, what, when, where, why, how, which, whose)? Here, we use graph theory analysis and quantify several aspects of WH questions. By creating a question graph, wherein a node is an answer and edges are questions, we use graph-theoretic attributes to define the following measures: (i) breadth -the number of answers to a question; (ii) depth -the number of questions the answer exposes; and (iii) specificity -the number of answers that cannot be reached through other questions. In other words, breadth exposes novel information, depth exposes more uncertainty and specificity measures the uniqueness of the question.
The goal of this study was to develop and evaluate a novel digital curiosity assessment tool, where we use the following definition of curiosity: "the intrinsic drive to learn as much as possible and the accompanying behavior" [1]. The development process was based on the hypothesis that curious children select different question types than less curious children [5]. Our curiosity assessment tool, dubbed Questions Worlds, aimed to characterize question-asking behavior in children, with the current research focusing on individual differences rather than an educational outcome assessment tool. By visualizing a question graph in an entertaining fashion, we facilitated a novel digital interaction for children that quantified their question-based exploration path along the graph. Moreover, we focused on a challenging group, namely, talented and gifted middle schoolers who are known to already be highly curious [7].
We hypothesize that curious children will choose "good questions", as measured by our tool, and, more specifically, questions of high specificity (H1) [13], [14]. However, based on prior research on curiosity [11], we hypothesize that this effect will manifest only after familiarizing themselves with the game (H2). As divergent validation for our tool, we hypothesize that these measures will not be correlated with perceived intelligence but rather only with curiosity (H3).
To validate our digital curiosity assessment tool, we conducted two studies in Youth University, which admits talented and gifted middle school students. The students played with Questions Worlds and were then subjectively assessed by their teachers using questionnaires. We show that students externally assessed as having high curiosity indeed chose significantly higher-specificity questions in the last stage of the game. Furthermore, we show that the same measures are not significantly correlated with perceived intelligence.
The contributions of this paper are threefold: (i) a novel graph-theoretic quantitative measure for questions, (ii) a novel tablet game for the assessment of curiosity and (iii) external validation of the tool via teachers' questionnaires.
The results presented here suggest that the utilization of digital media, combined with novel graph-theoretic measures that are visualized in an entertaining fashion, could be used as an assessment tool for question-based curiosity.

A. Curiosity
The elusive definition of curiosity has plagued its study for millennia [10], [15]. There have been several definitions of curiosity, ranging from "lust for knowledge" to the more modern "an intense, intrinsically motivated appetite for information" [16]. The most recent and relevant theory of curiosity is Lowenstein's information gap theory, which suggests that curiosity is a result of feelings of deprivation, which are unpleasant and motivate information seeking to reduce these feelings [16]. Litman and colleagues have recently extended Loewenstein's information gap theory of curiosity to include both deprivation (D) and interest (I) dimensions [17]- [19]. Research on curiosity has shown its great effect on learning processes. Curiosity drives the curious person to actively explore and seek new information, i.e., ask questions, test hypotheses, etc. [10], [20]. As a result of this active learning, the person's learning process and information acquisition will usually be much greater and more effective [21], [22]. This effect was also demonstrated in brain study research, which showed that the more curious people are while learning new information, the better they will remember it [23].
This strong relation between curiosity and effective learning has an important meaning for the educational system. Curiosity is usually expressed in behaviors (such as active information seeking, concentration, and visible interest), which are deeply related to improved academic performance [24]. It has been shown that even though intelligence and effort play a great part in predicting scholarly success, curiosity is no less of an important, strong, and distinct predictor of it [25]. Furthermore, these mentioned visible behaviors that derive from curiosity also lead to higher teacher ratings of attention, motivation, competence and persistence [10], as the curious child is much more engaged in class. These results suggest that being curious in school can greatly and positively affect academic performance.

B. Curiosity Assessment
This misalignment between current educational system goals related to curiosity and its need to endorse it to increase effective learning creates a need for tools to assess and measure curiosity. Curiosity measures today are mostly based on a personal point of view, limited to self-report questionnaires, such as the curiosity and exploration inventory-II [26] and its more recent version [3], as well as some behavioral methods that mostly focus on spontaneous exploration [27], [28]. While these self-report questionnaires and observation methods have been validated and improved over time [3], they cannot be used to assess people's curiosity in testable scenarios and do not base their assessments on a person's authentic actions. In addition, most of the questionnaires are less suited for young children.
In recent years, several digital curiosity assessment tools have been developed for children. These tools, usually tablet applications, are intended to be more objective and behavioral-based than the current questionnaire-type assessments. One application is Underwater Exploration! [10], in which curiosity is indicated by the amount of uncertainty the child prefers throughout a specific task, in a repetitive yet stepwise manner. A second application is Free Exploration [29], [30], in which children can move different characters on the tablet and receive information about them. Measures such as exploration time were used as a proxy of curiosity.
Yet another novel assessment tool has been recently presented, in which a fully autonomous humanoid robot has been used as the experimenter and assessment tool for physically expressed curiosity [11]. It has been shown that initial exploratory behavior correlated more with shyness, whereas later exploratory behavior correlated more with self-reported curiosity.
These games represent the beginning of a solution for the current subjective measurement methods. However, one important expression of curiosity does not appear in these games, namely, question-asking [5], [6], [31].

A. Digital Game-Based Assessment Tool
We developed a quantitative, objective model-based assessment tool for children ages 11-15 in the form of a tablet game we called Questions Worlds. In Questions Worlds, the players encounter different virtual alien worlds that they can explore, as shown in Fig. 1. Their interaction with the worlds comes from selecting different objects within them, e.g., aliens, technology or indigenous plants, and selecting which questions to ask: How does it work? What is it made of? Why is it here? Each object-question pair results in a verbal utterance of an answer, which is part of a different story arc for each alien world, and the appearance of more objects with which the child can interact.

B. Question Graph
We have created an acyclic directed graph-based model that represents the questions-answers connections between the different objects in the world, which we called the "question graph". In our game, the nodes of the graph represent the world's objects, and the edges, i.e., the directed connections between two nodes, represent the questions that can be asked about the object (that lead to new objects). Thus, the question graph is a directed graph (a question about an object "points" to the object that represents the answer, but this does not mean that it also points in the other direction). The question graph (i.e., the graph of objects and questions) is identical for all worlds, but the story differs. The network was constructed such that graph parameters of each question type are different, with the assumption that these parameters reflect a basic curiosity-based behavior goal.
In total, a player visits 5 alien worlds: Worlds 1, 2 and 5 are time-limited; World 3 is limited to only five questions; and World 4 is limited to one question type (what/how/why).
The question graph was built in such a way that each question asked about a different object has its own values of parameters, as shown in Fig. 2. These parameters are as follows: Breadth, which is the number of answers to the question; Depth, which is the number of new questions that are potentially available from the given answers; and Specificity, which is a parameter representing the uniqueness of the answer, i.e., a high specificity means that there are few other questions leading to this answer, and vice versa.
When the base case for this recursive formula is when an object has no questions to be asked about, then , 0 ij D  . , and the arrows represent the questions that can be asked about the object, that lead to the discovery of new items. For instance, asking "What is it made of?" about the Alien will make Parts 1, 2, 3 and 4 appear on the screen.

A. Teachers' Evaluation Questionnaire
To externally validate the Questions Worlds measures, we have created an evaluation tool for the students' teachers, namely, a 10-item questionnaire designed to assess students' perceived curiosity by their teachers. It is important to note that the teachers met the students during more than 10 lessons, each lasting 3 to 5 hours. During summer school, classes took place during two consecutive weeks, whereas during the semester, they took place once a week for three months.
The process of creating the questionnaire included interviews with senior and experienced teachers from the Youth University. From these interviews, we gained insights into the relationship and familiarity of the teachers with the students, the types of interactions between them during the course (lectures, workshops, group activities, etc.), their ability to assess curiosity (as defined earlier in this paper), and possible ways for us to guide the teachers during the course to notice students' behaviors that might reflect a curiosity drive.
The questionnaire was written following these insights and International Journal of Information and Education Technology, Vol. 10, No. 8, August 2020 influenced by the validated curiosity assessment questionnaire I and D Type Epistemic Curiosity Scale for Young Children (I/D-YC) [32], even though the questionnaire was designed for much younger children. The questionnaire included the following questions, with the instructions: "please rate the following statements according to their relevance to the student you are assessing. Base your ratings on your familiarity with the student from the Youth University course. 1 = Very Low, 2 =Low, 3= Medium, 4 = High, or 5 = Very High." 1) How often is the student active in lectures? 2) How often is the student active in the group activities (if there are any in the course)? 3) How often does the student ask you to elaborate on the course material? 4) How often does the student address you with questions about the material following lectures? 5) Does the student spend time in addition to the lecture times to learn more about the course material? 6) Do you believe that the student asks questions from a curiosity drive, i.e., a real desire for knowledge, or for another reason? (Curiosity drive / Another reason / I do not know). 7) When the student encounters material that he/she does not understand, he/she will try hard to make sense of it until he/she understands it. 8) The student shows visible enjoyment when discovering or understanding something new. 9) What is your assessment of the student's intellectual abilities? 10) What is your assessment of the student's curiosity level?
During the first study, we allowed teachers not to answer questions they were not confident about. This resulted in several missing values (see Table I). In the second study, we requested full answers to all the items in the questionnaire for all students.
We presented the questionnaire to all the teachers prior to the course during a meeting that included a lecture on curiosity in general and an assessment of curiosity specifically. The goal was to have the teachers pay attention to specific behaviors of students throughout the course.

B. Experimental Setup
Questions Worlds was programmed in Python and is suitable for Android devices. The application includes a logging system, which writes every action that is made during the game (touch, object pressing, swiping, etc.), with extra information (timestamps, coordinates, etc.). For this experiment, the game's text and narration were written and recorded in Hebrew, the native language of the students.
The experimental setup consisted of tablets and earphones, with the Questions Worlds application installed on them; see Fig. 3. Each session included between 10 and 20 students within the same classroom and lasted less than an hour.

C. Experiment Protocol
After giving the researcher a signed parental consent form and a participant's signed assent form, each participant received a participant ID, a tablet and earphones and was given a short instruction about the tablet usage. The student entered his or her participant ID and started the game. As the first screen appeared, the following voiced narration started: "You are space travelers, searching for new worlds. You can press on anything you find in order to ask questions about it. You have landed on the first planet. You have 60 seconds to learn about it." If the student pressed an object, the 3 question icons "How?", "What?" and "Why?" appeared at the bottom of the screen, with the following narration: "You can ask a question about this object. How does it work? What is it made of? Why is it here?". If the student pressed the question icon, a new object (or objects) appeared on the screen (according to the question graph), and a narration with information about the new object started. For example, in World 1, pressing the Alien and the "Why is it here" button made the "Grand City" object appear, with the following narration: "The aliens are here to build their grand city in the mountains".
Only after the first new object appeared (in World 1 only) did the narrator say the following: "A new object appeared; you can ask questions about it too".
The student was able to independently press on different objects and choose different questions until the 60 seconds ended. After 60 seconds, the game automatically changed to a new screen, namely, World 2 (with a different background and objects). The narrator said, "You have reached a new planet. You can press on anything you find in order to ask questions about it. You have 60 seconds to learn about the planet." Worlds 3 and 4 followed.
Then, a new screen appeared (World 5). The narrator said, "Space travelers! You have reached the last planet. You have 60 seconds to learn about it before you start your journey back home". After 60 seconds of playing, the narrator said, "What a great adventure! We hope you enjoyed it!" The game then closed automatically.

D. Participants
The participants were students in Tel-Aviv University's Youth University courses. We ran the study twice, once during the university's summer camp and once during the semester. Both times, we performed the exact same protocol with the same target participant groups, yet with different participants. In what follows, we present the results from both studies.
In the first study, 131 students participated, age=13±2.5 yrs, female=59. Twelve teachers answered the evaluation International Journal of Information and Education Technology, Vol. 10, No. 8, August 2020 questionnaire about 70 students. Seven students were evaluated by two teachers, and their evaluations were averaged. In the second study, 52 students participated, age=11±3.5 yrs, female=21. Ten teachers answered the evaluation questionnaire about 52 students. One student was evaluated by two teachers, and his evaluations were averaged.
In total, N = 122 different students were evaluated, age=12±3 yrs, female=58. Out of these 122 students, only 114 completed the Questions Worlds game.
All students signed assent forms, and their parents signed consent forms, in accordance with the IRB of the university.

A. Teachers' Evaluation Measures
As mentioned earlier, only in the first study were the teachers not obligated to answer all the questions, and they could choose not to answers questions that they were not confident about regarding the evaluated student; see Table I. This resulted in only 76 students for which we had fully completed questionnaires.
We conducted an exploratory factor analysis (EFA), performed in SPSS using maximum-likelihood estimation with promax rotation. Analysing the eigenvalues decomposition (not shown), we concluded that the optimal number of factors is 2; see Table I. The two factors were found to be the following: 1) Factor 1 is composed of items asking directly about the following: intrinsic drive to learn (Q5, Q7), enjoyment of learning (Q8), question asking about expressions (Q3, Q4), intelligence (Q9) and curiosity (Q10). This factor aligns with the definition of curiosity mentioned above: "the intrinsic drive to learn as much as possible and the accompanying behavior"; hence, we named it the "curiosity factor". 2) Factor 2 is a combination of items related to activity, either group activity (Q2) or class activity (Q1). Hence, we named this factor the "activity factor".
The two factors are highly correlated but do not fully explain each other's variance (R=0.591, p<0.001 Pearson correlation).
Next, we analysed the correlation of demographics with these factors. First, the Kruskal-Wallis test showed no significant difference between the genders in the two factors (Curiosity: H=2.0, p=0.158, Activity: H=0.275, p=0.6). Second, we performed a linear regression to analyse the factors' dependency on age. We found no correlation between the age of the students and the curiosity factor, (F(1,74)=0, p=0.90, R 2 =0) and a negative significant correlation between age and the activity factor (F(1,74)=4.02, p=0.044, R 2 =.054, β=−0.14). These findings may shed light on the previously reported decline in curiosity with age [6], even in gifted children [7], where curiosity was previously assessed by activity in the classroom.

B. Game Measures
We first present descriptive statistics of the game measures. Based on our hypothesis (H2) [11], we focused on the last world in the game (see below for the supporting evidence for this hypothesis). Depth and specificity followed a normal distribution, whereas breadth was more uniform (p=0.089, 0.532, 0.03 Shapiro-Wilk normality test for depth, specificity and breadth, respectively).
We next analysed the demographics dependence of the measures. We found that only specificity had significant dependence on gender (F(1,112)=4.05, 2.43, 3.07, p=0.046, 0.121, 0.082 for specificity, breadth and depth, respectively, one-way ANOVA test; specificity male=0.45±0.09, female=0.42±0.09). We found that all three measures did not correlate with age.

C. Curiosity Assessment
We next set out to directly test our hypotheses, namely, that (i) the last world exploration, as measured by our game measures, will be correlated with perceived curiosity (H2) and (ii) that specificity in that world is the most contributing measure (H1).
We performed a multi-linear regression analysis with the breadth, depth and specificity of the first world as predictors of the curiosity factor. We found no significant contribution of these measures (F(3,69)=0.225, p=0.879, R 2 =0.01). We then performed a multi-linear regression analysis with the breadth, depth and specificity of the fifth and last world as predictors of the curiosity factor. We found that specificity in the last world is the only significant predictor of the curiosity factor (F(3,69)=3.69, p=0.016, R 2 =0.138, with β(p)=9.35(0.014), −2.935(0.055), −1.39(0.628) for specificity, breadth and depth, respectively). Given that depth is highly correlated with specificity (R=0.919, p<0.0001 Pearson correlation) and makes a non-significant contribution, we excluded it from the linear regression (F(2,70)=5.48, p=0.006, R 2 =0.135, with β(p) = 7.98(0.002), −3.24(0.020) for specificity, and breadth, respectively). These results support hypotheses (H1) and (H2).
To strengthen the support for our hypotheses, we conducted another statistical test. We divided the students into three groups, according to the evaluation received by their teachers in question number 10 (assessed curiosity) in the questionnaire; see Fig. 4. As suspected, due to the target group of talented students, a low percentage achieved 1 and 2 on the Likert scale. Hence, we divided Q10 into low (1-3), medium (4) and high (5) curiosity; N=32, 38 and 29, respectively. Analysing only world-5 specificity, we conducted an ANOVA test, which resulted in a significant difference between the three groups, with an increasing mean with curiosity level (F(2,96)=5.02, p=0.008), where a post-hoc Tukey test revealed significant difference between the low and high curiosity groups (Means=0.39,0.43,0.45, for low, medium and high, respectively. p = 0.007 for low-high groups).
These results show that the measures computed using the Questions Worlds application, namely, the specificity and breadth of the questions in the last world, are highly predictive of students' curiosity, as assessed by their teachers, supporting our hypothesis (H1).
As divergent validation, we tested whether the measures computed by the Questions Worlds application are predictive of other factors. We tested whether perceived intelligence has significant difference in specificity (similarly to the above analysis). We again grouped them into low (1-3), medium (4) and high (5) intelligence and found that there is no significant difference in specificity between the groups (F(2,91)=2.65, p=0.076). This supports our hypothesis (H3).

V. DISCUSSION
In this study, we have presented a novel digital tablet game that was used to assess talented students' curiosity.

A. Intrinsic Motivation-Oriented Design
We have made several design decisions that were specific for the implementation of an assessment tool for curiosity. The main decision was not to add any external rewards, e.g., stars or points, to the game. This was an important design decision due to the nature and definition of curiosity as an intrinsic reward [1], [33]. We chose not to add another dimension to the exploration, e.g., points for discovering more, to restrict the motivational component as much as possible to the intrinsic drive [34].

B. Curiosity Aspects
Curiosity is a multi-faceted construct [1], [2]. The novel tool presented here was designed to address only a limited number of curiosity aspects.
State-trait curiosity: Trait curiosity is considered a stable aspect of a person, e.g., that person is curious [8]. On the other hand, state curiosity is a fleeting thing and can drastically change from one moment to the next and from one topic to the next, e.g., that person is curious now about that thing. The tool, by design, is limited to assess only state curiosity, as it measures behavioral aspects in the moment, with respect to the presented content. For example, a person with high trait curiosity may be completely uninterested in aliens and thus exhibit low state curiosity during gameplay.
Specific-diverse curiosity: Curiosity can be expressed regarding a specific topic, e.g., cars, or as a diverse behavior, e.g., wanting to learn more about many things [35]. We attempted to address this curiosity axis via introducing aliens, technology and plants, which the students can explore. Furthermore, children could have been curious about the tablet itself and not the content of the game. However, our results, namely, that the last world's measures predict teachers' reported curiosity, suggest that this is not the case.
Information gap theory: Information gap theory states that curiosity is aimed at filling a perceived information gap. Many computational models have recently been developed under this assumption [33]. However, all the models address the issue of the quantity of information and not necessarily what type of information is missing. In this contribution, we have designed a question graph that enabled us to distinguish between types of information sought, i.e., uncertainty-resolving measured by breadth; uncertainty-seeking measured by depth; and uniqueness measured by specificity. We suggest that information gap theory may require an extension to include such variance in information types.

C. Study Limitations
While this study has produced supporting evidence for our hypotheses, it has several limitations. The first is a lack of self-reported questionnaires to correlate students' own perspective with their behavioral measures. Another limitation of the study is due to the population, which is on the high end of curiosity and intelligence. This may limit the generalization of our results to the general population. Finally, due to the nature of Youth University, there were no external measures, e.g., grades, to correlate with the extracted behavioral measures.

VI. CONCLUSIONS AND FUTURE WORK
We presented a novel digital interactive curiosity assessment tool and conducted a study in an attempt to International Journal of Information and Education Technology, Vol. 10, No. 8, August 2020 validate it. A novel approach to quantify question-asking behavior was also presented, wherein questions are measured based on the breadth, depth and specificity of their answers. We have shown that there is a significant correlation with the hypothesized question-based exploration parameters, namely, specificity, and externally perceived curiosity, as assessed by students' teachers.
In future work, we intend to expand our validation test of this novel tool to include not only talented and gifted children and to expand the target age to elementary school students. Moreover, using Questions Worlds as a pre-post test to assess the effectiveness of curiosity-promoting pedagogies, will require changing the structure and content of the question graph to enable repeatability for the same child. Furthermore, tailoring the content, e.g., alien worlds, to each student's interests can facilitate a much more personalized and accurate tool. These changes can easily be done, since the code structure enables insertion of novel content and graph structure via a simple text file.
To conclude, our results suggest that there is considerable promise in the possibility of creating novel interactive digital tools for the assessment of important 21st century skills, such as curiosity.