1 Introduction
"There are three kinds of lies: lies, damned lies, and statistics” first proclaimed the British Prime Minister Benjamin Disraeli before the sentence was popularised by Mark Twain. Indeed, statistics frequently had itself discredited because of misunderstanding or mistrust. A lack of statistical literacy can easily lead to “misunderstandings, misperceptions, mistrusts and misgivings about the value of statistics for guidance in public and private choices” (Wallman, 1993). In today’s complexities of our information society, an understanding of statistical information and techniques has become essential both for everyday life and effective participation in the workplace, leading to calls for an increased attention to statistics and statistical literacy (see Shaughnessy and Pfannkuch, 2004; Shaughnessy, 2007; Makar and Rubin, 2009). The quality of available statistics can vary considerably so that an understanding of sampling techniques and sources of bias can help to first assess what has been done and second adopt a critical stance on statistics. Increasing the public awareness regarding the quality of the information consumed from television or newspapers is crucial due to the “overwhelming amount of unregulated, unrestricted information being thrust upon a public that is generally ill equipped to consume the information” (Rumsey, 2002 p.33). Indeed, this current phenomenon is dual: progress in the use of statistics goes hand in hand with an increase in the misuses and statistical fallacies (Hooke, 1983). A large body of literature, built by teachers, education researchers, statisticians and professional organisations thus calls for improving and measuring statistical literacy, with a special focus on the student population. Begg et al. (2004), for example, underlined the societal motive behind the call for a greater emphasis on statistical literacy in school curriculum, being that students can become active and critical citizens. Callingham (2007) stressed the importance for students to adopt a critical stance about data, referred to as applying statistical literacy.
The call for statistical literacy has recently been echoed by the international community. In their “A World That Counts” report, the United Nations Secretary General’s appointed Independent Expert Advisory Group (IEAG) on the 'Data Revolution for Sustainable Development' recommended that more is done to increase global literacy. Specifically, the group called for “A proposal for a special investment to increase global data literacy. To close the gap between people able to benefit from data and those who cannot, in 2015 the UN should work with other organisations to develop an education program and promote new learning approaches to improve peoples’, infomediaries’ and public servants’ data literacy. Special efforts should be made to reach people living in poverty through dedicated programmes.” The Synthesis Report of the UN SecretaryGeneral on the Post2015 Agenda, “The Road to Dignity by 2030”, called itself for a transformative agenda where we “base our analysis in credible data and evidence, enhancing data capacity, availability, disaggregation, literacy and sharing”. It stressed that “the world must acquire a new ‘data literacy’ in order to be equipped with the tools, methodologies, capacities, and information necessary to shine a light on the challenges of responding to the new agenda”.
To inform this debate, the PARIS21 Secretariat established a task team for the purpose of developing and reporting on a global indicator to measure the current state and future progress in global statistical literacy. The paper presents the outcome of this consultative process and presents a novel measure of statistical literacy based on the use of and critical engagement with statistics in national newspapers. The use of text mining techniques bridges current data gaps in this area and allows the assessment of statistical literacy of an the adult population on a daytoday basis in more than one hundred developing and developed countries.
The paper is structured as follows. Section 2 reviews the literature. Section 3 presents the text mining methodologies developed to measure statistical literacy and provides a brief overview of the keywords analytic. Section 4 describes the data and presents the results. Section 5 presents robustness checks. Section 6 concludes.
2 Literature Review
Conceptualisation of the notion of statistical literacy
The present paper contributes to a body of literature that addresses the need to give a concrete measure of statistical literacy. Despite an international consensus on the value of understanding data and improving global statistical literacy, there is no general agreement on its conceptualisation. While the need for a common definition of statistical literacy has been recognised (see BenZvi and Garfield, 2004) in the literature, Batanero (2002, p.37) resumes that “we have not reached a general consensus about what are the basic building blocks that constitute statistical literacy or about how we can help citizens construct and acquire the abilities”. These definitional issues led to the building of an expanding conception of statistical literacy, from purely conceptual to a more applied concept. The present paper contributes to a body of literature that addresses the need to give a concrete measure of statistical literacy. Despite an international consensus on the value of understanding data and improving global statistical literacy, there is no general agreement on its conceptualisation. While the need for a common definition of statistical literacy has been recognised (see BenZvi and Garfield, 2004) in the literature, Batanero (2002, p.37) summarises that “we have not reached a general consensus about what are the basic building blocks that constitute statistical literacy or about how we can help citizens construct and acquire the abilities”. These definitional issues led to the building of an expanding conception of statistical literacy, from purely conceptual to a more applied concept.
Early work tries to provide a comprehensive definition of statistical literacy. Wallman (1993 p.1), for example, defines statistical literacy as “the ability to understand and critically evaluate statistical results that permeate our daily lives—coupled with the ability to appreciate the contributions that statistical thinking can make in public and private, professional and personal decisions” (see also Trewin (2005) for such kinds of broad, generic definitions). The concept directly introduces both a personal and societal need to develop statistical literacy skills. Callingham (2007) endorsed this definition, underlining it also requires an appreciation of the social context. These studies, however, suffer from the lack of methodological tools that would help to quantify levels of statistical literacy or to identity useful skills and competencies to develop. In response, BenZvi and Garfield (2004) include more active components to the definition of statistical literacy. They define statistical literacy as a set of skills that students may actively use in understanding statistical information  among them, organising data, constructing tables or being able to work with different varieties of data representations. Most of these definitions are strongly linked to the field of education, identifying statistical literacy as a primary goal and a need for statistics instruction, because “most students are more likely to be consumers of data than researchers” (Garfield and Gal, 1999, p.4). In that regards, Gal (2004, p.1) sees statistical literacy as one of the prominent prerequisite for participation in society, the “key ability expected of citizens in informationladen societies”. His statistical literacy concept involves both cognitive and dispositional components, where some components are common with numeracy and literacy whereas others are unique to statistical literacy. This definition encompasses both critical evaluation of statistics and the ability to express one’s opinions or datarelated arguments about it. Likewise, Schield (2004) and Watson (2006) see the ability to question claims in social contexts as a fundamental element to statistical literacy.
Within the nebula of definitions, the concepts of data literacy and statistical literacy are often used without distinction. Our paper adopts the division used by the Oxford Dictionary of Statistical Terms (Dodge, 2003)[2] assessing that data literacy is a subcomponent of statistical literacy. Data literacy, as called for in the Synthesis Report, can therefore be seen as a component of statistical literacy, which Dodge (2003) defines as “the ability to critically evaluate statistical material and to appreciate the relevance of statisticallybased approaches to all aspects of life in general”. Statistical literacy can indeed be seen as an encompassing concept, implying to be comfortable and competent with a large variety of forms and representation. Statistics is about data analysis processes, but also number sense, understanding variable and symbols, interpreting tables and graphics, mapping notions of sample, data collection methods and questionnaire design, probabilities and inferential reasoning (Scheaffer, Watkins & Landwehr, 1998). In particular, the Australian Bureau of Statistics’ Education Services considers four criteria essential for statistical literacy: data awareness; the ability to understand statistical concept; the ability to analyse, interpret and evaluate statistical information; the ability to communicate statistical information and understandings. This emphasis in the ability to understand and communicate about statistics is recurring in the recent literature: “Statistics requires the basic understanding of statistical concepts…whereas literacy requires the ability to express that understanding in words, not in mathematical formulas” (Watson and Kelly, 2003). Milo Schield (2004, p.9) for instance supports that statistical literacy is “typically more about words than number, more about evidence than about formulas”.[3] The complexity of the statistical literacy construct, by emphasising the place of critical thinking, contextual understanding and students’ dispositions, offers a real challenge for assessment. Despite these challenges in terminology, several frameworks have attempted to model the features of statistical literacy, focusing mainly on a student population. Our indicator builds on these models to provide a reliable and more widely applicable measure of statistical literacy.
Empirical frameworks
Gal (2004) developed one of the first model evaluating the understanding of statistics by adults. In this model, cognitive and dispositional components interact together. In particular, statistical literacy presupposes the use of five interrelated cognitive elements: mathematical knowledge, statistical knowledge, literacy skills, knowledge of the context and critical questions. An important claim in Gal’s model is that all components leading together to adopt a statistically literacy behaviour constitute a dynamic set of knowledge and dispositions, strongly contextdependent and interrelated entities. Gal particularly examines how a person’s dispositions or attitudes toward data and statistics interact with these knowledge bases to motivate a critical thinking about statistics. Once a certain level of statistical literacy is reached, individuals would be able to automatically transfer their skills to evaluating everyday life statistical information they encounter. Gal’s model draws the important implications that anyone lacking these skills is functionally illiterate as a responsible, informed and productive citizen and worker. As suggested by Batanero (2002), Gal’s model is useful for understanding what statistical literacy involves and for helping policy makers to take decisions at a macrolevel of analysis. Its strength is that Gal offers a full definition along with the necessary components to achieve statistical literacy. However, analysing statistical concepts related to this notion requires to use more specific microlevel models (and somewhat a less exigent definition).
A second model, the Statistical Literacy Construct from Watson and Callingham, (2003) builds on the Structure of Observed Learning Outcomes (SOLO) taxonomy developed by Biggs and Collins (1982) to hierarchize statistical thinking into six stages of skills, that can be viewed as a progression of levels of statistical understanding. As suggested by Callingham (2007), the boundaries between levels are not rigid. The strength of this model is that its statistical literacy scale has been widely validated by researchers, based on responses from a large number of students in Australia. At the top two levels of the Watson and Callingham, (2003) construct, students display skills matching the criticalthinking skills of the third tier of the Statistical Literacy Hierarchy in Biggs and Collins (1982). This model of measuring statistical literacy was born to solve the lack of research proposing methods to measure students’ progress  despite statistical literacy being part of the school curriculum.
The differences in approaches between the Gal (2004) and Watson and Callingham (2003) model can partly be explained by the fact that Gal’s construct is developed for an adult population while the construct of Watson and Callingham was developed for students. These two main frameworks for statistical literacy are by no means the only ones. Wild and Pfannkuch (1999) proposed a model for statistical thinking in empirical enquiry, built upon statistics education literature and interviews with statisticians and undergraduate students. Reading (2002) relies on the SOLO taxonomy across five areas of statistics to build a “profile for statistical understanding”. This methodology, as well as the one developed by Jones et al. (2000) , is very similar to the hierarchical model of Watson and Callingham (2003).
As a framework, we adapt ourselves the taxonomy developed in Watson and Callingham (2003). The main methodological contribution of our paper is that it develops text mining methods to measure literacy and critical thinking based on articles from RSS feeds of national newspaper. By relying on text mining techniques, our methodology targets mainly journalists and newspaper readers. This excludes the illiterate population and those without access to print or online media. Moreover, this paper contributes to the existing literature by targeting a population that is not limited to students. The original statistical literacy construct of Watson and Callingham (2003) involves six stages: Idiosyncratic, informal, inconsistent, consistent non critical, critical, and critical mathematical. This taxonomy is developed for students in grades 5 to 10. Our interest is in an adult population and therefore attention is on the top three levels of the taxonomy, as characterised in Table 1.
Table 1. Statistical literacy construct, adapted from Watson and Callingham (2003)
Level  Brief characterization of levels 
1: Consistent noncritical 
Appropriate but noncritical engagement with context, multiple aspects of terminology usage.

2: Critical 
Critical, questioning engagement in contexts that do not involve proportional reasoning, but which do involve appropriate use of terminology.

3: Critical mathematical 
Critical, questioning engagement with context, using proportional reasoning particularly in chance contexts, showing appreciation of the need for uncertainty in making predictions, and interpreting subtle aspects of language.

Existing data sources
There are several existing data sources and indicators that were initially considered for the purpose of this study but not implemented. The main reasons were that they are reported infrequently and/or are not comparable across countries. These indicators and their limitations are discussed below.
A first indicator considered would have built on an occupational classification related to statistics. This indicator would have provided a measure of the increase in percentage of population working as Statisticians and related occupations that require similar skills. The data source for this indicator would be the Demographic and Health Surveys (DHS). It would have offered the advantage of relying on data that is publicly available, dates back to the 1990’s and is conducted in approximately 90% of International Development Association (IDA) borrowing countries every five years. Mathematicians and statisticians often make for less than 10 respondents per survey and changes in this figure are more likely driven by sampling variation than by a shift in the number of mathematicians. It is therefore useful to widen the focus from statisticians to related occupations. This can be achieved with the O*NET database that contains detailed information on career changers for 858 job categories, based on responses to surveys of large representative samples of workers. For every job category, the database gives the top 10 related job categories. To construct an indicator, one could have used the top 10 related categories to "Statisticians" and also the top 10 categories related to each of those ten categories, which results in a list of 53 job categories. This approach, however, would have suffered from three methodological limitations. First, DHS are conducted every five years but the indicator could have been calculated annually on a rolling basis. Further, only 20 countries use the International Standard Classification of Occupations (ISCO, 1998), which would have made it very difficult to compare occupations across countries. Second, there were general issues in terms of using this indicator for advocacy purposes. To begin with, it is not clear that a society benefits from more mathematicians, at the expense of, say, teachers or doctors. Finally, there was a conceptual issue in that we need to assume that having more mathematicians also makes the teacher and doctor more numerate.
A second indicator that was considered would have relied on a Global dataset on education achievement. It would have measured improvement in primary and secondary school mathematics test scores, based on The Global Achievement Data (Angrist et al, 2013). This data source is a panel for 128 countries in 5year intervals. It links the international achievement tests PISA and TIMSS with regional ones such as SACMEQ, PASEC and LLECE to make student achievement globally comparable. A clear limitation of such an indicator is that the dataset only covers 22 of the 77 IDA/Blend countries. In addition, for 2010, there is only one data point available: all IDA/Blend countries except for the Kyrgyz Republic are missing. The data availability in 5year intervals, lack of IDA country coverage and data for the 2010 round, would have made this indicator insufficient for use as logical framework indicator. Furthermore, the dataset was assembled as a oneoff exercise and it is not clear whether data for 2015 will be available from the same source.
3 Methodology
Data sources
To measure statistical literacy empirically, we turn to references to statistics and statistical fallacies in national newspaper articles that are accessible online, in line with the work done by Watson and Callingham (2003) in terms of scaling. This is essentially for three reasons.
 First, and foremost, while there is some gap between journalists’ perception of statistics, which is reflected by statistics reported in news articles, and the demand for statistics in the audience, the writing of journalists can be seen as an image for a nation's demand for statistical facts as well as the depth of critical analysis. In any case, in most parts of the world, it largely reflects the nation’s consumption of statistical facts as well as the level of critical analysis of statistics offered to a country’s population.
 Second, newspaper articles are generally available, most of them online, which makes them representative for a country's literate population and easily accessible for text analysis.
 Lastly, alternative data sources are either not representative (e.g. Google Trends searches related to statistics; downloads of statistical software packages) or are reported infrequently and/or not comparable across countries (e.g. job categories related to statistics; regional numeracy assessments).
The indicator used is a threedimensional composite indicator of the equally weighted percentages of national newspaper articles that contain references to statistics at statistical literacy level 1, 2 or 3, respectively, following the scale defined in Table 1. The three levels are not mutually exclusive. For each of the three levels, we obtain the share of documents that match the classification, country per country. An overall measure for statistical literacy is then obtained as the sum over the three shares. Specifically, the methodology classifies keywords used in each article into literacy levels 1 to 3 based on three corresponding keyword lists, so that for each of the 3 levels, there is a different denominator of newspaper articles that is analyzed (see below for a precise description of the keyword analysis). Each keyword list contains different terms referring to statistics and statistical fallacies, and the use of one precise category of keywords by one newspaper article allows for defining one level of statistical literacy.
The empirical instrument is still under construction and the preliminary results described here are helpful to improve the quality of measurement. To establish the validity of the measure, the classification of articles will be further validated by analysts at National Statistical Offices (NSOs).
Text mining techniques
This subsection summarises the keywords used in the analysis, and the sources used to define the appropriate keywords. It also provides examples of keywords defined for each level of statistical literacy. Keywords are derived from major statistical data sources and refer to wide categories of indicator, based on standard internationally adopted by NSOs, International Organisations, books, articles and glossaries specialized in statistics and statistical fallacies (examples are the OECD Glossary of Statistical Fallacies or the Glossary of Statistical Terms by the University of California, Berkeley, for English keywords; or the Glossário InglêsPortuguês de Estatística for Portuguese keywords). The detailed list of keywords used in the analysis, data source and preliminary results are available from in Appendix A.
The study further used the World Bank's WDI database (World Development Indicator) to extend the initial keywords list and added a blacklist of keywords to disentangle ambiguous meaning of acronyms (such as IPC for instance, which stands for both 'indice des prix à la consommation' and 'International Paralympic Committee). The reliability and validity of the keyword lists will be further tested during the implementation of validity checks (see below).
Note: Keywords have been translated in all four languages used for the indicator. Text mining techniques, as word stemming, were applied to all keyword lists and news articles before proceeding with the analysis. For articles, stop words were removed and characters are converted to lower case.
LEVEL 1: CONSISTENT, NONCRITICAL USE OF STATISTICS
Data source: Daily, top 100 news articles from Google News for publishers who
 have registered their RSS feeds with this service,
 publish in either English, French, Spanish or Portuguese, and
 use the country's toplevel domain, e.g. '.sn' for Senegal, for their website.
Keywords: articles are considered a good fit for this category if they contain words from one of the following lists:
1. Keywords indicating data sources
 a. word sequences of length two, derived from list of all NSO names worldwide
 b. main statistical data sources, such as 'population census', 'household survey', 'geospatial data', etc (cf. Espey et al., 2015)
2. Keywords indicating a statistical indicator:
 a. GDP, CPI, etc. based on the World Development Indicator database’s ‘Economy and Growth’ category. This list is currently being extended using additional keywords from other categories.
3. Keyword list from statistical capacity building projects
Example: Level 1. consistent, noncritical
 Sentence: “The report indicates tobacco use has increased since the Kenya Demographic Health Survey conducted in 200809, which found 19 per cent of men and 1.8 per cent of women use tobacco.”
 Source: The Star, Kenya
LEVELS 2 AND 3: CRITICAL ENGAGEMENT WITH STATISTICS
Data source: Daily, top 100 news articles from a Google News search for either: 'statistics', 'data', 'study', 'research', 'report'. For publishers who
 have registered their RSS feeds with this service,
 publish in either English, French, Spanish or Portuguese, and
 use the country's toplevel domain, e.g. '.sn' for Senegal, for their website.
Keywords: articles are considered a good fit for this category if they contain words from one of the following lists:
 1. Critical mathematical engagement:
 a. List of statistical fallacies: based on books, articles and websites that discuss statistical biases and fallacies
 2. Critical nonmathematical engagement:
 a. List of adjectives to assess the quality of research studies: based on synonyms and antonyms for 'accuracy', 'reliability' and 'validity' (cf. Pierce, 2008)
Examples:
Level 2. critical
 Sentence: “Dr Barres admits a definitive scientific conclusion for how these epigenetic changes affect the gene is not yet scientifically known.”
 Source: Citizen Digital, Kenya
Level 3. critical mathematical
 Sentences: “Without going to the details of the statistics, the final results found […]. Sample sizes were calculated at regional level in order to estimate global acute malnutrition with a desired precision of between 24 percent with a design effect of 1.5.”
 Source: Daily News, Tanzania
For further examples, see the interactive tables at http://paris21.org/literacy [password: literacy].
Limitations
The data source has several limitations that are usefully addressed. First, and foremost, our hierarchy of statistical thinking into three stages of skills (progression of nonrigid levels of statistical understanding based on the SOLO taxonomy) creates a scale that has widely been validated empirically as a measure of statistical literacy. Nevertheless, the indicator is measuring a count of terms specifically referring to each level of literacy, whereas literacy would also need to be tested against the “appropriateness” of the terms used, in context. Therefore, the measure is conditional on the assumption that statistical terms are appropriate for the context they are used in. This assumption is essential to a fully automated process allowing a daily collection and analysis of newspapers articles.
Second, the current implementation is limited to the four most widely spoken languages globally (English, French, Spanish and Portuguese) and thereby ignores local languages. Extending the analysis would require software that allows word stemming and stop word removal in these local languages. An initial analysis of newspapers coverage nevertheless reveals that a vast majority of countries have national newspapers available through their RSS feeds and written in one or several of these four languages.
Third, newspapers and blogs are only a subset of national media. Radio and TV, however, cannot easily be captured in machine readable format. New promising tools, as the Radio Analysis tools developed by Pulse Lab Kampala and the United Nations in Uganda, could maybe fill this gap in the coming years. Radio data could for instance be useful in the future to do a robustness check to see how the use of statistics differs in urban areas – that have access to (online) newspapers – from that in rural areas and illiterate populations. Moreover, automated text analysis does not cover visualized data, such as graphics and tables, an important way of presenting statistics in news media.
Finally, while based on highlevel glossaries and internationally acknowledged statistical data sources, the keyword lists used for the analysis are subjective.
4 Results
Scope of the indicator
The purpose of the indicator is to set and monitor targets and report on them annually. Target countries comprise all International Development Association (IDA) borrower countries, of which 65 countries were analysed this year.
From January 2020 to March 2021, a total of 131,213 articles were analysed for the use of statistics. This corresponds to an average of 1800 articles per country for the period until 1 March 2021.
The aggregation score for each country is simply the sum over the three dimensions (ranging from 0 to 300): threedimensional composite indicator of the equally weighted percentages of national newspaper articles that contain references to statistics at statistical literacy level 1, 2 or 3, respectively, following the scale defined in Table 1. For each of the three levels of statistical literacy, the resulting score gives the percentage of articles that contain at least one search term from the keyword lists defined previously. The score for each level thus ranges between 0 and 100 and the maximum total score over all three levels is 300. The results in Figure 1 are presented by language groups to allow for a direct comparison between countries for which the same keyword list was applied.
There are 976 general news articles (corresponding to 5.32 percent of all articles) that cite statistics (Level 1) and the 2492 researchrelated articles (equivalent to 12.89 percent of all articles) that demonstrate a critical engagement with statistics (Level 2 and 3). The global distribution is visualised in Figure 1 below.
Figure 1: Statistical literacy score at three levels by country in 2021
The use of data in news articles increased substantially in 2020, mostly due to the intensive reporting on COVID19related stories. If eliminate COVID19 related contents, the level of statistical literacy actually remained at the same level in most regions except Oceania.
Figure 2: Trend of statistical literacy score by region
Whether this change is long term is difficult to determine. It is possible that the COVID19 has incentivised data users, especially journalists, to increase their ability to understand and interpret data more effectively. This substantial increase in statistical literacy was presented in the overall scores. The increase in datarelated content also reflected a stronger demand for data from the media consumers. From this perspective, it is plausible to argue that COVID19 and the related news coverage helped improve the overall statistical literacy score.
On the other hand, on can also argue that the statistical literacy skills are not improved in 2020. The improvements in the score only reflects the temporary increase in use of statistics, which could reduce to the prepandemic level after the special demand for data is over. Followup analysis in the next round should focus on how the literacy skills exhibited in COVIDrelated content translate to more generic topics.
References
Angrist, N., H.A. Patrinos and M. Schlotter (2013). An Expansion of a Global Data Set on Educational Quality: A Focus on Achievement in Developing Countries, Policy Research Working Paper Series 6536, The World Bank. Accessed at http://datatopics.worldbank.org/Education/wDataQuery/QAchievement.aspx
Batanero, C. (2002). Discussion: The role of models in understanding and improving statistical literacy. International Statistical Review, 70, 37–40.
Begg, A., Pfannkuch, M., Camden, M., Hughes, P., Noble, A., & Wild, C. (2004). The school statistics curriculum: Statistics and probability education literature review. Auckland, New Zealand: Auckland Uniservices Ltd., University of Auckland.
BenZvi, D., & Garfield, J. B. (2004). Statistical literacy, reasoning and thinking: Goals, definitions, and challenges. In D. BenZvi & J. B Garfield (Eds.), The challenge of developing statistical literacy, reasoning, and thinking (pp. 3–16). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Biggs, J., & Collis, K. (1982). Evaluating the quality of learning: The SOLO taxonomy. New York, NY: Academic Press.
Callingham R. (2007). Assessing statistical literacy: A question of interpretation. Retrieved from www.stat.auckland.ac.nz/~iase/publications/17/6D1_CALL.pdf
Dodge, Y. (2003). Oxford Dictionary of Statistical Terms. Oxford University Press.
Espey, J., Swanson, E., Badiee, S., Christensen, Z., Fischer, A., Levy, M., Yetman, G., de Sherbinin, A., Chen, R., Qiu, Y., Greenwell, G., Klein, T., Jutting, J., Jerven, M., Cameron, G., Aguilar Rivera, A.M., Arias, V.C., Lantei Mills, S. and Motivans, A. 2015. Data for Development: A Needs Assessment for SDG Monitoring and Statistical Capacity Development.
Gal, I. (1995). Statistical tools and statistical literacy: The case of the average. Teaching Statistics, 17, 97–99.
Gal, I. (2004). Statistical literacy: Meanings, components, responsibilities. In D. BenZvi & J. B. Garfield (Eds.), The challenge of developing statistical literacy, reasoning and thinking (pp. 47–78). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Garfield, J. (1999), 'Thinking about Statistical Reasoning, Thinking, and Literacy', Paper presented at First Annual Roundtable on Statistical Thinking, Reasoning, and Literacy
Hooke, R. (1983). How to tell the liars from the statisticians. New York, NY: Marcel, Dekker.
Lester (Ed.), Second handbook of research on mathematics teaching and learning (pp 957–1009). Reston, VA: The National Council of Teachers of Mathematics.
Makar, K., & Rubin, A. (2009). A framework for thinking about informal statistical inference. Statistics Education Research Journal, 8(1), 82–105.
PARIS21 (2015). Partner Report on Support to Statistics 2015. Accessed at http://www.paris21.org/PRESS2015
Pierce, R. (2008). Research Methods in Politics. SAGE Publications.
Reading, C. (2002). Profile for statistical understanding. Proceedings of the Sixth International Conference on Teaching Statistics, Cape Town, South Africa.
Rumsey, D. J. (2002). Discussion: Statistical literacy: Implications for teaching, research and practice. International Statistical Review, 70, 32–36.
Scheaffer, R. L., Watkins, A. E., & Landwehr, J. M. (1998). What every highschool graduate should know about statistics. In S. P. Lajoie (Ed.), Reflections on statistics: Learning, teaching and assessment in grades K–12 (pp. 3–31). Mahwah, NJ: Lawrence Erlbaum.
Schield, M. (2004). Statistical literacy and liberal education at Augsburg College. Peer Review, 6(4), 16–18.
Shaughnessy, J. M. (2007). Research on statistics learning and reasoning. In F. K.
Shaughnessy, J. M., & Pfannkuch, M. (2004). How faithful is old faithful? Statistical thinking: A story of variation and prediction. Mathematics Teacher, 95(4), 252–259.
Trewin, D. (2005), Making Maths Vital, Key note speech, AAMT conference
Wallman, K. K. (1993). Enhancing statistical literacy: Enriching our society. Journal of the American Statistical Association, 88(421), 1.
Watson, J. M. (1997). Assessing statistical literacy using the media. In I. Gal & J. B. Garfield (Eds.), The assessment challenge in statistics education (pp 107– 121) Amsterdam, The Netherlands: IOS Press & The International Statistical Institute.
Watson, J.M., & Callingham, R.A. (2003). Statistical literacy: A complex hierarchical construct. Statistics Education Research Journal, 2(2), 346.
Wild, C. J. & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry (with discussion). International Statistical Review, 67(3), 223–265.
Appendix A. Keyword lists
Level 1. Consistent, noncritical use of Statistics
Keywords indicating data sources: popul census, agricultur census, vital statist, household survey, agricultur survey, administr data, econom statist, forc survey, establish survey, trade statist, geospati data
Keywords indicating a statistical indicator: gross domest, gross nation, gdp, gni, price index, cpi, unemploy rate, inflat rate, …
Keyword list from statistical capacity building projects: birth registr, vital registr, civil registr, death registr, administr databas, data portal, devinfo, data archiv, archiv data, data dissemin, dissemin data, dissemin statist, statist dissemin, microdata, metadata, data manag, data document, survey data, qualiti statist, statist qualiti, qualiti survey, survey qualiti, qualiti data, data qualiti, access data, open data, use data, produc data, product data, data user, data produc, data outreach, data awar, data product, statist product, statist busi, data collect, data process, data access, statist harmon, survey harmon, data harmon, harmon data, statist system, nsds, develop statist, mdg indic, statist standard, data standard, statist capac, data curat, curat data, statist offic, offic statist, statist bureau, bureau statist, statist train, institut statist, demograph data, demograph statist, mdg monitor, monitor mdg, releas data, data releas, nation survey, survey programm, survey program, agenc statist, statist agenc, survey catalogu, survey catalog, afristat, ckan, prsp monitor, data revolut, lfs questionnair, govern statist, govt statist, statist law, statist legisl, disaggreg data, data disaggreg, disaggreg sex, disaggreg gender, gender disaggreg, sex disaggreg, statist studi, collect method, busi registr, registr busi, survey design, data compil, survey system, statistician, statist program, statist programm, minimum statist, statist data, data entri, statist oper, questionnair design, design questionnair, statist survey, statist questionnair, sampl plan, multipl indic, cluster survey, busi survey, health survey, partnership statist, region statist, nation statist, metadata exchang, mdg assess, assess mdg, measur indic, indic measur, statist methodolog, evalu methodolog, survey methodolog, data improv, improv data, improv statist, statist improv, gender statist, disaggreg indic, disaggreg statist, region survey, nation data, statist databas, statist db, nation account, data avail, avail data, statist avail, avail statist, data develop, develop data, central statist, statist depart, statist austria, azerbaijan statist, depart statist, director statist, barbado statist, statist belgium, statist institut, statist canada, statist denmark, mobilis statist, statist finland, state statist, feder statist, ghana statist, statist greenland, statist plan, statist iceland, bps statsit, istat, statist republ, kostat, statist committe, lao statist, statist lithuania, plan statist, statist budget, statist econom, statist new, statist niue, statist divis, statist norway, center statist, philippin statist, qatar statist, statist rwanda, statist south, census statist, statist sweden, turkish statist, forecast turkmenistan, emir statist, uk statist, us census, statist organ, statist afghanistan, statist servic, statist popul, statist bosnia, statsit indonesia, statist korea, statist nz, statist inform, statist author, rosstat, depart census, statist forecast, statist centr, census bureau, popul registri
Level 2. Critical nonmathematical engagement
List of adjectives to assess the quality of research studies: mislead, deceiv, deceit, inaccur, accur, decept, reliabl, unreli, generaliz, erron, error, limit, ambigu, delud, fallaci, sound, unsound, rigor, scientif, unscientif, proper, improp, solid, imprecis, precis, exact, inexact, vagu, mistaken, fake, manipul, bias, unbias, spurious, invalid, valid, untrustworthi, trustworthi
Level 3. Critical mathematical engagement
List of statistical fallacies: sampl bias, select bias, sampl select, nonrepres, undercoverag, nonrespons bias, respons bias, miss observ, spurious relationship, confound, report bias, statist fallaci, lead question, load question, social desir, measur error, statist bias, sampl error, survey bias, data dredg, overgeneralis, gambler fallaci, baser fallaci, conjunct fallaci, prosecutor fallaci, ludic fallaci, compar appl, regress toward, regress mean, misus statist, misreport, causal, fals causal, data manipul, nonrandom, randomis, causat, causal effect, causal relationship, multipl hypothes, signific level, sampl size, power test, fals posit, fals negat, omit variabl, data accuraci
Footnotes
[1] This paper was developed in collaboration with a task team set up by the PARIS21 Secretariat. We are thankful to the task team members: Kenneth Bambrick and Anthony Higney (DfIDChair), Maurice Nsabimana (World Bank), Adele Atkinson (OECD), Freeman Amegashie (Afristat), Yanhong Zhang (UNESCAP), Reija Helenius (IASE), Barnabe Okouda (INS Cameroun), Pedro Campos (ISLP), Lisa Bersales (Philippine Statistics Authority), Innocent Ngalinda and Albina Chuwa (East African Statistics Training Centre), Margarita Guererro (UNESCAP/ SIAP), Scott Keir (Royal Statistical Society).
[2] Critical perspective towards statistics is promoted by numerous professional organisations, among which the National Council of Teachers of Mathematics (NCTM) and national curriculum policy like The New Zaeland Curriculum (Ministry of Education of New Zealand, 2007). In particular, the International Statistical Institute has launched the International Statistical Literacy Project.
[3] Namely data collection, data tabulation and representation, data reduction, probability and interpretation and inference
[4] Proposing four levels of thinking across four key constructs for young children’s thinking.
[5] O*NET is a labor market information tool intended to facilitate matches between jobseekers and employers in the United States.
[6]The United Nations initiative Pulse Lab Kampala is developing a tool to analyse radio content, currently tested in Uganda. The tools involved the development of speech technology for three African languages (Uganda English, Luganda and Acholi). For more information on this project, see http://radio.unglobalpulse.net/uganda/