As one of the world's largest and most populous nations, Brazil's census is a huge undertaking by any measure. However, after a two-year delay caused by the COVID-19 pandemic, ensuring a smooth census process is more crucial than ever. Through an initiative led by the UK Office of National Statistics, PARIS21 and the Brazilian Institute of Geography and Statistics came together to improve the census enumeration process for better data and statistics and, ultimately, for better policies.
The challenge of census activities in Brazil
For many countries, the COVID-19 pandemic brought about unprecedented changes: high excess death rates, profound economic impacts, as well as rapid changes in immigration and migration due to border closures and migrants being disproportionately affected by loss of employment1. Restrictions on movements in many countries have also meant that ways of working and interacting have changed markedly and very quickly. Furthermore, the SDG era has driven greater demand than ever for data and statistics. At the halfway point to 2030, the policies that countries make now will determine how successful they are in achieving the development goals; and timely, high-quality data and statistics are key to this. With a population of over 212 million people, Brazil is the seventh most populous country on earth and the fifth largest by area, with an extraordinary geographical and social diversity. As a consequence, some populations are harder to reach and are at risk of being underrepresented. The data collected through Brazil’s census will be the basis for many critical policy choices between now and the end of the SDG period and will likely determine whether the country meets its development goals. Therefore, accuracy is of utmost importance.
The Data Science Accelerator Programme
Thanks to an initiative spearheaded by the Office of National Statistics of the United Kingdom, the Data Science Accelerator Programme brought together PARIS21 and the National Institute for Geography and Statistics of Brazil (IBGE) in April 2022. The programme connects national statistical organisations with partner agencies to address specific issues that the NSO is facing. Together, these partners develop solutions using non-traditional data sources, new techniques, and innovative tools. For Brazil, this was an opportunity to look into the census preparation process and address what could be done to obtain more better results. In April 2022, while IBGE was carrying out its final rehearsal for the census, PARIS21 and IBGE set out to gain insights from the data that were produced using digital survey tools (e.g. mobile devices and web-based surveys). These ‘paradata’ – or log data about the process by which data are collected - are a common byproduct of digitalised surveys but are rarely explored. Experimenting with these data could help not only inform Brazil's census in the short term but also set an important example for other surveys around the world.
How process mining can help
After initial familiarisation with the available data, PARIS21 and IBGE set two directions: First, they used the geolocation and timestamp in the data set to identify anomalies in responses collected by the census enumerators to spot fraud or visualise the responses of certain questions on a map. Second, they analysed how census questionnaires were answered using an innovative technique called process mining. Process mining can be described as a set of tools that allow a user to gain insights into the working mechanism of a specific process. A simple analogy could be a patient's visit to a hospital: During a trip to the hospital, a patient goes through various stations. They might pass, for example, from the front desk to a doctor, through an MRI, back to the doctor and lastly to the reception. Recording the time that patients spend at each station might help to identify capacity bottlenecks of the hospital, for instance, time spent on processing documents at the reception desk. In the case of the census, the patient is now a single questionnaire, and the different stations are its questions. This makes it easy to identify the questions on which people spent the most time, reveal whether respondents tend to skip or revisit certain questions multiple times, or recommend re-visiting schedule in the case of absence. These insights can help to improve census questionnaire design after the rehearsal and inform future surveys.
Findings can help many countries
The first results of IBGE and PARIS21's twelve-week collaboration are already promising. The new dashboard, built on open-source tools, can reveal valuable information about the questionnaire, from how much time the respondents spent on every question, which steps in the questionnaire led to no response, to how the responding patterns and sequences differed from the intended design. By plotting geolocation data in an interactive map, the dashboard also visualises activities in each enumeration area on daily basis to spot possible inefficiencies or anomalies in data collection during the census. The output from this collaboration can also be used in other country contexts to inform their survey and census processes. For this purpose, a methodology document with key lessons learnt will be produced for the future. The code developed will also be stored in a public repository for future adaptation by countries. The lessons learned from this collaboration can provide a good model for the future engagement of PARIS21 with many partners around the globe. The approach is scalable, useful and can ultimately contribute to an enhanced statistical capacity of statistical systems in developing countries.