1. Background

The indicator 4a, developed in the context of the BAPS Logframe, measures the references in global summits to statistical development and/or data gaps.

This indicator measures progress towards the BAPS action area that aims to “ensure that outcomes of global summits and high-level forums specifically recognise the need for statistical capacity development” see the BAPS document

Definition of Global Summits

A first step towards identifying the data source for this indicator is the definition of global summit. A summit is a meeting between heads of governments (The Oxford Dictionary). For a summit to be global, it must bring together governments from around the world.

Following this definition, global summits can be characterised by exchanges between the participating governments. The events closest to this definition are the G20 summits that bring together governments from the 20 major economies. The G20, while closely fitting the concept of a global summit, can be criticised for not being truly global as discussions are driven by a rather select group of governments.

The approach proposed here is therefore to focus on intergovernmental organisations whose members are composed of the widest possible set of sovereign states. At the global level, these are the United Nations and (for each sectorial focus) their Specialised Agencies, such as:
FAO, IFAD, ILO, IMF, IMO, ITU, UNDP, UNESCO, UNIDO, WHO, World Bank, WMO, and WTO
These agencies bring together countries from all member states to discuss and set norms and policies.


2. Data sources

 

A. USE INFORMATION FROM UN AGENCIES TO SCREEN FOR GLOBAL SUMMITS

List of global summits

A natural second step is the identification of suitable outcome documents from UN specialised agencies that cover global summits and related discussions. One option is to draw up an exhaustive list of global summits. Immediate concerns with this approach are that this list would change annually and some of the largest summits are only organised biannually, which would introduce a lot of undesirable variation in such a measure.

Output documents from UN Agencies

The method proposed here is therefore to consider more frequent output documents from UN Specialised Agencies.

Use of direct output from UN agencies has two main advantages:

  1. First, it covers all relevant sectors at a global level.
  2. Second, and more crucially, this approach does not require the Secretariat to identify such events itself. + Instead, the Secretariat can assume that agencies follow and report on topics discussed in the most relevant events in their sector.

 

B. COLLECT HIGH FREQUENCY TWITTER DATA

RSS Feeds?

Another option is their website content that can be captured through RSS feeds, for example. However, not every site has an active RSS feed and some (e.g. the World Bank) have dozens of feeds to subscribe to.

Tweets!

  • It is thus proposed to use a standardised and frequent source of outcome documents: the official Twitter accounts of the agencies and their daily tweets, endorsements and retweets[1].
    • Tweets are particularly suitable because they cover the most relevant sectors and are published in an easily accessible format on a daily basis.
  • The proposed indicator uses thus data extracted from the twitter timelines of these UN specialised agencies to measure the extent to which global summits include reference to statistical capacity development and data gaps. + These tweets are currently restricted to 140 characters, making them easy to analyse.

3. Methodology

The methodology proceeds in four steps as follows.
1. In a first step, it extracts hashtags for the current year for the 15 UN Specialised Agencies as well as for 20 Statistical Agencies. The latter are taken from the website of the global partnership for sustainable development data (GPSDD). These include PARIS21, ODW, World Bank Data, FAO Stats, etc.
2. In a second step, the methodology creates a list of hashtags related to statistical development and data gaps. These are hashtags that occur at least three times[2] more often in tweets of Statistical Agencies than they do for UN Specialised Agencies.
3. Next, we take the most frequent hashtags (the top 75%) and calculate the relative frequency of their occurrence for each of the 15 UN Specialised Agencies.
4. In the fourth step, we calculate the non-weighted average over these relative frequencies for the 15 UN Specialised Agencies.
5. Finally, we show the trending history of top 6 hashtags used by Stats agencies.

 

A. EXTRACT HASHTAGS

Twitter timelines

First, twitter timelines of 15 United Nation Specialised Agencies are downloaded to identify twitter hashtags related to statistical development that were used during global summits for the current year.

Then, the same process is used with 23 accounts that tweet exclusively about issues related to statistical capacity development, i.e. accounts of Statistics Think Tanks.

Examples are:

  • ContactPARIS21
  • OpenDataWatch
  • worldbankdata
  • FAOstatistics

But not: WorldBank, FAO, etc.

Hashtags used by Stats Think Tanks are for instance: #SDGs, #SDG, #Data, #BigData, #DataRev, #DataRevolution, #OpenData, #Statistics, #PPPs, #GlobalGoals, #Post2015, #Development, #MDGs, #data4sdgs

Major Stats Think Tanks are catalogued based on the Global Partnership for sustainable development data’s website. The database finally includes tweets from 15 UN agencies and 23 Stats Think Tanks. Twitter has a restriction of 3200 tweets loaded by twitter account. We update timeline datasets of each agency’s every month to capture the ctivities of its account.

Identification of hashtags refering to statistical development or data gaps

After loading timelines of United Nation Specialised Agencies, we extract hashtags, id and date of creation of each tweet and create a data frame with these observations. All the information extracted from all the UN agencies’ tweets are compiled together in a unique dataframe. This exercise is repeated for Stats Think Tanks.

 

B. SPOT REFERENCES TO STATISTICAL DEVELOPMENT USING HASHTAGS SEQUENCES

We thus consider a tweet to make reference to statistical development or data gaps if it contains hashtags from this keyword list related to statistical capacity building.

This keyword list was compiled and extended by comparing the odds ratio of hashtags used in the timelines of (A) Stats Think Tanks with those of (B) other International Organisations

##                             hash freq       odds
## 26                         #sdgs  254   5.862130
## 53               #genderequality  122   3.369269
## 108                        #data   74  27.424102
## 166                         #sdg   54   7.984669
## 202                       #icymi   46   4.436699
## 215                      #gender   42   8.007483
## 219                        #sdg4   42  41.885294
## 277                        #tech   34   3.804460
## 402                     #science   23   5.124075
## 409                  #betterdata   22  21.819924
## 411                     #bigdata   22  21.689265
## 435                     #dataviz   21  16.562486
## 460             #digitalidentity   20   3.880549
## 470                       #solar   20   3.736825
## 471                    #teachers   20   6.323858
## 521                    #hlpf2019   18  10.539764
## 567                    #earthday   16   3.413446
## 568               #education2030   16   5.209997
## 571                #genderpaygap   16   4.850687
## 573                        #hlpf   16  10.779304
## 600                 #asiapacific   15   4.024273
## 625                     #iwd2020   15   3.641009
## 629             #machinelearning   15   7.665283
## 635                 #outofschool   15  14.180773
## 687                        #sdg5   14   4.927682
## 711                         #gdp   13   8.402329
## 723                    #opendata   13 100.385720
## 760                  #employment   12   5.269882
## 779                         #job   12  40.003193
## 859            #worldteachersday   11   3.658430
## 882                    #discover   10   3.449377
## 905            #leavenoonebehind   10   8.910891
## 943  #worlddevelopmentindicators   10   8.048547
## 973                       #edd19    9   7.026509
## 1070                      #csw63    8  11.857234
## 1072                      #davos    8   7.904823
## 1179                  #covid
19    7   4.927682
## 1196                 #geospatial    7   6.980882
## 1230                    #refugee    7   4.517042
## 1290                          #d    6   4.790802
## 1306                #engineering    6   3.832641
## 1364                   #literacy    6   8.144363
## 1415                  #resilient    6   3.353561
## 1497                 #betterjobs    5   4.599170
## 1543                  #foodwaste    5   7.473651
## 1548                          #g    5   3.449377
## 1593                        #nyc    5  21.271159
## 1641                    #vizrisk    5   5.174066
## 1699                       #bees    4   4.311721
## 1719          #buildbetterbefore    4   5.030342
## 1728                      #chart    4   4.311721
## 1730        #cleantransportation    4   4.311721
## 1749                    #dataday    4  30.182050
## 1750                #dataprivacy    4  20.839987
## 1775           #energyefficiency    4   4.311721
## 1786                    #fashion    4   5.748962
## 1798                   #funddata    4  71.143404
## 1799              #fundeducation    4  10.060683
## 1807                      #girls    4  10.779304
## 1854                 #landrights    4   7.186202
## 1857             #learningcrisis    4  28.026189
## 1888                #northafrica    4   7.904823
## 1908                 #population    4   9.342063
## 1940                      #sdg14    4   9.342063
## 1941                      #sdg17    4   5.748962
## 1942                       #sdg9    4   4.311721
## 1956                 #statistics    4  85.515809
## 2006                     #wd2019    4  82.641328
## 2022                  #youth2030    4   5.030342
## 2034              #2020gemreport    3   3.832641
## 2044                     #afghan    3   4.790802
## 2150              #childmarriage    3   6.707122
## 2155                  #chocolate    3   5.748962
## 2180               #construction    3  35.451932
## 2242                   #eastasia    3  11.497924
## 2264                #empowerment    3   5.748962
## 2284            #familiesoftoday    3   3.832641
## 2346                 #globalgoal    3   4.790802
## 2535                    #offgrid    3   3.832641
## 2547                     #outnow    3  23.954008
## 2623                     #retail    3   5.748962
## 2650                      #sdg16    3   7.665283
## 2668                 #socialgood    3  10.539764
## 2711                   #thehague    3   3.832641
## 2789                       #wind    3   7.665283

Stats Hashtags are identified in a two-stage process:

  1. First, the frequencies of each hashtag used by UN agencies is computed. This exercise is repeated for Stats Think Tank.
  2. Secondly, a list of hashtags related to statistics is determined by comparing the odds ratios of hashtags.
  • The odd ratio of one hashtag is the relative frequency of this hashtag among all tags used by Stats Think Tanks on the relative frequency of this hashtag in the twitter timelines of UN agency.
  • Mathematically, this means: Frequency#1 inStatsThinkTankstweets/Totalnumberof# inStatsThinkTankstweetsFrequency#1 inUNagenciestweets/Totalnumberof# inUNAgenciestweets
  • Odds ratio are cut from 3, so that we only keep the relevant hashtags in the UN agencies timelines that are in the 25 percentile in stats agencies odds ratios. Hashtags that occur at least three times more often in tweets of Stats Agencies than they do for UN Agencies are considered related to statistical capacity development.

C. UN AGENCIES TWEETS: TOP HASHTAGS RELATED TO STATISTICS

Then, we take the most frequent hashtags (the top 75%) used by the 15 UN Specialised Agencies.

 

wordcloud

The above word cloud clearly shows that #sdgs, #genderequality, #data and #tech are among the most important hashtags also tweeted by statistical agencies, which validates that UN specialised agencies tweets present information on statistical development.

Relative frequency of hashtags related to statistics used per UN agency

Then, the relative frequency of hashtags related to statistics is computed for each UN agency, based on the hashtags identified during the first phase using odd ratios.

The relative frequency of hashtags related to statistical development for each agency is simply equal to the frequency of hashtags related to statistics used by each UN agencies over its total number of hashtags.

UN

This relative frequency of hashtags related to statistics is fairly low among all United Nation Specialised Agencies.

Comparison UN agencies and Stats Think Tanks: top 10 hashtags related to stats

comparison

 

D. Final Indicator and initial results

An initial analysis was undertaken based on 53 434 unique tweets containing a total of 150 198 hashtags since 2018. The final indicator is the average over these relative frequencies for the 15 UN agencies. The relative frequency of tags related to statistical capacity development is currently at about 1,23%. It means that 1.25% of all the hashtags used by all the UN agencies are related to statistical development and/or data gaps.

 

E. Trending of the top hashtags overtime

The top 6 hashtags used by the Statistical Agencies are #sdgs, #data, #genderdata, #sdg4, #opendata, #covid19 and data4sdgs. By creating the graphs of their trending histor, we found that different hashtags evolves differently overtime. The frequencies of some hashtags show clearly sharp increases during global summits and decreases afterwards. Frequencies of other hashtags, on the other hand, increase steadily overtime. The hashtag #covid19 was the 6th most used hashtag by Statistical Agencies despite only being used since March 2020. It shows the statistical community was able to organise rapid and effective response to the crisis on communication.

a

 


4. Limitations

This methodology has several limitations.

a. Tweets are a non-representative data source made available by a private company. The methodology is therefore subject to biases and depends on continued access to the Twitter API, which may not be sustainable in the long term.
b. Difference between UN and country perspective: Although UN Specialised Agencies will by and large reflect the priorities of their member countries, they may not give a balanced view of the position of country’s heads of states.
c. Difference between frequency and impact of topics: Some topics may trend on Twitter and even induce herding behaviour among the agencies but do not make it on the agenda of high-level events.
d. Weighting of summits: Some summits do not really have an output but some others clearly have decisions. A future version of the methodology could aim to differentiate these and reflect this in the form of a weighting.


5. Next Steps and Conclusion

This indicator provides a unique measure of references in global summits to statistical development and/or data gaps. It identifies hashtags related to statistical development and computes the relative frequency of these stats hashtags per UN agency. It allows therefore to observe the relative weight of each United Nation Specialised Agency in the total number of references to statistical development in global summits.

As part of the implementation, the analysis will feed into the PARIS21 Statistical Capacity Monitor to allow users to track and browse outcomes of all global summits and high level forums. PARIS21 will also expand the coverage of this indicator to more development co-operation agencies of donor countries in 2021 report.

 

[1] A retweet is a forward of individual tweets by other users to their own feed.

[2] This odds ratio of three turned out to be a good cut-off value.

See other articles

participatory data ecosystems
A new policy brief, by PARIS21’s Archita Misra and Julia Schmidt, explores the significant, and growing, deficit of trust among citizens and their governments around the world an(...)
21.09.2020
citizen-generated data
PARIS21 released the new report "Use of Citizen-Generated Data for SDG Reporting in the Philippines: A Case Study" in partnership with the Philippine Statistics Authority (PSA) an(...)
04.08.2020
gender
On 22 July, the National Statistical Committee of the Kyrgyz Republic, PARIS21 and UN Women offices of the Kyrgyz Republic and Europe and Central Asia discussed the findings fr(...)
30.07.2020
gender
A new online course, launched today, offers journalists and statisticians the opportunity to boost their skills with regards to communicating gender statistics. The free course,(...)
27.07.2020