1. Background

BAPS Logframe

The indicator 4a, developed in the context of the BAPS Logframe, measures the references in global summits to statistical development and/or data gaps.

This indicator measures progress towards the BAPS action area that aims to “ensure that outcomes of global summits and high-level forums specifically recognise the need for statistical capacity development”. See the BAPS document.

Definition of Global Summits

A first step towards identifying the data source for this indicator is the definition of global summit. A summit is a meeting between heads of governments (The Oxford Dictionary). For a summit to be global, it must bring together governments from around the world.

Following this definition, global summits can be characterised by exchanges between the participating governments. The events closest to this definition are the G20 summits that bring together governments from the 20 major economies. The G20, while closely fitting the concept of a global summit, can be criticised for not being truly global as discussions are driven by a rather select group of governments.

The approach proposed here is therefore to focus on intergovernmental organisations whose members are composed of the widest possible set of sovereign states. At the global level, these are the United Nations and (for each sectoral focus) and the FAO, IFAD, ILO, IMF, IMO, ITU, UNDP, UNESCO, UNIDO, WHO, World Bank, WMO, and WTO. These agencies bring together countries from all member states to discuss and set norms and policies.

2. Data sources



List of global summits

A natural second step is the identification of suitable outcome documents from UN specialised agencies that cover global summits and related discussions. One option is to draw up an exhaustive list of global summits. Immediate concerns with this approach are that this list would change annually and some of the largest summits are only organised biannually, which would introduce a lot of undesirable variation in such a measure.

Output documents from UN Agencies

The method proposed here is therefore to consider more frequent output documents from UN Specialised Agencies.

Use of direct output from UN agencies has two main advantages:

  1. First, it covers all relevant sectors at a global level.
  2. Second, and more crucially, this approach does not require the Secretariat to identify such events itself.
    • Instead, the Secretariat can assume that agencies follow and report on topics discussed in the most relevant events in their sector.


RSS Feeds?

Another option is that their website content that can be captured through RSS feeds, for example. However, not every site has an active RSS feed and some (e.g. the World Bank) have dozens of feeds to subscribe to.


  • It is thus proposed to use a standardised and frequent source of outcome documents: the official Twitter accounts of the agencies and their daily tweets, endorsements and retweets1.
    • Tweets are particularly suitable because they cover the most relevant sectors and are published in an easily accessible format on a daily basis.
  • The proposed indicator uses thus data extracted from the twitter timelines of these UN specialised agencies to measure the extent to which global summits include reference to statistical capacity development and data gaps.
    • These tweets are currently restricted to 280 characters, making them easy to analyse.

3. Methodology

The methodology proceeds in four steps.
1. In a first step, it extracts hashtags for the current year for the 15 UN Specialised Agencies as well as for 20 Statistical Agencies. The latter are taken from the website of the global partnership for sustainable development data (GPSDD). These include PARIS21, ODW, World Bank Data, FAO Stats, etc.
2. In a second step, the methodology creates a list of hashtags related to statistical development and data gaps. These are hashtags that occur at least three times2 more often in tweets of Statistical Agencies than they do for UN Specialised Agencies.
3. Next, we take the most frequent hashtags (the top 75%) and calculate the relative frequency of their occurrence for each of the 15 UN Specialised Agencies.
4. In the fourth step, we calculate the non-weighted average over these relative frequencies for the 15 agencies.
5. Finally, we show the trending history of top 6 hashtags used by those agencies.



Twitter timelines

First, the twitter timelines of 15 UN Specialised Agencies are downloaded to identify hashtags related to statistical development that were used during global summits in the current year.

Then, the same process is used with 20 accounts that tweet exclusively about issues related to statistical capacity development, i.e. accounts of Statistics Think Tanks.

Examples are:

  • ContactPARIS21
  • OpenDataWatch
  • worldbankdata
  • FAOstatistics

But not: WorldBank, FAO, etc.

Hashtags used by Stats Think Tanks are for instance: #SDGs, #SDG, #Data, #BigData, #DataRev, #DataRevolution, #OpenData, #Statistics, #PPPs, #GlobalGoals, #Post2015, #Development, #MDGs, #data4sdgs

Major Stats Think Tanks are catalogued based on the Global Partnership for Sustainable Development Data’s website. The database includes tweets from 15 UN agencies and 20 Stats Think Tanks. Twitter has a restriction of 3200 tweets loaded by twitter account. We update timeline datasets of each agency’s every month to capture the activities of its account.

Identification of hashtags referring to statistical development or data gaps

After loading timelines of United Nation Specialised Agencies, we extract hashtags, id and date of creation of each tweet and create a data frame with these observations. All the information extracted from all the UN agencies’ tweets are compiled together in a unique dataframe. This exercise is repeated for stats think tanks.


We thus consider a tweet to make reference to statistical development or data gaps if it contains hashtags from this keyword list related to statistical capacity building.

This keyword list was compiled and extended by comparing the odds ratio of hashtags used in the timelines of (A) stats think tanks with those of (B) other international organisations

##               hash freq      odds
## 96           #data   63 16.180250
## 112          #sdg4   58  8.104335
## 119 #education2030   54  3.972765
## 199      #hlpf2016   37  4.025425
## 198     #globaldev   37  3.028301
## 213           #sdg   35  5.895163
## 220       #bigdata   34  8.560273
## 252       #science   31  4.231521
## 301      #opendata   26 99.749289
## 407          #bees   18  3.871548
## 435       #dataviz   17 37.456220
## 458     #data4sdgs   16 28.780403
## 475        #wd2016   16 16.482545
## 499       #sofia16   15  4.281476
## 531          #rice   14  3.220867
## 564           #ods   13  4.414616
## 583        #canada   12  4.099286
## 636      #learning   11 14.782273
## 621       #culture   11  4.099286

Stats hashtags are identified in a two-stage process:

  1. First, the frequencies of each hashtag used by UN agencies is computed. This exercise is repeated for stats think tanks.
  2. Secondly, a list of hashtags related to statistics is determined by comparing the odds ratios of hashtags.
  • The odd ratio of one hashtag is the relative frequency of this hashtag among all tags used by stats think tanks on the relative frequency of this hashtag in the twitter timelines of UN agency.
  • Mathematically, this means: 
  • Odds ratio are cut from 3, so that we only keep the relevant hashtags in the UN agencies timelines that are in the 25 percentile in stats agencies odds ratios. Hashtags that occur at least three times more often in tweets of stats agencies than they do for UN agencies are considered related to statistical capacity development.

D. Final Indicator and initial results

An initial analysis was undertaken based on 112 000 unique tweets containing a total of over 160 000 hashtags. The final indicator is the average over these relative frequencies for the 15 UN agencies. The relative frequency of tags related to statistical capacity development is currently at about 1,13%. It means that around 1% of all the hashtags used by all the UN agencies are related to statistical development and/or data gaps.


4. Limitations

This methodology has several limitations.

a. Tweets are a non-representative data source made available by a private company. The methodology is therefore subject to biases and depends on continued access to the Twitter API, which may not be sustainable in the long term.
b. Difference between UN and country perspective: Although UN Specialised Agencies will by and large reflect the priorities of their member countries, they may not give a balanced view of the position of country’s heads of states.
c. Difference between frequency and impact of topics: Some topics may trend on Twitter and even induce herding behaviour among the agencies but do not make it on the agenda of high-level events.
d. Weighting of summits: Some summits do not really have an output but some others clearly have decisions. A future version of the methodology could aim to differentiate these and reflect this in the form of a weighting.

5. Next Steps and Conclusion

This indicator 4a provides a unique measure of references in global summits to statistical development and/or data gaps. It identifies hashtags related to statistical development and computes the relative frequency of these stats hashtags per UN agency. It allows therefore to observe the relative weight of each United Nation Specialised Agency in the total number of references to statistical development in global summits.

If approved by the Board, the Secretariat will continue the fully automated data collection for the indicator using the Twitter API and by the 2017 Board Meeting, will have established a baseline and determined a target for 2018.

As part of the implementation, the analysis will feed into the development of a freely accessible, online-based results monitoring instrument that allows users to track and browse outcomes of all global summits and high level forums.

  1. A retweet is a forward of individual tweets by other users to their own feed.

  2. This odds ratio of three turned out to be a good cut-off value.

See other articles

citizen-generated data
Today, the Partnership in Statistics for Development in the 21stCentury (PARIS21) and Partners for Review launched alandmark new paperthat identifies the factors preventing c(...)
From 6 to 8 November, PARIS21 is partnering with the United Nations Statistical Institute for Asia and the Pacific (SIAP) and the Asian Development Bank on a training course in Chi(...)
PARIS21 held its 2019 Cross Regional Forum from 28 to 29 October, with this year's edition exploring the multifaceted, multidisciplinary issue of trust in official statistics. T(...)
PARIS21-UN Women Gender Expert Meeting In a collaboration with UN Women, PARIS21 held an expert meeting on gender statistics from 1 to 2 October 2019. The two-day event brought (...)