Select Page

Nutrition officers measure the height of a child on a scale in the Bokolmayo Nutrition Center, Ethiopia. © UNHCR/Eduardo Soteras Jalil

This blog post the first in a series of post about UNHCR’s journey build a new open data finder. Check back regularly to find others.

By Lauren Herby

In 2017, the United Nations Statistical Commission adopted an indicator framework of 231 unique indicators to monitor the progress towards the Sustainable Development Goals (SDGs). Similarly the World Bank tracks 1,400 indicators on development and UNICEF over 680 on women and children. But what about the visibility of refugees, stateless, and other people affected by forced displacement?

Member states have committed to “more predictable and equitable responsibility-sharing” in refugee situations, and the Global Compact on Refugees (GCR) outlines a wide-ranging data agenda to support evidence-based responses. In parallel, UNHCR and other agencies, organizations and governments make commendable efforts to explicitly include forcibly displaced and stateless people in robust data collection and analysis. At the same time, the systematic publication of reliable statistics disaggregated by forced displacement status is currently limited to population and demographic statistics. This serves as our foundation, but to fully understand the unique experience of the forcibly displaced and stateless, how it compares to other population groups and progress towards solutions, we need more.

What are we doing?

With the support of the World Bank-UNHCR Joint Data Center on Forced Displacement (JDC), we, at UNHCR’s Global Data Service (GDS), are working on the integration of robust thematic statistics on forcibly displaced and stateless people and situations with the population statistics and demographic currently published on the Refugee Data Finder (RDF). The result will be a new data finder that serves a broad range of official statistics in a standardized and interoperable format. Sounds simple? Let me tell you.

Who has been involved?

This project started in mid-2023, and with the full support of a project manager, statisticians, business analysts, and punctual support of countless other internal and external stakeholders, we have drafted our methodological and system design, and are getting ready to start mapping workflows, sifting semantics, crunching numbers, drawing wireframes and building prototypes. So far, this project has served as an important opportunity for engagement across different thematic areas and UNHCR stakeholders, not to mention key partners. We must shout a big thank you to our partners at FAO, ILO, IOM, OCHA, OECD, UNDP, UNCTAD, UNICEF, UNSD, WFP and World Bank who have shared the guts of their own data systems and their experience that have helped us to get this far.

Challenging our assumptions

The assumption in the beginning was that we could build on our work over the past four years to curate and publish microdata on UNHCR’s Microdata Library, by compiling the aggregated data from these surveys and publishing them alongside our existing statistics. Quite straight-forward, right?

Not exactly if we want to ensure that: a) our end users can compare statistics across time, location, population groups, etc., b) the data is coherent across sharing mechanisms, and c) the data finder and database behind it is interoperable with countless internal and external systems. In the beginning, we need to address each of these points one data source and statistic at a time.

Mapping thematic data series

We have mapped around 450 thematic data series (or indicators) across internationally recognized frameworks and UNHCR’s standardized surveys. Among others, these include:

If you take one of these, say Malnutrition among children under 5 years of age, and multiply it by the type of malnutrition, a couple reporting units (i.e. number and percentage), disaggregation by gender, and the potential geographic coverage, we are looking at over 5,000 statistics, and this does not include all potential disaggregations and is only one data series! So, where do we start?

Sourcing data

Considering each of these statistics needs a data source, data flow, and alignment with our statistical standards, we are starting small. We are in the midst of consulting with subject-matter and statistical experts to identify a priority list of data series we will start compiling. We are narrowing in on those that have already been prioritized, are conceptually well defined, and with potential data sources – stratified by key themes of interest as much as possible. We are ready to face a few challenges we know we will encounter, namely in the way we approach data sources that have deviated from international standards, coherence across our internal and external data systems, and semantics that communicate (i.e., What speaks to you? Data, Statistics, Indicators, Series, None of the above, Just give me a number please). We do not want to confuse people or promote the misuse of statistics.  

Data modeling

By now, we have constructed multiple iterations of a data model, with inspiration and flexible alignment with the SDMX standard. Our goal is to have a model that is interoperable, scalable, and fit for an optimized user experience (both back-end data producers and front-end data users). At the time of writing, we are working on choosing the right database format, building a prototype and testing it with a narrow set of our initial statistics. This work is a true collaboration between our statisticians, data curators, data engineers, information management colleagues, and IT department, with valued feedback from our key partner, the World Bank, as well as Gartner. As the database will be sandwiched between data production and data discovery and needs to efficiently push and pull from other systems, we are determined to get its design right from the beginning.

Data finder design

At the end of last year, we held a collaborative wireframing workshop to try and get a jumpstart on the system requirements from a front-end perspective. We have taken the designs drafted during the workshop and translated them into a list of use cases and functional requirements. This summer we will map our requirements against potential technological solutions, develop more refined mockups and test them with a range of potential users.

Reach out and stay tuned

Now you know what will keep us busy over the next few months. We are happy to hear your ideas and encouragement as we move on our journey, and certainly would be happy to hear from you if you think you have data that is relevant for our data finder. Don’t be shy to reach out ([email protected]), and definitely stay tuned as we move ahead.