How to use UNHCR's refugees R package

The R programming language is a widely used, free statistical software. Users can create packages, which are units of reproducible code and share those publicly. UNHCR has created the refugees R package designed to facilitate access to the data within the Refugee Data Finder. It provides an easy-to-use interface to the datasets, which cover forcibly displaced populations, including refugees, asylum-seekers and internally displaced people, stateless people, and others over a span of more than 70 years.
This package provides data from three major sources:
  1. Data from UNHCR’s annual statistical activities dating back to 1951.
  2. Data from the United Nations Relief and Works Agency for Palestine Refugees in the Near East (UNRWA), specifically for registered Palestine refugees under UNRWA’s mandate.
  3. Data from the Internal Displacement Monitoring Centre (IDMC) on people displaced within their country due to conflict or violence.
The data within the refugees package is updated at the same time as the Refugee Data Finder, twice per year.
The refugees package includes eight datasets:
  1. population: Data on forcibly displaced and stateless persons by year, including refugees, asylum-seekers, internally displaced people (IDPs) and stateless people. Detailed definitions of the different population groups can be found on the methodology page of the Refugee Data Finder.
  2. idmc: Data from the Internal Displacement Monitoring Centre on the total number of IDPs displaced due to conflict and violence.
  3. asylum_applications: Data on asylum applications including the procedure type and application type.
  4. asylum_decisions: Data on asylum decisions, including recognitions, rejections, and administrative closures.
  5. demographics: Demographic and sub-national data, where available, including disaggregation by age and sex.
  6. solutions: Data on durable solutions for refugees and IDPs.
  7. unrwa: Data on registered Palestine refugees under UNRWA’s mandate.
  8. flows: Numbers of the people forced to flee during each of the years since 1962. For more information, see the explaination of the forced displacement flow dataset.
The refugees package can be installed from CRAN.
install.packages("refugees",  repos = "")
Alternatively, the development version of the package can also be installed from Github using the pak package.
How to use the refugees package
The population dataset can be used to easily retrieve data on people forced to flee, as well as stateless people. The data is structured by year, country of asylum and country of origin with separate columns for each population group.

Rows: 120,338
Columns: 16
$ year              <dbl> 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951~
$ coo_name          <chr> "Unknown", "Unknown", "Unknown", "Unknown", "Unknown~
$ coo               <chr> "UKN", "UKN", "UKN", "UKN", "UKN", "UKN", "UKN", "UK~
$ coo_iso           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
$ coa_name          <chr> "Australia", "Austria", "Belgium", "Canada", "Denmar~
$ coa               <chr> "AUL", "AUS", "BEL", "CAN", "DEN", "FRA", "GBR", "GF~
$ coa_iso           <chr> "AUS", "AUT", "BEL", "CAN", "DNK", "FRA", "GBR", "DE~
$ refugees          <dbl> 180000, 282000, 55000, 168511, 2000, 290000, 208000,~
$ asylum_seekers    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ returned_refugees <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ idps              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ returned_idps     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ stateless         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ ooc               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ oip               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
$ hst               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
The three examples below illustrate some of the use cases of the refugees package and its combination with other popular R packages such as the graphing package ggplot2. The unhcrthemes package allows the creation of charts in R according to the UNHCR data visualization guidelines.
Example 1: The ten largest countries of origin of refugees and other people in need of international protection
At the end of 2022, there were 34.6 million refugees and other people in need of international protection around the world. Syrians, Ukrainians and Afghans accounted for 52 per cent of them.
Show the code

ref_coo_10 <- refugees::population |>
  filter(year == 2022) |>
  summarise(refugees = sum(refugees, na.rm = TRUE) + sum(oip, na.rm = TRUE),
            .by = coo_name) |>
  slice_max(order_by = refugees, n = 10) |>
  ggplot(aes(refugees, reorder(coo_name, refugees))) +
  geom_col(fill = unhcr_pal(n = 1, "pal_blue"),
           width = 0.8) +
  geom_text(aes(label = label_comma()(refugees)),
            hjust = -0.2) +
  scale_x_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Refugees and other people in need of international protection",
       subtitle = "By country of origin | end-2022",
       caption = "Source: UNHCR Refugee Data Finder") +
  theme_unhcr(font_size = 20,
              grid = FALSE,
              axis = FALSE,
              axis_title = FALSE,
              axis_text = "y")

Copy code to clipboard

Example 2: Global forcibly displaced population during last ten years
Between 2013 and 2022, the global forcibly displaced population more than doubled from 46.2 million to 108.4 million.
Show the code

fd_last_ten_years <- refugees::population |>
  filter(year >= 2022-9) |>
  summarise(refugees = sum(refugees, na.rm = TRUE),
            asylum_seekers = sum(asylum_seekers, na.rm = TRUE), 
            oip = sum(oip, na.rm = TRUE), .by = year) |>
  left_join(refugees::idmc |> 
              filter(year >= 2022-9) |>
              summarise(idmc = sum(total, na.rm = TRUE), .by = year), by=c("year")) |>
  left_join(refugees::unrwa |> 
              filter(year >= 2022-9) |>
              summarise(unrwa = sum(total, na.rm = TRUE), .by = year), by=c("year")) |>
  pivot_longer(cols = -year, names_to = "population_type", values_to = "total") |> 
  mutate(population_type=factor(population_type, levels=c("oip", "unrwa","asylum_seekers", "refugees","idmc")), 
             refugees="Refugees under UNHCR's mandate",
             oip="Other people in need of international protection",
             idmc="Internally displaced persons",
             unrwa="Palestine refugees under UNRWA's mandate")) |>
  arrange(year, population_type) |>
  ggplot() +
  geom_area(aes(x = year, y = total, fill = population_type)) +
  scale_fill_unhcr_d(palette = "pal_unhcr") +
  labs(title = "People forced to flee | 2013 - 2022", y = "Number of people", caption = "Source: UNHCR Refugee Data Finder") +
  scale_x_continuous(breaks = pretty_breaks(n = 10)) +
  scale_y_continuous(expand = expansion(c(0, 0.1)), breaks= pretty_breaks(n = 5), labels = label_number(scale_cut = cut_long_scale())) +  
  theme_unhcr(font_size = 20,
    grid = "Y",
    axis = "x",
    axis_title = "y") +
    scale_fill_manual(values = c("#EF4A60","#8EBEFF","#18375F","#0072BC", "#00B398")) +
guides(fill = guide_legend(nrow = 3, byrow = TRUE, reverse = TRUE)) 
Copy code to clipboard

Example 3: Demographics of populations protected and/or assisted by UNHCR in 2022
Children account for 42 per cent of all populations protected and/or assisted by UNHCR in 2022.
Show the code

demo_2022 <- refugees::demographics |> 
  filter(year==2022 & pop_type %in% c("REF", "ASY", "IDP", "OIP", "RDP", "RET", "STA", "OOC")) |>
  summarise("male 0-17" = sum(m_0_4, na.rm = TRUE) + sum(m_5_11, na.rm = TRUE) + sum(m_12_17, na.rm = TRUE),
            "male 18-59"= sum(m_18_59, na.rm = TRUE), 
            "male 60+"=sum(m_60, na.rm=TRUE), 
            "female 0-17" = sum(f_0_4, na.rm = TRUE) + sum(f_5_11, na.rm = TRUE) + sum(f_12_17, na.rm = TRUE),
            "female 18-59"= sum(f_18_59, na.rm = TRUE), 
            "female 60+"=sum(f_60, na.rm=TRUE)) |>
  pivot_longer(cols=everything(), names_sep = " ", names_to = c(".value", "ages")) |>
  mutate(male_p=round(male/(sum(female)+sum(male)),2), female_p=round(female/(sum(female)+sum(male)),2)) |>
  ggplot() +
  geom_col(aes(-male_p, ages, fill = "Male" ), width = 0.7) +
  geom_col(aes(female_p, ages, fill = "Female"), width = 0.7) +
  geom_text(aes(-male_p, ages, label = percent(abs(male_p))), hjust = 1.25, size = 16/ .pt) +
  geom_text(aes(female_p, ages, label = percent(abs(female_p))), hjust = -0.25, size = 16 / .pt) +
  labs(title = "Demographics of populations protected and/or assisted by UNHCR | 2022",
    caption = "Note: Age and sex disaggregated data is not available for all countries of asylum.\nFigures do not add up to 100 per cent due to rounding.\nSource: UNHCR Refugee Data Finder") +
  scale_x_continuous(expand = expansion(c(0.2, 0.2))) +
  scale_fill_manual(breaks=c("Male", "Female"), values = setNames(unhcr_pal(n = 3, "pal_unhcr")[c(2, 1)], c("Male", "Female"))) +
  theme_unhcr(font_size = 20, grid = FALSE, axis = FALSE, axis_title = FALSE, axis_text = "y")

Copy code to clipboard
See also the manual page on how to use the refugees package.