class: center, middle, inverse, title-slide .title[ # Earth observation data in the social sciences ] .subtitle[ ## Methods primer - MethodsNET Launch Conference ] .author[ ### Dennis Abel & Stefan Jünger ] .date[ ### UCLouvain, 2024-10-31 ] --- layout: true --- ## About this course In this primer session, we will provide an introduction to the relevance and usage of **Earth observation** (EO) data for social scientists in `R`. We will address four major challenges (for social scientists): .pull-left[ .small[ - **What**: What is Earth observation data? - **Why**: Why is it useful for social scientists? - **Where**: Where can I get the data? - **How**: How can I process the data? ]] .pull-right[ <img src="data:image/png;base64,#../img/EO_cats.PNG" width="90%" style="display: block; margin: auto;" /> ] --- ## About this course In this primer session, we will provide an introduction to the relevance and usage of **Earth observation** (EO) data for social scientists in `R`. This primer is meant to convince you to join our full two-day online course on ["Advanced Geospatial Data Processing for Social Scientists"](https://training.gesis.org/?site=pDetails&child=full&pID=0x5948F7543A5E42CF9CE2C91E844E33E1) on **28-29 April 2025**. We will remind you of that opportunity several times during this session. No need to type along today. You can find the slides here: [https://github.com/denabel/gxc/tree/main/slides](https://github.com/denabel/gxc/tree/main/slides) --- ## About us **Dennis Abel** .pull-left[ <img src="data:image/png;base64,#../img/Dennis.jpg" width="50%" style="display: block; margin: auto;" /> ] .pull-right[ .small[ - Postdoctoral researcher in the team Survey Data Augmentation at the GESIS department Survey Data Curation - Ph.D. in Political Economy, University of Cologne ]] .small[ - Research interests: - Geographic Information Systems (GIS) - environmental social sciences & policy analysis - causal identification via (quasi-)experimental approaches [dennis.abel@gesis.org](mailto:dennis.abel@gesis.org) | [Personal GESIS website](https://www.gesis.org/institut/mitarbeitendenverzeichnis/person/Dennis.Abel?no_cache=1) ] --- ## About us **Stefan Jünger** .pull-left[ <img src="data:image/png;base64,#../img/Stefan-Juenger.png" width="50%" style="display: block; margin: auto;" /> ] .pull-right[ .small[ - Senior Researcher in the team Survey Data Augmentation at the GESIS department Survey Data Curation - Ph.D. in social sciences, University of Cologne ] ] .small[ - Research interests: - Quantitative methods, Geographic Information Systems (GIS) - Social inequalities - Attitudes towards minorities - Environmental attitudes - Reproducible research ] .small[ [stefan.juenger@gesis.org](mailto:stefan.juenger@gesis.org) | [https://stefanjuenger.github.io](https://stefanjuenger.github.io) ] --- ## Four challenges - **What**: What is Earth observation data? - **Why**: Why is it useful for social scientists? - **Where**: Where can I get the data? - **How**: How can I process the data? --- ## Four challenges - **What**: <span style="color:#D20064;">**What is Earth observation data?**</span> - **Why**: Why is it useful for social scientists? - **Where**: Where can I get the data? - **How**: How can I process the data? --- ## What is Earth observation data? Earth observation (EO) data refers to all collected information about the Earth's physical, chemical, and biological systems. Utilizing Earth observation data allows to study the Earth's: - atmosphere, - land cover, - oceans and inland waters, as well as - biological diversity and ecosystems. --- ## What is Earth observation data? .pull-left[ .small[ There are many crucial Earth system indicators. With respect to the Earth's climate, for example, the **Global Climate Observing System** (GCOS) maps 55 **Essential Climate Variables** (ECVs). ECVs are seen as the empirical evidence base for the guidance of mitigation and adaptation measures, risk assessment and the understanding of climate services. EO data is essential for systematically measuring these variables. Check out their [documentation of ECVs](https://gcos.wmo.int/en/essential-climate-variables/about). ]] .pull-right[ <img src="data:image/png;base64,#../img/ecvs.png" width="100%" style="display: block; margin: auto;" /> .small[Image: ECVs as proposed by GCOS. Source: GCOS 2024.] ] --- ## What is Earth observation data? Applications of EO data in academia, industry, and policy-making are extensive. It is crucial for the operation of activities in: - environmental protection, - energy management, - urban planning, - agriculture, fisheries and forestry, - public health, - transport and mobility, - civil protection, or - tourism. --- ## What is Earth observation data? There is often confusion about terms such as Earth observation data, geodata, remote sensing, or satellite data. .pull-left[ .small[ - **Earth observation data**: Information collected about the Earth's system - **Geospatial data**: Data that is georeferenced (includes information on the location) - **Remote sensing data**: Information that is acquired by sensors "from a distance" - **Satellite data**: Information that is acquired by sensors via satellites. ]] .pull-right[ <img src="data:image/png;base64,#../img/EO_venn.png" width="100%" style="display: block; margin: auto;" /> ] --- ## What is Earth observation data? There is often confusion about terms such as Earth observation data, geodata, remote sensing, or satellite data. .pull-left[ .small[ As you can see, these four terminologies address different aspects of the data. While "EO data" refers to the content of the data (the Earth system), "geospatial data" addresses the location, and "remote sensing" and "satellite data" address the way of obtaining the data. ]] .pull-right[ <img src="data:image/png;base64,#../img/EO_venn.png" width="100%" style="display: block; margin: auto;" /> ] --- ## What is Earth observation data? There is often confusion about terms such as Earth observation data, geodata, remote sensing, or satellite data. .pull-left[ .small[ The main areas when it comes to EO data are **A**, **B**, and **C**. As social scientists interested in working with EO data, **D** and **E** are similarly relevant: Often, we are interested in spatially linking our social indicators (**D**) with EO data. We therefore see these as two sides of the same coin. The geolocation represents the link. ]] .pull-right[ <img src="data:image/png;base64,#../img/EO_venn.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Four challenges - **What**: What is Earth observation data? - **Why**: Why is it useful for social scientists? - **Where**: Where can I get the data? - **How**: How can I process the data? --- ## Four challenges - **What**: What is Earth observation data? - **Why**: <span style="color:#D20064;">**Why is it useful for social scientists?**</span> - **Where**: Where can I get the data? - **How**: How can I process the data? --- ## Why is EO data useful for social scientists? <img src="data:image/png;base64,#../img/cat_measure.PNG" width="60%" style="display: block; margin: auto;" /> --- ## Why is EO data useful for social scientists? A growing interest in economics and the social sciences in Earth observation data has led to a broad spectrum of publications in recent years. We have identified four major subject areas which have been addressed with EO data recently: - Environmental attitudes and behavior, - Economic development and inequality, - Conflict and migration, - Political behavior. --- ## Why is EO data useful for social scientists? .pull-left[ .small[ [Hoffmann et al. 2022](https://doi.org/10.1038/s41558-021-01263-8) analyse how the experience of climate anomalies and extremes influences environmental attitudes and vote intention in Europe .small[- Data integration of 1. harmonized Eurobarometer data, 2. EU parliamentary electoral data, and 3. climatological data - Aggregation on regional levels (NUTS-2 and NUTS-3) - Climatological data from ERA5 reanalysis (CS3) - Calculations of temperature anomalies and extremes based on reference period (1971-2000) - Findings suggest effect of temperature anomalies (heat, "dry spell") on environmental concern and vote intention ]]] .pull-right[ <img src="data:image/png;base64,#../img/Hoffmann_et_al_2022.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Why is EO data useful for social scientists? .pull-left[ <img src="data:image/png;base64,#../img/Garcia-Leon_et_al_2021.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ .small[ [García-León et al. 2021](https://doi.org/10.1038/s41467-021-26050-z) investigate historical and future economic impacts of recent heatwaves (2003, 2010, 2015, 2018) in Europe .small[ - Data integration of 1. heatwave data with 2. population data, and 3. worker productivity data, 4. economic accounts from Eurostat. - Aggregation on regional levels - Temperature data from ERA5 reanalysis - Calculations of heatwaves based on reference period (1981-2010) - Findings indicate total estimated damages attributed to heatwaves to 0.3-0.5% of European GDP with high spatial variation (GDP impacts beyond 1% in vulnerable regions) ] ]] --- ## Why is EO data useful for social scientists? .pull-left[ .small[ [Jean et al. 2016](https://doi.org/10.1126/science.aaf7894) show how nighttime maps can be utilized as estimates of household consumption and assets .small[- Economic indicators are hard to measure in poorer countries - satellite imagery could be an alternative proxy for it - The authors integrate 1. survey data (World Bank’s Living Standards Measurement Surveys - LSMS; and Demographic and Health Surveys - DHS) with 2. nighttime light data in five African countries - Nigeria, Tanzania, Uganda, Malawi, and Rwanda - ML approach for image feature extraction in nighttime maps - Daytime satellite images from Google Static Maps, nighttime lights from US DMSP - Model can explain up to 75% of variation in local-level economic outcomes ]]] .pull-right[ <img src="data:image/png;base64,#../img/Jean_et_al_2016.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Four challenges - **What**: What is Earth observation data? - **Why**: Why is it useful for social scientists? - **Where**: Where can I get the data? - **How**: How can I process the data? --- ## Four challenges - **What**: What is Earth observation data? - **Why**: Why is it useful for social scientists? - **Where**: <span style="color:#D20064;">**Where can I get the data?**</span> - **How**: How can I process the data? --- ## Where can I get EO data? <img src="data:image/png;base64,#../img/cat_api.PNG" width="60%" style="display: block; margin: auto;" /> --- ## Where can I get EO data? Europe's Earth Observation programme is called **[Copernicus](https://www.copernicus.eu/en)**. It is funded and managed by the European Commission and partners like the [European Space Agency](https://www.esa.int/) (ESA) and the [European Organisation for the Exploitation of Meteorological Satellites](https://www.eumetsat.int/) (EUMETSAT). Copernicus has been operational since 2014 and provides **free access** to a wealth of satellite data from ESA's **“Sentinel”** fleet. Copernicus combines data from satellites, ground-based as well as air- and sea-borne sensors to track the Earth system and provide this information largely free for all customers. Check out this 5min [video](https://www.youtube.com/watch?v=MGJss4lDaBo) on the Copernicus programme. --- ## Where can I get EO data? .small[ The ESA describes **Copernicus** as the world's most ambitious Earth observation program, which will be further expanded in the coming years. On the [Copernicus homepage](https://www.copernicus.eu/en/access-data.), the daily data collection is estimated at 12 terabytes. Given the complexity of issues, Copernicus has separated its services for public usage along several thematic areas: ] .pull-left[ .small[ - **Atmosphere**: [Copernicus Atmosphere Monitoring Service](https://atmosphere.copernicus.eu/) (CAMS) - **Marine**: [Copernicus Marine Service](https://marine.copernicus.eu/) (CMEMS) - **Land**: [Copernicus Land Monitoring Service](https://land.copernicus.eu/en) (CLMS) - **Climate change**: [Copernicus Climate Change Service](https://climate.copernicus.eu/) (C3S) - **Emergency**: [Copernicus Emergency Management Service](https://emergency.copernicus.eu/) (CEMS). ]] .pull-right[ <img src="data:image/png;base64,#../img/copernicus_services.png" width="90%" style="display: block; margin: auto;" /> ] .small[.small[https://www.copernicus.eu/en/accessing-data-where-and-how/conventional-data-access-hubs]] --- ## Where can I get EO data? This project focuses on the data provided by the Copernicus programme. However, this is not the only relevant source of EO data which you can consider for your projects. The US equivalent, for example, is based on the [Landsat satellite programme](https://landsat.gsfc.nasa.gov/), which is jointly operated by [NASA](https://www.nasa.gov/) and the [US Geological Survey](https://www.usgs.gov/landsat-missions) (USGS). [Google's Earth Engine Cloud Computing Platform](https://developers.google.com/earth-engine/datasets/catalog) catalogs an extensive selection of additional data sets from various sources. --- ## Where can I get EO data? Data quality is a major issue for EO data - Major levers are: - Indicator type - Spatial resolution - Temporal resolution - Spatial scope (coverage) - Temporal scope (coverage) - Measurement types (in-situ sensors, remote sensing, reanalysis) - Provider / source / API availability - Costs --- ## Where can I get EO data? Data quality is a major issue for EO data - Higher quality is expensive! <img src="data:image/png;base64,#../img/satellites_comparison.PNG" width="80%" style="display: block; margin: auto;" /> .footnote[https://www.destatis.de/EN/Service/EXSTAT/Datensaetze/satellite-data.html] --- ## Where can I get EO data? Data quality is a major issue for EO data - Higher quality is expensive! <img src="data:image/png;base64,#../img/resolution_comparison.PNG" width="70%" style="display: block; margin: auto;" /> .footnote[https://www.destatis.de/EN/Service/EXSTAT/Datensaetze/satellite-data.html] --- ## Four challenges - **What**: What is Earth observation data? - **Why**: Why is it useful for social scientists? - **Where**: Where can I get the data? - **How**: How can I process the data? --- ## Four challenges - **What**: What is Earth observation data? - **Why**: Why is it useful for social scientists? - **Where**: Where can I get the data? - **How**: <span style="color:#D20064;">**How can I process the data?**</span> --- ## Working with EO data in `R` <img src="data:image/png;base64,#../img/cat_scientist.PNG" width="60%" style="display: block; margin: auto;" /> --- ## Working with EO data in `R` .center[ <img src="data:image/png;base64,#https://d33wubrfki0l68.cloudfront.net/795c039ba2520455d833b4034befc8cf360a70ba/558a5/diagrams/data-science-explore.png" width="60%" style="display: block; margin: auto;" /> ] .small[ Source: [*R for Data Science*](http://r4ds.had.co.nz/) ] .small[ - **Import**: read in data in different formats (e.g., .csv, .xls, .sav, .dta) - **Tidy**: clean data (1 row = 1 case, 1 column = 1 variable), rename & recode variables, etc. - **Transform**: prepare data for analysis (e.g., by aggregating and/or filtering) - **Visualize**: explore/analyze data through informative plots - **Model**: analyze the data by creating models (e.g, linear regression model) - **Communicate**: present the results (to others) ] --- ## Vector and Raster Data <img src="data:image/png;base64,#../img/fig_geometries.png" width="40%" style="display: block; margin: auto;" /> --- ## General Difference to Vector Data Data Structure: - Other data format(s), different file extensions - geometries do not differ within one dataset Implications: - Other geospatial operations possible Benefits: - can be way more efficient - straightforward processing of raster values and extraction of zonal statistics - it's like working with simple tabular data --- ## What Exactly Are Raster Data? - Hold information on (most of the time) evenly shaped grid cells - Basically, a simple data table - each cell represents one observation <img src="data:image/png;base64,#../img/table_to_raster.png" width="5333" style="display: block; margin: auto;" /> --- ## Metadata - Information about geometries is globally stored - they are the same for all observations - their location in space is defined by their cell location in the data table - Without this information, raster data were simple image files --- ## Important Metadata - Raster Dimensions - number of columns, rows, and cells - Extent - Similar to bounding box in vector data - Resolution - the size of each raster cell - Coordinate reference system - defines where on the earth's surface the raster layer lies --- ## Most important EOD packages 📦 If you want to do anything in `R`, you need to use functions, and functions are provided through packages. .pull-left[ .small[.small[ [`terra`](https://rspatial.github.io/terra/index.html) (and its predecessor `raster`) by Robert Hijmans is probably the most commonly used package for raster data in `R` (and equally relevant for vector data). Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. The authors have produced a very extensive [tutorial](https://rspatial.org/) which is basically a textbook for spatial data science in R. ]]] .pull-right[ <img src="data:image/png;base64,#../img/terra_logo.png" width="80%" style="display: block; margin: auto;" /> ] --- ## Most important EOD packages 📦 If you want to do anything in `R`, you need to use functions, and functions are provided through packages. .pull-left[ <img src="data:image/png;base64,#../img/horst_sf_illustration.jpg" width="100%" style="display: block; margin: auto;" /> .small[.small[Illustration by [Allison Horst 2018](https://allisonhorst.com/allison-horst)]] ] .pull-left[ .small[.small[ [`stars`](https://r-spatial.github.io/stars/index.html) by Edzer Pebesma is equally useful when handling spatiotemporal arrays (datacubes). This R package provides classes and methods for reading, manipulating, plotting and writing such data cubes, to the extent that there are proper formats for doing so. The `stars` syntax follows the logic of the [`sf`](https://r-spatial.github.io/sf/)-package (also by Edzer Pebesma) and together they can be seen as the emergence of a "spatial tidyverse". Edzer Pebesma's and Roger Bivand's [Spatial Data Science](https://r-spatial.org/book/) textbook is THE go-to resource for learning the `sf` and `stars` syntax. ]]] --- ## All Beginnings Are... Easy! ``` r terra::rast() ``` ``` ## class : SpatRaster ## dimensions : 180, 360, 1 (nrow, ncol, nlyr) ## resolution : 1, 1 (x, y) ## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax) ## coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) ``` --- ## Feed With Data ``` r input_data <- sample(1:100, 16) |> matrix(nrow = 4) raster_layer <- terra::rast(input_data) raster_layer ``` ``` ## class : SpatRaster ## dimensions : 4, 4, 1 (nrow, ncol, nlyr) ## resolution : 1, 1 (x, y) ## extent : 0, 4, 0, 4 (xmin, xmax, ymin, ymax) ## coord. ref. : ## source(s) : memory ## name : lyr.1 ## min value : 4 ## max value : 96 ``` --- ## Plotting ``` r terra::plot(raster_layer) ``` <img src="data:image/png;base64,#primer_files/figure-html/plot-raster-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Plotting with tmap The `tmap` package is also very suitable for plotting raster data ``` r library(tmap) tm_shape(raster_layer) + tm_raster() ``` <img src="data:image/png;base64,#primer_files/figure-html/plot-raster-tmap-1.png" width="40%" style="display: block; margin: auto;" /> --- ## Loading your raster files: File Formats/Extensions - Gtiff/GeoTiff - JPEG2000 - GRIB - .grd - netCDF - ... - sometimes, raster data come even in a text format, such as CSV --- class: middle ## Basic Raster Operations --- ## Loading raster tiffs In the following example, we will work with simple raster files. We will load two separate files for June and July 2019. The raster files contain only one indicator, air **temperature** at 2m above the surface of land, sea or in-land waters. It is measured in Kelvin. The data has been accessed from CS3 [ERA5 reanalysis](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land-monthly-means?tab=overview). ``` r temp_6_2019 <- terra::rast("./data/temp_6_2019.tif") temp_7_2019 <- terra::rast("./data/temp_7_2019.tif") temp_6_2019 ``` ``` ## class : SpatRaster ## dimensions : 561, 1440, 1 (nrow, ncol, nlyr) ## resolution : 0.25, 0.25 (x, y) ## extent : -180.125, 179.875, -56.125, 84.125 (xmin, xmax, ymin, ymax) ## coord. ref. : lon/lat WGS 84 (EPSG:4326) ## source : temp_6_2019.tif ## name : SFC (Ground or water surface); 2 metre temperature [C] ## min value : 258.1914 ## max value : 313.1426 ## time : 2019-06-01 UTC ``` --- ## Compare Layers by Plotting .pull-left[ ``` r terra::plot(temp_6_2019) ``` <img src="data:image/png;base64,#primer_files/figure-html/plot-temperature-june-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ``` r terra::plot(temp_7_2019) ``` <img src="data:image/png;base64,#primer_files/figure-html/plot-temperature-july-1.png" style="display: block; margin: auto;" /> ] --- ## Simple Statistics Working with raster data is straightforward - quite speedy - yet not as comfortable as working with `sf` objects For example, to calculate the mean we would use: ``` r terra::global(temp_6_2019, fun = "mean", na.rm = TRUE) ``` ``` ## mean ## SFC (Ground or water surface); 2 metre temperature [C] 289.9985 ``` --- ## Combining Raster Layers .small[ The temperature is measured in Kelvin and can be converted to degrees Celsius by subtracting 273.15. ] .pull-left[ ``` r temp_6_2019_celsius <- temp_6_2019 - 273.15 temp_7_2019_celsius <- temp_7_2019 - 273.15 temp_6_2019_celsius ``` ``` ## class : SpatRaster ## dimensions : 561, 1440, 1 (nrow, ncol, nlyr) ## resolution : 0.25, 0.25 (x, y) ## extent : -180.125, 179.875, -56.125, 84.125 (xmin, xmax, ymin, ymax) ## coord. ref. : lon/lat WGS 84 (EPSG:4326) ## source(s) : memory ## varname : temp_6_2019 ## name : SFC (Ground or water surface); 2 metre temperature [C] ## min value : -14.95859 ## max value : 39.99258 ## time : 2019-06-01 UTC ``` ] -- .pull-right[ <img src="data:image/png;base64,#primer_files/figure-html/transformed-raster-plot-1.png" style="display: block; margin: auto;" /> ] --- ## Combining Raster Layers .small[ Similarly, calculations can also be performed with both raster files. ] .pull-left[ ``` r temp_diff <- temp_7_2019 - temp_6_2019 temp_diff ``` ``` ## class : SpatRaster ## dimensions : 561, 1440, 1 (nrow, ncol, nlyr) ## resolution : 0.25, 0.25 (x, y) ## extent : -180.125, 179.875, -56.125, 84.125 (xmin, xmax, ymin, ymax) ## coord. ref. : lon/lat WGS 84 (EPSG:4326) ## source(s) : memory ## varname : temp_7_2019 ## name : SFC (Ground or water surface); 2 metre temperature [C] ## min value : -6.845703 ## max value : 12.255859 ## time : 2019-07-01 UTC ``` ] -- .pull-right[ <img src="data:image/png;base64,#primer_files/figure-html/combined-raster-plot-1.png" style="display: block; margin: auto;" /> ] --- ## 'Subsetting' Raster Layers We can subset vector data by simply filtering for specific attribute values. For example, to subset a world map only to Belgium, we can use the `Tidyverse` for `sf` data: .pull-left[ ``` r world <- rnaturalearth::ne_countries( scale = "medium", returnclass = "sf") # Subset to relevant variables world <- world |> select(admin, geometry) # Subset to Belgium belgium <- world |> dplyr::filter(admin == "Belgium") sf::st_geometry(belgium) ``` ``` ## Geometry set for 1 feature ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: 2.524902 ymin: 49.51089 xmax: 6.364453 ymax: 51.49111 ## Geodetic CRS: WGS 84 ``` ] -- .pull-right[ <img src="data:image/png;base64,#primer_files/figure-html/plot-deutz-1.png" style="display: block; margin: auto;" /> ] --- ## Cropping Cropping is a method of cutting out a specific `slice` of a raster layer based on an input dataset or geospatial extent, such as a bounding box. Cropping reduces the spatial extent of a raster to a specified rectangular bounding box. -- .pull-left[ ``` r cropped_temp_6_2019 <- terra::crop(temp_6_2019_celsius, belgium) ``` ] -- .pull-right[ <img src="data:image/png;base64,#primer_files/figure-html/crop-raster-map-1.png" style="display: block; margin: auto;" /> ] --- ## Masking Masking is similar to cropping, yet values outside the extent are set to missing values (`NA`). Masking creates a precise match between the spatial extent of your shape and the raster values. -- .pull-left[ ``` r masked_temp_6_2019 <- raster::mask(temp_6_2019_celsius, terra::vect(belgium)) ``` ] -- .pull-right[ <img src="data:image/png;base64,#primer_files/figure-html/mask-raster-map-1.png" style="display: block; margin: auto;" /> ] --- ## Combining Cropping and Masking Cropping first and masking afterwards combines both processes. -- .pull-left[ ``` r temp_6_2019_belgium <- terra::crop(temp_6_2019_celsius, belgium) |> raster::mask(terra::vect(belgium)) ``` ] -- .pull-right[ <img src="data:image/png;base64,#primer_files/figure-html/crop-mask-raster-map-1.png" style="display: block; margin: auto;" /> ] --- class: middle ## Raster Extraction / Zonal statistics --- ## Sampling of some points .pull-left[ ``` r random_points <- temp_6_2019_belgium |> terra::spatSample(size = 10, na.rm = TRUE, as.points = TRUE) |> sf::st_as_sf() |> dplyr::select(-1) ``` ] -- .pull-right[ ``` r plot(random_points) ``` <img src="data:image/png;base64,#primer_files/figure-html/plot-random-points-1.png" style="display: block; margin: auto;" /> ] --- ## Extract Information From Rasters .pull-left[ Raster data are helpful when we aim to - apply calculations that are the same for all geometries in the dataset - **extract information from raster fast and efficient** ] .pull-right[ ``` r library(tmap) tm_shape(temp_6_2019_belgium) + tm_raster() + tm_shape(belgium) + tm_borders(col = "black", lwd = 2) + tm_shape(random_points) + tm_dots(size = .25) ``` <img src="data:image/png;base64,#primer_files/figure-html/plot-raster-extraction-1.png" style="display: block; margin: auto;" /> ] --- ## Raster Extraction To extract the raster values at a specific point by location, we use the following: ``` r terra::extract(temp_6_2019_belgium, random_points, ID = FALSE) ``` ``` ## SFC (Ground or water surface); 2 metre temperature [C] ## 1 17.95742 ## 2 18.20156 ## 3 17.89297 ## 4 17.83828 ## 5 18.46133 ## 6 18.19375 ## 7 17.10586 ## 8 17.52969 ## 9 18.77969 ## 10 17.11953 ``` --- ## Add Results to Existing Dataset This information can be added to an existing dataset (our points in this example): ``` r random_points <- random_points |> dplyr::mutate( temp_value = as.vector( terra::extract(temp_6_2019_belgium, random_points, ID = FALSE, raw = TRUE) ) ) random_points ``` ``` ## Simple feature collection with 10 features and 1 field ## Geometry type: POINT ## Dimension: XY ## Bounding box: xmin: 2.75 ymin: 49.75 xmax: 6 ymax: 51.25 ## Geodetic CRS: WGS 84 ## geometry temp_value ## 1 POINT (5.75 49.75) 17.95742 ## 2 POINT (5.25 50.5) 18.20156 ## 3 POINT (3.5 50.75) 17.89297 ## 4 POINT (5.25 50.25) 17.83828 ## 5 POINT (4.5 51.25) 18.46133 ## 6 POINT (4.25 51.25) 18.19375 ## 7 POINT (6 50.25) 17.10586 ## 8 POINT (3.5 51.25) 17.52969 ## 9 POINT (6 50.75) 18.77969 ## 10 POINT (2.75 51) 17.11953 ``` --- ## More Elaborated: Spatial Buffers .pull-left[ Sometimes, extracting information 1:1 is not enough - too narrow - missing information about the surroundings of a point ] .pull-right[ ``` r tm_shape(temp_6_2019_belgium) + tm_raster() + tm_shape( sf::st_buffer(random_points, 5000) ) + tm_dots(size = .1) + tm_borders() ``` <img src="data:image/png;base64,#primer_files/figure-html/plot-buffer-extraction-1.png" width="75%" style="display: block; margin: auto;" /> ] --- ## Spatial Linking Spatial linking is a crucial task in most research projects. Selecting a spatial (and temporal) buffer size is not trivial (buzzwords: **MAUP**, **UGCoP**, **ecological fallacy**) <img src="data:image/png;base64,#../img/FIGURE_1.png" width="75%" style="display: block; margin: auto;" /> .footnote[Jünger, 2021] --- ## Spatial Linking Spatial linking is a crucial task in most research projects. Selecting a spatial (and temporal) buffer size is not trivial (buzzwords: **MAUP**, **UGCoP**, **ecological fallacy**) <img src="data:image/png;base64,#../img/linking_juenger_2019.png" width="50%" style="display: block; margin: auto;" /> .footnote[Jünger, 2019] --- ## Spatial Linking Our gxc-package will help you with this process (at least the technical, not the theoretical): - Access EO data - Aggregate to your needs - Link to your social science datasets <img src="data:image/png;base64,#../img/gxclogo_v1_bright.png" width="30%" style="display: block; margin: auto;" /> --- ## Spatial Linking Our gxc-package will help you with this process: - Allows flexibility regarding major attributes of EO indicators <img src="data:image/png;base64,#../img/attribute_tree.PNG" width="100%" style="display: block; margin: auto;" /> --- ## Raster Stacks So far, raster data have been unidimensional in our examples: we only had one attribute for each dataset. But they can also be stacked: ``` r temp_stack <- c(temp_6_2019, temp_7_2019) temp_stack ``` ``` ## class : SpatRaster ## dimensions : 561, 1440, 2 (nrow, ncol, nlyr) ## resolution : 0.25, 0.25 (x, y) ## extent : -180.125, 179.875, -56.125, 84.125 (xmin, xmax, ymin, ymax) ## coord. ref. : lon/lat WGS 84 (EPSG:4326) ## sources : temp_6_2019.tif ## temp_7_2019.tif ## names : SFC (Ground or ~temperature [C], SFC (Ground or ~temperature [C] ## min values : 258.1914, 258.7051 ## max values : 313.1426, 314.7812 ## time : 2019-06-01 to 2019-07-01 UTC ``` --- ## Datacubes n-dimensional **arrays** or **datacubes** are the backbone of EO data. Unfortunately, humans are not very good in thinking beyond 3D. <img src="data:image/png;base64,#../img/cube2.png" width="70%" style="display: block; margin: auto;" /> .footnote[https://raw.githubusercontent.com/r-spatial/stars/master/images/cube2.png] --- class: middle ## What's next? --- ## Next steps Join our mailinglist by sending me an email ([dennis.abel@gesis.org](mailto:dennis.abel@gesis.org)) --- ## Next steps .small[ We are organizing an **international expert workshop on EO data in the social sciences** from 27-29 November at GESIS in Cologne, Germany. There will be keynote speeches by Dr. [Jennifer Marlon](https://geospatial.yale.edu/profile/jennifer-marlon), Executive Director of the Yale Center for Geospatial Solutions, and [Adel Daoud](https://liu.se/en/employee/adeda07), Associate Professor at the Institute for Analytical Sociology, Linköping University. We will also have a high-level roundtable on data quality issues and many interesting paper presentations on issues like: - Do earthquakes affect voting outcomes? - What political and socio-economic factors affect the distribution of pollution-monitoring sensors? - Can we detect illegal mining operations with satellite data? - Do autocracies misreport GDP accounts (and can we measure how much)? Do you want to participate? Send me an email ([dennis.abel@gesis.org](mailto:dennis.abel@gesis.org))! ] --- ## Next steps This primer was a teaser for our full two-day online course on ["Advanced Geospatial Data Processing for Social Scientists"](https://training.gesis.org/?site=pDetails&child=full&pID=0x5948F7543A5E42CF9CE2C91E844E33E1) on 28-29 April 2025. We will dig deeper into: - Accessing EO data - Wrangling with EO data (transformation and aggregation of values) - Visualization of EO data - Linking steps based on different spatial buffer specifications If you need a more basic introduction to geospatial data analysis, check out our [introductory course](https://training.gesis.org/?site=pDetails&child=full&pID=0x2C153F4DEA5C4685AB460DEC331C0E80) from 9-10 April 2025. In addition, if you are specifically interested in spatial econometrics, [Tobias Rüttenauer](https://ruettenauer.github.io/) offers exactly the right course for you from 9-11 July 2025: [Geodata and Spatial Regression Analysis](https://training.gesis.org/?site=pDetails&child=full&pID=0xC8DBE5AAA5FC4BE4ADDF1A90012B677E). --- ## Next steps Honestly, you don't need us. `R` is **free** and **open-source** and the user community has generated a lot of helpful content in the last years. We have mentioned the very extensive [`terra` tutorial](https://rspatial.org/) before. Together with Edzer Pebesma's and Roger Bivand's [Spatial Data Science](https://r-spatial.org/book/), you have two excellent textbooks to work with on your own. To advance even further, check out the freely-accessible textbook [Geocomputation with R](https://r.geocompx.org/) by Lovelace, Nowosad and Muenchow (2024). --- class: middle ## Q&A --- ## Backup slides --- ## Non-Regular Grids In this course, we only use regular grid-based raster data. However, be aware that non-regular grid data also exists. <img src="data:image/png;base64,#../img/nonregular_grid.png" width="70%" style="display: block; margin: auto;" /> .footnote[https://r-spatial.github.io/stars/]