Introduction to Earth observation data

What is Earth observation data? 🌐

Journey into EO data for social scientists (AI generated image).

Earth observation (EO) data refers to all collected information about the Earth’s physical, chemical, and biological systems. Utilizing Earth observation data allows to study the Earth’s atmosphere, land cover, near subsurface, oceans and inland waters, as well as biological diversity and ecosystems. Applications of EO data in academia, industry, and policy-making are extensive. It is crucial for the operation of activities in environmental protection, energy management, urban planning, agriculture and fisheries, forestry, public health, risk and hazard management, transport and mobility, civil protection, or tourism.

Additional information on Earth system indicators

There are many crucial Earth system indicators. With respect to the Earth’s climate, for example, the Global Climate Observing System (GCOS) maps 55 Essential Climate Variables (ECVs). ECVs are seen as the empirical evidence base for the guidance of mitigation and adaptation measures, risk assessment and the understanding of climate services. EO data is essential for systematically measuring these variables. Check out their documentation of ECVs.

ECVs as proposed by GCOS. Source: GCOS 2024 (https://gcos.wmo.int/en/essential-climate-variables/)

There is often confusion about terms such as Earth observation data, geodata, remote sensing, or satellite data. Let’s disentangle these different concepts.

  • Earth observation data 🌐: Information collected about the Earth’s system
  • Geospatial data 🗺: Data that is georeferenced (includes information on the location)
  • Remote sensing data 📡: Information that is acquired by sensors “from a distance”. Remote sensing uses electromagnetic radiation from a source (natural, e.g. solar radiation, or artificial, e.g. radar) that interacts with targets on the Earth’s surface in a unique way (spectral signatures).
  • Satellite data 🛰: Information that is acquired by sensors via satellites
Background on Remote Sensing

Electromagnetic radiation: is the transfer of energy from a target through space via waves that vary in wavelength, frequency, amplitude, and speed. It can be captured by sensors for analysis.

Radiance: Is the amount of electromagnetic radiation emitted or reflected by an object in a specific direction. (In contrast, irradiance refers to the radiation that strikes the surface of the object.)

Reflectance: Refers to how much light or electromagnetic radiation a surface bounces back compared to how much it receives. Different materials reflect and absorb light in unique ways, which helps us identify them using their spectral signature—a kind of “fingerprint” based on reflectance at different wavelengths.

  • Vegetation: Healthy plants reflect a lot of green light (why they look green) and near-infrared (NIR) light, but they absorb red and blue light for photosynthesis.

  • Water: Water reflects very little light overall. Most radiation is absorbed or transmitted into the water. However, shorter wavelengths like blue and green are more likely to be reflected—this is why water can appear blue.

  • Soil: Reflectance depends on factors like moisture. Wetter soil reflects less light, while dry soil reflects more.

Have a look at this paper to learn more about the basics of remote sensing.

As you can see, these four terminologies address different aspects of the data. While “EO data” refers to the data content (the Earth system), “geospatial data” addresses the location, and “remote sensing” and “satellite data” address the way of obtaining the data.

The Venn diagram below visualizes these definitions and shows overlaps and gaps between these concepts. Except for satellite data, which can be seen as a subset of remote sensing data, all other concepts have overlaps and gaps with each other.

EO Venn diagram. Source: Abel and Jünger 2024

Let’s disentangle these overlaps a bit further:

  • A: The area A captures all concepts discussed above - Georeferenced Earth observation data derived from satellite sensors. An example for this are nighttime lights 🌇. Night lights are highly informative for social scientists to measure population growth, electrification and light pollution, the expansion of urban areas, the impacts of natural events, and economic activity. An important satellite for nighttime lights is the NASA/NOAA Suomi National Polar-orbiting Partnership (Suomi NPP), which was launched in 2011. On board of Suomi NPP, the Visible Infrared Imaging Radiometer Suite (VIIRS) instrument observes nighttime lights with the day night band (DNB). Luckily for us, we do not have to handle this raw data. NASA’s Black Marble project offers various pre-processed products with high spatial and temporal resolution and temporal coverage since 2012. The data products are free and open access. R user benefit from the BlackMarbleR-package, which is a user-friendly interface to Black Marble data.
  • B: Area B is similar to A except that this data is derived from remote sensing OTHER than satellite data. There are, in fact, other platforms for sensors, like aircrafts or drones. For example, data on gases in the atmosphere are collected that way. Often, airborne data has a smaller geographical coverage and is more case specific. Furthermore, it is much more often proprietary and costly. For example, commercial providers offer airborne measurements of GHG emissions on a very high spatial resolution (<1m) to estimate point-source emissions of factories and power plants.
  • C: C represents EO data which is georeferenced but NOT derived from remote sensing. There are several alternatives to EO data generation: Ground-based sensors (called “in-situ”) are widespread sources of EO data. Weather stations or local air quality sensors are common examples. Projects like Google’s StreetView can also be classified here. But C also captures a data source which is very familiar to social scientists: Field surveys and survey-based methods. Biodiversity monitoring and soil quality projects often employ these methods. This work is not necessarily performed by experts only. Citizen science projects have contributed immensely to mapping local biodiversity, air quality, or land use: For example, scientists work together with “laypeople” to map bird populations like the RSPB in the UK with the annual Big Garden Birdwatch. Check out the European Citizen Science Association website to learn more about this form of data generation.
  • D and E: The two areas D and E deviate from the previous concepts because they are NOT EO data. D captures all geodata which is not derived from remote sensing and does not constitute EO data. This is primarily data which is human-centered - many examples of which are familiar to social scientists: Georeferenced variables on socio-demographics or economic indicators like Census data, electoral outcomes, or environmental attitudes and behavior. Examples for E, which represents data gathered based on remote sensing but is NOT classified as Earth observation, are space exploration and astronomy 🌌. Imaging of Mars’ surface is such an example. Future generations of social scientists might be concerned with interplanetary societal issues - (un)fortunately for us, we will focus on societies from planet Earth 🚀.

As you can see, the main areas when it comes to EO data are A, B, and C. As social scientists interested in working with EO data, D is similarly relevant: Often, we are interested in spatially linking our social indicators (D) with EO data. We therefore see these as two sides of the same coin. The geolocation represents the functional link between our social indicators and the Earth system context.

There are a few additional areas in the Venn diagram which we have not highlighted. These are the fields which do not overlap with the circle for “geodata”. Since we are focussing on geo-referenced data in this project, these parts will not be further considered in our tutorials.

The conceptual distinctions between the ways of obtaining data (in-situ, airborne, or satellite sensors) are important but often not clear cut in practice. Most of the time, we access processed (and/or simulated) data products 🎁, which are generated by integrating datastreams from various sensors to increase data quality. This is done to enhance accuracy, scope or resolution, helps to validate data, or to fill data gaps. We will explore a few common sources and indicators in the next chapter.

Applications in the social sciences

EO data is not just utilized in the Earth system sciences. A growing interest in economics and the social sciences in Earth observation data has increased the number of publications in recent years, which integrate, in one way or another, EO data in the research design.

We have identified 6 major topics in the social sciences which have benefited from this data source in the past:

  1. Environmental social sciences 🌱,
  2. Conflict and peace research 🕊,
  3. Political attitudes and behavior 🗳,
  4. Policy studies 📜,
  5. Economic development and inequality 📈, and
  6. Public health 💪.

The environmental social sciences are a growing research field at the intersection between the Earth system and societies. One particular topic, the role of extreme weather events for people’s environmental attitudes and behavior, has especially benefited from a growing data availability of EO data. A noteworthy study by Hoffmann et al. (2022) analyses how the experience of climate anomalies and extremes influences environmental attitudes and vote intention in Europe. They integrate sources from 1. harmonized Eurobarometer surveys, 2. EU parliamentary electoral data, and 3. climatological data and aggregate it on the regional level (NUTS2 and NUTS3). The climatological data is derived from C3S ERA5 reanalysis and is utilized to calculate temperature anomalies and extremes based on the reference period 1971-2000. Their findings suggest an effect of temperature anomalies (heat, “dry spell”) on environmental concern and vote intention.

Economists and social scientists who study economic development and inequality exploit EO data in various forms to operationalize independent variables such as drivers and barriers to development (e.g. droughts) as well as dependent variables (e.g. night lights as proxies for economic activity or the quality of rooftops as development indicator). García-León et al. (2021), for example, investigate historical and future economic impacts of recent heatwaves (2003, 2010, 2015, 2018) in Europe. They combine and regionally-aggregate 1. heatwave data with 2. population data, 3. worker productivity data, and 4. economic accounts from Eurostat. They utilize the C3S ERA5 data to calculate heatwaves based on the reference period 1981-2010. Their findings indicate total estimated damages attributed to heatwaves to 0.3-0.5% of European GDP with high a geospatial variation (GDP impacts beyond 1% in vulnerable regions). Jean et al. (2016) show how nighttime maps can be utilized as estimates of household consumption and assets. Economic indicators are hard to measure in poorer countries - satellite imagery could be an alternative proxy for it. The authors integrate 1. survey data (World Bank’s Living Standards Measurement Surveys - LSMS; and Demographic and Health Surveys - DHS) with 2. nighttime light data in five African countries - Nigeria, Tanzania, Uganda, Malawi, and Rwanda. They utilize ML approaches for image feature extraction on daytime satellite images from Google Static Maps and nighttime lights from US DMSP. They find that their model can explain up to 75% of variation in local-level economic outcomes.

Beyond single research projects, EO data is increasingly integrated in official statistical accounts. For example, the German Federal Statistical Office (DESTATIS) is currently evaluating the usage of remote sensing and satellite data for official statistical accounts in several projects as part of their experimental statistics. The agency explores, among other things, whether the number of ships and containers in harbors or inland waters can be used as proxies for trading activities or production figures or whether the occupancy of parking spaces adjacent to shops are indicative of sales figures (for more information, see their project on “Smart business cycle statistics based on satellite data”. In the long-term, they plan to integrate indicators, like vehicle manufacturing indices, into GDP flash estimates and nowcasting (see - Satellite-based early estimate of short-term economic development). For the establishment of the register census in 2031, they furthermore develop algorithms for the identification of buildings based on satellite and airborne imagery (Remote sensing data and artifical intelligence in the register census)

Earth observation is a growing field. Over the next few years, many more satellites will be put into orbit to increase data quality, resolution and range. Better data quality will further increase the possibilities for usage in the social sciences. Furthermore, novel applications of machine learning (ML) on satellite and aerial imagery expand the potential of EO data for the construction of new indicators. While lacking data availability, until recently, had severely limited use cases in the social sciences, these innovations will drive social scientists to tackle new research questions. Consider, for example, Stanford’s DeepSolar, which maps solar PV installations in the US at an unprecedented level of detail and enables researchers to study household-level adoption and diffusion of renewable energy technologies.

Relevant sources of Earth observation data

Where do we get EO data? (AI generated image)

Europe’s Earth Observation programme is called Copernicus. It is funded and managed by the European Commission and partners like the European Space Agency (ESA) and the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT). It has been operational since 2014 and provides free access to a wealth of satellite data from ESA’s “Sentinel” fleet. Copernicus combines data from satellites, ground-based as well as air- and sea-borne sensors to track the Earth system and provide this information largely free for all customers.

Additional information on the Copernicus programme Check out this 5min video on the Copernicus programme.

The ESA describes Copernicus as the world’s most ambitious Earth observation program, which will be further expanded in the coming years. On the Copernicus homepage, the daily data collection is estimated at 12 terabytes. Given this complexity, Copernicus has separated its services for public usage along several thematic areas:

Copernicus infrastructure and data services. Source: https://www.copernicus.eu/en/accessing-data-where-and-how/conventional-data-access-hubs

This project focuses on the data provided by the Copernicus programme. However, this is not the only relevant source of EO data which you can consider for your projects. The US equivalent, for example, is based on the Landsat satellite programme, which is jointly operated by NASA and the US Geological Survey (USGS). Google’s Earth Engine Cloud Computing Platform catalogs an extensive selection of additional data sets from various sources.

References

García-León, D., Casanueva, A., Standardi, G., Burgstall, A., Flouris, A. D., & Nybo, L. (2021). Current and projected regional economic impacts of heatwaves in Europe. Nature Communications, 12(1), 5807. https://doi.org/10.1038/s41467-021-26050-z
Hoffmann, R., Muttarak, R., Peisker, J., & Stanig, P. (2022). Climate change experiences raise environmental concerns and promote Green voting. Nature Climate Change, 12(2), 148–155. https://doi.org/10.1038/s41558-021-01263-8
Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790–794. https://doi.org/10.1126/science.aaf7894