library(blackmarbler)
Warning: Paket 'blackmarbler' wurde unter R Version 4.4.3 erstellt
Nighttime Lights (NL) data offer a versatile and increasingly important perspective for a wide range of social research topics. At the most basic level, NL data can be used to assess electrification rates, map urban extent, and estimate population density, urbanization, and economic activity. These applications are foundational but can also extend into more complex spatial analyses, such as defining administrative boundaries, monitoring infrastructure development, and analyzing spatial inequalities.
In a case study of Indian rural electrification rates Dugoua (2018) proves that NL data accurately assess rural electrification rates when compared to Census data. However, this study questions the accuracy of NL data for the assessment of socio-economic indicators like financial inclusion or poverty indexes. Also Dugoua (2018) suggests possible improvements of accuracy by using better GIS maps, basic geo-processing tools, and particular aggregations of night-time luminosity.
NL data also provide insights into more dynamic and often critical processes, including conflict detection, disaster impact assessment, population displacement, and refugee movements. Additionally, they have been used to capture economic trends, energy consumption patterns, and the evolution of urban growth over time.
Chen (2016) explores how economic statistics can be accurately assessed using NL data. Additionally, it reveals the possibility of sub-national analysis of urbanization and its implications for poverty rates.
Beyond physical and economic indicators, NL data can also be linked to various socio-economic measures, such as regional GDP, poverty rates, public health, wealth distribution and even progress toward Sustainable Development Goals (SDGs). However, these correlations often require careful interpretation to avoid oversimplification, as the brightness of a location is not always a straightforward proxy for economic well-being.
In a study of African countries for which little statistical information is available, Bruederle (2018) assesses economic activity and development statistics by using NL data. This study shows a positive relationship between NL data and location-specific human development indicators. Changes in NL data can also be used to detect variation in these indicators.
Proville (2017) broadens the scope of NL data applications to a global scale, revealing a significant correlation between NL data and various socio-economic indicators. Additionally, Proville (2017) points out the relationship between NL and multiple greenhouse gas (GHG) emissions, such as CO2, CH4, N2O and F-gas. The study highlights the accuracy of NL data on a local level and in areas with poor statistical inventories.
Moreover, nighttime lights can reflect cultural and social behaviors, such as religious celebrations, tourism, or even the decorative nature of public lighting, providing unique insights into the social importance of specific objects. Finally, NL data can capture natural NIR emitters, including wildfires, volcanic activity, and gas flaring, adding further depth to its environmental monitoring capabilities.
Levin (2020) provides a thorough explanation of NL for most of the named indicators. Regarding the topic of “Holiday and ornamental lights, and political, historical, and cultural differences in lighting”, Levin (2020) discusses temporal, small-scale and individual variations in lightning behaviors during holidays and cultural events. During the Christmas and New Year holidays, the “[…] patterns of total lighting electricity usage (units of Watt · hr) derived from nighttime radiance were shown to uniformly increase across US cities with diverse ethnicity and religious backgrounds […]”. This indicated holiday lighting behaviour can also be detected by NL, which can also detect religious and cultural lighting patterns. For example, “[…] cities with Muslim-majority population exhibit lighting peaks during and slightly after the 30 days of Ramadan compared to non-arab cities in Israel […]”. Moreover, the tourist season in Greece can be detected by a seasonal changes of lighting patterns. Levin (2020) also points out the difference between celebratory/ decorative lights, which are usually brighter than functional lights, such as street lights.
Here is a visualization of a possible workflow:
All these indicators have multiple causes, so it is important not to make quick assumptions. This is why we would like to clarify the accuracy and reliability of the NL data.
The accuracy of nighttime light (NL) data largely depends on a stable electricity supply, as fluctuations in power availability can obscure the true signal of human activity. Additionally, areas without electrification are not captured, leading to significant blind spots in the data (Dugoua, 2018). Higher spatial resolutions improve the ability to capture fine-scale variations in light emissions, enhancing accuracy and interpretability (Bruederle, 2018). Despite this, NL data remains an imperfect proxy for many socioeconomic indicators. It is particularly useful for estimations in regions with limited statistical coverage or where reliable data is difficult to obtain (Proville, 2017) (Chen, 2016). Talking about statistical inventories NL data has the strength of detecting possibly informal activities and their spatial patterns, disclosing regional governance differences (Proville, 2017). However, interpreting these data requires caution, as they can also reflect unintended side effects like artificial light pollution. This “skyglow” not only disrupts human circadian rhythms and wildlife behavior but also complicates astronomical observations by reducing nighttime sky quality.
Finally, while NLs are often interpreted as indicators of modernization, urbanization, economic development, and technological innovation, they can also signify longer working hours and potentially weaker labor protections, reflecting broader social conditions.
“[…] usefulness of nighttime lights as proxy depends on optimal weight of proxy measure […]” (Chen, 2016)
You can find a really elaborate documentation of NL data indication also discussing impacts on ecosystems and health here.
Remote sensing normally relies on the reflection of electromagnetic radiation derived from the Sun on the Earth’s surface. This technique is called passive remote sensing, since the sensor does not actively emit electromagnetic radiation to explore the Earth’s surface. But what happens at night?
One could, of course one could apply active remote sensing such as RADAR (Radio Detection and Ranging), LiDAR (Light Detection and Ranging) or SAR (Synthetic Aperture RADAR) but these sensors present many difficulties in interpretation and require more pre-processing. However, there is still electromagnetic radiation that can be detected at night. Nevertheless, electromagnetic radiation can still be detected at night. At night, it is not the reflection of electromagnetic radiation that is assessed, but solely emissions from the Earth’s surface, e.g. thermal infrared detecting heat or microwave radiation. Artificial lights e.g. street lights are detected since they are situated in the visible infrared and near-infrared part of the radiometric spectrum.
DMSP/OLS (Defense Meteorological Satellite Program’s Operational Line-scan System): Nighttime satellite imagery obtained to detect clouds at night with low light imaging (the moon). This was helpful in the first studies of light pollution in the 70s. However, DMSP/OLS is not very high in quality, blurring an “overglowing” is caused by intense scattering.
VIIRS (Visible Infrared Imaging Radiometer Suite): Is an improved version of DMSP/OLS consisting of 2 MODIS sensors, 36 spectral bands and VIIRS. The sensors are especially sensitive to lower nighttime light levels due to panchromatic DNB (Day/Night Band) that detects visible and near-infrared wavelength. Compared to DMSP the “overglow” is reduced and spatial resolution increased. Due to its high resolution and sensitivity VIIRS can effectively detect seasonal changes in intensity of NLs.
Landsat: Detecting thermal night-time information though only detecting really bright nighttime lights.
Photos from ISS: Digital photos taken from the ISS only have moderate spatial resolution. However, they have been helpful in environmental impact and ecological studies, among others.
Citizen Science: The “Cities at Night” project aimes to allocate and geo-reference nighttime photos from ISS. It works with in 3-steps: classification, allocation and geo-referencing. The broad participation of 20,000 volunteers in project resulted in 190,000 night images tagged, 3000 images located (1 GCP), 700 images geo-referenced (multiple GCP). The resulting photos were used in various studies concerning light pollution monitoring, epidemiological studies and ecological studies.
(Levin, 2020)NASA’s Blackmarble uses VIIRS DNB sensors on board of the Suomi NPP and NOAA-20 satellites. The VIIRS DNB has a high spatial resolution of 500 m and is calibrated and corrected for atmospheric, terrain, lunar BRDF, thermal, and stray light effects. This is resulting in faster retrieval time and less noise, facilitating research based on daily, seasonal and annual changes in NLs (Levin, 2020). Therefore, NASA’s Blackmarble is the most commonly used NL data set in recent studies.
Working with NL data can give insights to natural and human activity, as well as complex socioeconomic phenomenons. In this case study we will focus on the relation between NL and population in Los Angeles County. We will try to visualize correlations and test the accuracy of NL as indicator for socio demographic statistics. The workflow we will follow looks like this: data retrieval using an API, pre-processing (log-transformation), sub-setting (masking & cropping) the AOI, disaggregation (matching resolution), stacking layers, data preparation & model training (randomForest, imputation logics), descriptive statistics.
In case you are not all up to date in working with raster layers and data cubes, please have a look at that topic again.
blackmarbler
blackmarbleR
by Robert Marty and Gabriel Stefanini Vicente (2025) supports easy access to NASA’s Black Marble API. Let’s check out their vignette to set up an account and the data retrieval.
The function bm_raster()
to retrieve the nighttime lights requires as input an sf
object to determine the spatial extent of the downloaded data. The object must be in WGS84.
We will focus on California. Let’s load US states shapefiles with the tigris
package and subset to California.
CA_sf <- tigris::states(progress_bar = FALSE) |>
dplyr::filter(STUSPS == "CA") |>
sf::st_transform(crs = "EPSG:4326")
Retrieving data for the year 2024
Simple feature collection with 1 feature and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -124.482 ymin: 32.52951 xmax: -114.1312 ymax: 42.0095
Geodetic CRS: WGS 84
REGION DIVISION STATEFP STATENS GEOID GEOIDFQ STUSPS NAME LSAD
1 4 9 06 01779778 06 0400000US06 CA California 00
MTFCC FUNCSTAT ALAND AWATER INTPTLAT INTPTLON
1 G4000 A 403673433805 20291632828 +37.1551773 -119.5434183
geometry
1 MULTIPOLYGON (((-119.9999 4...
Once you have set up your profile at NASA’s Earth Data Portal and generated your API token, you can assign it to an object in R
for the data retrieval.
Now we can download the data from NASA’s API.
CA_nl_stack <- bm_raster(
roi_sf = CA_sf,
product_id = "VNP46A4", # for yearly data
date = 2017:2020, # same four years like our population data
bearer = bearer, # your API token
output_location_type = "file", # we want to store geotiff on disk
file_dir = "./data/", # where to store geotiff
file_return_null = FALSE # also create SpatRaster file
)
R
environment (output_location_type = "memory"
). If you want to store it as single GeoTIFFs, specify output_location_type = "file
and the file path with file_dir=
. file_return_null=
further specifies whether the data is additionally loaded to the R
environment.
Let’s have a quick lock at our data:
CA_nl_2020 <- terra::rast("./data/VNP46A4_NearNadir_Composite_Snow_Free_qflag_t2020.tif")
print(CA_nl_2020)
class : SpatRaster
dimensions : 2275, 2485, 1 (nrow, ncol, nlyr)
resolution : 0.004166667, 0.004166667 (x, y)
extent : -124.4833, -114.1292, 32.52917, 42.00833 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source : VNP46A4_NearNadir_Composite_Snow_Free_qflag_t2020.tif
name : t2020
min value : 0.0
max value : 269.2
The data is right-skewed. We perform a log-transformation to account for that. Let’s have a look at our data now.
CA_nl_2020[] <- log(CA_nl_2020[] + 1)
ggplot() +
geom_spatraster(data = CA_nl_2020) +
geom_sf(
data = CA_sf,
fill = "transparent",
color = "white",
size = 4
) +
scale_fill_gradient2(
low = "black",
mid = "yellow",
high = "red",
midpoint = 3,
na.value = "transparent"
) +
labs(title = "Nighttime Lights in California 2020") +
coord_sf() +
theme_void() +
theme(
plot.title = element_text(face = "bold", hjust = 0.5),
legend.position = "none"
)
<SpatRaster> resampled to 500980 cells.
In order to define our AOI to Los Angeles County 🌃 we first need to:
# Load LA County file
LA_county <- tigris::counties("CA",
progress_bar = FALSE
) |>
filter(NAME == "Los Angeles") |>
st_transform(crs = "EPSG:4326")
Retrieving data for the year 2024
# Subset to "mainland" California and
# exclude the two islands Santa Catalina and San Clemente
LA_county <- LA_county %>%
st_cast("POLYGON") %>%
mutate(area = st_area(.)) %>%
slice_max(area, n = 1)
Warning in st_cast.sf(., "POLYGON"): repeating attributes for all
sub-geometries for which they may not be constant
Simple feature collection with 1 feature and 19 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -118.9517 ymin: 33.65955 xmax: -117.6464 ymax: 34.8233
Geodetic CRS: WGS 84
STATEFP COUNTYFP COUNTYNS GEOID GEOIDFQ NAME
1.2 06 037 00277283 06037 0500000US06037 Los Angeles
NAMELSAD LSAD CLASSFP MTFCC CSAFP CBSAFP METDIVFP FUNCSTAT
1.2 Los Angeles County 06 H1 G4020 348 31080 31084 A
ALAND AWATER INTPTLAT INTPTLON area
1.2 10516008433 1784982940 +34.1963983 -118.2618616 10863687022 [m^2]
geometry
1.2 POLYGON ((-118.0288 33.8733...
Now that we have an sf
file of our AOI, we can prepare our two raster files.
# Create LA raster files for population and night lights in 2020
LA_pop_2020 <- terra::rast("./data/US-CA_ppp_2020_1km.tif") |>
terra::mask(terra::vect(LA_county)) |>
terra::crop(LA_county)
LA_pop_2020[] <- log(LA_pop_2020[] + 1)
LA_nl_2020 <- terra::mask(
CA_nl_2020,
terra::vect(LA_county)
) |>
terra::crop(LA_county)
Our two raster files have the same CRS and (almost same) spatial extent. Unfortunately, the resolution (cell size) differs. Our population data is on an approx. 1km grid and our night lights data on an approx. 500m grid.
class : SpatRaster
dimensions : 140, 157, 1 (nrow, ncol, nlyr)
resolution : 0.008333333, 0.008333333 (x, y)
extent : -118.9512, -117.6429, 33.65792, 34.82458 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source(s) : memory
varname : US-CA_ppp_2020_1km
name : US-CA_ppp_2020_1km
min value : 0.000000
max value : 9.843707
class : SpatRaster
dimensions : 280, 313, 1 (nrow, ncol, nlyr)
resolution : 0.004166667, 0.004166667 (x, y)
extent : -118.95, -117.6458, 33.65833, 34.825 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source(s) : memory
varname : VNP46A4_NearNadir_Composite_Snow_Free_qflag_t2020
name : t2020
min value : 0.000000
max value : 4.400603
So how can we align two layers? There are multiple ways, like always in R. Firstly we will try out terra::disagg()
/ terra::aggregate()
. By splitting each cell into smaller parts we can increase the resolution of a grid to adjust it to another. It is the simplest and fastest way to harmonize multiple cell sizes and increase their resolution without altering original values.
# Increase resolution for population data
LA_pop_2020_high <- terra::disagg(LA_pop_2020,
fact = c(2, 2),
method = "bilinear"
)
# There is still a slight mismatch due to rounding errors (one more ncol)
# Let's crop to the spatial extent of the nightlights data
LA_pop_2020_high <- crop(
LA_pop_2020_high,
terra::ext(LA_nl_2020)
)
[1] 0.004166667 0.004166667
[1] 0.004166667 0.004166667
SpatExtent : -118.9512495092, -117.64708284775, 33.65791673341, 34.82458339541 (xmin, xmax, ymin, ymax)
SpatExtent : -118.95, -117.645833333333, 33.6583333333333, 34.825 (xmin, xmax, ymin, ymax)
# Decrease resolution for nightlights data
LA_nl_2020_low <- terra::aggregate(LA_nl_2020,
fact = c(2, 2),
method = "bilinear"
)
# There is still a slight mismatch due to rounding errors (one more ncol)
# Let's crop to the spatial extent of the population data
LA_nl_2020_low <- crop(
LA_nl_2020_low,
terra::ext(LA_pop_2020)
)
[1] 0.008333333 0.008333333
[1] 0.008333333 0.008333333
SpatExtent : -118.9512495092, -117.6429161811, 33.65791673341, 34.82458339541 (xmin, xmax, ymin, ymax)
SpatExtent : -118.95, -117.641666666667, 33.6583333333333, 34.825 (xmin, xmax, ymin, ymax)
Another way to match different grids is terra::resample()
. By interpolating the resolution of one layer is adjusted to another one. The advantage: this method accounts more for the real spatial pattern.
# Cross-check
LA_pop_2020_resampled <- resample(
x = LA_pop_2020,
y = LA_nl_2020,
method = "bilinear"
)
res(LA_pop_2020_resampled)
[1] 0.004166667 0.004166667
[1] 0.004166667 0.004166667
SpatExtent : -118.95, -117.645833333333, 33.6583333333333, 34.825 (xmin, xmax, ymin, ymax)
SpatExtent : -118.95, -117.645833333333, 33.6583333333333, 34.825 (xmin, xmax, ymin, ymax)
Only when all grids and resolutions of our layers match we can start combining layers into a raster stack.
class : SpatRaster
dimensions : 280, 313, 2 (nrow, ncol, nlyr)
resolution : 0.004166667, 0.004166667 (x, y)
extent : -118.95, -117.6458, 33.65833, 34.825 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source(s) : memory
varnames : VNP46A4_NearNadir_Composite_Snow_Free_qflag_t2020
VNP46A4_NearNadir_Composite_Snow_Free_qflag_t2020
names : US-CA_ppp_2020_1km, t2020
min values : 0.000000, 0.000000
max values : 9.733816, 4.400603
Previous examples follow the idea of interpolating existing data across the spatial domain. Imputation fills in missing values based on a prediction model. Let’s consider our two variables to make up a stylized example:
We know that population density and nightlights is correlated. We could try to predict the missing values for population based on the values of nightlights to generate the higher resolution population data. In order to do that, we train a RandomForest model on the low resolution data of population and nightlights.
Warning: Paket 'randomForest' wurde unter R Version 4.4.3 erstellt
randomForest 4.7-1.2
Type rfNews() to see new features/changes/bug fixes.
Attache Paket: 'randomForest'
Das folgende Objekt ist maskiert 'package:ggplot2':
margin
# Covariates need to be in same size as outcome variable = 1km
LA_nl_2020_resampled <- resample(
x = LA_nl_2020,
y = LA_pop_2020,
method = "bilinear"
)
# Create training data - one row per cell
train_data <- as.data.frame(LA_nl_2020_resampled,
xy = TRUE,
cells = TRUE,
na.rm = FALSE
) |>
left_join(
as.data.frame(LA_pop_2020,
xy = FALSE,
cells = TRUE,
na.rm = FALSE
),
by = "cell"
) |>
rename(
nightlights = t2020,
population = `US-CA_ppp_2020_1km`
)
train_data <- na.omit(train_data)
Ready to fit the model and predict population data on 500m grid.
Now we have a million options to analyse our raster stack.
mean
population 2.9236307
nightlights 0.5277355
population_predicted 2.7920416
$correlation
population nightlights population_predicted
population 1.0000000 0.8245851 0.8760559
nightlights 0.8245851 1.0000000 0.8735970
population_predicted 0.8760559 0.8735970 1.0000000
$mean
population nightlights population_predicted
population NaN 2.923631 2.9236307
nightlights 0.5277355 NaN 0.5277355
population_predicted 2.7920416 2.792042 NaN
$n
population nightlights population_predicted
population NaN 59392 59392
nightlights 59256 NaN 59256
population_predicted 59256 59256 NaN
Bruederle A, Hodler R (2018) Nighttime lights as a proxy for human development at the local level. PLOS ONE 13(9): e0202231. https://doi.org/10.1371/journal.pone.0202231
Chen, X. (2016). Using Nighttime Lights Data as a Proxy in Social Scientific Research. In: Howell, F., Porter, J., Matthews, S. (eds) Recapturing Space: New Middle-Range Theory in Spatial Demography. Spatial Demography Book Series, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-22810-5_15
Dugoua, E., Kennedy, R., & Urpelainen, J. (2018). Satellite data for the social sciences: measuring rural electrification with night-time lights. International Journal of Remote Sensing, 39(9), 2690–2701. https://doi.org/10.1080/01431161.2017.1420936
Freie Universität Berlin. (n.d.). Sensor basics - Remote sensing data [Blog post]. Freie Universität Berlin. Retrieved May 19, 2025, from https://blogs.fu-berlin.de/reseda/sensor-basics/
Levin, N., Kyba, C. C., Zhang, Q., de Miguel, A. S., Román, M. O., Li, X., … & Elvidge, C. D. (2020). Remote sensing of night lights: A review and an outlook for the future. Remote Sensing of Environment, 237, 111443. https://doi.org/10.1016/j.rse.2019.111443
NASA Goddard Space Flight Center. (n.d.). NASA Black Marble. NASA. Retrieved May 14, 2025, from https://blackmarble.gsfc.nasa.gov/
Proville, J., Zavala-Araiza, D., & Wagner, G. (2017). Night-time lights: A global, long term look at links to socio-economic trends. PloS one, 12(3), e0174610. https://doi.org/10.1371/journal.pone.0174610