This study employs two different algorithms to detect land cover change: (1) a theme-based algorithm that identifies loss of estimated vegetation cover fraction and (2) a statistical signal-based algorithm that identifies spectral anomalies in all land covers. Both algorithms compare the current observation to observations from a baseline of the previous three years within the same calendar window (±15 days). The first algorithm flags observations with vegetation fraction less than the baseline minimum, and the second algorithm flags spectral reflectance outside of the previous distribution. These anomalies are then tracked in subsequent observations to build or decrease confidence. High confidence alerts are flagged as confirmed. Every HLS granule is downloaded and evaluated for anomalies, and then the time-series layers are updated and all layers are sent as a new DIST-ALERT granule to the Land Processes Distributed Active Archive Center (LP DAAC) of USGS and NASA, where they are publicly available (https://doi.org/10.5067/SNWG/OPERA_L3_DIST-ALERT-HLS_V1.001). DIST-ALERT has been run for all HLS data from 2023 to the present. Operational production of version 1 DIST-ALERT data began in March 2024 with a median delivery to LP DAAC less than 6 h from HLS availability. Events that began in 2022 and persisted into 2023 are captured but with a later start date and shorter duration reported, resulting in higher rates of initial detection of land cover change in early 2023, but this does not affect DIST-ALERT data after 2023.
The area estimates for land cover change for 2023 are derived from a probability sample of reference data. The sample was stratified by land cover type as well as by the change detection labels within DIST-ALERT. The selected sample of 30 m pixels was labeled for change throughout a time series of 2023, with 2022 imagery evaluated to mark sampled pixels where the change event began in 2022. The driver and conversion status were also recorded per selected location through consensus interpretation. This time-series reference data was also employed to estimate the accuracy of DIST-ALERT for any date with updated map data in 2023. Additionally, metrics of latency and timeliness are quantified.
Input HLS data
To effectively map land cover change based on individual scenes, there must be radiometrically consistent inputs. The Harmonized Landsat Sentinel-2 (HLS) dataset is a conversion of raw Landsat 8/9 and Sentinel-2A/B/C data to a radiometrically consistent time series of surface reflectance at a standard 30 m with bi-directional reflectance distribution function correction23. Data of bands with matching spectral ranges from any of the five satellites can then be used interchangeably in mapping algorithms. The HLS data also include per-pixel quality flags based on the Fmask 4.2 algorithm (updated from ref. 40); here used to filter out cloud, cloud buffer, cloud shadow, and snow/ice contaminated observations (1 in any of bits 1–4 in Fmask data), leaving clear land and water observations suitable for change monitoring. Given that aerosols cause much greater scattering effects for shorter wavelengths, the coastal aerosol, blue, and green bands were excluded, along with the cirrus band, to allow for more stable alerting. The red, NIR, SWIR1.6, and SWIR2.1 bands were then used throughout this study as they are the remaining bands that are present in both Landsat 8/9 and Sentinel-2 data.
Baseline
To test for anomalies in the current observation, it is compared against observations from a rolling baseline period of the previous three years within 15 calendar days of the current observation (e.g., for April 10, 2023 this is all observations March 26 through April 25 of 2020, 2021, and 2022, and for April 10, 2025 this would be all observations March 26 through April 25 of 2022, 2023, and 2024). Due to historical data availability during the development of DIST-ALERT, a baseline period of three calendar years was selected as it was the maximum duration of HLS data with combined Landsat and Sentinel-2 coverage (since that time, the whole Sentinel-2 data record has been integrated into HLS). Landsat 9 was included in HLS data beginning in 2022, resulting in more observations within the baseline for 2022 forward.
The number of years within the baseline can affect the types of land cover change that are detected. A greater number of years will include more natural variations as well as more anthropogenic disturbances within the baseline. Land cover change events within the baseline period expand the range of variability within the set of baseline observations and cause affected locations to be less likely to be alerted within the current year. If the baseline period is long enough, second change events will be suppressed, regardless of whether the change is abrupt, such as forest clear cuts, or gradual, such as droughts. If the baseline period is very short, all variations will be flagged, even those well within the range of normal fluctuations. The selected three-year period gives relatively high sensitivity to natural variations and disturbance events, whereas a 10-year baseline product would only highlight rare or more anomalous events, such as decadal-scale droughts or primary forest loss. A longer baseline product would be a valuable future addition alongside the current product.
The number of calendar days included in the baseline affects the sensitivity to shifts of the phenological cycles, with fewer days tied to greater sensitivity, and is reliant on data availability. With the goal of maximizing utility to land managers and protectors, such as forest rangers, a baseline window of ±15 days was selected to maximize monitored area while balancing sensitivity and significance of events. A longer window for acquisitions provides more opportunities for observations uncontaminated by cloud, shadow, or snow/ice, and more observations contribute to a more representative set of normal land cover conditions. Leveraging the stratified sample (see below), we estimate that a calendar window of ±5 days would have, on average, 18% less current year observations with at least four observations within the baseline (our requirement for calculating the vegetation anomaly) and 47% less with at least seven observations within the baseline (our requirement for calculating the spectral anomaly). For a window of ±10 days, there are 4% and 9% less observations with a sufficient baseline for the vegetation and spectral anomaly calculations, respectively. Increasing the window to ±20 days or ±30 days provides gains of 2% for the vegetation anomaly calculation and 3% and 6% for the spectral anomaly calculation, respectively. For both applications, we achieve relative stability in reference data richness at ±15 days.
Vegetation fraction
Vegetation cover is defined as the fraction of the ground surface that is blocked by photosynthetically active vegetation when viewed from above, or inversely, the fraction of skylight intercepted by photosynthetically active vegetation within a unit area41. This includes all vegetation over the land or water surface, both woody and herbaceous, as has been done for other vegetation continuous field products42,43.
As it is very difficult to ascertain the precise vegetation cover of a pixel using multi-meter resolution satellite data, we developed training data from very high-resolution multi-spectral unoccupied aerial vehicle (UAV) data. A WingtraOne GEN II UAV equipped with a MicaSense RedEdge-MX camera was flown at about 120 m to collect 8–10 cm data. The sensor has five reflectance bands: blue (475 nm), green (560 nm), red (668 nm), red edge (717 nm), and near infrared (842 nm). The training data for the vegetation fraction model were collected iteratively over 232 flights in Senegal, the Republic of Congo, and 17 different states of the United States (Supplementary Fig. 2). Each flight was composed of dozens of flight lines and mosaicked from thousands of individual images taken by UAV and had an average area of 24 hectares after processing. This data was collected iteratively by assessing spectral plots of training data and filling in regions of sparse data with additional flights. The resulting data cover a wide range of ecosystems, soil types, and land uses in different seasons including: desert, leaf-on and leaf-off arid shrubland, grassland, marsh, forested wetland, evergreen and deciduous temperate forest during various stages from leaf-off to full leaf-out, before and after selective logging events within tropical rainforest, suburban neighborhoods, and various crop types across phenology stages from bare to planted to harvested.
For each of these high-resolution pixels, a simple linear translation of the NDVI to vegetation fraction was applied for the range of 0.10 to 0.80, adapted from previous studies44,45,46. At 8–10 cm, there is a high degree of uniformity in the land cover that allows for a simple scaling of NDVI. However, we do not apply this same translation method directly to 30 m HLS pixels because there can be a wide array of land covers within a single pixel (e.g., grass, bare soil, concrete, asphalt, and a tree could all be within a single 30 m pixel), and NDVI can behave non-linearly with mixed land cover44. Instead, the data from these UAV flights were matched with near-coincident clear-sky HLS data, with each HLS image interpreted visually to avoid cloud or haze. For each 30 m pixel footprint (aligned with HLS data) that had complete UAV data coverage, the >90,000 high-resolution pixels of UAV-based vegetation fraction were averaged to estimate the vegetation fraction at the 30 m scale to employ as training (Supplementary Fig. 3).
A K-Nearest Neighbor (KNN) model47 was then built based on the training set of 30 m UAV-derived vegetation fraction pixels with near-coincident data (Supplementary Fig. 3). To acquire representative training covering various land cover/use types, UAV data were collected iteratively at different seasons through 2022–2023 to fill out the HLS reflectance feature space. A principal component analysis (PCA) transformation of the HLS reflectance values of the set of all training pixels was applied to equally weight variance along multiple axes, to have the K-nearest neighbors be distributed relative to the distribution of the whole dataset, and to reduce dimensionality to the top three principal components. A KNN model with K set to 100 was selected to facilitate a smooth gradation of vegetation fractions across the reflectance space without large increases or decreases of vegetation fraction relative to a small reflectance change, as can happen with thresholding-based decision tree models such as Random Forest. This fixed global PCA transformation and model are then applied to all HLS pixels that are identified as clear land or water, and the percent vegetation cover is recorded in the layer VEG-IND. Given our current training data, the algorithm may contain biases for certain land conditions. However, the model will yield consistent estimates for given spectral reflectances, allowing for change to be mapped, with the exception of cases where the estimated pre-change vegetation fraction is incorrectly underestimated.
Vegetation loss anomaly
To alert for anomalously low vegetation cover, the vegetation cover must be estimated for the current observation and compared against estimates from the rolling baseline period. The set of baseline vegetation fraction estimates can be aggregated in a number of ways, including the minimum, mean, median, etc. For the vegetation loss algorithm of DIST-ALERT, we have selected the minimum vegetation fraction as the baseline value so that only vegetation cover estimates below the range of what had been previously detected are flagged. Thus, the vegetation cover anomaly is calculated as the current vegetation cover estimate subtracted from baseline minimum, and only low cover anomaly values are recorded, with the layer VEG-ANOM ranging from 0 to 100% vegetation loss relative to the baseline (Supplementary Fig. 4). This means that if there was a vegetation loss event within the baseline period for a given location, that location can only be flagged as vegetation loss in the current DIST-ALERT if the current vegetation fraction estimate is less than it was during that previous event.
At least four baseline clear-sky observations are set as the requirement to calculate the vegetation anomaly, but often there are far more available. However, in cloud-dominated regions, there may be extended periods without four uncontaminated ground observations, in spite of typically having 23–46 HLS observations (before cloud and shadow masking) within the ±15-day baseline for the tropics and with many more potential observations at higher latitudes. To better monitor these cloud-dominated areas throughout the year, the estimate with the minimum percent vegetation cover in the previous three calendar years is selected, avoiding observations flagged as high aerosol in Fmask. If this three-year minimum is ≥85% vegetation cover, then even when there is less than the standard requirement of four vegetation cover estimates within the ±15-day window, we allow the vegetation cover anomaly still to be calculated as the location has year-round high vegetation cover. In this case, the baseline value is set as the lesser of the annual minimum and any observations within that window. This allows for year-round monitoring in the humid tropical forests.
Spectral anomaly
To account for land change unrelated to vegetation cover loss, we include a secondary algorithm that employs a spectral distance measure. Reflectance data from observations within the baseline period form a distribution of spectral variation. Most often, there is a larger range for some spectral bands compared to others, and reflectance values of a given spectral band are correlated with those of another. Thus, while various methods of measuring distance exist, the measure chosen was Mahalanobis distance48, as it accounts for covariance in the near-term historical range, unlike Euclidean distance. With this measure, the absolute distance from the mean is scaled by the standard deviation of the baseline in the direction of the change vector. The unit of the distance is the number of standard deviations away from the centroid of the baseline distribution.
Historical observations are included in the baseline based on the same temporal parameters as those used for the vegetation anomaly, and the same four reflectance bands as the vegetation fraction inputs are used: red, NIR, SWIR1.6, and SWIR2.1. Only observations uncontaminated by cloud, shadow, and snow/ice, as identified by the HLS Fmask layer, are included. Current HLS spectral signatures are compared to the normal historical range, and outliers are measured by the Mahalanobis distance from the mean. A minimum of seven historical clear ground observations is needed to calculate the baseline distribution. However, the distance calculation is more robust the more baseline observations are available. The scaled distance from the historical mean is recorded for all valid observations per HLS scene (layer GEN-ANOM). For targets with very stable land cover in the baseline period, the standard deviation in any direction is small, and observations with land cover change will have very high distance values. For targets with fluctuating cover within the baseline, such as some cropland, the distance values for change will be smaller.
Time-series tracking
All anomalies are tracked through time to build or decrease confidence and provide additional contextual information, such as duration and magnitude. DIST-ALERT includes 19 layers, four of which are based only on the current granule (image): current vegetation fraction, current vegetation anomaly, current spectral anomaly, and a mask of pixels with valid data in the current granule (Supplementary Table 1). The other layers all build on the product labels in the most recent previous granule and are updated based on the anomaly values of the current granule. These time-series layers are in two sets, corresponding to the vegetation anomaly and the spectral anomaly, and are independent of one another but largely mirror one another in structure. A minimum of 10% vegetation loss or a minimum distance of 15 is required for inclusion in the time-series layers. The date of the initial anomaly detection, maximum anomaly value, count of anomaly detections, confidence, duration, and the date of the last observation with sufficient baseline are all recorded. The baseline vegetation fraction value of the date with the maximum vegetation anomaly is also provided.
Status layers (for each vegetation loss and generic change) summarize the confidence of alerts, whether they are ongoing, and whether they are high or low magnitude events (≥50% vegetation loss and ≥50 distance is considered high magnitude). Initial detections are labeled as ‘first’. If the subsequent valid observation is anomalous then the alert moves to ‘provisional’, otherwise the time-series layers are reset to no-disturbance values. A confidence value is computed as the mean of the anomaly values multiplied by the number of anomaly detections squared and is updated with each subsequent observation. Alerts that attain a confidence value of ≥400 are marked as ‘confirmed’. Once there are two consecutive non-anomalies (below the minimum threshold) or one non-anomaly ≥15 days after the last anomaly detection, then confirmed alerts are moved to ‘finished’ and provisional alerts are reset to no-disturbance. The values of time-series layers of finished alerts are no longer updated with each HLS observation, but can be overwritten if there is a new anomaly detection. Alerts can persist in the product for a maximum of 365 days. This allows for multiple disturbance detections across years as well as within a single year.
Annual summaries
To facilitate multi-year analyses and end-of-year reporting, annual summaries are produced for each calendar year in the product DIST-ANN (also available from LP DAAC). Since a location could have multiple alerts in a year, the per-pixel highest confidence alert that also reaches confirmed status within the given calendar year is included. This means that alerts that remain provisional are excluded from DIST-ANN, but the alerts initially detected late the previous year that were not confirmed until the current calendar year are included (an additional layer is provided to flag whether the initial alert detection was in the previous year). All the time-series layers of DIST-ALERT are included in DIST-ANN with values corresponding to the final update of each included alert. In addition, the maximum annual vegetation cover is calculated for all no-disturbance pixels. The minimum vegetation cover estimate is calculated for all pixels from the given and preceding two years (avoiding high aerosol observations). This three-year minimum becomes input into the next year’s anomaly detection for sparse data periods where there is year-round high vegetation cover (≥85% vegetation cover in this three-year minimum, see “Vegetation loss anomaly”).
Sample design
To estimate the area and attribute the drivers of various change dynamics in 2023 and to assess the accuracy of DIST-ALERT, we employed a probability sample assessment. Our evaluation period is the 2023 calendar year. DIST-ALERT was processed starting from January 1, 2023, so there was no start-up window, and events that began in 2022 were included. All the layers were produced for every HLS granule, resulting in a dense time series of DIST-ALERT granules. The DIST-ANN product was produced from all the 2023 DIST-ALERT data. Both products are assessed with the developed reference sample data.
As land cover change represents only a very small fraction of Earth’s land surface, using a stratified design is essential to target locations of probable change49. Our stratification combined change strata derived from DIST-ALERT and land cover/use strata derived from existing land cover/use maps30,31,50,51,52. The change strata were constructed hierarchically from the whole time series of the VEG-DIST-STATUS and GEN-DIST-STATUS layers of DIST-ALERT. First, all pixels with provisional or confirmed vegetation loss ≥50% for any date were combined into the high vegetation loss, high confidence stratum (Supplementary Fig. 5). Similarly, any pixels with provisional or confirmed vegetation loss <50% were included in the low vegetation loss, high confidence stratum. Then, any remaining pixels that have any dates with an isolated detection of vegetation loss ≥10% (‘first’ only) are included in the possible vegetation loss stratum. Then any pixels with provisional or confirmed spectral anomalies were included in the other change stratum. Finally, all valid unselected pixels were included in the no change stratum. Pixels without anomaly evaluations in the evaluation year (due to no valid current year observations or insufficient observations in the baseline) were excluded. Land cover/use strata were defined from a combination of existing data sources that were used to define the strata hierarchically (Table 1). Within each land cover/use stratum, the four change strata were assigned, and the no change stratum was combined between all the land cover/use strata (Supplementary Fig. 5).
The strata were created for each Military Grid Reference System (MGRS) tile and then aggregated into a strata map for each UTM zone. From each of the four change strata, fifty 30 m pixels were selected for each land cover/use stratum (with other and surface water initially combined as one stratum, and an additional pixel from the tree cover with previous disturbance, high vegetation loss stratum), plus 48 30 m pixels from the no change stratum, resulting in 1649 selected in total. These pixels were selected following existing methods designed to ensure that every point within each stratum had an equal initial inclusion probability53. This protocol yielded an equal inclusion probability stratified random sample of equal area pixels (30 × 30 m) representing the global land surface (excluding Antarctica) (Supplementary Fig. 5). The UTM zone strata maps were converted to geographic coordinates and globally mosaiced to calculate strata areas.
Reference data
For each of the selected sample pixels, a dense time series of disturbance labels was created using information from HLS, monthly true color PlanetScope composites created by Planet Labs, and very high-resolution images from Google Earth. The first cloud and shadow-free HLS image, or if none were clear, just the first HLS image when available, was selected for each 5-day interval within 2023 and for the respective intervals of each year within the three-year baseline. These images were displayed in false color (SWIR1-NIR-Red and NIR-SWIR1-SWIR2) to better highlight vegetation. All these data sources were compiled into an HTML page for each selected pixel that was used for interpretation and labeling (Supplementary Fig. 6). To best understand the context and land cover dynamic of each selected pixel, individual interpreters first evaluated in concert with one another (a) a spectral time-series plot of HLS data for the selected pixel, (b) very high-resolution images in Google Earth, and (c) the monthly Planet true color composites for every month through the entire baseline and evaluation period (2020–2023). Then each interpreter separately assessed a time series of HLS images and labeled the land surface disturbance status for each 5-day interval with viable data.
For each of the 73 five-day intervals of the evaluation year, the current image was compared to the images from the preceding, current, and following 5-day intervals within the previous three years. If there was discernable vegetation loss compared to any of the three baseline five-day intervals it was assessed as ‘majority vegetation loss’ (≥50%) or ‘minority vegetation loss’ (>0% and <50%) based on the estimated drop in vegetation cover in the 30 m pixel regardless whether the loss was dispersed over the entire pixel or just affecting a portion of the pixel. If there was no vegetation loss, but there was other land change, then it was similarly labeled ‘majority other change’ (≥50%) or ‘minority other change’ (>0% and <50%). If there was no discernible change, it was marked as ‘no change’. To aid in correctly determining the date of change, periods with missing or obscured observations directly preceding identifiable change were recorded as ‘no observation’. Finally, after evaluating all the data sources and labeling the time series, each interpreter labeled (a) the overall type/driver of land change dynamic (including ‘no change’), (b) whether it was a human-caused conversion, naturally caused conversion, or no conversion, and (c) whether the change began in the previous year. For a list of the observed dynamics, see Fig. 3. If multiple dynamics were observed, such as natural browning followed by clear-cut logging, the dominant change factor was recorded. However, if there was natural browning and natural greening at different times of the year, it was recorded as natural browning and natural greening.
All selected pixels were evaluated by two interpreters. Where there was disagreement in any of the overall labels (a)–(c) or a difference in the overall presence/absence or magnitude of vegetation loss, the pixels were evaluated collectively in a consensus approach. This final time series of 5-day labels was interpolated to derive reference labels for each date, with gaps up to 15 days filled, else left as ‘no clear observation’. The final set of pixels with conversion was assessed for removal of or damage to natural vegetation. When natural vegetation was observed it was marked as ‘long-lived’ for vegetation with no signs of a different previous land cover within the Landsat and Google Earth record (1985–1998 forward, depending on the location), ‘long-lived secondary’ when there was evidence of previous conversion but with mature vegetation natural to the ecosystem established before 2023, or ‘scrub’ for immature vegetation growing up on idle land following previous land use or conversion. Long-lived vegetation was combined with long-lived secondary vegetation for all of the reported analyses, as both had mature, established vegetation prior to 2023.
Area estimation
Areas were estimated for each of the labeled change dynamic types and conversion types from the reference sample based on the presence or absence of the target class within each selected sample 30 m pixel. No sampled pixel can be double-counted, and each sampled pixel can only have one change dynamic label. All non-conversion changes were required to have a duration of ≥10 days. For area estimation, we define an indicator function, yu as yu = 1 if the sample unit u is of the target class, and yu = 0 otherwise. Area is then estimated by (Eq. (1) of ref. 53 adapted from ref. 54):
$$\hat{A}={\sum }_{h=1}^{H}{A}_{h} \, {\bar{y}}_{h}$$
(1)
where H is the set of all strata; Ah is the area of stratum h; \({\bar{y}}_{h}\) is the proportion of stratum h that is of the target class, estimated from the sample: \({\bar{y}}_{h}={\sum}_{u\in {h}}{y}_{u}{/n}_{h}\), where \({n}_{h}\) is the number of sampled pixels in stratum h. The variance estimator used to estimate the standard error (SE) is:
$${SE}\left(\hat{A}\right)=A*\sqrt{\left(\frac{1}{{N}^{2}}\right){\sum }_{h=1}^{H}{N}_{h}^{2}\left(1-\frac{{n}_{h}}{{N}_{h}}\right)\left(\frac{{s}_{{yh}}^{2}}{{n}_{h}}\right)}$$
(2)
where Nh is the number of pixels in stratum h; \({s}_{{yh}}^{2}\) is the sample variance of yu for stratum h: \({s}_{{yh}}^{2}={\sum}_{u\in {h}}{({y}_{u}-\,{\bar{y}}_{h})}^{2}/({n}_{h}-1)\) (equations (25) and (26) of ref. 54).
Accuracy estimation
User’s and producer’s accuracies were estimated for the DIST-ALERT vegetation disturbance status layer (VEG-DIST-STATUS). The accuracy measures estimate the accuracy of the vegetation disturbance product available at any given date within 2023. As the generic change is measured by the Mahalanobis spectral distance, it is a statistical measure of the reflectance received by the sensors. A conventional accuracy assessment applicable to thematic classes does not apply to validating such a statistical measure as the reported distances. The harmonization of spectral inputs of the HLS time series ensures consistency of the generic change algorithm, though in both algorithms, anomalies can be due to unlabeled cloud or other atmospheric effects in the Fmask layer and improvement of cloud masking would remove many single-detection false positives. However, as we assign the thematic class of change to larger distances, we validate that classification.
Each available date within the DIST-ALERT time series with a valid current observation was compared to the reference time series. This results in a one-stage cluster sample55 where all available dates for a selected pixel form a temporal cluster53. Change labels must persist for 10 days in the reference data to qualify as a change for the validation. This translates to a minimum visible duration of roughly 15 days, given the comparison of adjacent intervals during reference sample labeling and the variability of the date of HLS within a 5-day interval, corresponding to the ±15-day baseline window of DIST-ALERT. The product can have states of no-alert, ongoing-alert (first, provisional, or confirmed), or finished-alert. No-alert and ongoing-alert states are compared against the reference data of the previous 30 days, plus nine subsequent dates to allow for the required 10-day duration for change presence/absence. Thirty days were selected to account for the current state and allow for variability of no-data labels and the persistence of the DIST-ALERT ongoing-alert label until there are two consecutive no-anomaly observations or one ≥15 days after the last anomaly detection. Finished alerts are compared against the entire preceding reference time series plus nine days.
To evaluate the performance of DIST-ALERT, we aggregated the map and reference labels into ‘high-change’, ‘low-change’, and ‘no-change’ labels, separately for the vegetation loss alerts and the generic change alerts. The performance for high magnitude vegetation loss events was evaluated by comparing alerts with ≥50% loss with any loss in the reference and by comparing majority loss in the reference with any loss alerts. Similarly, the performance of the system for low magnitude events was evaluated by comparing alerts with <50% loss with any loss in the reference and by comparing minority loss in the reference with any loss alerts. The performance for low and high magnitude events combined was evaluated by comparing all alerts against all reference loss. For the generic change alerts, these bins are low and high magnitude change in the status layer (GEN-DIST-STATUS), and other change and vegetation loss are combined in the reference. We performed these assessments for all alerts provisional status and above, and separately for confirmed alerts only. When computing the accuracy of alerts with confidence levels of provisional and confirmed, dates with map labels of ‘first’ were excluded because given that all higher confidence labels start with ‘first’ before progressing to ‘provisional’ and possibly ‘confirmed’ these dates should not be considered ‘no-change’, but they should also not be considered ‘change’ when assessing higher confidence labels. Similarly, ‘provisional’ and ‘first’ were excluded for the assessment of confirmed alerts.
To calculate user’s and producer’s accuracies, we employed indicator functions54 of yu and xu which are defined per statistic, but then are inserted into the same pair of general formulas (equations (27) and (28) of ref. 54). Depending on the definition of yu and xu, user’s or producer’s accuracies can be estimated by \(\hat{R}\):
$$\hat{R}=\frac{{\sum }_{h=1}^{H}{N}_{h}{\bar{y}}_{h}}{{\sum }_{h=1}^{H}{N}_{h}{\bar{x}}_{h}}$$
(3)
where H and Nh are defined as above; \({\bar{y}}_{h}\) and \({\bar{x}}_{h}\) are the stratum-specific sample means of \({y}_{u}\) and \({x}_{u}\)56 (section 6.11). The variance estimator used to estimate the standard error (SE) is:
$${{\rm{SE}}}\left(\hat{R}\right)=\sqrt{\left(\frac{1}{{({\sum }_{h=1}^{{{\rm{H}}}}{N}_{h}{\bar{x}}_{h})}^{2}}\right){\sum }_{h=1}^{H}{N}_{h}^{2}\left(1-\frac{{n}_{h}}{{N}_{h}}\right)\left(\frac{{s}_{{yh}}^{2}+\,{\hat{R}}^{2}{s}_{{xh}}^{2}-\,2\hat{R}{s}_{{xyh}}}{{n}_{h}}\right)}$$
(4)
where \({s}_{{yh}}^{2}={\sum}_{u \in h\,}{({y}_{u}-\,{\bar{y}}_{h})}^{2}/({n}_{h}-1)\); \({s}_{{xh}}^{2}={\sum}_{u\in h\,}{({x}_{u}-\,{\bar{x}}_{h})}^{2}/({n}_{h}-1)\); and \({s}_{{xyh}}^{2}={\sum}_{u\in h\,}({y}_{u}-\,{\bar{y}}_{h})({x}_{u}-\,{\bar{x}}_{h})/({n}_{h}-1)\).
For user’s accuracy, a measure of commission, xu equals the proportion of all dates with valid reference and map data for a pixel u where the map label was the target level of change; yu equals the proportion of all valid dates where the map labels were the target level of change and the reference was either ‘low-change’ or ‘high-change’. For producer’s accuracy, a measure of omission, xu equals the proportion of valid dates where the reference label was the target level of change; and yu equals the proportion of all valid dates where the reference labels were the target level of change and the map was either ‘low-change’ or ‘high-change’. Since all dates of DIST-ALERT with both a viable observation and viable reference data are used, there is no additional sampling performed, and thus no additional variance is contributed by the temporal cluster. For DIST-ANN, the reference state is calculated from all dates, and yu and xu are binary 1 or 0 based on the aggregate labels employing the same criteria as for DIST-ALERT.
Latency of alerts
For an alert system to be useful, it must be both timely and accurate. However, there is a trade-off between the two, with longer time lags in alert labeling typically allowing greater accuracy due to more post-event observations. Further, there can be a trade-off between omission and commission errors based on the detection algorithm, confidence measure mechanism, and the lag. It is application and user-specific which characteristics are most important.
There have been several approaches to quantifying timeliness, with some tied to accuracy measures57,58 or the mean detection lag for correctly identified alerts32. Here following the framework of ref. 57, we plot omission errors as a function of time since the actual time of disturbance as determined from the reference date and commission errors as a function of time since initial detection of possible disturbance (Fig. 4). Given the 2–3 day lag of DIST-ALERT generation due to the lag of source HLS generation and to alert production and delivery (median <6 h from HLS availability), three days were added to the detection date for quantifying omission error (producer’s accuracy/recall) to account for production lag in addition to lag due to observation timing (from satellite overpass schedule and cloud cover) and algorithm performance. This temporal quantification of accuracy gives users guidance on the utility of the product for various time steps since the event occurrence, e.g., the likelihood of it being mapped within 3 days, a week, or a month after the disturbance occurred. It also provides estimates of the false positive rate for mapped alerts at different time steps after initial flagging of a disturbance, e.g., after one month, commission rates are much lower than after two days. Users can obtain the last observed date from DIST-ALERT to assess how many days since the expected event or since initial detection are actually accounted for based on when the last clear ground observation was obtained.
We also calculated the mean detection lag of the date label in DIST-ALERT for correctly identified alerts (initial mapped date – initial reference date)32 through the summation of the mean lag per stratum multiplied by the relative stratum area. The initial mapped date comes from the product and corresponds to the first detection of an anomaly. The initial reference date is taken from the raw reference time series prior to interpolation, and then assessed both with and without adjustment for no-observation gaps. When adjusting the reference date, the initial reference date was adjusted to halfway between the date with the most recent prior no-change label and the date with the first change label, if there was a span of dates with no observation32.
Source link