Data source
We used the food purchase data from the Nielsen Homescan Consumer Panel dataset as a proxy of food consumption of the households in the sample. The dataset tracks the product purchase records from 40,000–60,000 US households, with sample sizes varying by year. The same households are tracked for multiple years before replacement and thus constitute a longitudinal panel. The purchase records contain detailed transaction information including the Universal Product Code, product name, quantity, price and promotions of each product purchased, as well as the date of the trip and the county of the store. The food items cover all retail outlets across the USA except Alaska and Hawaii, though consumption data at restaurants are not included. The dataset also collects the demographic and socio-economic characteristics of the households such as the income, education and occupation of male and female heads; as well as the age, sex and ethnic group of each member; the county of the residence and so on. The panel is nationally representative with a sampling weight applied. The dataset is available from 2004 to 2023, and we excluded the data after 2019 to avoid the confounding effects from COVID-19. Although the purchased foods can be stored rather than consumed in a short period, this issue is partially relieved in our analysis since (1) we focus on monthly food consumption, (2) most products with high added sugar content such as sweetened beverage are typically consumed within a short period following purchase, unlike items that are often stored for extended durations before use such as flour and (3) even for the items that are likely to be stored such as sugar, our analysis provides conservative estimates (in which the purchase of sugar is less immediately responsive to changes in the temperature).
We focus on all the food items to retrieve the consumption of added sugar for each household. Unfortunately, the nutrition facts are not directly available from the Nielsen dataset, and a direct search to obtain these nutrition facts using the product name is not feasible as only the abbreviation of the name is available other than the 716 product modules, which describe the nature of the products (sweetened beverage, cake, ice cream and so on) and special characteristics (low sugar, zero sugar and so on) in the Nielsen dataset. Therefore, we match the products with the items in the Food and Nutrient Database for Dietary Studies (FNDDS)43, which is updated every 2 years and describes the amounts of nutrients/food components in the most common foods and beverages consumed by Americans. Since the majority of food items stay the same, we adopted the 2017–2018 version for matching. We conducted multiple rounds of manual matching with the description of more than 6,000 food items in the FNDDS, after which a manual check was conducted to randomly sample the Nielsen products and inspect their matchings to ensure that the matches best reflect the nutrition facts of the products. For some food ingredients that are not included in FNDDS, such as cake flour and beverage powders, we used the Food Patterns Equivalents Ingredients Database (FPID) to complete the matching, which is an equivalent ingredient database developed based on multi-ingredient foods in FNDDS. To demonstrate the response by product categories, we used the What We Eat in America (WWEIA) categories, which provide a concise division of food items with nutritional implications and are directly available for every food item in FNDDS. For the FPID-matched items, we manually assigned the WWEIA category. A full relation between Nielsen products and WWEIA categories is available in Supplementary Table 1. Note that such method can only quantify the added sugar consumption rather than the actual intake, the limitation of which is examined in ‘Discussion’ section.
The ground-station level meteorological records are from Global Surface Summary of the Day (GSOD)44, which provides daily meteorological records, including atmospheric temperature, precipitation, wind speed, humidity and so on. We matched each household with all the climate stations within a 100 km buffer of the county where it is located, similar to previous research21. The spatial distributions of the stations and the counties in which data are available are shown in Supplementary Fig. 1. The records of the matched climate stations were then averaged to indicate the meteorological condition that the household was experiencing during a specific day.
Empirical strategy
The specification was constructed as
$${y}_{{irt}}=f\left({\mathrm{temp}}_{{irt}}\right)+{\delta \mathbf{W}}_{{irt}}+\mu {\mathbf{H}}_{{irt}}+{\alpha }_{i}+{\gamma }_{{rc}}+{\lambda }_{{tc}}+{\varepsilon }_{{irt}},$$
(1)
where \({y}_{{irt}}\) is the indicator related to added sugar consumption per capita per day in household i in month r of year \(t\). Despite the consumption data being available from every trip that the households make, the location of the destination stores and the date for making the trip can be affected by the weather16. Particularly, unpleasant weather would lower the shopping frequencies since people tend to avoid outdoor travelling45. As a result, regression at the trip level may show a smaller correlation between high temperatures and added sugar consumption, potentially leading to an underestimation issue. Therefore, we aggregated the data at the monthly level to avoid the capture of such confounding effects with the change of product type and quantity sustained. \(f\left({\mathrm{temp}}_{{irt}}\right)\) is a response function of temperature, describing how each household reacts to a particular temperature level in consuming added sugar and other food and nutrient components. The vector \({\bf{W}}_{{irt}}\) contains other meteorological variables including precipitation, wind speed, relative humidity and its squared terms. \({\bf{H}}_{{irt}}\) is the vector of household characteristics including the household income, household size, educational level and age of male and female head, presence and age of children, marital status and ethnic group. Among them, the original income is a categorical variable representing nominal annual income ranges. It is converted to a continuous variable by assigning mid-point values, simplifying high-income categories to 150,000, adjusting for inflation (2010 as base year) and standardizing by dividing by 10,000 to reflect real purchasing power in 10,000 units for regression analysis (Supplementary Table 2). Notably, the Nielsen survey employed a stratified sampling design to ensure sample representativeness, balancing nine key household characteristics including household income, which was categorized into six distinct levels for stratification purposes. Based on this framework, we combine two adjacent middle-income tiers (US$25,000–34,999 and US$35,000–49,999) to create a five-category structure for subsequent heterogeneity analysis, as illustrated in Fig. 2c. We also include fixed effects for household (\({\alpha }_{i}\)), the interaction between month of year and climate zone (\({\gamma }_{{rc}}\)) and the interaction between year and climate zone (\({\lambda }_{{tc}}\)). The classification of climate zones are classified by the US Department of Energy Building America Program in determining the standards of building construction22. This classification provides a concise way in separating the background climate. The error term \({\varepsilon }_{{irt}}\) is clustered at the county level.
Multiple indicators related to added sugar consumption are examined as \({y}_{{irt}}\). We first adopted the actual per-capita added sugar consumption, which distorts the discrepancy between the age–gender distribution of average energy requirements and that of sugar consumption thresholds. We thus converted the per-capita consumption to adult male equivalents to allow for better standardization and comparison of consumption across different age and gender groups46. Since the Nielsen data provide consumption data for the entire household rather than each member, we standardized sugar consumption using the total energy requirement ratio between all the true household members and the adult equivalent members. Here, we first defined adult males as men aged 18–29 years, and for infants under 1 year old, we calculated them as 11 months. We then calculated the conversion ratio using the average energy requirement (AER) for energy of a specific age and gender with physical activity level values of 1.6 (replaced with 1.4 when children aged 1–3 years) provided by European Food Safety Authority using the following equation:
$${\mathrm{Conversion}}\,{\mathrm{ratio}}=\mathop{\sum }\limits_{i=1}^{{\mathrm{household}}\,{\mathrm{size}}}\frac{{\mathrm{AER}}\,{\mathrm{of}}\,{\mathrm{membe{r}}}_{i}}{{\mathrm{AER}}\,{\mathrm{of}}\,{\mathrm{adult}}\,{\mathrm{male}}}.$$
(2)
The adult male equivalent per-capita consumption of added sugar can then be adjusted by dividing the actual per-capita added sugar consumption by this conversion ratio.
Multiple forms of \(f\left({\mathrm{temp}}_{{irt}}\right)\) are included in our analysis. We ran the binned model for the main results with \(f\left({\mathrm{temp}}_{{irt}}\right)={\sum }_{j}{\beta }_{1j}{\mathrm{temp}}_{{ijrt}}\), in which \({\beta }_{1j}{\mathrm{temp}}_{{ijrt}}\) are a set of bins indicating whether the monthly average temperature falls into a specific interval. We set 2 °C for each interval to allow a high resolution in detecting the possible nonlinear effect of temperature. The results show that all the outcome variables are monotonically increasing with temperature (Fig. 1). The default group in the final regression was set as the lowest temperature level (≤0 °C). We also conducted robustness checks, replacing \({\sum }_{j}{\beta }_{1j}{\mathrm{temp}}_{{ijrt}}\) by the CDDs and heating degree days (HDDs) aggregated to the monthly level (Supplementary Table 5). These two measures are used to estimate energy demand for heating or cooling buildings as a proxy of how much the atmosphere temperature is beyond the thermal comfort of human being. The CDD is calculated when the outdoor temperature exceeds a base temperature (for example, 18 °C or 65 °F), indicating a need for cooling, while the HDD is calculated when the temperature falls below the base, indicating a need for heating. In addition, we completed a robustness check using interactions of education (male head and female head) with month and year as fixed effects in the model and find that there is no critical change in the results (Supplementary Tables 6 and 7). We also reran the regression analysis including food price as a control to explore whether the weather fluctuation covariates with price changes. To include food category prices in the regression analysis and eliminate the impact of price changes, we calculated the monthly average price of each food category based on the purchase data of each household trip. To exclude outlier influence, we remove the top 5% of all purchase prices, and then averaged the monthly average price of each food category in each county according to the county where the household was located. We also checked the results by removing the top and bottom 5% price outliers (the low price is probably due to the discount and redeem of vouchers, but can also be due to misrecording, and the regression results do not change, as shown in Supplementary Table 10). However, some counties may have missing prices because no households purchased them. We first tried to replace them with the annual average price of the same county. In the case of missing values, we used the national average price of the same month. Furthermore, we used 2010 as the benchmark and used the food inflation index to eliminate the impact of inflation. We evaluated multicollinearity in the price model using variance inflation factor tests. The results show that, except for relative humidity and its squared term, all other variables had adjusted generalized variance inflation factor values within acceptable thresholds (<2), indicating no severe multicollinearity issues in the model (Supplementary Table 11).
We then conducted a spline regression, which exhibits higher sensitivity to changes in the independent variables compared with the binned model but provides more flexibility in capturing the potential nonlinearities of temperature compared with the CDD and HDD method. Based on the results of the binned model, the spline model divides the temperature range into three segments: 0–12 °C, 12–30 °C and above 30 °C, which allows us to better capture the growth rate of added sugars in different temperature ranges. The spline model also provides a smaller number of temperature variables to facilitate the tests on heterogeneous effects that require interaction terms between temperature indicators and the socio-economic characteristics of the households, which could otherwise involve too many regressors in the binned model. Note that we do not use the interaction above 30 °C in the heterogeneity analysis due to the insufficient number of observations so that the slope is same for all groups in this temperature interval.
Projections of added sugar consumption in future climate scenarios
Meteorological variables for projected climates are from the CMIP6. Here, we used monthly surface temperature (tas), rainfall (pr), wind (sfcWind) and relative humidity (hurs) from the r1i1p1f1 run of SSP5-8.5 (2015–2100) experiments from 25 models (ACCESS-CM2, ACCESS-ESM1-5, AWI-CM-1-1-MR, CanESM5, CanESM5-1, CAS-ESM2-0, CESM2-WACCM, CMCC-CM2-SR5, CMCC-ESM2, EC-Earth3, EC-Earth3-CC, EC-Earth3-Veg, EC-Earth3-Veg-LR, FGOALS-f3-L, FGOALS-g3, FIO-ESM-2-0, INM-CM4-8, INM-CM5-0, IPSL-CM6A-LR, MIROC6, MPI-ESM1-2-HR, MPI-ESM1-2-LR, MRI-ESM2-0, NorESM2-LM and NorESM2-MM). We subsampled model outputs at the grid box in each county with the 100 km buffers for spatial join. Although other SSP scenarios are available, we used the highest emission scenario (SSP5-8.5) to test the widest possible warming range, and results are also reported as a function of the global warming level47.
When a county covers more than one grid box, we used the averaged value. To account for possible model biases in climatology, we calculated anomalies relative to simulated 2015–2023 monthly climatology and added observed climatology over the same time interval48. Bias-corrected simulations were further averaged using a 9-yr running smoother to dampen interannual variability. The projection of added sugar consumption was then calculated using equation (1). Multimodel mean results are shown in Fig. 4, and descriptive statistics for individual models are listed in Supplementary Table 24.
Disclaimer
The researcher(s)’ own analyses are calculated (or derived) based, in part, on (1) retail measurement/consumer data from Nielsen Consumer LLC (‘NielsenIQ’), (2) media data from The Nielsen Company (US), LLC (‘Nielsen’) and (3) marketing databases provided through the respective NielsenIQ and the Nielsen Datasets at the Kilts Center for Marketing Data Center at The University of Chicago Booth School of Business.
The conclusions drawn from the NielsenIQ and Nielsen data are those of the researcher(s) and do not reflect the views of Nielsen. Nielsen is not responsible for, had no role in and was not involved in analysing and preparing the results reported herein.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Source link