### Data

For gridded wind data, we use the 2019 and 2020 surface wind ERA5 datasets from the European Centre for Medium-Range Weather Forecasts (ECMWF), with a 1-hour temporal resolution and 1/4° spatial resolution^{37}. For gridded currents, sea temperature and salinity data, we use 2019 and 2020 data from the Hybrid Coordinate Ocean Model, with a 3-hour temporal resolution and 1/12° spatial resolution^{38}. For data on the properties of the oil, we use the National Oceanic and Atmospheric Administration’s (NOAA’s) Oil Library Project^{39}. For data on fuel prices, we use a dataset from the World Food Programme^{40}. We use fuel import data from the United Nations Verification and Inspection Mechanism for Yemen^{41}. We used a variety of sources for data on desalination plant locations and capacity (Supplementary Table 2)^{42,43,44}. For Yemen, locations for all desalination plants were not available, and the water capacities for the known plants were also unavailable. Therefore, to estimate the water capacity for each of the known plants in Yemen, we used the most recent available data on country-wide desalination capacity^{45} and divided it equally among the known plants. Food import estimates were derived from Yemen’s port data^{46}.

### Oil-spill modelling

We modelled the spill using the pyGNOME library from NOAA to use their GNOME model^{47}. GNOME is a widely used Eulerian/Lagrangian spill-trajectory model, modelling spills with Lagrangian elements within flow fields. Like most operational response tools, GNOME is able to model the oil transport and weathering processes of advection, diffusion, dispersion-entrainment, emulsification, evaporation, spreading, oil–shoreline interaction and dissolution. We chose NOAA’s suite of modelling tools due to their history of being implemented operationally and validated against real-life environmental disasters as well as their widespread use among disaster response agencies^{48}. As model inputs, we used the characteristics of the crude oil on board the *Safer*, Marib Light, as well as the historical currents and wind data of the region. We performed 1,000 Monte Carlo simulations each for both summer (June–August) and winter (December–February), varying time of day and date. Seasons were chosen on the basis of known current patterns in the Red Sea^{49}. We restricted simulations to three-week timelines due to predictability limits inherent to oil-spill modelling and uncertainty in clean-up efforts^{50}.

Each simulation had 1,000 particles representing the oil trajectory based on historical weather conditions. Particles were sizeless and represented by surface oil concentration values and coordinates on a latitude–longitude grid rounded to three decimals, corresponding approximately to 100 m × 100 m. For each season, we simulated 1,000 spills with three-week timelines. At the end of weeks 1, 2 and 3, we calculated the average surface concentration value at each point on the grid. We then used bilinear interpolation to calculate values between points. Given the uncertainty in the amount of oil that will be spilled, we converted the average surface concentration values from absolute values (average surface oil thickness, measured in metres) to relative values (average surface oil thickness relative to other exposed areas, measured in percentiles).

Each simulation also had 1,000 uncertainty particles that simulated the oil trajectory through parameter settings that assume extreme weather conditions. According to GNOME documentation, the area enclosed by the uncertainty particles represents where approximately 90% of spill trajectories are expected to fall^{51}. We calculated the area enclosed by the uncertainty particles by computing the convex hull from the locations of all uncertainty particles from all 1,000 simulations and plotted it as the uncertainty region.

We repeated our spill analyses, varying spill duration, season, grid resolution and number of particles to assess how model output would change. Our original models assumed a 7-day spill; we repeated the models for a 24-hour spill. We further repeated the spill models for spring (March–May) and autumn (September–November). We also resimulated the spills using a coarser spatial resolution (latitude and longitude rounded to 2 decimal points, corresponding to approximately 1.1 km × 1.1 km) and higher number of particles (10,000 instead of 1,000) (Supplementary Figs. 3–5).

### Oil fate

To calculate the fate of the oil, we used NOAA’s Automated Data Inquiry for Oil Spills tool. As inputs, we used data on the oil type, gridded winds, gridded currents, water salinity and water temperature. We modelled oil fate for two scenarios: one with no clean-up (evaporation only) and one with extremely optimistic clean-up conditions. We restricted all weathering models to run for six days because they do not account for longer-term factors, such as biodegradation or photo-oxidation, that may affect the weathering rate^{47}. To get a range of evaporation estimates, we performed 1,000 Monte Carlo simulations by randomly varying time and date. We calculated the mean and 95% uncertainty intervals, defined as the 2.5th and 97.5th percentiles across all simulations. For clean-up analyses, we made the optimistic assumptions that (1) clean-up would begin immediately after the spill occurs, and (2) oil recovery amounts over six days would be comparable to those of the 2010 *Deepwater Horizon* oil-spill clean-up. We modelled a skimmer with a recovery rate of 14 barrels per hour and 100% efficiency, in situ burning with an area of 70,000 m^{2} and 50% efficiency, and 15% of oil sprayed with dispersant at 20% efficiency, at 29 °C and 42% salinity. The clean-up parameters were chosen to match the clean-up rate in the *Deepwater Horizon* oil spill. We used the amount of oil recovered by different clean-up methods in the *Deepwater Horizon* oil spill, scaled it from its 85-day clean-up time frame to our 6-day time frame and selected clean-up parameters that would recover this estimated amount of oil over 6 days^{52}. The weather values were selected to match the values from our previous Monte Carlo simulation analysis that maximized oil evaporation.

### Air pollution modelling

For air pollution modelling, we used NOAA’s Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model^{53}. HYSPLIT is an extensively used atmospheric transport and dispersion modelling framework using a hybrid Eulerian/Lagrangian approach to compute the trajectory of airborne particles as well as pollutant concentrations. We conducted simulations across four scenarios, varying seasons (summer and winter) and spill duration (24 hours for a fast-release spill and 72 hours for a slow-release spill). The spill duration is equivalent to the duration of pollutants being emitted. We used HYSPLIT’s daily runs feature to conduct simulations for every three hours from January to March and June to August. Each simulation ran for 144 hours with pollutants emitted at sea level. Each simulation had 2,500 particles, with a concentration value for each particle. For each season and spill duration combination, we calculated the average pollutant concentration value at each point on a latitude–longitude grid rounded to three decimals. We used bilinear interpolation to calculate values between points. We assumed a full spill of 150,000 metric tons in all scenarios.

To get air pollution values in terms of fine particulate matter, we converted the initial oil release from barrels to micrograms, multiplied by the oil-to-particulate matter conversion rate calculated by Middlebrook et al.^{54}, divided by the duration of the spill and multiplied by the concentration values estimated by the model. We calculated the population-weighted average increased risk of cardiovascular and respiratory hospitalization from the air pollution by multiplying the air pollution values by the increased risk and population share at each interpolated grid point. Our risk function relating PM_{2.5} exposure to cardiovascular and respiratory hospitalizations was derived from Burnett et al.^{12}. Risk was averaged over person-days to compare across different spill durations. We calculated person-days by calculating the number of people exposed to at least 10 ug m^{–3} of PM_{2.5} and the duration for which they were exposed.

Due to uncertainty in the particulate matter conversion rate and increased risk of cardiovascular and respiratory mortality from fine particulate matter, we repeated this process through Monte Carlo simulation, varying those two parameters over 1,000 simulations each to propagate uncertainty. To propagate uncertainty, we varied two parameters over the simulations: the rate of conversion from surface oil to fine particulate matter and the increased risk (IR) of fine particulate matter with respect to cardiovascular and respiratory hospitalizations. Drawing on the point estimates and confidence intervals (CIs) reported from existing literature, we adopted the methodology from Khomenko et al.^{55} and computed the IR standard deviations as follows (where qnorm is the inverse of the standard normal cumulative distribution function):

$${{{mathrm{ln}}}}left( {{{{mathrm{IR}}}},{{{mathrm{upper}}}},{{{mathrm{CI}}}}/{{{mathrm{IR}}}},{{{mathrm{lower}}}},{{{mathrm{CI}}}}} right)/left( {2 times {{{mathrm{qnorm}}}}left( {0.975} right)} right)$$

We assumed normal distributions and sampled from the reported mean and estimated standard deviation. The standard deviation for the fine particulate matter conversion rate is provided, so we used the reported estimate. We calculated the mean IR from our simulations and constructed 95% uncertainty intervals. If the lower uncertainty interval for the simulated IR was less than 0, we assumed no increased risk of hospitalization. We also calculated the mean and 95% uncertainty intervals for the population affected over time, measured in person-days.

We repeated our air pollution analyses under several different scenarios. In addition to modelling 24-hour spills and 72-hour spills during different seasons, we modelled air pollution with and without combustion. For estimates of air pollution from combustion, we added the burned oil-to-particulate matter conversion rate from Middlebrook et al.^{54} to the existing evaporative conversion rate. We also varied the increased risk function since the estimate we used from Burnett et al.^{12} may not demographically reflect the population in Yemen. We thus performed the preceding calculations using respiratory hospitalization rates from Wei et al.^{11} and short-term PM_{2.5}-related mortality rates from Kloog et al.^{10}. Wei et al.^{11} used a Medicare population (mostly aged 65+) that is more vulnerable to air pollution, which may reflect some subgroups in the Yemeni population, many of whom are malnourished and lack proper health services. Kloog et al.^{10} used full mortality records from the state of Massachusetts, used more modern methods than Burnett et al.^{12} and may more accurately reflect the younger age structure of Yemen than the Medicare population studied by Wei et al.^{11}. We present results in terms of increased risk of hospitalizations instead of increased hospitalizations because the increased risk we calculate is relative to a baseline level of hospitalization risk, which we do not know for the population of Yemen (Supplementary Tables 4 and 5).

### Supply disruption

We estimated fuel disruption by calculating the average and 95% uncertainty intervals of monthly fuel imports through Red Sea ports from January to May 2020 (*n* = 5), before fuel imports being restricted. We calculated the fuel price increase from the November 2017 port closures in Al Hudaydah by taking the median price among diesel, petrol and gas and comparing it between 15 October 2017 and 15 November 2017 price data.

We estimated disruption to desalination capacity by compiling a dataset of locations and water capacity of all known plants in the region and identifying locations reached by the simulated oil spills (Supplementary Table 1). Total water consumption equivalents were computed by multiplying each affected country’s share of water amounts by their respective per capita daily usage.

We estimated average and 95% uncertainty intervals of food disruption on the basis of historical data of imports at Yemeni ports. To calculate the amount of food aid disruption, we used 2019 data showing percentages of total food aid in Yemen originating from Hudaydah and Aden^{46}. We then multiplied these percentages by the average of total people targeted for food assistance based on available situation reports from the World Food Programme ranging from March 2020 to February 2021 (*n* = 10). We used linear interpolation through the quantile algorithm in R to construct the uncertainty intervals. See Supplementary Table 6 for estimates as reported and as originally calculated.

We estimated fish yield loss on the basis of gridded annual fish yield in the Red Sea and Gulf of Aden from 2016^{56}. We first filtered for fish yield for only Yemen as a fishing entity, then summed over all types of fishing sectors (artisanal, industrial and so on). We then summed Yemen’s fish yield over each gridded cell reached by the simulated spill at each week and season. By default, we included cells only if the oil concentration in them was in the tenth percentile of surface concentration or higher to exclude cells with trace amounts of oil. To assess how threshold values would affect fish yield loss estimates, we repeated the analysis at no threshold and a threshold of the 20th percentile (Supplementary Table 3).

### Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.