Three integrated components estimating crop likelihood across the Central and Eastern United States.
Our modeling framework combines three integrated components to estimate the likelihood of a crop being grown at any location across the Central and Eastern United States.
Random Forest models learn from observed crop distributions and a comprehensive set of biophysical and human-controlled predictors to generate spatially continuous crop likelihood maps.
SHAP (SHapley Additive exPlanations), an explainable AI framework, decomposes each prediction into variable-level contributions, revealing which factors drive crop likelihood and by how much.
Real-world human knowledge, gathered through farmer focus groups and a Delphi expert panel, validates model inputs, ground-truths results, and projects how agricultural conditions may shift under future scenarios.
The following 13 crops are each modeled with a separate random forest model:
Crop presence and absence points are derived from the USDA Cropland Data Layer (CDL) at 30-meter resolution, covering the period 2009–2018. A location is classified as crop-present (value = 1) if the crop appeared in more than two of those ten years; otherwise it is classified as absent (0). This multi-year threshold reduces noise from anomalous single-year plantings.
All predictor variables that vary over time are averaged over the 2009–2018 period to represent stable baseline conditions. Once trained, each model is applied to a spatially continuous 2.5-arcminute grid covering the Central and Eastern United States.
To prevent data leakage and ensure the model is evaluated on genuinely unseen locations, the train/test split is performed at the county level. Counties are randomly assigned to training (70%) or test (30%) sets, so the model is never tested on locations from counties it has seen during training. This provides a geographically independent assessment of model accuracy.
Most future crop distribution models change only the climate, holding human factors fixed. Our framework explicitly projects how farming practices, policy environments, and market conditions are also expected to evolve — producing more realistic and complete projections of future agricultural landscapes.
Each random forest model is trained on 70+ predictor variables drawn from two broad categories: biophysical conditions and human-controlled factors. This dual approach allows us to capture not just where crops can grow based on environmental suitability, but also where they actually get grown based on the economic, institutional, and social conditions farmers operate within.
All 70+ predictor variables are grouped into 9 thematic categories. Tap or hover any card below to flip it and see the variables included.
Various temperature and precipitation metrics.
Topsoil properties: % silt, sand, clay, gravel, bulk density, and cation exchange capacity.
Slope and elevation.
Crop-specific surface and groundwater withdrawal rates; percent of area irrigated.
Cover crop acres, tillage, tile drainage, CRP & EQIP enrollment, organic farm operations, and cattle inventory.
Fertilizer and chemical expenditure and quantities, machinery value, and labor count.
Agricultural land value, commodity price, rented acreage fraction, small-scale farm share, mega-farm sales, direct retail sales, crop insurance acres.
Urban-rural classification, female, young, beginning & non-white producers, and median farm size.
Counts of crop-specific agri-food processing establishments.
Training a random forest tells us the predicted crop likelihood at any location — but it does not, by itself, tell us which factors drove that prediction or how much each one contributed. To answer that question, we apply SHAP (SHapley Additive exPlanations), a state-of-the-art explainable AI framework.
SHAP assigns each predictor variable a contribution score for every individual prediction. The score — called a Shapley value — represents how much that variable pushed the predicted crop likelihood up (positive value) or down (negative value) relative to the model's average prediction.
Before model training, we conducted focus group sessions with farmers in Georgia, Nebraska, and Ohio. These sessions served two goals: variable validation (farmers reviewed the list of human-controlled predictors and provided feedback on relevance) and SHAP result validation (we brought the SHAP results back to farmers to ground-truth them).
Learn more about our farmer focus group process →
To project future crop likelihood (2041–2060), we conducted a structured Delphi elicitation process with agricultural scientists, extension specialists, and policy experts. Panelists projected how key human-controlled variables — irrigation rates, fertilizer use, conservation program enrollment, land tenure structure, demographic trends — are likely to change by 2050. Combined with CMIP6 climate projections (SSP2-4.5, SSP3-7.0, SSP5-8.5), this generates forward-looking crop likelihood maps for 2041–2060.