Our Modeling Approach

Three integrated components estimating crop likelihood across the Central and Eastern United States.

How the framework works

Our modeling framework combines three integrated components to estimate the likelihood of a crop being grown at any location across the Central and Eastern United States.

Diagram showing the three integrated components of the modeling framework: random forest models, SHAP explainability, and human knowledge integration.
Figure 1. Overall modeling framework.

Random Forest models learn from observed crop distributions and a comprehensive set of biophysical and human-controlled predictors to generate spatially continuous crop likelihood maps.

SHAP (SHapley Additive exPlanations), an explainable AI framework, decomposes each prediction into variable-level contributions, revealing which factors drive crop likelihood and by how much.

Real-world human knowledge, gathered through farmer focus groups and a Delphi expert panel, validates model inputs, ground-truths results, and projects how agricultural conditions may shift under future scenarios.

Crops Modeled

The following 13 crops are each modeled with a separate random forest model:

Corn
Wheat
Cotton
Soybean
Sorghum
Hay
Peanuts
Oats
Fruit & Vegetables
Barley
Rye
Pasture
Pecans

Crop presence and absence points are derived from the USDA Cropland Data Layer (CDL) at 30-meter resolution, covering the period 2009–2018. A location is classified as crop-present (value = 1) if the crop appeared in more than two of those ten years; otherwise it is classified as absent (0). This multi-year threshold reduces noise from anomalous single-year plantings.

All predictor variables that vary over time are averaged over the 2009–2018 period to represent stable baseline conditions. Once trained, each model is applied to a spatially continuous 2.5-arcminute grid covering the Central and Eastern United States.

500K data points per crop (most crops)
200K data points for Barley & Rye (sparser distributions)
70/30 train/test split by county — strict out-of-sample evaluation
13 independent random forest models

To prevent data leakage and ensure the model is evaluated on genuinely unseen locations, the train/test split is performed at the county level. Counties are randomly assigned to training (70%) or test (30%) sets, so the model is never tested on locations from counties it has seen during training. This provides a geographically independent assessment of model accuracy.

Why this matters

Most future crop distribution models change only the climate, holding human factors fixed. Our framework explicitly projects how farming practices, policy environments, and market conditions are also expected to evolve — producing more realistic and complete projections of future agricultural landscapes.

Predictor Variables

Each random forest model is trained on 70+ predictor variables drawn from two broad categories: biophysical conditions and human-controlled factors. This dual approach allows us to capture not just where crops can grow based on environmental suitability, but also where they actually get grown based on the economic, institutional, and social conditions farmers operate within.

25 biophysical predictors — climate, soil, topography
45+ human-controlled predictors — policy, markets, demographics, infrastructure

All 70+ predictor variables are grouped into 9 thematic categories. Tap or hover any card below to flip it and see the variables included.

Climate

Climate

Various temperature and precipitation metrics.

🌍

Soil

Soil

Topsoil properties: % silt, sand, clay, gravel, bulk density, and cation exchange capacity.

Topography

Topography

Slope and elevation.

💧

Irrigation

Irrigation

Crop-specific surface and groundwater withdrawal rates; percent of area irrigated.

🌾

Farm Management

Farm Management

Cover crop acres, tillage, tile drainage, CRP & EQIP enrollment, organic farm operations, and cattle inventory.

🧹

Farm Inputs

Farm Inputs

Fertilizer and chemical expenditure and quantities, machinery value, and labor count.

💵

Economics

Economics

Agricultural land value, commodity price, rented acreage fraction, small-scale farm share, mega-farm sales, direct retail sales, crop insurance acres.

👥

Demographics

Demographics

Urban-rural classification, female, young, beginning & non-white producers, and median farm size.

🚚

Infrastructure

Infrastructure

Counts of crop-specific agri-food processing establishments.

SHAP Explainability

Training a random forest tells us the predicted crop likelihood at any location — but it does not, by itself, tell us which factors drove that prediction or how much each one contributed. To answer that question, we apply SHAP (SHapley Additive exPlanations), a state-of-the-art explainable AI framework.

SHAP assigns each predictor variable a contribution score for every individual prediction. The score — called a Shapley value — represents how much that variable pushed the predicted crop likelihood up (positive value) or down (negative value) relative to the model's average prediction.

  • Additive: the sum of all SHAP values plus the model baseline equals the full predicted crop likelihood for any observation.
  • Fair: SHAP accounts for correlations between predictors and distributes contributions fairly even when predictors are related.
  • Exact: we use the TreeSHAP algorithm, which computes exact Shapley values by exploiting the internal tree structure of random forests.

Human Knowledge Integration

Farmer Focus Groups

Before model training, we conducted focus group sessions with farmers in Georgia, Nebraska, and Ohio. These sessions served two goals: variable validation (farmers reviewed the list of human-controlled predictors and provided feedback on relevance) and SHAP result validation (we brought the SHAP results back to farmers to ground-truth them).

Learn more about our farmer focus group process →

Delphi Expert Projections

To project future crop likelihood (2041–2060), we conducted a structured Delphi elicitation process with agricultural scientists, extension specialists, and policy experts. Panelists projected how key human-controlled variables — irrigation rates, fertilizer use, conservation program enrollment, land tenure structure, demographic trends — are likely to change by 2050. Combined with CMIP6 climate projections (SSP2-4.5, SSP3-7.0, SSP5-8.5), this generates forward-looking crop likelihood maps for 2041–2060.

Explore our Delphi expert panel process →