Our model

Our model

Our modeling Approach

Three integrated components estimating crop likelihood across the Central and Eastern United States.

A diagram showing input data curated through focus groups and Delphi, processed by a random forest model, with SHAP explanations to predict map-based crop likelihood in the U.S.

How the framework works

Our modeling framework combines three integrated components to estimate the likelihood of a crop being grown at any location across the Central and Eastern United States.

Random Forest models learn from observed crop distributions and a comprehensive set of biophysical and human-controlled predictors to generate spatially continuous crop likelihood maps.

SHAP (SHapley Additive exPlanations), an explainable AI framework, decomposes each prediction into variable-level contributions, revealing which factors drive crop likelihood and by how much.

Real-world human knowledge, gathered through farmer focus groups and a Delphi expert panel, validates model inputs, ground-truths results, and projects how agricultural conditions may shift under future scenarios.

Crops modeled

The following 13 crops are each modeled with a random forest model: 

Corn
Wheat
Cotton
SoybeanSorghum
Hay
Peanuts
Oats
Fruit & Vegetables

Barley
Rye
Pasture
Pecans 

Crop presence and absence points are derived from the USDA Cropland Data Layer (CDL) at 30-meter resolution, covering 2009–2018. A location is classified as crop-present (1) if the crop appeared in more than two of those ten years; otherwise it is classified as absent (0).

Our modeling Approach

Three integrated components estimating crop likelihood across the Central and Eastern United States.

A diagram showing input data curated through focus groups and Delphi, processed by a random forest model, with SHAP explanations to predict map-based crop likelihood in the U.S.

How the framework works

Our modeling framework combines three integrated components to estimate the likelihood of a crop being grown at any location across the Central and Eastern United States.

Random Forest models learn from observed crop distributions and a comprehensive set of biophysical and human-controlled predictors to generate spatially continuous crop likelihood maps.

SHAP (SHapley Additive exPlanations), an explainable AI framework, decomposes each prediction into variable-level contributions, revealing which factors drive crop likelihood and by how much.

Real-world human knowledge, gathered through farmer focus groups and a Delphi expert panel, validates model inputs, ground-truths results, and projects how agricultural conditions may shift under future scenarios.

Crops modeled

The following 13 crops are each modeled with a random forest model: 

Corn
Wheat
Cotton
Soybean

Sorghum
Hay
Peanuts
Oats
Fruit & Vegetables

Barley
Rye
Pasture
Pecans 

Crop presence and absence points are derived from the USDA Cropland Data Layer (CDL) at 30-meter resolution, covering 2009–2018. A location is classified as crop-present (1) if the crop appeared in more than two of those ten years; otherwise it is classified as absent (0).

Predictor Variables

Each random forest model is trained on 70+ predictor variables grouped into 9 thematic categories. Tap or hover any card below to flip it and see the variables included.

Climate

mean annual temperature, temperature seasonality, max/min temperature of warmest/coldest month, mean annual precipitation, precipitation seasonality, and related extremes and variability indices

soil

percent silt, sand, clay, gravel, bulk density, and cation exchange capacity

Topography

Slope and elevation

Irrigation

Crop-specific surface and groundwater withdrawal rates; percent of area irrigated. 

Farm Management

Cover crop acres, conservation and conventional tillage acres, tile drainage acres, CRP enrollment, EQIP enrollment, organic farm operations, and cattle inventory. 

Farm Inputs

Fertilizer and chemical expenditure, N and P quantities from fertilizer and manure and rates, machinery value, and labor count.

economics

Agricultural land value, commodity price, rented acreage fraction, mega-farm sales, direct retail sales, and crop insurance acres.

demographics

Urban-rural classification, female producer count, non-white producer count, median farm size, young producer count, and beginning producer count.

Infrastructure

Counts of crop-specific agri-food processing establishments from NAICS and biofuel use fractions.

Predictor Variables

Each random forest model is trained on 70+ predictor variables grouped into 9 thematic categories. Tap or hover any card below to flip it and see the variables included.

Climate

mean annual temperature, temperature seasonality, max/min temperature of warmest/coldest month, mean annual precipitation, precipitation seasonality, and related extremes and variability indices

soil

percent silt, sand, clay, gravel, bulk density, and cation exchange capacity

Topography

Slope and elevation

Irrigation

Crop-specific surface and groundwater withdrawal rates; percent of area irrigated. 

Farm Management

Cover crop acres, conservation and conventional tillage acres, tile drainage acres, CRP enrollment, EQIP enrollment, organic farm operations, and cattle inventory. 

Farm Inputs

Fertilizer and chemical expenditure, N and P quantities from fertilizer and manure and rates, machinery value, and labor count.

economics

Agricultural land value, commodity price, rented acreage fraction, mega-farm sales, direct retail sales, and crop insurance acres.

demographics

Urban-rural classification, female producer count, non-white producer count, median farm size, young producer count, and beginning producer count.

Infrastructure

Counts of crop-specific agri-food processing establishments from NAICS and biofuel use fractions.

SHAP Explainability

Training a random forest tells us the predicted crop likelihood at any location — but it does not, by itself, tell us which factors drove that prediction or how much each one contributed. To answer that question, we apply SHAP (SHapley Additive exPlanations), a state-of-the-art explainable AI framework.

SHAP assigns each predictor variable a contribution score for every individual prediction. The score — called a Shapley value — represents how much that variable pushed the predicted crop likelihood up (positive value) or down (negative value) relative to the model’s average prediction.

  • Additive: the sum of all SHAP values plus the model baseline equals the full predicted crop likelihood for any observation.
  • Fair: SHAP accounts for correlations between predictors and distributes contributions fairly even when predictors are related.
  • Exact: we use the TreeSHAP algorithm, which computes exact Shapley values by exploiting the internal tree structure of random forests.

SHAP Explainability

Training a random forest tells us the predicted crop likelihood at any location — but it does not, by itself, tell us which factors drove that prediction or how much each one contributed. To answer that question, we apply SHAP (SHapley Additive exPlanations), a state-of-the-art explainable AI framework.

SHAP assigns each predictor variable a contribution score for every individual prediction. The score — called a Shapley value — represents how much that variable pushed the predicted crop likelihood up (positive value) or down (negative value) relative to the model’s average prediction.

  • Additive: the sum of all SHAP values plus the model baseline equals the full predicted crop likelihood for any observation.
  • Fair: SHAP accounts for correlations between predictors and distributes contributions fairly even when predictors are related.
  • Exact: we use the TreeSHAP algorithm, which computes exact Shapley values by exploiting the internal tree structure of random forests.

Human Knowledge Integration

Farmer focus groups

Before model training, we conducted focus group sessions with farmers in Georgia, Nebraska, and Ohio. These sessions served two goals: variable validation (farmers reviewed the list of human-controlled predictors and provided feedback on relevance) and SHAP result validation (we brought the SHAP results back to farmers to ground-truth them).

Delphi Expert Projections

To project future crop likelihood (2041–2060), we conducted a structured Delphi elicitation process with agricultural scientists, extension specialists, and policy experts. Panelists projected how key human-controlled variables — irrigation rates, fertilizer use, conservation program enrollment, land tenure structure, demographic trends — are likely to change by 2050. Combined with CMIP6 climate projections (SSP2-4.5, SSP3-7.0, SSP5-8.5), this generates forward-looking crop likelihood maps for 2041–2060.

Human Knowledge Integration

Farmer focus groups

Before model training, we conducted focus group sessions with farmers in Georgia, Nebraska, and Ohio. These sessions served two goals: variable validation (farmers reviewed the list of human-controlled predictors and provided feedback on relevance) and SHAP result validation (we brought the SHAP results back to farmers to ground-truth them).

Delphi Expert Projections

To project future crop likelihood (2041–2060), we conducted a structured Delphi elicitation process with agricultural scientists, extension specialists, and policy experts. Panelists projected how key human-controlled variables — irrigation rates, fertilizer use, conservation program enrollment, land tenure structure, demographic trends — are likely to change by 2050. Combined with CMIP6 climate projections (SSP2-4.5, SSP3-7.0, SSP5-8.5), this generates forward-looking crop likelihood maps for 2041–2060.