Three integrated components estimating crop likelihood across the Central and Eastern United States.
Our modeling framework combines three integrated components to estimate the likelihood of a crop being grown at any location across the Central and Eastern United States.
Random Forest models learn from observed crop distributions and a comprehensive set of biophysical and human-controlled predictors to generate spatially continuous crop likelihood maps.
SHAP (SHapley Additive exPlanations), an explainable AI framework, decomposes each prediction into variable-level contributions, revealing which factors drive crop likelihood and by how much.
Real-world human knowledge, gathered through farmer focus groups and a Delphi expert panel, validates model inputs, ground-truths results, and projects how agricultural conditions may shift under future scenarios.
The following 13 crops are each modeled with a random forest model:
Corn
Wheat
Cotton
SoybeanSorghum
Hay
Peanuts
Oats
Fruit & Vegetables
Barley
Rye
Pasture
Pecans
Crop presence and absence points are derived from the USDA Cropland Data Layer (CDL) at 30-meter resolution, covering 2009–2018. A location is classified as crop-present (1) if the crop appeared in more than two of those ten years; otherwise it is classified as absent (0).
Three integrated components estimating crop likelihood across the Central and Eastern United States.
Our modeling framework combines three integrated components to estimate the likelihood of a crop being grown at any location across the Central and Eastern United States.
Random Forest models learn from observed crop distributions and a comprehensive set of biophysical and human-controlled predictors to generate spatially continuous crop likelihood maps.
SHAP (SHapley Additive exPlanations), an explainable AI framework, decomposes each prediction into variable-level contributions, revealing which factors drive crop likelihood and by how much.
Real-world human knowledge, gathered through farmer focus groups and a Delphi expert panel, validates model inputs, ground-truths results, and projects how agricultural conditions may shift under future scenarios.
The following 13 crops are each modeled with a random forest model:
Corn
Wheat
Cotton
Soybean
Sorghum
Hay
Peanuts
Oats
Fruit & Vegetables
Barley
Rye
Pasture
Pecans
Crop presence and absence points are derived from the USDA Cropland Data Layer (CDL) at 30-meter resolution, covering 2009–2018. A location is classified as crop-present (1) if the crop appeared in more than two of those ten years; otherwise it is classified as absent (0).
Each random forest model is trained on 70+ predictor variables grouped into 9 thematic categories. Tap or hover any card below to flip it and see the variables included.
mean annual temperature, temperature seasonality, max/min temperature of warmest/coldest month, mean annual precipitation, precipitation seasonality, and related extremes and variability indices
percent silt, sand, clay, gravel, bulk density, and cation exchange capacity
Slope and elevation
Crop-specific surface and groundwater withdrawal rates; percent of area irrigated.
Cover crop acres, conservation and conventional tillage acres, tile drainage acres, CRP enrollment, EQIP enrollment, organic farm operations, and cattle inventory.
Fertilizer and chemical expenditure, N and P quantities from fertilizer and manure and rates, machinery value, and labor count.
Agricultural land value, commodity price, rented acreage fraction, mega-farm sales, direct retail sales, and crop insurance acres.
Urban-rural classification, female producer count, non-white producer count, median farm size, young producer count, and beginning producer count.
Counts of crop-specific agri-food processing establishments from NAICS and biofuel use fractions.
Each random forest model is trained on 70+ predictor variables grouped into 9 thematic categories. Tap or hover any card below to flip it and see the variables included.
mean annual temperature, temperature seasonality, max/min temperature of warmest/coldest month, mean annual precipitation, precipitation seasonality, and related extremes and variability indices
percent silt, sand, clay, gravel, bulk density, and cation exchange capacity
Slope and elevation
Crop-specific surface and groundwater withdrawal rates; percent of area irrigated.
Cover crop acres, conservation and conventional tillage acres, tile drainage acres, CRP enrollment, EQIP enrollment, organic farm operations, and cattle inventory.
Fertilizer and chemical expenditure, N and P quantities from fertilizer and manure and rates, machinery value, and labor count.
Agricultural land value, commodity price, rented acreage fraction, mega-farm sales, direct retail sales, and crop insurance acres.
Urban-rural classification, female producer count, non-white producer count, median farm size, young producer count, and beginning producer count.
Counts of crop-specific agri-food processing establishments from NAICS and biofuel use fractions.
Training a random forest tells us the predicted crop likelihood at any location — but it does not, by itself, tell us which factors drove that prediction or how much each one contributed. To answer that question, we apply SHAP (SHapley Additive exPlanations), a state-of-the-art explainable AI framework.
SHAP assigns each predictor variable a contribution score for every individual prediction. The score — called a Shapley value — represents how much that variable pushed the predicted crop likelihood up (positive value) or down (negative value) relative to the model’s average prediction.
Training a random forest tells us the predicted crop likelihood at any location — but it does not, by itself, tell us which factors drove that prediction or how much each one contributed. To answer that question, we apply SHAP (SHapley Additive exPlanations), a state-of-the-art explainable AI framework.
SHAP assigns each predictor variable a contribution score for every individual prediction. The score — called a Shapley value — represents how much that variable pushed the predicted crop likelihood up (positive value) or down (negative value) relative to the model’s average prediction.
Before model training, we conducted focus group sessions with farmers in Georgia, Nebraska, and Ohio. These sessions served two goals: variable validation (farmers reviewed the list of human-controlled predictors and provided feedback on relevance) and SHAP result validation (we brought the SHAP results back to farmers to ground-truth them).
To project future crop likelihood (2041–2060), we conducted a structured Delphi elicitation process with agricultural scientists, extension specialists, and policy experts. Panelists projected how key human-controlled variables — irrigation rates, fertilizer use, conservation program enrollment, land tenure structure, demographic trends — are likely to change by 2050. Combined with CMIP6 climate projections (SSP2-4.5, SSP3-7.0, SSP5-8.5), this generates forward-looking crop likelihood maps for 2041–2060.
Before model training, we conducted focus group sessions with farmers in Georgia, Nebraska, and Ohio. These sessions served two goals: variable validation (farmers reviewed the list of human-controlled predictors and provided feedback on relevance) and SHAP result validation (we brought the SHAP results back to farmers to ground-truth them).
To project future crop likelihood (2041–2060), we conducted a structured Delphi elicitation process with agricultural scientists, extension specialists, and policy experts. Panelists projected how key human-controlled variables — irrigation rates, fertilizer use, conservation program enrollment, land tenure structure, demographic trends — are likely to change by 2050. Combined with CMIP6 climate projections (SSP2-4.5, SSP3-7.0, SSP5-8.5), this generates forward-looking crop likelihood maps for 2041–2060.