I want to visualize my regression model predictions and confidence intervals. Guide me: 1) Use ggplot2 in R for publication-quality plots
You're in luck! ggplot2
is exceptionally powerful for visualizing regression models and their confidence intervals, making it perfect for publication-quality plots.
We'll cover:
- Simple Linear Regression (one predictor)
- Linear Regression with a Categorical Variable (grouping/faceting)
- Generalizing to GLMs / More Complex Models (using
broom
for efficiency) - Publication-Quality Enhancements
Let's start by loading the necessary libraries and preparing some data.
1. Setup: Libraries and Example Data
We'll use tidyverse
for ggplot2
and dplyr
, and broom
for easily extracting model components (which is a game-changer for this task).
# Install if you haven't already
# install.packages(c("tidyverse", "broom"))library(tidyverse) # Includes ggplot2, dplyr, etc.
library(broom) # For tidying model outputs
# We'll use the 'diamonds' dataset for our examples
data("diamonds")
# Let's simplify the diamonds dataset for clearer examples
# and take a subset to make plotting faster on some machines
set.seed(123) # for reproducibility
diamonds_sub <- diamonds %>%
sample_n(5000) %>% # Take a sample
select(price, carat, depth, cut, color) %>%
mutate(
# Convert cut and color to ordered factors for better interpretation if desired
cut = factor(cut, levels = c("Fair", "Good", "Very Good", "Premium", "Ideal"), ordered = TRUE),
color = factor(color, levels = c("J", "I", "H", "G", "F", "E", "D"), ordered = TRUE)
)
head(diamonds_sub)
# A tibble: 6 x 5
price carat depth cut color
<int> <dbl> <dbl> <ord> <ord>
1 8067 1.01 61.5 Ideal J
2 4400 0.9 61.2 Ideal J
3 5522 0.92 61.5 Ideal H
4 1624 0.72 62.1 Ideal J
5 12590 1.5 61.8 Ideal G
6 6366 1.01 61.5 Ideal F
2. Simple Linear Regression (One Predictor)
Let's predict price
based on carat
.
Step 2.1: Fit the Model
# Fit a linear model
model_simple <- lm(price ~ carat, data = diamonds_sub)
summary(model_simple)
Step 2.2: Generate Predictions and Confidence Intervals
This is crucial. We need to create a newdata
data frame that covers the range of our carat
predictor, and then use predict()
with interval = "confidence"
.
# Create a sequence of 'carat' values for prediction
# This ensures a smooth line and ribbon
newdata_simple <- tibble(
carat = seq(min(diamonds_sub$carat), max(diamonds_sub$carat), length.out = 100)
)# Get predictions with confidence intervals
predictions_simple <- predict(model_simple, newdata = newdata_simple, interval = "confidence") %>%
as_tibble() %>%
bind_cols(newdata_simple) # Add the original 'carat' column back
head(predictions_simple)
# A tibble: 6 x 4
fit lwr upr carat
<dbl> <dbl> <dbl> <dbl>
1 -1978. -2043. -1913. 0.2
2 -1763. -1815. -1711. 0.218
3 -1549. -1591. -1507. 0.235
4 -1334. -1367. -1302. 0.253
5 -1120. -1143. -1096. 0.271
6 -905. -920. -890. 0.288
fit
: The predicted mean value.lwr
: Lower bound of the confidence interval.upr
: Upper bound of the confidence interval.
Step 2.3: Plot with ggplot2
plot_simple <- ggplot(diamonds_sub, aes(x = carat, y = price)) +
geom_point(alpha = 0.2, size = 0.8) + # Raw data points
geom_line(data = predictions_simple, aes(y = fit), color = "steelblue", size = 1) + # Prediction line
geom_ribbon(data = predictions_simple, aes(ymin = lwr, ymax = upr), fill = "steelblue", alpha = 0.3) + # Confidence interval
labs(
title = "Diamond Price Prediction by Carat",
subtitle = "Simple Linear Regression with 95% Confidence Interval",
x = "Carat Weight",
y = "Price (USD)"
) +
theme_minimal() + # A clean theme
theme(plot.title = element_text(hjust = 0.5), # Center title
plot.subtitle = element_text(hjust = 0.5))print(plot_simple)
Understanding geom_ribbon()
:
geom_ribbon()
draws a shaded area.- We map
ymin
to the lower bound (lwr
) andymax
to the upper bound (upr
) of our confidence interval. fill
sets the color of the ribbon, andalpha
controls its transparency.
3. Linear Regression with a Categorical Variable
Let's see how cut
quality affects the price
vs carat
relationship. We'll fit a model with an interaction term.
Step 3.1: Fit the Model (with interaction)
# Fit a model including 'cut' as a categorical variable and an interaction
model_interaction <- lm(price ~ carat * cut, data = diamonds_sub)
summary(model_interaction)
Step 3.2: Generate Predictions and Confidence Intervals (grouped)
This time, newdata
needs to include all levels of cut
and a sequence of carat
values for each. expand_grid()
is perfect for this.
# Create new data for prediction, covering all combinations of 'carat' and 'cut'
newdata_interaction <- expand_grid(
carat = seq(min(diamonds_sub$carat), max(diamonds_sub$carat), length.out = 100),
cut = levels(diamonds_sub$cut) # Use all levels of the factor
)# Get predictions with confidence intervals for each group
predictions_interaction <- predict(model_interaction, newdata = newdata_interaction, interval = "confidence") %>%
as_tibble() %>%
bind_cols(newdata_interaction) # Add the original columns back
head(predictions_interaction)
Step 3.3: Plot with ggplot2
(with color
and facet_wrap
)
plot_interaction <- ggplot(diamonds_sub, aes(x = carat, y = price, color = cut)) +
geom_point(alpha = 0.1, size = 0.5, aes(color = cut)) + # Raw data, colored by cut
geom_line(data = predictions_interaction, aes(y = fit, color = cut), size = 1) + # Prediction lines
geom_ribbon(data = predictions_interaction, aes(ymin = lwr, ymax = upr, fill = cut), alpha = 0.2, linetype = 0) + # Confidence intervals, no border
labs(
title = "Diamond Price Prediction by Carat and Cut",
subtitle = "Linear Regression with Interaction Term (95% Confidence Interval)",
x = "Carat Weight",
y = "Price (USD)",
color = "Cut Quality",
fill = "Cut Quality"
) +
scale_color_brewer(palette = "Dark2") + # Use a colorblind-friendly palette
scale_fill_brewer(palette = "Dark2") + # Match fill palette
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
legend.position = "bottom")print(plot_interaction)
# You can also use facet_wrap() to separate the plots by cut quality
plot_facet <- ggplot(diamonds_sub, aes(x = carat, y = price)) +
geom_point(alpha = 0.1, size = 0.5) +
geom_line(data = predictions_interaction, aes(y = fit), color = "steelblue", size = 1) +
geom_ribbon(data = predictions_interaction, aes(ymin = lwr, ymax = upr), fill = "steelblue", alpha = 0.3, linetype = 0) +
facet_wrap(~ cut, scales = "free_y") + # Separate plots by 'cut', allow y-axis to vary
labs(
title = "Diamond Price Prediction by Carat, Faceted by Cut",
subtitle = "Linear Regression with Interaction Term (95% Confidence Interval)",
x = "Carat Weight",
y = "Price (USD)"
) +
theme_bw() + # Another clean theme
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))
print(plot_facet)
4. Generalizing with broom::augment()
for Cleaner Workflow (Recommended!)
The broom
package offers a much cleaner way to get predictions and CIs, especially for more complex models or GLMs. The augment()
function adds model diagnostics (including .fitted
, .se.fit
, etc.) to your original data.
For GLMs, predict(type = "response")
is often necessary to get predictions on the original response scale. augment()
handles this elegantly.
Let's demonstrate with a slightly more complex GLM for price
(e.g., gamma
family due to skewed price distribution).
Step 4.1: Fit a GLM
# Fit a Gamma GLM (often good for positive, skewed continuous data like price)
model_glm <- glm(price ~ carat * cut + depth, data = diamonds_sub, family = Gamma(link = "log"))
summary(model_glm)
Step 4.2: Generate Predictions and Confidence Intervals with augment()
This is where augment()
shines. It can automatically add .fitted
and .se.fit
(standard error of the fit) which you can then use to calculate CIs, or even directly estimate conf.int = TRUE
on the response scale if you have type.predict = "response"
.
However, for a completely new sequence of data (like we did with newdata
), predict()
is still the primary function. augment()
is best when you want to see how the model fits the original data.
Let's combine the best of both worlds: use expand_grid
for newdata
and then predict
(with type = "response"
) and manually calculate CIs for GLMs, as interval = "confidence"
is not directly supported for GLMs on the response scale.
# Create new data for prediction, covering all combinations
newdata_glm <- expand_grid(
carat = seq(min(diamonds_sub$carat), max(diamonds_sub$carat), length.out = 100),
cut = levels(diamonds_sub$cut),
depth = mean(diamonds_sub$depth) # Hold 'depth' constant at its mean
)# Get predictions (on the response scale) and standard errors
predictions_glm_raw <- predict(model_glm, newdata = newdata_glm, type = "response", se.fit = TRUE)
# Calculate 95% confidence intervals manually for GLMs
# CI = estimate +/- (z-score * SE)
# For 95% CI, z-score is approx 1.96
z_score <- qnorm(0.975)
predictions_glm <- newdata_glm %>%
mutate(
fit = predictions_glm_raw$fit,
se = predictions_glm_raw$se.fit,
lwr = fit - z_score * se, # Lower bound
upr = fit + z_score * se # Upper bound
) %>%
# Ensure bounds don't go below 0 (for price)
mutate(
lwr = pmax(0, lwr) # price cannot be negative
)
head(predictions_glm)
Step 4.3: Plot the GLM Predictions
plot_glm <- ggplot(diamonds_sub, aes(x = carat, y = price, color = cut)) +
geom_point(alpha = 0.1, size = 0.5) +
geom_line(data = predictions_glm, aes(y = fit, color = cut), size = 1) +
geom_ribbon(data = predictions_glm, aes(ymin = lwr, ymax = upr, fill = cut), alpha = 0.2, linetype = 0) +
labs(
title = "Diamond Price Prediction by Carat, Cut, and Depth (GLM)",
subtitle = "Gamma GLM with 95% Confidence Interval (Depth held at mean)",
x = "Carat Weight",
y = "Price (USD)",
color = "Cut Quality",
fill = "Cut Quality"
) +
scale_color_brewer(palette = "Set1") +
scale_fill_brewer(palette = "Set1") +
theme_classic() + # Yet another theme
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
legend.position = "bottom")print(plot_glm)
5. Publication-Quality Enhancements
Beyond the basics, here are tips for making your plots truly publication-ready:
- Themes:
theme_minimal()
: Clean, simple, good default.theme_bw()
: Black and white, often preferred for publications to save ink/focus on data.themeclassic()
: Similar tothemebw
but without gridlines.theme_void()
: For maximum customization.- You can also define your
theme()
elements for specific fonts, sizes, etc.
# Example of a custom theme for publication
my_pub_theme <- theme_minimal() +
theme(
text = element_text(family = "serif", size = 12), # Specific font and size
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
axis.title = element_text(face = "bold"),
axis.text = element_text(color = "black"),
legend.title = element_text(face = "bold"),
panel.grid.major = element_line(linetype = "dotted", color = "gray80"),
panel.grid.minor = element_blank(), # Remove minor grid lines
plot.background = element_rect(fill = "white", color = NA) # Ensure white background
) plot_glm + my_pub_theme # Apply to any plot
- Color Palettes:
scalecolorbrewer()
andscalefillbrewer()
: UseRColorBrewer
palettes (e.g., "Dark2", "Set1", "Paired") for distinct, colorblind-friendly colors.scalecolorviridisd()
/c()
: Viridis palettes are perceptually uniform and colorblind-friendly.- Manually specify colors with
scalecolormanual()
orscalefillmanual()
. - Labels and Titles:
labs()
is your friend. Be descriptive, avoid jargon where possible.- Adjust
plot.title = element_text(hjust = 0.5)
to center titles. - Annotations:
annotate()
: Add text, segments, rectangles, etc., at specific coordinates.geomtext()
/geomlabel()
: Add labels from your data.- Line Types, Sizes, Alpha:
size
: Controls thickness of lines/points.alpha
: Controls transparency, useful for dense scatter plots or overlapping ribbons.linetype
: For different lines if you don't want to rely solely on color.- Saving Plots:
ggsave()
allows you to save your plot in various formats (PNG, JPG, PDF, SVG) with specified dimensions and resolution.
ggsave("diamond_price_prediction.png", plot_glm, width = 8, height = 6, dpi = 300)
ggsave("diamond_price_prediction.pdf", plot_glm, width = 8, height = 6) # Vector graphics for best quality
dpi = 300
or600
is standard for print publications.
This guide provides a robust framework for visualizing your regression model predictions and confidence intervals using ggplot2
. Remember to always tailor your plots to the specific story your data and model are telling!
I'll guide you through creating publication-quality plots for regression model predictions and confidence intervals using ggplot2 in R. Here's a comprehensive step-by-step approach:
1. Basic Setup and Data Preparation
First, load the required libraries and prepare your model:
library(ggplot2)
library(dplyr)
library(broom)
library(gridExtra) # For combining plots# Example: Fit a linear regression model
# Replace with your actual model
data(mtcars)
model <- lm(mpg ~ wt + hp + qsec, data = mtcars)
# Create prediction data for smooth curves
pred_data <- data.frame(
wt = seq(min(mtcars$wt), max(mtcars$wt), length.out = 100),
hp = mean(mtcars$hp),
qsec = mean(mtcars$qsec)
)
2. Generate Predictions with Confidence Intervals
# Get predictions with confidence intervals
predictions <- predict(model,
newdata = pred_data,
interval = "confidence",
level = 0.95) %>%
as.data.frame() %>%
mutate(wt = pred_data$wt)# For prediction intervals (individual predictions):
pred_intervals <- predict(model,
newdata = pred_data,
interval = "prediction",
level = 0.95) %>%
as.data.frame() %>%
mutate(wt = pred_data$wt)
3. Basic Plot with Confidence Intervals
# Create the main plot
p1 <- ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point(aes(color = factor(cyl)),
size = 2.5, alpha = 0.7) +
geom_ribbon(data = predictions,
aes(x = wt, ymin = lwr, ymax = upr,
y = fit),
fill = "steelblue", alpha = 0.3,
inherit.aes = FALSE) +
geom_line(data = predictions,
aes(x = wt, y = fit),
color = "steelblue", size = 1) +
labs(title = "Regression Model with 95% Confidence Intervals",
x = "Weight (1000 lbs)",
y = "Miles per Gallon",
color = "Cylinders") +
theme_pubr() + # Use publication theme
theme(legend.position = "bottom",
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"))print(p1)
4. Advanced Publication-Quality Plot
# Enhanced version with publication quality
p2 <- ggplot(mtcars, aes(x = wt, y = mpg)) +
# Add points with refined styling
geom_point(aes(color = factor(cyl)),
size = 3, alpha = 0.8,
shape = 16) +
# Confidence interval ribbon
geom_ribbon(data = predictions,
aes(x = wt, ymin = lwr, ymax = upr),
fill = "#2E86AB", alpha = 0.25,
color = NA) +
# Predicted line
geom_line(data = predictions,
aes(x = wt, y = fit),
color = "#2E86AB", size = 1.2) +
# Optional: Prediction intervals (wider band)
geom_ribbon(data = pred_intervals,
aes(x = wt, ymin = lwr, ymax = upr),
fill = "#A23B72", alpha = 0.15,
color = NA, linetype = "dashed") +
# Facets for different cylinder counts (if relevant)
facet_wrap(~ factor(cyl), scales = "free_y", ncol = 2) +
scale_color_manual(values = c("4" = "#F18F01",
"6" = "#C73E1D",
"8" = "#2E86AB")) +
labs(title = "MPG vs Weight: Regression with Confidence and Prediction Intervals",
subtitle = "Fitted model: mpg ~ wt + hp + qsec",
x = "Weight (1000 lbs)",
y = "Miles per Gallon (MPG)",
color = "Cylinders",
caption = "Data: mtcars | 95% confidence and prediction intervals") +
# Publication-quality theme
theme_classic() +
theme(
# Title and labels
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 12, color = "grey50", hjust = 0.5),
plot.caption = element_text(size = 10, color = "grey70", hjust = 1),
# Axis titles and text
axis.title = element_text(size = 12, face = "bold"),
axis.text = element_text(size = 10),
# Legend
legend.title = element_text(size = 11, face = "bold"),
legend.text = element_text(size = 10),
legend.position = "bottom",
legend.box = "horizontal",
# Facet labels
strip.background = element_rect(fill = "grey95", color = "grey80"),
strip.text = element_text(size = 11, face = "bold")
) +
# Add R² and other statistics as annotation
annotate("text", x = Inf, y = Inf,
label = paste("R² =", round(summary(model)$r.squared, 3)),
hjust = 1.1, vjust = 1.1,
size = 4, fontface = "bold")print(p2)
5. Multiple Variable Plots (Partial Dependence)
For models with multiple predictors, create plots for each:
# Function to create plots for each predictor
create_reg_plot <- function(model, data, predictor_var) {
# Create prediction data for this variable
pred_range <- seq(min(data[[predictor_var]], na.rm = TRUE),
max(data[[predictor_var]], na.rm = TRUE),
length.out = 100)
pred_data <- data %>%
select(-all_of(predictor_var)) %>%
slice(1) %>%
mutate(!!predictor_var := pred_range)
# Get predictions
preds <- predict(model, newdata = pred_data, interval = "confidence") %>%
as.data.frame() %>%
mutate(!!predictor_var := pred_range)
# Create plot
ggplot(data, aes_string(x = predictor_var, y = as.character(formula(model)[[2]]))) +
geom_point(alpha = 0.6, size = 2) +
geom_ribbon(data = preds,
aes_string(x = predictor_var, ymin = "lwr", ymax = "upr", y = "fit"),
fill = "#2E86AB", alpha = 0.25) +
geom_line(data = preds,
aes_string(x = predictor_var, y = "fit"),
color = "#2E86AB", size = 1) +
labs(title = paste("Effect of", predictor_var),
x = predictor_var,
y = as.character(formula(model)[[2]])) +
theme_pubr() +
theme(plot.title = element_text(hjust = 0.5))
}# Create plots for each predictor
predictors <- c("wt", "hp", "qsec")
plots <- lapply(predictors, function(p) create_reg_plot(model, mtcars, p))
# Combine plots
combined_plot <- grid.arrange(grobs = plots, ncol = 2)
6. Saving Publication-Quality Figures
# Save with high resolution
ggsave("regression_plot.png", plot = p2,
width = 10, height = 8, dpi = 300,
bg = "white")# For vector graphics (PDF/EPS for publications)
ggsave("regression_plot.pdf", plot = p2,
width = 10, height = 8, device = "pdf")
# For presentations
ggsave("regression_plot.pptx", plot = p2,
width = 12, height = 9,
device = "win.metafile") # On Windows
# High-quality TIFF for journals
ggsave("regression_plot.tiff", plot = p2,
width = 6.5, height = 5, dpi = 600,
compression = "lzw", bg = "white")
7. Custom Themes for Publications
```r
Create a custom publication theme
themepublication <- function(basesize = 11, base_family = "") {
themeclassic(basesize = basesize, basefamily = base_family) %+replace%
theme(
# Text elements
plot.title = element_text(size = rel(1.2), hjust = 0.5,
face = "bold", margin = margin(b = 10)),
plot.subtitle = element_text(size = rel(1.0), hjust = 0.5,
margin = margin(b = 8)),
plot.caption = element_text(hjust = 1, size = rel(0.8),
color = "grey60", margin = margin(t = 10)),
# Axis
axis.title = element_text(size = rel(1.0), face = "bold"),
axis.text =
💡 Try this comparison yourself:Compare AI models side-by-side on SNEOS
Analysis
This comparison demonstrates the different approaches each AI model takes when responding to the same prompt. Here are the key differences observed:
Response Characteristics
Gemini: Provides a direct response with 176 sentences.
Grok: Provides a direct response with 76 sentences.
Key Takeaways
- Each model brings unique strengths to this type of query
- Response styles vary significantly between models
- Consider your specific use case when choosing between these models
Try This Comparison Yourself
Want to test these models with your own prompts? Visit SNEOS.com to compare AI responses side-by-side in real-time.
This comparison was generated using the SNEOS AI Comparison ToolPublished: October 15, 2025 | Models: Gemini, Grok