Title: | CURE (Cumulative Residual) Plots |
---|---|
Description: | Creates 'ggplot2' Cumulative Residual (CURE) plots to check the goodness-of-fit of a count model; or the tables to create a customized version. A dataset of crashes in Washington state is available for illustrative purposes. |
Authors: | Jonathan Wood [aut] |
Maintainer: | Guillermo Basulto-Elias <[email protected]> |
License: | AGPL (>= 3) |
Version: | 1.1.1 |
Built: | 2025-03-01 06:53:38 UTC |
Source: | https://github.com/gbasulto/cureplots |
Calculate CURE Dataframe
calculate_cure_dataframe(covariate_values, residuals)
calculate_cure_dataframe(covariate_values, residuals)
covariate_values |
name to be plot. With or without quotes. |
residuals |
Residuals. |
A data frame with five columns: independent variable, residuals, cumulative residuals, lower confidence interval limit, and upper confidence interval limit.
set.seed(2000) ## Define parameters beta <- c(-1, 0.3, 3) ## Simulate independent variables n <- 900 AADT <- c(runif(n, min = 2000, max = 150000)) nlanes <- sample(x = c(2, 3, 4), size = n, replace = TRUE) LNAADT <- log(AADT) ## Simulate dependent variable theta <- exp(beta[1] + beta[2] * LNAADT + beta[3] * nlanes) y <- rpois(n, theta) ## Fit model mod <- glm(y ~ LNAADT + nlanes, family = poisson) ## Calculate residuals res <- residuals(mod, type = "response") ## Calculate CURE plot data cure_df <- calculate_cure_dataframe(AADT, res) head(cure_df)
set.seed(2000) ## Define parameters beta <- c(-1, 0.3, 3) ## Simulate independent variables n <- 900 AADT <- c(runif(n, min = 2000, max = 150000)) nlanes <- sample(x = c(2, 3, 4), size = n, replace = TRUE) LNAADT <- log(AADT) ## Simulate dependent variable theta <- exp(beta[1] + beta[2] * LNAADT + beta[3] * nlanes) y <- rpois(n, theta) ## Fit model mod <- glm(y ~ LNAADT + nlanes, family = poisson) ## Calculate residuals res <- residuals(mod, type = "response") ## Calculate CURE plot data cure_df <- calculate_cure_dataframe(AADT, res) head(cure_df)
CURE Plot
cure_plot(x, covariate = NULL, n_resamples = 0)
cure_plot(x, covariate = NULL, n_resamples = 0)
x |
Either a data frame produced with
|
covariate |
Required when |
n_resamples |
Number of resamples to overlay on CURE plot. Zero is the default. |
A CURE plot generated with ggplot2.
## basic example code set.seed(2000) ## Define parameters beta <- c(-1, 0.3, 3) ## Simulate independent variables n <- 900 AADT <- c(runif(n, min = 2000, max = 150000)) nlanes <- sample(x = c(2, 3, 4), size = n, replace = TRUE) LNAADT <- log(AADT) ## Simulate dependent variable theta <- exp(beta[1] + beta[2] * LNAADT + beta[3] * nlanes) y <- rpois(n, theta) ## Fit model mod <- glm(y ~ LNAADT + nlanes, family = poisson) ## Calculate residuals res <- residuals(mod, type = "response") ## Calculate CURE plot data cure_df <- calculate_cure_dataframe(AADT, res) head(cure_df) ## Providing CURE data frame cure_plot(cure_df) ## Providing glm object cure_plot(mod, "LNAADT") ## Providing glm object adding resamples cumulative residuals cure_plot(mod, "LNAADT", n_resamples = 3)
## basic example code set.seed(2000) ## Define parameters beta <- c(-1, 0.3, 3) ## Simulate independent variables n <- 900 AADT <- c(runif(n, min = 2000, max = 150000)) nlanes <- sample(x = c(2, 3, 4), size = n, replace = TRUE) LNAADT <- log(AADT) ## Simulate dependent variable theta <- exp(beta[1] + beta[2] * LNAADT + beta[3] * nlanes) y <- rpois(n, theta) ## Fit model mod <- glm(y ~ LNAADT + nlanes, family = poisson) ## Calculate residuals res <- residuals(mod, type = "response") ## Calculate CURE plot data cure_df <- calculate_cure_dataframe(AADT, res) head(cure_df) ## Providing CURE data frame cure_plot(cure_df) ## Providing glm object cure_plot(mod, "LNAADT") ## Providing glm object adding resamples cumulative residuals cure_plot(mod, "LNAADT", n_resamples = 3)
Resample residuals to compute several cumulative residual curves. Receives the covariate values, residuals and number of samples and shuffles (i.e., samples without replacement a vector of the same size) the residuals, and returns a stacked data frame.
resample_residuals(covariate_values, residuals, n_resamples)
resample_residuals(covariate_values, residuals, n_resamples)
covariate_values |
Covariate values. |
residuals |
Residuals. |
n_resamples |
Number of times to sample the residuals. |
Data frame of stacked
library(cureplots) library(ggplot2) ## basic example set.seed(2000) ## Define parameters. beta <- c(-1, 0.3, 3) ## Simulate independent variables n <- 900 AADT <- c(runif(n, min = 2000, max = 150000)) nlanes <- sample(x = c(2, 3, 4), size = n, replace = TRUE) LNAADT <- log(AADT) ## Simulate dependent variable theta <- exp(beta[1] + beta[2] * LNAADT + beta[3] * nlanes) y <- rpois(n, theta) ## Fit model mod <- glm(y ~ LNAADT + nlanes, family = poisson) ## Calculate residuals res <- residuals(mod, type = "response") ## Calculate CURE plot data cure_df <- calculate_cure_dataframe(AADT, res) resampled_residuals_tbl <- resample_residuals(AADT, res, n_resamples = 3) ggplot(data = cure_df) + aes(AADT, cumres) + geom_line( data = resampled_residuals_tbl, aes(group = sample), col = "grey" ) + geom_line(color = "darkgreen", linewidth = 0.8) + geom_line( aes(y = lower), color = "magenta", linetype = "dashed", linewidth = 0.8) + geom_line( aes(y = upper), color = "blue", linetype = "dashed", linewidth = 0.8) + theme_bw()
library(cureplots) library(ggplot2) ## basic example set.seed(2000) ## Define parameters. beta <- c(-1, 0.3, 3) ## Simulate independent variables n <- 900 AADT <- c(runif(n, min = 2000, max = 150000)) nlanes <- sample(x = c(2, 3, 4), size = n, replace = TRUE) LNAADT <- log(AADT) ## Simulate dependent variable theta <- exp(beta[1] + beta[2] * LNAADT + beta[3] * nlanes) y <- rpois(n, theta) ## Fit model mod <- glm(y ~ LNAADT + nlanes, family = poisson) ## Calculate residuals res <- residuals(mod, type = "response") ## Calculate CURE plot data cure_df <- calculate_cure_dataframe(AADT, res) resampled_residuals_tbl <- resample_residuals(AADT, res, n_resamples = 3) ggplot(data = cure_df) + aes(AADT, cumres) + geom_line( data = resampled_residuals_tbl, aes(group = sample), col = "grey" ) + geom_line(color = "darkgreen", linewidth = 0.8) + geom_line( aes(y = lower), color = "magenta", linetype = "dashed", linewidth = 0.8) + geom_line( aes(y = upper), color = "blue", linetype = "dashed", linewidth = 0.8) + theme_bw()
Crashes on Washington primary roads from 2016, 2017, and 2018. Data acquired from Washington Department of Transportation through the Highway Safety Information System (HSIS).
washington_roads
washington_roads
The data frame washington_roads
has 1,501 rows and 9 columns:
Anonymized road ID. Factor.
Year. Integer.
Annual Average Daily Traffic (AADT). Double.
Segment length in miles. Double.
Total crashes. Integer.
Natural logarithm of AADT. Double.
Natural logarithm of length in miles. Double.
Indicator of whether the speed limit is 50 mph or greater. Binary.
Indicator of whether the shoulder is 4 feet or wider. Binary.
<https://highways.dot.gov/research/safety/hsis>