Package 'cureplots'

Title: CURE (Cumulative Residual) Plots
Description: Creates 'ggplot2' Cumulative Residual (CURE) plots to check the goodness-of-fit of a count model; or the tables to create a customized version. A dataset of crashes in Washington state is available for illustrative purposes.
Authors: Jonathan Wood [aut] , Guillermo Basulto-Elias [aut, cre]
Maintainer: Guillermo Basulto-Elias <[email protected]>
License: AGPL (>= 3)
Version: 1.1.1
Built: 2025-03-01 06:53:38 UTC
Source: https://github.com/gbasulto/cureplots

Help Index


Calculate CURE Dataframe

Description

Calculate CURE Dataframe

Usage

calculate_cure_dataframe(covariate_values, residuals)

Arguments

covariate_values

name to be plot. With or without quotes.

residuals

Residuals.

Value

A data frame with five columns: independent variable, residuals, cumulative residuals, lower confidence interval limit, and upper confidence interval limit.

Examples

set.seed(2000)

## Define parameters
beta <- c(-1, 0.3, 3)

## Simulate independent variables
n <- 900
AADT <- c(runif(n, min = 2000, max = 150000))
nlanes <- sample(x = c(2, 3, 4), size = n, replace = TRUE)
LNAADT <- log(AADT)

## Simulate dependent variable
theta <- exp(beta[1] + beta[2] * LNAADT + beta[3] * nlanes)
y <- rpois(n, theta)

## Fit model
mod <- glm(y ~ LNAADT + nlanes, family = poisson)

## Calculate residuals
res <- residuals(mod, type = "response")

## Calculate CURE plot data
cure_df <- calculate_cure_dataframe(AADT, res)

head(cure_df)

CURE Plot

Description

CURE Plot

Usage

cure_plot(x, covariate = NULL, n_resamples = 0)

Arguments

x

Either a data frame produced with calculate_cure_dataframe, in that case, the first column is used to produce CURE plot; or regression model for count data (e.g., Poisson) adjusted with glm or gam.

covariate

Required when x is model fit.

n_resamples

Number of resamples to overlay on CURE plot. Zero is the default.

Value

A CURE plot generated with ggplot2.

Examples

## basic example code

set.seed(2000)

## Define parameters
beta <- c(-1, 0.3, 3)

## Simulate independent variables
n <- 900
AADT <- c(runif(n, min = 2000, max = 150000))
nlanes <- sample(x = c(2, 3, 4), size = n, replace = TRUE)
LNAADT <- log(AADT)

## Simulate dependent variable
theta <- exp(beta[1] + beta[2] * LNAADT + beta[3] * nlanes)
y <- rpois(n, theta)

## Fit model
mod <- glm(y ~ LNAADT + nlanes, family = poisson)

## Calculate residuals
res <- residuals(mod, type = "response")

## Calculate CURE plot data
cure_df <- calculate_cure_dataframe(AADT, res)

head(cure_df)

## Providing CURE data frame
cure_plot(cure_df)

## Providing glm object
cure_plot(mod, "LNAADT")

## Providing glm object adding resamples cumulative residuals
cure_plot(mod, "LNAADT", n_resamples = 3)

Resample residuals

Description

Resample residuals to compute several cumulative residual curves. Receives the covariate values, residuals and number of samples and shuffles (i.e., samples without replacement a vector of the same size) the residuals, and returns a stacked data frame.

Usage

resample_residuals(covariate_values, residuals, n_resamples)

Arguments

covariate_values

Covariate values.

residuals

Residuals.

n_resamples

Number of times to sample the residuals.

Value

Data frame of stacked

Examples

library(cureplots)
library(ggplot2)
## basic example
set.seed(2000)
## Define parameters.
beta <- c(-1, 0.3, 3)
## Simulate independent variables
n <- 900
AADT <- c(runif(n, min = 2000, max = 150000))
nlanes <- sample(x = c(2, 3, 4), size = n, replace = TRUE)
LNAADT <- log(AADT)
## Simulate dependent variable
theta <- exp(beta[1] + beta[2] * LNAADT + beta[3] * nlanes)
y <- rpois(n, theta)
## Fit model
mod <- glm(y ~ LNAADT + nlanes, family = poisson)
## Calculate residuals
res <- residuals(mod, type = "response")
## Calculate CURE plot data
cure_df <- calculate_cure_dataframe(AADT, res)
resampled_residuals_tbl <- resample_residuals(AADT, res, n_resamples = 3)
ggplot(data = cure_df) +
  aes(AADT, cumres) +
  geom_line(
    data = resampled_residuals_tbl,
    aes(group = sample),
    col = "grey"
  ) +
  geom_line(color = "darkgreen", linewidth = 0.8) +
  geom_line(
    aes(y = lower),
    color = "magenta",
    linetype = "dashed",
    linewidth = 0.8) +
  geom_line(
    aes(y = upper),
    color = "blue",
    linetype = "dashed",
    linewidth = 0.8) +
  theme_bw()

Washington Road Crashes

Description

Crashes on Washington primary roads from 2016, 2017, and 2018. Data acquired from Washington Department of Transportation through the Highway Safety Information System (HSIS).

Usage

washington_roads

Format

The data frame washington_roads has 1,501 rows and 9 columns:

ID

Anonymized road ID. Factor.

Year

Year. Integer.

AADT

Annual Average Daily Traffic (AADT). Double.

Length

Segment length in miles. Double.

Total_crashes

Total crashes. Integer.

lnaadt

Natural logarithm of AADT. Double.

lnlength

Natural logarithm of length in miles. Double.

speed50

Indicator of whether the speed limit is 50 mph or greater. Binary.

ShouldWidth04

Indicator of whether the shoulder is 4 feet or wider. Binary.

Source

<https://highways.dot.gov/research/safety/hsis>