Package 'discretefit'

Title: Simulated Goodness-of-Fit Tests for Discrete Distributions
Description: Implements fast Monte Carlo simulations for goodness-of-fit (GOF) tests for discrete distributions. This includes tests based on the Chi-squared statistic, the log-likelihood-ratio (G^2) statistic, the Freeman-Tukey (Hellinger-distance) statistic, the Kolmogorov-Smirnov statistic, the Cramer-von Mises statistic as described in Choulakian, Lockhart and Stephens (1994) <doi:10.2307/3315828>, and the root-mean-square statistic, see Perkins, Tygert, and Ward (2011) <doi:10.1016/j.amc.2011.03.124>.
Authors: Josh McCormick [aut, cre]
Maintainer: Josh McCormick <[email protected]>
License: MIT + file LICENSE
Version: 0.1.2
Built: 2025-02-15 05:25:28 UTC
Source: https://github.com/josh-mc/discretefit

Help Index


Simulated Chi-squared goodness-of-fit test

Description

The chisq_gof() function implements Monte Carlo simulations to calculate p-values based on the Chi-squared statistic for goodness-of-fit tests for discrete distributions.

Usage

chisq_gof(x, p, reps = 10000, tolerance = 64 * .Machine$double.eps)

Arguments

x

a numeric vector that contains observed counts for each bin/category.

p

a vector of probabilities of the same length of x. An error is given if any entry of p is negative or if the sum of p does not equal one.

reps

an integer specifying the number of Monte Carlo simulations. The default is set to 10,000 which may be appropriate for exploratory analysis. A higher number of simulation should be selected for more precise results.

tolerance

sets an upper bound for rounding errors when evaluating whether a statistic for a simulation is greater than or equal to the statistic for the observed data. The default is identical to the tolerance set for simulations in the chisq.test function from the stats package in base R.

Value

A list with class "htest" containing the following components:

statistic

the value of the Chi-squared test statistic

p.value

the simulated p-value for the test

method

a character string describing the test

data.name

a character string give the name of the data

Examples

x <- c(15, 36, 17)
p <- c(0.25, 0.5, 0.25)

chisq_gof(x, p)

Simulated Cramer-von Mises goodness-of-fit test

Description

The cvm_gof() function implements Monte Carlo simulations to calculate p-values based on the Cramer-von Mises statistic (W^2) for goodness-of-fit tests for discrete distributions.

Usage

cvm_gof(x, p, reps = 10000, tolerance = 64 * .Machine$double.eps)

Arguments

x

a numeric vector that contains observed counts for each bin/category.

p

a vector of probabilities of the same length of x. An error is given if any entry of p is negative or if the sum of p does not equal one.

reps

an integer specifying the number of Monte Carlo simulations. The default is set to 10,000 which may be appropriate for exploratory analysis. A higher number of simulation should be selected for more precise results.

tolerance

sets an upper bound for rounding errors when evaluating whether a statistic for a simulation is greater than or equal to the statistic for the observed data. The default is identical to the tolerance set for simulations in the chisq.test function from the stats package in base R.

Value

A list with class "htest" containing the following components:

statistic

the value of the Cramer-von Mises test statistic (W2)

p.value

the simulated p-value for the test

method

a character string describing the test

data.name

a character string give the name of the data

Examples

x <- c(15, 36, 17)
p <- c(0.25, 0.5, 0.25)

cvm_gof(x, p)

Simulated Freeman-Tukey (Hellinger-distance) goodness-of-fit test

Description

The ft_gof() function implements Monte Carlo simulations to calculate p-values based on the Freeman-Tukey statistic for goodness-of-fit tests for discrete distributions. This statistic is also referred to as the Hellinger-distance. Asymptotically, the Freeman-Tukey GOF test is identical to the Chi-squared GOF test, but for smaller n, results may vary significantly.

Usage

ft_gof(x, p, reps = 10000, tolerance = 64 * .Machine$double.eps)

Arguments

x

a numeric vector that contains observed counts for each bin/category.

p

a vector of probabilities of the same length of x. An error is given if any entry of p is negative or if the sum of p does not equal one.

reps

an integer specifying the number of Monte Carlo simulations. The default is set to 10,000 which may be appropriate for exploratory analysis. A higher number of simulation should be selected for more precise results.

tolerance

sets an upper bound for rounding errors when evaluating whether a statistic for a simulation is greater than or equal to the statistic for the observed data. The default is identical to the tolerance set for simulations in the chisq.test function from the stats package in base R.

Value

A list with class "htest" containing the following components:

statistic

the value of the Freeman-Tukey test statistic (W2)

p.value

the simulated p-value for the test

method

a character string describing the test

data.name

a character string give the name of the data

Examples

x <- c(15, 36, 17)
p <- c(0.25, 0.5, 0.25)

ft_gof(x, p)

Simulated log-likelihood-ratio (G^2) goodness-of-fit test

Description

The g_gof() function implements Monte Carlo simulations to calculate p-values based on the log-likelihood-ratio statistic for goodness-of-fit tests for discrete distributions. In this context, the log-likelihood-ratio statistic is often referred to as the G^2 statistic. Asymptotically, the G^2 GOF test is identical to the Chi-squared GOF test, but for smaller n, results may vary significantly.

Usage

g_gof(x, p, reps = 10000, tolerance = 64 * .Machine$double.eps)

Arguments

x

a numeric vector that contains observed counts for each bin/category.

p

a vector of probabilities of the same length of x. An error is given if any entry of p is negative or if the sum of p does not equal one.

reps

an integer specifying the number of Monte Carlo simulations. The default is set to 10,000 which may be appropriate for exploratory analysis. A higher number of simulation should be selected for more precise results.

tolerance

sets an upper bound for rounding errors when evaluating whether a statistic for a simulation is greater than or equal to the statistic for the observed data. The default is identical to the tolerance set for simulations in the chisq.test function from the stats package in base R.

Value

A list with class "htest" containing the following components:

statistic

the value of the log-likelihood-ratio test statistic (G2)

p.value

the simulated p-value for the test

method

a character string describing the test

data.name

a character string give the name of the data

Examples

x <- c(15, 36, 17)
p <- c(0.25, 0.5, 0.25)

g_gof(x, p)

Simulated Kolmogorov-Smirnov goodness-of-fit test

Description

The ks_gof() function implements Monte Carlo simulations to calculate p-values based on the Kolmogorov-Smirnov statistic for goodness-of-fit tests for discrete distributions. The p-value expressed by ks_gof() is based on a two-sided alternative hypothesis.

Usage

ks_gof(x, p, reps = 10000, tolerance = 64 * .Machine$double.eps)

Arguments

x

a numeric vector that contains observed counts for each bin/category.

p

a vector of probabilities of the same length of x. An error is given if any entry of p is negative or if the sum of p does not equal one.

reps

an integer specifying the number of Monte Carlo simulations. The default is set to 10,000 which may be appropriate for exploratory analysis. A higher number of simulation should be selected for more precise results.

tolerance

sets an upper bound for rounding errors when evaluating whether a statistic for a simulation is greater than or equal to the statistic for the observed data. The default is identical to the tolerance set for simulations in the chisq.test function from the stats package in base R.

Value

A list with class "htest" containing the following components:

statistic

the value of the Kolmogorov-Smirnov test statistic

p.value

the simulated p-value for the test

method

a character string describing the test

data.name

a character string give the name of the data

Examples

x <- c(15, 36, 17)
p <- c(0.25, 0.5, 0.25)

ks_gof(x, p)

Simulated root-mean-square goodness-of-fit test

Description

The rms_gof() function implements Monte Carlo simulations to calculate p-values based on the root-mean-square statistic for goodness-of-fit tests for discrete distributions.

Usage

rms_gof(x, p, reps = 10000, tolerance = 64 * .Machine$double.eps)

Arguments

x

a numeric vector that contains observed counts for each bin/category.

p

a vector of probabilities of the same length of x. An error is given if any entry of p is negative or if the sum of p does not equal one.

reps

an integer specifying the number of Monte Carlo simulations. The default is set to 10,000 which may be appropriate for exploratory analysis. A higher number of simulation should be selected for more precise results.

tolerance

sets an upper bound for rounding errors when evaluating whether a statistic for a simulation is greater than or equal to the statistic for the observed data. The default is identical to the tolerance set for simulations in the chisq.test function from the stats package in base R.

Value

A list with class "htest" containing the following components:

statistic

the value of the root-mean-square test statistic

p.value

the simulated p-value for the test

method

a character string describing the test

data.name

a character string give the name of the data

Examples

x <- c(15, 36, 17)
p <- c(0.25, 0.5, 0.25)

rms_gof(x, p)