Title: | Pipe-Friendly Framework for Basic Statistical Tests |
---|---|
Description: | Provides a simple and intuitive pipe-friendly framework, coherent with the 'tidyverse' design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses. The output of each test is automatically transformed into a tidy data frame to facilitate visualization. Additional functions are available for reshaping, reordering, manipulating and visualizing correlation matrix. Functions are also included to facilitate the analysis of factorial experiments, including purely 'within-Ss' designs (repeated measures), purely 'between-Ss' designs, and mixed 'within-and-between-Ss' designs. It's also possible to compute several effect size metrics, including "eta squared" for ANOVA, "Cohen's d" for t-test and 'Cramer V' for the association between categorical variables. The package contains helper functions for identifying univariate and multivariate outliers, assessing normality and homogeneity of variances. |
Authors: | Alboukadel Kassambara [aut, cre] |
Maintainer: | Alboukadel Kassambara <[email protected]> |
License: | GPL-2 |
Version: | 0.7.2.999 |
Built: | 2024-11-22 03:37:11 UTC |
Source: | https://github.com/kassambara/rstatix |
Add p-value significance symbols into a data frame.
add_significance( data, p.col = NULL, output.col = NULL, cutpoints = c(0, 1e-04, 0.001, 0.01, 0.05, 1), symbols = c("****", "***", "**", "*", "ns") )
add_significance( data, p.col = NULL, output.col = NULL, cutpoints = c(0, 1e-04, 0.001, 0.01, 0.05, 1), symbols = c("****", "***", "**", "*", "ns") )
data |
a data frame containing a p-value column. |
p.col |
column name containing p-values. |
output.col |
the output column name to hold the adjusted p-values. |
cutpoints |
numeric vector used for intervals. |
symbols |
character vector, one shorter than cutpoints, used as significance symbols. |
a data frame
# Perform pairwise comparisons and adjust p-values ToothGrowth %>% t_test(len ~ dose) %>% adjust_pvalue() %>% add_significance("p.adj")
# Perform pairwise comparisons and adjust p-values ToothGrowth %>% t_test(len ~ dose) %>% adjust_pvalue() %>% add_significance("p.adj")
A pipe-friendly function to add an adjusted p-value column into a data frame. Supports grouped data.
adjust_pvalue(data, p.col = NULL, output.col = NULL, method = "holm")
adjust_pvalue(data, p.col = NULL, output.col = NULL, method = "holm")
data |
a data frame containing a p-value column |
p.col |
column name containing p-values |
output.col |
the output column name to hold the adjusted p-values |
method |
method for adjusting p values (see
|
a data frame
# Perform pairwise comparisons and adjust p-values ToothGrowth %>% t_test(len ~ dose) %>% adjust_pvalue()
# Perform pairwise comparisons and adjust p-values ToothGrowth %>% t_test(len ~ dose) %>% adjust_pvalue()
Create beautiful summary tables of ANOVA test results obtained
from either Anova()
or aov()
.
The results include ANOVA table, generalized effect size and some assumption checks.
anova_summary(object, effect.size = "ges", detailed = FALSE, observed = NULL)
anova_summary(object, effect.size = "ges", detailed = FALSE, observed = NULL)
object |
|
effect.size |
the effect size to compute and to show in the ANOVA results. Allowed values can be either "ges" (generalized eta squared) or "pes" (partial eta squared) or both. Default is "ges". |
detailed |
If TRUE, returns extra information (sums of squares columns, intercept row, etc.) in the ANOVA table. |
observed |
Variables that are observed (i.e, measured) as compared to experimentally manipulated. The default effect size reported (generalized eta-squared) requires correct specification of the observed variables. |
return an object of class anova_test
a data frame containing
the ANOVA table for independent measures ANOVA. However, for repeated/mixed
measures ANOVA, it is a list containing the following components are
returned:
ANOVA: a data frame containing ANOVA results
Mauchly's Test for Sphericity: If any within-Ss variables with more than 2 levels are present, a data frame containing the results of Mauchly's test for Sphericity. Only reported for effects that have more than 2 levels because sphericity necessarily holds for effects with only 2 levels.
Sphericity Corrections: If any within-Ss variables are present, a data frame containing the Greenhouse-Geisser and Huynh-Feldt epsilon values, and corresponding corrected p-values.
The returned object might have an attribute called args
if
you compute ANOVA using the function anova_test()
. The attribute args
is a
list holding the arguments used to fit the ANOVA model, including: data, dv,
within, between, type, model, etc.
The following abbreviations are used in the different results tables:
DFn Degrees of Freedom in the numerator (i.e. DF effect).
DFd Degrees of Freedom in the denominator (i.e., DF error).
SSn Sum of Squares in the numerator (i.e., SS effect).
SSd Sum of Squares in the denominator (i.e.,SS error).
F F-value.
p p-value (probability of the data given the null hypothesis).
p<.05 Highlights p-values less than the traditional alpha level of .05.
ges Generalized Eta-Squared measure of effect size.
GGe Greenhouse-Geisser epsilon.
p[GGe] p-value after correction using Greenhouse-Geisser epsilon.
p[GGe]<.05 Highlights p-values (after correction using Greenhouse-Geisser epsilon) less than the traditional alpha level of .05.
HFe Huynh-Feldt epsilon.
p[HFe] p-value after correction using Huynh-Feldt epsilon.
p[HFe]<.05 Highlights p-values (after correction using Huynh-Feldt epsilon) less than the traditional alpha level of .05.
W Mauchly's W statistic
Alboukadel Kassambara, [email protected]
anova_test()
, factorial_design()
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth df$dose <- as.factor(df$dose) # Independent measures ANOVA #::::::::::::::::::::::::::::::::::::::::: # Compute ANOVA and display the summary res.anova <- Anova(lm(len ~ dose*supp, data = df)) anova_summary(res.anova) # Display both SSn and SSd using detailed = TRUE # Show generalized eta squared using effect.size = "ges" anova_summary(res.anova, detailed = TRUE, effect.size = "ges") # Show partial eta squared using effect.size = "pes" anova_summary(res.anova, detailed = TRUE, effect.size = "pes") # Repeated measures designs using car::Anova() #::::::::::::::::::::::::::::::::::::::::: # Prepare the data df$id <- as.factor(rep(1:10, 6)) # Add individuals ids head(df) # Easily perform repeated measures ANOVA using the car package design <- factorial_design(df, dv = len, wid = id, within = c(supp, dose)) res.anova <- Anova(design$model, idata = design$idata, idesign = design$idesign, type = 3) anova_summary(res.anova) # Repeated measures designs using stats::Aov() #::::::::::::::::::::::::::::::::::::::::: res.anova <- aov(len ~ dose*supp + Error(id/(supp*dose)), data = df) anova_summary(res.anova)
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth df$dose <- as.factor(df$dose) # Independent measures ANOVA #::::::::::::::::::::::::::::::::::::::::: # Compute ANOVA and display the summary res.anova <- Anova(lm(len ~ dose*supp, data = df)) anova_summary(res.anova) # Display both SSn and SSd using detailed = TRUE # Show generalized eta squared using effect.size = "ges" anova_summary(res.anova, detailed = TRUE, effect.size = "ges") # Show partial eta squared using effect.size = "pes" anova_summary(res.anova, detailed = TRUE, effect.size = "pes") # Repeated measures designs using car::Anova() #::::::::::::::::::::::::::::::::::::::::: # Prepare the data df$id <- as.factor(rep(1:10, 6)) # Add individuals ids head(df) # Easily perform repeated measures ANOVA using the car package design <- factorial_design(df, dv = len, wid = id, within = c(supp, dose)) res.anova <- Anova(design$model, idata = design$idata, idesign = design$idesign, type = 3) anova_summary(res.anova) # Repeated measures designs using stats::Aov() #::::::::::::::::::::::::::::::::::::::::: res.anova <- aov(len ~ dose*supp + Error(id/(supp*dose)), data = df) anova_summary(res.anova)
Provides a pipe-friendly framework to perform different types of ANOVA tests, including:
Independent measures ANOVA: between-Subjects designs,
Repeated measures ANOVA: within-Subjects designs
Mixed ANOVA: Mixed within within- and between-Subjects designs, also known as split-plot ANOVA and
The function is an easy to use wrapper around Anova()
and
aov()
. It makes ANOVA computation handy in R and It's
highly flexible: can support model and formula as input. Variables can be
also specified as character vector using the arguments dv, wid,
between, within, covariate
.
The results include ANOVA table, generalized effect size and some assumption checks.
anova_test( data, formula, dv, wid, between, within, covariate, type = NULL, effect.size = "ges", error = NULL, white.adjust = FALSE, observed = NULL, detailed = FALSE ) get_anova_table(x, correction = c("auto", "GG", "HF", "none")) ## S3 method for class 'anova_test' print(x, ...) ## S3 method for class 'anova_test' plot(x, ...)
anova_test( data, formula, dv, wid, between, within, covariate, type = NULL, effect.size = "ges", error = NULL, white.adjust = FALSE, observed = NULL, detailed = FALSE ) get_anova_table(x, correction = c("auto", "GG", "HF", "none")) ## S3 method for class 'anova_test' print(x, ...) ## S3 method for class 'anova_test' plot(x, ...)
data |
a data.frame or a model to be analyzed. |
formula |
a formula specifying the ANOVA model similar to
aov. Can be of the form Examples of supported formula include:
If the formula doesn't contain any within vars, a linear model is directly fitted and passed to the ANOVA function. For repeated designs, the ANOVA variables are parsed from the formula. |
dv |
(numeric) dependent variable name. |
wid |
(factor) column name containing individuals/subjects identifier. Should be unique per individual. |
between |
(optional) between-subject factor variables. |
within |
(optional) within-subjects factor variables |
covariate |
(optional) covariate names (for ANCOVA) |
type |
the type of sums of squares for ANOVA. Allowed values are either
1, 2 or 3. |
effect.size |
the effect size to compute and to show in the ANOVA results. Allowed values can be either "ges" (generalized eta squared) or "pes" (partial eta squared) or both. Default is "ges". |
error |
(optional) for a linear model, an lm model object from which the
overall error sum of squares and degrees of freedom are to be calculated.
Read more in |
white.adjust |
Default is FALSE. If TRUE, heteroscedasticity correction is applied to the coefficient of covariance matrix. Used only for independent measures ANOVA. |
observed |
Variables that are observed (i.e, measured) as compared to experimentally manipulated. The default effect size reported (generalized eta-squared) requires correct specification of the observed variables. |
detailed |
If TRUE, returns extra information (sums of squares columns, intercept row, etc.) in the ANOVA table. |
x |
an object of class |
correction |
character. Used only in repeated measures ANOVA test to specify which correction of the degrees of freedom should be reported for the within-subject factors. Possible values are:
|
... |
additional arguments |
The setting in anova_test()
is done in such a way that it
gives the same results as SPSS, one of the most used commercial software. By
default, R uses treatment contrasts, where each of the levels is compared to
the first level used as baseline. The default contrast can be checked using
options('contrasts')
. In the function anova_test()
, the
following setting is used
options(contrasts=c('contr.sum','contr.poly'))
, which gives
orthogonal contrasts where you compare every level to the overall mean. This
setting gives the same output as the most commonly used commercial
softwares, like SPSS. If you want to obtain the same result with the
function car::Anova()
as the one obtained with
rstatix::anova_test()
, then don't forget to set
options(contrasts=c('contr.sum','contr.poly'))
.
return an object of class anova_test
a data frame containing
the ANOVA table for independent measures ANOVA.
However, for repeated/mixed measures ANOVA, a list containing the following
components are returned: ANOVA table, Mauchly's Test for Sphericity,
Sphericity Corrections. These table are described more in the documentation
of the function anova_summary()
.
The returned object has an attribute called args
, which is a
list holding the arguments used to fit the ANOVA model, including: data, dv,
within, between, type, model, etc.
anova_test()
: perform anova test
get_anova_table()
: extract anova table from an object of class
anova_test
. When within-subject factors are present, either
sphericity corrected or uncorrected degrees of freedom can be reported.
Alboukadel Kassambara, [email protected]
anova_summary()
, factorial_design()
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # One-way ANOVA test #::::::::::::::::::::::::::::::::::::::::: df %>% anova_test(len ~ dose) # Grouped One-way ANOVA test #::::::::::::::::::::::::::::::::::::::::: df %>% group_by(supp) %>% anova_test(len ~ dose) # Two-way ANOVA test #::::::::::::::::::::::::::::::::::::::::: df %>% anova_test(len ~ supp*dose) # Two-way repeated measures ANOVA #::::::::::::::::::::::::::::::::::::::::: df$id <- rep(1:10, 6) # Add individuals id # Use formula df %>% anova_test(len ~ supp*dose + Error(id/(supp*dose))) # or use character vector df %>% anova_test(dv = len, wid = id, within = c(supp, dose)) # Extract ANOVA table and apply correction #::::::::::::::::::::::::::::::::::::::::: res.aov <- df %>% anova_test(dv = len, wid = id, within = c(supp, dose)) get_anova_table(res.aov, correction = "GG") # Use model as arguments #::::::::::::::::::::::::::::::::::::::::: .my.model <- lm(yield ~ block + N*P*K, npk) anova_test(.my.model)
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # One-way ANOVA test #::::::::::::::::::::::::::::::::::::::::: df %>% anova_test(len ~ dose) # Grouped One-way ANOVA test #::::::::::::::::::::::::::::::::::::::::: df %>% group_by(supp) %>% anova_test(len ~ dose) # Two-way ANOVA test #::::::::::::::::::::::::::::::::::::::::: df %>% anova_test(len ~ supp*dose) # Two-way repeated measures ANOVA #::::::::::::::::::::::::::::::::::::::::: df$id <- rep(1:10, 6) # Add individuals id # Use formula df %>% anova_test(len ~ supp*dose + Error(id/(supp*dose))) # or use character vector df %>% anova_test(dv = len, wid = id, within = c(supp, dose)) # Extract ANOVA table and apply correction #::::::::::::::::::::::::::::::::::::::::: res.aov <- df %>% anova_test(dv = len, wid = id, within = c(supp, dose)) get_anova_table(res.aov, correction = "GG") # Use model as arguments #::::::::::::::::::::::::::::::::::::::::: .my.model <- lm(yield ~ block + N*P*K, npk) anova_test(.my.model)
Convert a correlation test data frame, returned by the
cor_test()
, into a correlation matrix format.
as_cor_mat(x)
as_cor_mat(x)
x |
an object of class |
Returns a data frame containing the matrix of the correlation coefficients. The output has an attribute named "pvalue", which contains the matrix of the correlation test p-values.
# Pairwise correlation tests between variables #::::::::::::::::::::::::::::::::::::::::::::::: res.cor.test <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_test() res.cor.test # Convert the correlation test into a correlation matrix #::::::::::::::::::::::::::::::::::::::::::::::: res.cor.test %>% as_cor_mat()
# Pairwise correlation tests between variables #::::::::::::::::::::::::::::::::::::::::::::::: res.cor.test <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_test() res.cor.test # Convert the correlation test into a correlation matrix #::::::::::::::::::::::::::::::::::::::::::::::: res.cor.test %>% as_cor_mat()
Performs exact binomial test and pairwise comparisons following a
significant exact multinomial test. Wrapper around the R base function
link[stats]{binom.test}()
that returns a data frame as a result.
binom_test( x, n, p = 0.5, alternative = "two.sided", conf.level = 0.95, detailed = FALSE ) pairwise_binom_test( x, p.adjust.method = "holm", alternative = "two.sided", conf.level = 0.95 ) pairwise_binom_test_against_p( x, p = rep(1/length(x), length(x)), p.adjust.method = "holm", alternative = "two.sided", conf.level = 0.95 )
binom_test( x, n, p = 0.5, alternative = "two.sided", conf.level = 0.95, detailed = FALSE ) pairwise_binom_test( x, p.adjust.method = "holm", alternative = "two.sided", conf.level = 0.95 ) pairwise_binom_test_against_p( x, p = rep(1/length(x), length(x)), p.adjust.method = "holm", alternative = "two.sided", conf.level = 0.95 )
x |
numeric vector containing the counts. |
n |
number of trials; ignored if |
p |
a vector of probabilities of success. The length of p must be the same as the number of groups specified by x, and its elements must be greater than 0 and less than 1. |
alternative |
indicates the alternative hypothesis and must be
one of |
conf.level |
confidence level for the returned confidence interval. |
detailed |
logical value. Default is FALSE. If TRUE, a detailed result is shown. |
p.adjust.method |
method to adjust p values for multiple comparisons. Used when pairwise comparisons are performed. Allowed values include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". If you don't want to adjust the p value (not recommended), use p.adjust.method = "none". |
return a data frame containing the p-value and its significance. with some the following columns:
group, group1, group2
:
the categories or groups being compared.
statistic
: the number
of successes.
parameter
: the number of trials.
p
:
p-value of the test.
p.adj
: the adjusted p-value.
method
: the used statistical test.
p.signif,
p.adj.signif
: the significance level of p-values and adjusted p-values,
respectively.
estimate
: the estimated probability of success.
alternative
: a character string describing the alternative
hypothesis.
conf.low,conf.high
: Lower and upper bound on a
confidence interval for the probability of success.
The returned object has an attribute called args, which is a list holding the test arguments.
binom_test()
: performs exact binomial test. Wrapper around the R
base function binom.test
that returns a dataframe as a
result.
pairwise_binom_test()
: performs pairwise comparisons (binomial test)
following a significant exact multinomial test.
pairwise_binom_test_against_p()
: performs pairwise comparisons (binomial test)
following a significant exact multinomial test for given probabilities.
# Exact binomial test #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: 160 mice with cancer including 95 male and 65 female # Q1: Does cancer affect more males than females? binom_test(x = 95, n = 160) # => yes, there are a significant difference # Q2: compare the observed proportion of males # to an expected proportion (p = 3/5) binom_test(x = 95, n = 160, p = 3/5) # => there are no significant difference # Multinomial test #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data tulip <- c(red = 81, yellow = 50, white = 27) # Question 1: are the color equally common ? # this is a test of homogeneity res <- multinom_test(tulip) res attr(res, "descriptives") # Pairwise comparisons between groups pairwise_binom_test(tulip, p.adjust.method = "bonferroni") # Question 2: comparing observed to expected proportions # this is a goodness-of-fit test expected.p <- c(red = 0.5, yellow = 0.33, white = 0.17) res <- multinom_test(tulip, expected.p) res attr(res, "descriptives") # Pairwise comparisons against a given probabilities pairwise_binom_test_against_p(tulip, expected.p)
# Exact binomial test #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: 160 mice with cancer including 95 male and 65 female # Q1: Does cancer affect more males than females? binom_test(x = 95, n = 160) # => yes, there are a significant difference # Q2: compare the observed proportion of males # to an expected proportion (p = 3/5) binom_test(x = 95, n = 160, p = 3/5) # => there are no significant difference # Multinomial test #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data tulip <- c(red = 81, yellow = 50, white = 27) # Question 1: are the color equally common ? # this is a test of homogeneity res <- multinom_test(tulip) res attr(res, "descriptives") # Pairwise comparisons between groups pairwise_binom_test(tulip, p.adjust.method = "bonferroni") # Question 2: comparing observed to expected proportions # this is a goodness-of-fit test expected.p <- c(red = 0.5, yellow = 0.33, white = 0.17) res <- multinom_test(tulip, expected.p) res attr(res, "descriptives") # Pairwise comparisons against a given probabilities pairwise_binom_test_against_p(tulip, expected.p)
Performs the Box's M-test for homogeneity of covariance matrices obtained from multivariate normal data according to one grouping variable. The test is based on the chi-square approximation.
box_m(data, group)
box_m(data, group)
data |
a numeric data.frame or matrix containing n observations of p variables; it is expected that n > p. |
group |
a vector of length n containing the class of each observation; it is usually a factor. |
A data frame containing the following components:
statistic an approximated value of the chi-square distribution.
parameter the degrees of freedom related of the test statistic in this case that it follows a Chi-square distribution.
p.value the p-value of the test.
method the character string "Box's M-test for Homogeneity of Covariance Matrices".
data(iris) box_m(iris[, -5], iris[, 5])
data(iris) box_m(iris[, -5], iris[, 5])
Performs chi-squared tests, including goodness-of-fit, homogeneity and independence tests.
chisq_test( x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), rescale.p = FALSE, simulate.p.value = FALSE, B = 2000 ) pairwise_chisq_gof_test(x, p.adjust.method = "holm", ...) pairwise_chisq_test_against_p( x, p = rep(1/length(x), length(x)), p.adjust.method = "holm", ... ) chisq_descriptives(res.chisq) expected_freq(res.chisq) observed_freq(res.chisq) pearson_residuals(res.chisq) std_residuals(res.chisq)
chisq_test( x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), rescale.p = FALSE, simulate.p.value = FALSE, B = 2000 ) pairwise_chisq_gof_test(x, p.adjust.method = "holm", ...) pairwise_chisq_test_against_p( x, p = rep(1/length(x), length(x)), p.adjust.method = "holm", ... ) chisq_descriptives(res.chisq) expected_freq(res.chisq) observed_freq(res.chisq) pearson_residuals(res.chisq) std_residuals(res.chisq)
x |
a numeric vector or matrix. |
y |
a numeric vector; ignored if |
correct |
a logical indicating whether to apply continuity
correction when computing the test statistic for 2 by 2 tables: one
half is subtracted from all |
p |
a vector of probabilities of the same length of |
rescale.p |
a logical scalar; if TRUE then |
simulate.p.value |
a logical indicating whether to compute p-values by Monte Carlo simulation. |
B |
an integer specifying the number of replicates used in the Monte Carlo test. |
p.adjust.method |
method to adjust p values for multiple comparisons. Used when pairwise comparisons are performed. Allowed values include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". If you don't want to adjust the p value (not recommended), use p.adjust.method = "none". |
... |
other arguments passed to the function |
res.chisq |
an object of class |
return a data frame with some the following columns:
n
: the number of participants.
group, group1, group2
:
the categories or groups being compared.
statistic
: the value
of Pearson's chi-squared test statistic.
df
: the degrees of
freedom of the approximate chi-squared distribution of the test statistic.
NA if the p-value is computed by Monte Carlo simulation.
p
:
p-value.
p.adj
: the adjusted p-value.
method
: the
used statistical test.
p.signif, p.adj.signif
: the significance
level of p-values and adjusted p-values, respectively.
observed
: observed counts.
expected
: the expected counts under the null hypothesis.
The returned object has an attribute called args, which is a list holding the test arguments.
chisq_test()
: performs chi-square tests including goodness-of-fit,
homogeneity and independence tests.
pairwise_chisq_gof_test()
: perform pairwise comparisons between groups following a global
chi-square goodness of fit test.
pairwise_chisq_test_against_p()
: perform pairwise comparisons after a global
chi-squared test for given probabilities. For each group, the observed and
the expected proportions are shown. Each group is compared to the sum of
all others.
chisq_descriptives()
: returns the descriptive statistics of the chi-square
test. These include, observed and expected frequencies, proportions,
residuals and standardized residuals.
expected_freq()
: returns the expected counts from the chi-square test result.
observed_freq()
: returns the observed counts from the chi-square test result.
pearson_residuals()
: returns the Pearson residuals, (observed - expected) / sqrt(expected)
.
std_residuals()
: returns the standardized residuals
# Chi-square goodness of fit test #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% tulip <- c(red = 81, yellow = 50, white = 27) # Q1: Are the colors equally common? chisq_test(tulip) pairwise_chisq_gof_test(tulip) # Q2: comparing observed to expected proportions chisq_test(tulip, p = c(1/2, 1/3, 1/6)) pairwise_chisq_test_against_p(tulip, p = c(0.5, 0.33, 0.17)) # Homogeneity of proportions between groups #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: Titanic xtab <- as.table(rbind( c(203, 118, 178, 212), c(122, 167, 528, 673) )) dimnames(xtab) <- list( Survived = c("Yes", "No"), Class = c("1st", "2nd", "3rd", "Crew") ) xtab # Chi-square test chisq_test(xtab) # Compare the proportion of survived between groups pairwise_prop_test(xtab)
# Chi-square goodness of fit test #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% tulip <- c(red = 81, yellow = 50, white = 27) # Q1: Are the colors equally common? chisq_test(tulip) pairwise_chisq_gof_test(tulip) # Q2: comparing observed to expected proportions chisq_test(tulip, p = c(1/2, 1/3, 1/6)) pairwise_chisq_test_against_p(tulip, p = c(0.5, 0.33, 0.17)) # Homogeneity of proportions between groups #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: Titanic xtab <- as.table(rbind( c(203, 118, 178, 212), c(122, 167, 528, 673) )) dimnames(xtab) <- list( Survived = c("Yes", "No"), Class = c("1st", "2nd", "3rd", "Crew") ) xtab # Chi-square test chisq_test(xtab) # Compare the proportion of survived between groups pairwise_prop_test(xtab)
Performs the Cochran's Q test for unreplicated randomized block
design experiments with a binary response variable and paired data. This
test is analogue to the friedman.test()
with 0,1 coded
response. It's an extension of the McNemar Chi-squared test for comparing
more than two paired proportions.
cochran_qtest(data, formula)
cochran_qtest(data, formula)
data |
a data frame containing the variables in the formula. |
formula |
a formula of the form |
# Generate a demo data mydata <- data.frame( outcome = c(0,1,1,0,0,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,0,0,1,0,1,1,0,0,1), treatment = gl(3,1,30,labels=LETTERS[1:3]), participant = gl(10,3,labels=letters[1:10]) ) mydata$outcome <- factor( mydata$outcome, levels = c(1, 0), labels = c("success", "failure") ) # Cross-tabulation xtabs(~outcome + treatment, mydata) # Compare the proportion of success between treatments cochran_qtest(mydata, outcome ~ treatment|participant) # pairwise comparisons between groups pairwise_mcnemar_test(mydata, outcome ~ treatment|participant)
# Generate a demo data mydata <- data.frame( outcome = c(0,1,1,0,0,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,0,0,1,0,1,1,0,0,1), treatment = gl(3,1,30,labels=LETTERS[1:3]), participant = gl(10,3,labels=letters[1:10]) ) mydata$outcome <- factor( mydata$outcome, levels = c(1, 0), labels = c("success", "failure") ) # Cross-tabulation xtabs(~outcome + treatment, mydata) # Compare the proportion of success between treatments cochran_qtest(mydata, outcome ~ treatment|participant) # pairwise comparisons between groups pairwise_mcnemar_test(mydata, outcome ~ treatment|participant)
Compute the effect size for t-test. T-test conventional effect sizes, proposed by Cohen, are: 0.2 (small effect), 0.5 (moderate effect) and 0.8 (large effect).
Cohen's d
is calculated as the difference between means or mean minus
mu
divided by the estimated standardized deviation.
For independent samples t-test, there are two possibilities implemented. If the t-test did not make a homogeneity of variance assumption, (the Welch test), the variance term will mirror the Welch test, otherwise a pooled estimate is used.
If a paired samples t-test was requested, then effect size desired is based on the standard deviation of the differences.
It can also returns confidence intervals by bootstap.
cohens_d( data, formula, comparisons = NULL, ref.group = NULL, paired = FALSE, mu = 0, var.equal = FALSE, hedges.correction = FALSE, ci = FALSE, conf.level = 0.95, ci.type = "perc", nboot = 1000 )
cohens_d( data, formula, comparisons = NULL, ref.group = NULL, paired = FALSE, mu = 0, var.equal = FALSE, hedges.correction = FALSE, ci = FALSE, conf.level = 0.95, ci.type = "perc", nboot = 1000 )
data |
a data.frame containing the variables in the formula. |
formula |
a formula of the form |
comparisons |
A list of length-2 vectors specifying the groups of
interest to be compared. For example to compare groups "A" vs "B" and "B" vs
"C", the argument is as follow: |
ref.group |
a character string specifying the reference group. If specified, for a given grouping variable, each of the group levels will be compared to the reference group (i.e. control group). If |
paired |
a logical indicating whether you want a paired test. |
mu |
theoretical mean, use for one-sample t-test. Default is 0. |
var.equal |
a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used. Used only for unpaired or independent samples test. |
hedges.correction |
logical indicating whether apply the Hedges
correction by multiplying the usual value of Cohen's d by
|
ci |
If TRUE, returns confidence intervals by bootstrap. May be slow. |
conf.level |
The level for the confidence interval. |
ci.type |
The type of confidence interval to use. Can be any of "norm",
"basic", "perc", or "bca". Passed to |
nboot |
The number of replications to use for bootstrap. |
Quantification of the effect size magnitude is performed using the
thresholds defined in Cohen (1992). The magnitude is assessed using the
thresholds provided in (Cohen 1992), i.e. |d| < 0.2
"negligible",
|d| < 0.5
"small", |d| < 0.8
"medium", otherwise "large".
return a data frame with some of the following columns:
.y.
: the y variable used in the test.
group1,group2
: the compared groups in the pairwise tests.
n,n1,n2
: Sample counts.
effsize
: estimate of the effect
size (d
value).
magnitude
: magnitude of effect size.
conf.low,conf.high
: lower and upper bound of the effect size
confidence interval.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York:Academic Press.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159.
Hedges, Larry & Olkin, Ingram. (1985). Statistical Methods in Meta-Analysis. 10.2307/1164953.
Navarro, Daniel. 2015. Learning Statistics with R: A Tutorial for Psychology Students and Other Beginners (Version 0.5).
# One-sample t test effect size ToothGrowth %>% cohens_d(len ~ 1, mu = 0) # Two indepedent samples t-test effect size ToothGrowth %>% cohens_d(len ~ supp, var.equal = TRUE) # Paired samples effect size df <- data.frame( id = 1:5, pre = c(110, 122, 101, 120, 140), post = c(150, 160, 110, 140, 155) ) df <- df %>% gather(key = "treatment", value = "value", -id) head(df) df %>% cohens_d(value ~ treatment, paired = TRUE)
# One-sample t test effect size ToothGrowth %>% cohens_d(len ~ 1, mu = 0) # Two indepedent samples t-test effect size ToothGrowth %>% cohens_d(len ~ supp, var.equal = TRUE) # Paired samples effect size df <- data.frame( id = 1:5, pre = c(110, 122, 101, 120, 140), post = c(150, 160, 110, 140, 155) ) df <- df %>% gather(key = "treatment", value = "value", -id) head(df) df %>% cohens_d(value ~ treatment, paired = TRUE)
Provides pipe-friendly functions to convert simultaneously multiple variables into a factor variable.
Helper functions are also available to set the reference level and the levels order.
convert_as_factor(data, ..., vars = NULL, make.valid.levels = FALSE) set_ref_level(data, name, ref) reorder_levels(data, name, order)
convert_as_factor(data, ..., vars = NULL, make.valid.levels = FALSE) set_ref_level(data, name, ref) reorder_levels(data, name, order)
data |
a data frame |
... |
one unquoted expressions (or variable name) specifying the name of
the variables you want to convert into factor. Alternative to the argument
|
vars |
a character vector specifying the variables to convert into factor. |
make.valid.levels |
logical. Default is FALSE. If TRUE, converts the variable to factor and add a leading character (x) if starting with a digit. |
name |
a factor variable name. Can be unquoted. For example, use
|
ref |
the reference level. |
order |
a character vector specifying the order of the factor levels |
convert_as_factor()
: Convert one or multiple variables into factor.
set_ref_level()
: Change a factor reference level or group.
reorder_levels()
: Change the order of a factor levels
# Create a demo data df <- tibble( group = c("a", "a", "b", "b", "c", "c"), time = c("t1", "t2", "t1", "t2", "t1", "t2"), value = c(5, 6, 1, 3, 4, 5) ) df # Convert group and time into factor variable result <- df %>% convert_as_factor(group, time) result # Show group levels levels(result$group) # Set c as the reference level (the first one) result <- result %>% set_ref_level("group", ref = "c") levels(result$group) # Set the order of levels result <- result %>% reorder_levels("group", order = c("b", "c", "a")) levels(result$group)
# Create a demo data df <- tibble( group = c("a", "a", "b", "b", "c", "c"), time = c("t1", "t2", "t1", "t2", "t1", "t2"), value = c(5, 6, 1, 3, 4, 5) ) df # Convert group and time into factor variable result <- df %>% convert_as_factor(group, time) result # Show group levels levels(result$group) # Set c as the reference level (the first one) result <- result %>% set_ref_level("group", ref = "c") levels(result$group) # Set the order of levels result <- result %>% reorder_levels("group", order = c("b", "c", "a")) levels(result$group)
Take a correlation matrix and replace the correlation coefficients by symbols according to the level of the correlation.
cor_as_symbols( x, cutpoints = c(0, 0.25, 0.5, 0.75, 1), symbols = c(" ", ".", "+", "*") )
cor_as_symbols( x, cutpoints = c(0, 0.25, 0.5, 0.75, 1), symbols = c(" ", ".", "+", "*") )
x |
a correlation matrix. Particularly, an object of class |
cutpoints |
numeric vector used for intervals. Default values are
|
symbols |
character vector, one shorter than cutpoints, used as
correlation coefficient symbols. Default values are |
cor_mat()
# Compute correlation matrix #:::::::::::::::::::::::::::::::::::::::::: cor.mat <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_mat() # Replace correlation coefficient by symbols #:::::::::::::::::::::::::::::::::::::::::: cor.mat %>% cor_as_symbols() %>% pull_lower_triangle()
# Compute correlation matrix #:::::::::::::::::::::::::::::::::::::::::: cor.mat <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_mat() # Replace correlation coefficient by symbols #:::::::::::::::::::::::::::::::::::::::::: cor.mat %>% cor_as_symbols() %>% pull_lower_triangle()
Reshape correlation analysis results. Key functions:
cor_gather()
: takes a correlation matrix and collapses (i.e. melt) it into a paired list
(long format).
cor_spread()
: spread a long correlation data format across
multiple columns. Particularly, it takes the results of cor_test
and transforms it into a correlation matrix.
cor_gather(data, drop.na = TRUE) cor_spread(data, value = "cor")
cor_gather(data, drop.na = TRUE) cor_spread(data, value = "cor")
data |
a data frame or matrix. |
drop.na |
logical. If TRUE, drop rows containing missing values after gathering the data. |
value |
column name containing the value to spread. |
cor_gather()
: takes a correlation matrix and collapses (or melt) it into long
format data frame (paired list)
cor_spread()
: spread a long correlation data frame into wide
format. Expects the columns "var1", "var2" and "cor" in the data.
(correlation matrix).
cor_mat()
, cor_reorder()
# Data preparation #:::::::::::::::::::::::::::::::::::::::::: mydata <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) head(mydata, 3) # Reshape a correlation matrix #:::::::::::::::::::::::::::::::::::::::::: # Compute a correlation matrix cor.mat <- mydata %>% cor_mat() cor.mat # Collapse the correlation matrix into long format # paired list data frame long.format <- cor.mat %>% cor_gather() long.format # Spread a correlation data format #:::::::::::::::::::::::::::::::::::::::::: # Spread the correlation coefficient value long.format %>% cor_spread(value = "cor") # Spread the p-value long.format %>% cor_spread(value = "p")
# Data preparation #:::::::::::::::::::::::::::::::::::::::::: mydata <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) head(mydata, 3) # Reshape a correlation matrix #:::::::::::::::::::::::::::::::::::::::::: # Compute a correlation matrix cor.mat <- mydata %>% cor_mat() cor.mat # Collapse the correlation matrix into long format # paired list data frame long.format <- cor.mat %>% cor_gather() long.format # Spread a correlation data format #:::::::::::::::::::::::::::::::::::::::::: # Spread the correlation coefficient value long.format %>% cor_spread(value = "cor") # Spread the p-value long.format %>% cor_spread(value = "p")
Combines correlation coefficients and significance levels in a correlation matrix data.
cor_mark_significant( x, cutpoints = c(0, 1e-04, 0.001, 0.01, 0.05, 1), symbols = c("****", "***", "**", "*", "") )
cor_mark_significant( x, cutpoints = c(0, 1e-04, 0.001, 0.01, 0.05, 1), symbols = c("****", "***", "**", "*", "") )
x |
an object of class |
cutpoints |
numeric vector used for intervals. |
symbols |
character vector, one shorter than cutpoints, used as significance symbols. |
a data frame containing the lower triangular part of the correlation matrix marked by significance symbols.
mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_mat() %>% cor_mark_significant()
mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_mat() %>% cor_mark_significant()
Compute correlation matrix with p-values. Numeric columns in the data are detected and automatically selected for the analysis. You can also specify variables of interest to be used in the correlation analysis.
cor_mat( data, ..., vars = NULL, method = "pearson", alternative = "two.sided", conf.level = 0.95 ) cor_pmat( data, ..., vars = NULL, method = "pearson", alternative = "two.sided", conf.level = 0.95 ) cor_get_pval(x)
cor_mat( data, ..., vars = NULL, method = "pearson", alternative = "two.sided", conf.level = 0.95 ) cor_pmat( data, ..., vars = NULL, method = "pearson", alternative = "two.sided", conf.level = 0.95 ) cor_get_pval(x)
data |
a data.frame containing the variables. |
... |
One or more unquoted expressions (or variable names) separated by commas. Used to select a variable of interest. |
vars |
a character vector containing the variable names of interest. |
method |
a character string indicating which correlation
coefficient is to be used for the test. One of |
alternative |
indicates the alternative hypothesis and must be
one of |
conf.level |
confidence level for the returned confidence interval. Currently only used for the Pearson product moment correlation coefficient if there are at least 4 complete pairs of observations. |
x |
an object of class |
a data frame
cor_mat()
: compute correlation matrix with p-values. Returns a data
frame containing the matrix of the correlation coefficients. The output has
an attribute named "pvalue", which contains the matrix of the correlation
test p-values.
cor_pmat()
: compute the correlation matrix but returns only the p-values of the tests.
cor_get_pval()
: extract a correlation matrix p-values from an object of
class cor_mat()
. P-values are not adjusted.
cor_test()
, cor_reorder()
,
cor_gather()
, cor_select()
,
cor_as_symbols()
, pull_triangle()
,
replace_triangle()
# Data preparation #::::::::::::::::::::::::::::::::::::::::::: mydata <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) head(mydata, 3) # Compute correlation matrix #:::::::::::::::::::::::::::::::::::::::::: # Correlation matrix between all variables cor.mat <- mydata %>% cor_mat() cor.mat # Specify some variables of interest mydata %>% cor_mat(mpg, hp, wt) # Or remove some variables in the data # before the analysis mydata %>% cor_mat(-mpg, -hp) # Significance levels #:::::::::::::::::::::::::::::::::::::::::: cor.mat %>% cor_get_pval() # Visualize #:::::::::::::::::::::::::::::::::::::::::: # Insignificant correlations are marked by crosses cor.mat %>% cor_reorder() %>% pull_lower_triangle() %>% cor_plot(label = TRUE) # Gather/collapse correlation matrix into long format #:::::::::::::::::::::::::::::::::::::::::: cor.mat %>% cor_gather()
# Data preparation #::::::::::::::::::::::::::::::::::::::::::: mydata <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) head(mydata, 3) # Compute correlation matrix #:::::::::::::::::::::::::::::::::::::::::: # Correlation matrix between all variables cor.mat <- mydata %>% cor_mat() cor.mat # Specify some variables of interest mydata %>% cor_mat(mpg, hp, wt) # Or remove some variables in the data # before the analysis mydata %>% cor_mat(-mpg, -hp) # Significance levels #:::::::::::::::::::::::::::::::::::::::::: cor.mat %>% cor_get_pval() # Visualize #:::::::::::::::::::::::::::::::::::::::::: # Insignificant correlations are marked by crosses cor.mat %>% cor_reorder() %>% pull_lower_triangle() %>% cor_plot(label = TRUE) # Gather/collapse correlation matrix into long format #:::::::::::::::::::::::::::::::::::::::::: cor.mat %>% cor_gather()
Provide a tibble-friendly framework to visualize a correlation
matrix. Wrapper around the R base function
corrplot()
. Compared to
corrplot()
, it can handle directly the output of the
functions cor_mat() (in rstatix)
, rcorr() (in Hmisc)
,
correlate() (in corrr)
and cor() (in stats)
.
The p-values contained in the outputs of the functions
cor_mat()
and rcorr()
are automatically detected and
used in the visualization.
cor_plot( cor.mat, method = "circle", type = "full", palette = NULL, p.mat = NULL, significant.level = 0.05, insignificant = c("cross", "blank"), label = FALSE, font.label = list(), ... )
cor_plot( cor.mat, method = "circle", type = "full", palette = NULL, p.mat = NULL, significant.level = 0.05, insignificant = c("cross", "blank"), label = FALSE, font.label = list(), ... )
cor.mat |
the correlation matrix to visualize |
method |
Character, the visualization method of correlation matrix to be
used. Currently, it supports seven methods, named The areas of circles or squares show the absolute value of corresponding
correlation coefficients. Method |
type |
Character, |
palette |
character vector containing the color palette. |
p.mat |
matrix of p-value corresponding to the correlation matrix. |
significant.level |
significant level, if the p-value is bigger than
|
insignificant |
character, specialized insignificant correlation coefficients, "cross" (default), "blank". If "blank", wipe away the corresponding glyphs; if "cross", add crosses (X) on corresponding glyphs. |
label |
logical value. If TRUE, shows the correlation coefficient labels. |
font.label |
a list with one or more of the following elements: size
(e.g., 1), color (e.g., "black") and style (e.g., "bold"). Used to
customize the correlation coefficient labels. For example |
... |
additional options not listed (i.e. "tl.cex") here to pass to corrplot. |
# Compute correlation matrix #:::::::::::::::::::::::::::::::::::::::::: cor.mat <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_mat() # Visualize correlation matrix #:::::::::::::::::::::::::::::::::::::::::: # Full correlation matrix, # insignificant correlations are marked by crosses cor.mat %>% cor_plot() # Reorder by correlation coefficient # pull lower triangle and visualize cor.lower.tri <- cor.mat %>% cor_reorder() %>% pull_lower_triangle() cor.lower.tri %>% cor_plot() # Change visualization methods #:::::::::::::::::::::::::::::::::::::::::: cor.lower.tri %>% cor_plot(method = "pie") cor.lower.tri %>% cor_plot(method = "color") cor.lower.tri %>% cor_plot(method = "number") # Show the correlation coefficient: label = TRUE # Blank the insignificant correlation #:::::::::::::::::::::::::::::::::::::::::: cor.lower.tri %>% cor_plot( method = "color", label = TRUE, insignificant = "blank" ) # Change the color palettes #:::::::::::::::::::::::::::::::::::::::::: # Using custom color palette # Require ggpubr: install.packages("ggpubr") if(require("ggpubr")){ my.palette <- get_palette(c("red", "white", "blue"), 200) cor.lower.tri %>% cor_plot(palette = my.palette) } # Using RcolorBrewer color palette if(require("ggpubr")){ my.palette <- get_palette("PuOr", 200) cor.lower.tri %>% cor_plot(palette = my.palette) }
# Compute correlation matrix #:::::::::::::::::::::::::::::::::::::::::: cor.mat <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_mat() # Visualize correlation matrix #:::::::::::::::::::::::::::::::::::::::::: # Full correlation matrix, # insignificant correlations are marked by crosses cor.mat %>% cor_plot() # Reorder by correlation coefficient # pull lower triangle and visualize cor.lower.tri <- cor.mat %>% cor_reorder() %>% pull_lower_triangle() cor.lower.tri %>% cor_plot() # Change visualization methods #:::::::::::::::::::::::::::::::::::::::::: cor.lower.tri %>% cor_plot(method = "pie") cor.lower.tri %>% cor_plot(method = "color") cor.lower.tri %>% cor_plot(method = "number") # Show the correlation coefficient: label = TRUE # Blank the insignificant correlation #:::::::::::::::::::::::::::::::::::::::::: cor.lower.tri %>% cor_plot( method = "color", label = TRUE, insignificant = "blank" ) # Change the color palettes #:::::::::::::::::::::::::::::::::::::::::: # Using custom color palette # Require ggpubr: install.packages("ggpubr") if(require("ggpubr")){ my.palette <- get_palette(c("red", "white", "blue"), 200) cor.lower.tri %>% cor_plot(palette = my.palette) } # Using RcolorBrewer color palette if(require("ggpubr")){ my.palette <- get_palette("PuOr", 200) cor.lower.tri %>% cor_plot(palette = my.palette) }
reorder correlation matrix, according to the coefficients, using the hierarchical clustering method.
cor_reorder(x)
cor_reorder(x)
x |
a correlation matrix. Particularly, an object of class |
a data frame
cor_mat()
, cor_gather()
, cor_spread()
# Compute correlation matrix #:::::::::::::::::::::::::::::::::::::::::: cor.mat <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_mat() # Reorder by correlation and get p-values #:::::::::::::::::::::::::::::::::::::::::: # Reorder cor.mat %>% cor_reorder() # Get p-values of the reordered cor_mat cor.mat %>% cor_reorder() %>% cor_get_pval()
# Compute correlation matrix #:::::::::::::::::::::::::::::::::::::::::: cor.mat <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_mat() # Reorder by correlation and get p-values #:::::::::::::::::::::::::::::::::::::::::: # Reorder cor.mat %>% cor_reorder() # Get p-values of the reordered cor_mat cor.mat %>% cor_reorder() %>% cor_get_pval()
Subset Correlation Matrix
cor_select(x, ..., vars = NULL)
cor_select(x, ..., vars = NULL)
x |
a correlation matrix. Particularly, an object of class |
... |
One or more unquoted expressions (or variable names) separated by commas. Used to select variables of interest. |
vars |
a character vector containing the variable names of interest. |
a data frame
cor_mat()
, pull_triangle()
, replace_triangle()
# Compute correlation matrix #:::::::::::::::::::::::::::::::::::::::::: cor.mat <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_mat() # Subsetting correlation matrix #:::::::::::::::::::::::::::::::::::::::::: # Select some variables of interest cor.mat %>% cor_select(mpg, drat, wt) # Remove variables cor.mat %>% cor_select(-mpg, -wt)
# Compute correlation matrix #:::::::::::::::::::::::::::::::::::::::::: cor.mat <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_mat() # Subsetting correlation matrix #:::::::::::::::::::::::::::::::::::::::::: # Select some variables of interest cor.mat %>% cor_select(mpg, drat, wt) # Remove variables cor.mat %>% cor_select(-mpg, -wt)
Provides a pipe-friendly framework to perform correlation test
between paired samples, using Pearson, Kendall or Spearman method. Wrapper
around the function cor.test()
.
Can also performs multiple pairwise correlation analyses between more than two variables or between two different vectors of variables. Using this function, you can also compute, for example, the correlation between one variable vs many.
cor_test( data, ..., vars = NULL, vars2 = NULL, alternative = "two.sided", method = "pearson", conf.level = 0.95, use = "pairwise.complete.obs" )
cor_test( data, ..., vars = NULL, vars2 = NULL, alternative = "two.sided", method = "pearson", conf.level = 0.95, use = "pairwise.complete.obs" )
data |
a data.frame containing the variables. |
... |
One or more unquoted expressions (or variable names) separated by
commas. Used to select a variable of interest. Alternative to the argument
|
vars |
optional character vector containing variable names for correlation analysis. Ignored when dot vars are specified.
. Accept unquoted
variable names: |
vars2 |
optional character vector. If specified, each element in
|
alternative |
indicates the alternative hypothesis and must be
one of |
method |
a character string indicating which correlation
coefficient is to be used for the test. One of |
conf.level |
confidence level for the returned confidence interval. Currently only used for the Pearson product moment correlation coefficient if there are at least 4 complete pairs of observations. |
use |
an optional character string giving a
method for computing covariances in the presence
of missing values. This must be (an abbreviation of) one of the strings
|
return a data frame with the following columns:
var1, var2
: the variables used in the correlation test.
cor
: the correlation coefficient.
statistic
: Test
statistic used to compute the p-value.
p
: p-value.
conf.low,conf.high
: Lower and upper bounds on a confidence interval.
method
: the method used to compute the statistic.
cor_test()
: correlation test between two or more variables.
cor_mat()
, as_cor_mat()
# Correlation between the specified variable vs # the remaining numeric variables in the data #::::::::::::::::::::::::::::::::::::::::: mtcars %>% cor_test(mpg) # Correlation test between two variables #::::::::::::::::::::::::::::::::::::::::: mtcars %>% cor_test(wt, mpg) # Pairwise correlation between multiple variables #::::::::::::::::::::::::::::::::::::::::: mtcars %>% cor_test(wt, mpg, disp) # Grouped data #::::::::::::::::::::::::::::::::::::::::: iris %>% group_by(Species) %>% cor_test(Sepal.Width, Sepal.Length) # Multiple correlation test #::::::::::::::::::::::::::::::::::::::::: # Correlation between one variable vs many mtcars %>% cor_test( vars = "mpg", vars2 = c("disp", "hp", "drat") ) # Correlation between two vectors of variables # Each element in vars is tested against all elements in vars2 mtcars %>% cor_test( vars = c("mpg", "wt"), vars2 = c("disp", "hp", "drat") )
# Correlation between the specified variable vs # the remaining numeric variables in the data #::::::::::::::::::::::::::::::::::::::::: mtcars %>% cor_test(mpg) # Correlation test between two variables #::::::::::::::::::::::::::::::::::::::::: mtcars %>% cor_test(wt, mpg) # Pairwise correlation between multiple variables #::::::::::::::::::::::::::::::::::::::::: mtcars %>% cor_test(wt, mpg, disp) # Grouped data #::::::::::::::::::::::::::::::::::::::::: iris %>% group_by(Species) %>% cor_test(Sepal.Width, Sepal.Length) # Multiple correlation test #::::::::::::::::::::::::::::::::::::::::: # Correlation between one variable vs many mtcars %>% cor_test( vars = "mpg", vars2 = c("disp", "hp", "drat") ) # Correlation between two vectors of variables # Each element in vars is tested against all elements in vars2 mtcars %>% cor_test( vars = c("mpg", "wt"), vars2 = c("disp", "hp", "drat") )
converts a contingency table or a data frame of counts into a data frame of individual observations.
counts_to_cases(x, count.col = "Freq")
counts_to_cases(x, count.col = "Freq")
x |
a contingency table or a data frame |
count.col |
the name of the column containing the counts. Default is "Freq". |
a data frame of cases
# Create a cross-tabulation demo data #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% xtab <- as.table( rbind(c(20, 5), c(16,9)) ) dimnames(xtab) <- list( before = c("non.smoker", "smoker"), after = c("non.smoker", "smoker") ) xtab # Convert into a data frame of cases #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% df <- counts_to_cases(xtab) head(df)
# Create a cross-tabulation demo data #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% xtab <- as.table( rbind(c(20, 5), c(16,9)) ) dimnames(xtab) <- list( before = c("non.smoker", "smoker"), after = c("non.smoker", "smoker") ) xtab # Convert into a data frame of cases #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% df <- counts_to_cases(xtab) head(df)
Compute Cramer's V, which measures the strength of the association between categorical variables.
cramer_v(x, y = NULL, correct = TRUE, ...)
cramer_v(x, y = NULL, correct = TRUE, ...)
x |
a numeric vector or matrix. |
y |
a numeric vector; ignored if |
correct |
a logical indicating whether to apply continuity
correction when computing the test statistic for 2 by 2 tables: one
half is subtracted from all |
... |
other arguments passed to the function
|
# Data preparation df <- as.table(rbind(c(762, 327, 468), c(484, 239, 477))) dimnames(df) <- list( gender = c("F", "M"), party = c("Democrat","Independent", "Republican") ) df # Compute cramer's V cramer_v(df)
# Data preparation df <- as.table(rbind(c(762, 327, 468), c(484, 239, 477))) dimnames(df) <- list( gender = c("F", "M"), party = c("Democrat","Independent", "Republican") ) df # Compute cramer's V cramer_v(df)
Order the rows of a data frame by values of specified columns.
Wrapper arround the arrange()
function. Supports
standard and non standard evaluation.
df_arrange(data, ..., vars = NULL, .by_group = FALSE)
df_arrange(data, ..., vars = NULL, .by_group = FALSE)
data |
a data frame |
... |
One or more unquoted expressions (or variable names) separated by
commas. Used to select a variable of interest. Use
|
vars |
a character vector containing the variable names of interest. |
.by_group |
If TRUE, will sort first by grouping variable. Applies to grouped data frames only. |
a data frame
df <- head(ToothGrowth) df # Select column using standard evaluation df %>% df_arrange(vars = c("dose", "len")) # Select column using non-standard evaluation df %>% df_arrange(dose, desc(len))
df <- head(ToothGrowth) df # Select column using standard evaluation df %>% df_arrange(vars = c("dose", "len")) # Select column using non-standard evaluation df %>% df_arrange(dose, desc(len))
Returns user specified variable names. Supports standard and non standard evaluation.
df_get_var_names(data, ..., vars = NULL)
df_get_var_names(data, ..., vars = NULL)
data |
a data frame |
... |
One or more unquoted expressions (or variable names) separated by commas. Used to select a variable of interest. |
vars |
a character vector containing the variable names of interest. |
a character vector
# Non standard evaluation ToothGrowth %>% df_get_var_names(dose, len) # Standard evaluation ToothGrowth %>% df_get_var_names(vars = c("len", "dose"))
# Non standard evaluation ToothGrowth %>% df_get_var_names(dose, len) # Standard evaluation ToothGrowth %>% df_get_var_names(vars = c("len", "dose"))
Group a data frame by one or more variables. Supports standard and non standard evaluation.
df_group_by(data, ..., vars = NULL)
df_group_by(data, ..., vars = NULL)
data |
a data frame |
... |
One or more unquoted expressions (or variable names) separated by commas. Used to select a variable of interest. |
vars |
a character vector containing the variable names of interest. |
# Non standard evaluation by_dose <- head(ToothGrowth) %>% df_group_by(dose) by_dose # Standard evaluation head(ToothGrowth) %>% df_group_by(vars = c("dose", "supp"))
# Non standard evaluation by_dose <- head(ToothGrowth) %>% df_group_by(dose) by_dose # Standard evaluation head(ToothGrowth) %>% df_group_by(vars = c("dose", "supp"))
Functions to label data frame rows by one or multiple grouping variables.
df_label_both(data, ..., vars = NULL, label_col = "label", sep = c(", ", ":")) df_label_value(data, ..., vars = NULL, label_col = "label", sep = ", ")
df_label_both(data, ..., vars = NULL, label_col = "label", sep = c(", ", ":")) df_label_value(data, ..., vars = NULL, label_col = "label", sep = ", ")
data |
a data frame |
... |
One or more unquoted expressions (or variable names) separated by commas. Used as grouping variables. |
vars |
a character vector containing the grouping variables of interest. |
label_col |
column to hold the label of the data subsets. Default column name is "label". |
sep |
String separating labelling variables and values. Should be of
length 2 in the function |
a modified data frame with a column containing row labels.
df_label_both()
: Displays both the variable name and the factor value.
df_label_value()
: Displays only the value of a factor.
# Data preparation df <- head(ToothGrowth) # Labelling: Non standard evaluation df %>% df_label_both(dose, supp) # Standard evaluation df %>% df_label_both(dose, supp) # Nesting the data then label each subset by groups ToothGrowth %>% df_nest_by(dose, supp) %>% df_label_both(supp, dose)
# Data preparation df <- head(ToothGrowth) # Labelling: Non standard evaluation df %>% df_label_both(dose, supp) # Standard evaluation df %>% df_label_both(dose, supp) # Nesting the data then label each subset by groups ToothGrowth %>% df_nest_by(dose, supp) %>% df_label_both(supp, dose)
Nest a tibble data frame using grouping specification. Supports standard and non standard evaluation.
df_nest_by(data, ..., vars = NULL)
df_nest_by(data, ..., vars = NULL)
data |
a data frame |
... |
One or more unquoted expressions (or variable names) separated by commas. Used as grouping variables. |
vars |
a character vector containing the grouping variables of interest. |
A tbl with one row per unique combination of the grouping variables. The first columns are the grouping variables, followed by a list column of tibbles with matching rows of the remaining columns.
# Non standard evaluation ToothGrowth %>% df_nest_by(dose, supp) # Standard evaluation ToothGrowth %>% df_nest_by(vars = c("dose", "supp"))
# Non standard evaluation ToothGrowth %>% df_nest_by(dose, supp) # Standard evaluation ToothGrowth %>% df_nest_by(vars = c("dose", "supp"))
A wrapper around the select()
function for
selection data frame columns. Supports standard and non standard
evaluations. Usefull to easily program with dplyr
df_select(data, ..., vars = NULL)
df_select(data, ..., vars = NULL)
data |
a data frame |
... |
One or more unquoted expressions (or variable names) separated by commas. Used to select a variable of interest. |
vars |
a character vector containing the variable names of interest. |
a data frame
df <- head(ToothGrowth) df # Select column using standard evaluation df %>% df_select(vars = c("dose", "len")) # Select column using non-standard evaluation df %>% df_select(dose, len)
df <- head(ToothGrowth) df # Select column using standard evaluation df %>% df_select(vars = c("dose", "len")) # Select column using non-standard evaluation df %>% df_select(dose, len)
Split a data frame by groups into subsets or data panel. Very
similar to the function df_nest_by()
. The only difference is
that, it adds label to each data subset. Labels are the combination of the
grouping variable levels. The column holding labels are named "label".
df_split_by( data, ..., vars = NULL, label_col = "label", labeller = df_label_both, sep = c(", ", ":") )
df_split_by( data, ..., vars = NULL, label_col = "label", labeller = df_label_both, sep = c(", ", ":") )
data |
a data frame |
... |
One or more unquoted expressions (or variable names) separated by commas. Used as grouping variables. |
vars |
a character vector containing the grouping variables of interest. |
label_col |
column to hold the label of the data subsets. Default column name is "label". |
labeller |
A function that takes a data frame, the grouping variables,
label_col and label_sep arguments, and add labels into the data frame.
Example of possible values are: |
sep |
String separating labelling variables and values. Should be of
length 2 in the function |
A tbl with one row per unique combination of the grouping variables. The first columns are the grouping variables, followed by a list column of tibbles with matching rows of the remaining columns, and a column named label, containing labels.
# Split a data frame # ::::::::::::::::::::::::::::::::::::::::::::::::: # Create a grouped data res <- ToothGrowth %>% df_split_by(dose, supp) res # Show subsets res$data # Add panel/subset labels res <- ToothGrowth %>% df_split_by(dose, supp) res
# Split a data frame # ::::::::::::::::::::::::::::::::::::::::::::::::: # Create a grouped data res <- ToothGrowth %>% df_split_by(dose, supp) res # Show subsets res$data # Add panel/subset labels res <- ToothGrowth %>% df_split_by(dose, supp) res
Paste together multiple columns into one. Wrapper arround
unite()
that supports standard and non standard
evaluation.
df_unite(data, col, ..., vars = NULL, sep = "_", remove = TRUE, na.rm = FALSE) df_unite_factors( data, col, ..., vars = NULL, sep = "_", remove = TRUE, na.rm = FALSE )
df_unite(data, col, ..., vars = NULL, sep = "_", remove = TRUE, na.rm = FALSE) df_unite_factors( data, col, ..., vars = NULL, sep = "_", remove = TRUE, na.rm = FALSE )
data |
a data frame |
col |
the name of the new column as a string or a symbol. |
... |
a selection of columns. One or more unquoted expressions (or variable names) separated by commas. |
vars |
a character vector containing the column names of interest. |
sep |
Separator to use between values. |
remove |
If |
na.rm |
If |
df_unite()
: Unite multiple columns into one.
df_unite_factors()
: Unite factor columns. First, order factors levels then
merge them into one column. The output column is a factor.
# Non standard evaluation head(ToothGrowth) %>% df_unite(col = "dose_supp", dose, supp) # Standard evaluation head(ToothGrowth) %>% df_unite(col = "dose_supp", vars = c("dose", "supp"))
# Non standard evaluation head(ToothGrowth) %>% df_unite(col = "dose_supp", dose, supp) # Standard evaluation head(ToothGrowth) %>% df_unite(col = "dose_supp", vars = c("dose", "supp"))
Provides a flexible alternative to the dplyr:do()
function.
Technically it uses nest() + mutate() + map()
to apply arbitrary
computation to a grouped data frame.
The output is a data frame. If the applied function returns a data frame, then the output will be automatically unnested. Otherwise, the output includes the grouping variables and a column named ".results." (by default), which is a "list-columns" containing the results for group combinations.
doo(data, .f, ..., result = ".results.")
doo(data, .f, ..., result = ".results.")
data |
a (grouped) data frame |
.f |
A function, formula, or atomic vector. For example
|
... |
Additional arguments passed on to .f |
result |
the column name to hold the results. Default is ".results.". |
a data frame
# Custom function #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% stat_test <- function(data, formula){ t.test(formula, data) %>% tidy() } # Example 1: pipe-friendly stat_test(). # Two possibilities of usage are available #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Use this ToothGrowth %>% group_by(dose) %>% doo(~stat_test(data =., len ~ supp)) # Or this ToothGrowth %>% group_by(dose) %>% doo(stat_test, len ~ supp) # Example 2: R base function t.test() (not pipe friendly) # One possibility of usage #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% comparisons <- ToothGrowth %>% group_by(dose) %>% doo(~t.test(len ~ supp, data =.)) comparisons comparisons$.results. # Example 3: R base function combined with tidy() #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ToothGrowth %>% group_by(dose) %>% doo(~t.test(len ~ supp, data =.) %>% tidy())
# Custom function #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% stat_test <- function(data, formula){ t.test(formula, data) %>% tidy() } # Example 1: pipe-friendly stat_test(). # Two possibilities of usage are available #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Use this ToothGrowth %>% group_by(dose) %>% doo(~stat_test(data =., len ~ supp)) # Or this ToothGrowth %>% group_by(dose) %>% doo(stat_test, len ~ supp) # Example 2: R base function t.test() (not pipe friendly) # One possibility of usage #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% comparisons <- ToothGrowth %>% group_by(dose) %>% doo(~t.test(len ~ supp, data =.)) comparisons comparisons$.results. # Example 3: R base function combined with tidy() #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ToothGrowth %>% group_by(dose) %>% doo(~t.test(len ~ supp, data =.) %>% tidy())
Performs Dunn's test for pairwise multiple comparisons of the ranked data. The mean rank of the different groups is compared. Used for post-hoc test following Kruskal-Wallis test.
The default of the rstatix::dunn_test()
function is to perform a
two-sided Dunn test like the well known commercial softwares, such as SPSS
and GraphPad. This is not the case for some other R packages
(dunn.test
and jamovi
), where the default is to perform
one-sided test. This discrepancy is documented at
https://github.com/kassambara/rstatix/issues/50.
dunn_test(data, formula, p.adjust.method = "holm", detailed = FALSE)
dunn_test(data, formula, p.adjust.method = "holm", detailed = FALSE)
data |
a data.frame containing the variables in the formula. |
formula |
a formula of the form |
p.adjust.method |
method to adjust p values for multiple comparisons. Used when pairwise comparisons are performed. Allowed values include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". If you don't want to adjust the p value (not recommended), use p.adjust.method = "none". |
detailed |
logical value. Default is FALSE. If TRUE, a detailed result is shown. |
DunnTest performs the post hoc pairwise multiple comparisons procedure appropriate to follow up a Kruskal-Wallis test, which is a non-parametric analog of the one-way ANOVA. The Wilcoxon rank sum test, itself a non-parametric analog of the unpaired t-test, is possibly intuitive, but inappropriate as a post hoc pairwise test, because (1) it fails to retain the dependent ranking that produced the Kruskal-Wallis test statistic, and (2) it does not incorporate the pooled variance estimate implied by the null hypothesis of the Kruskal-Wallis test.
return a data frame with some of the following columns:
.y.
: the y (outcome) variable used in the test.
group1,group2
: the compared groups in the pairwise tests.
n1,n2
: Sample counts.
estimate
: mean ranks difference.
estimate1, estimate2
: show the mean rank values of the two
groups, respectively.
statistic
: Test statistic (z-value) used
to compute the p-value.
p
: p-value.
p.adj
: the
adjusted p-value.
method
: the statistical test used to compare
groups.
p.signif, p.adj.signif
: the significance level of
p-values and adjusted p-values, respectively.
The returned object has an attribute called args, which is a list holding the test arguments.
Dunn, O. J. (1964) Multiple comparisons using rank sums Technometrics, 6(3):241-252.
# Simple test ToothGrowth %>% dunn_test(len ~ dose) # Grouped data ToothGrowth %>% group_by(supp) %>% dunn_test(len ~ dose)
# Simple test ToothGrowth %>% dunn_test(len ~ dose) # Grouped data ToothGrowth %>% group_by(supp) %>% dunn_test(len ~ dose)
Performs pairwise comparisons between groups using the estimated
marginal means. Pipe-friendly wrapper arround the functions emmans() +
contrast()
from the emmeans
package, which need to be installed
before using this function. This function is useful for performing post-hoc
analyses following ANOVA/ANCOVA tests.
emmeans_test( data, formula, covariate = NULL, ref.group = NULL, comparisons = NULL, p.adjust.method = "bonferroni", conf.level = 0.95, model = NULL, detailed = FALSE ) get_emmeans(emmeans.test)
emmeans_test( data, formula, covariate = NULL, ref.group = NULL, comparisons = NULL, p.adjust.method = "bonferroni", conf.level = 0.95, model = NULL, detailed = FALSE ) get_emmeans(emmeans.test)
data |
a data.frame containing the variables in the formula. |
formula |
a formula of the form |
covariate |
(optional) covariate names (for ANCOVA) |
ref.group |
a character string specifying the reference group. If specified, for a given grouping variable, each of the group levels will be compared to the reference group (i.e. control group). If |
comparisons |
A list of length-2 vectors specifying the groups of
interest to be compared. For example to compare groups "A" vs "B" and "B" vs
"C", the argument is as follow: |
p.adjust.method |
method to adjust p values for multiple comparisons. Used when pairwise comparisons are performed. Allowed values include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". If you don't want to adjust the p value (not recommended), use p.adjust.method = "none". |
conf.level |
confidence level of the interval. |
model |
a fitted-model objects such as the result of a call to
|
detailed |
logical value. Default is FALSE. If TRUE, a detailed result is shown. |
emmeans.test |
an object of class |
return a data frame with some the following columns:
.y.
: the y variable used in the test.
group1,group2
: the
compared groups in the pairwise tests.
statistic
: Test
statistic (t.ratio) used to compute the p-value.
df
: degrees of
freedom.
p
: p-value.
p.adj
: the adjusted p-value.
method
: the statistical test used to compare groups.
p.signif, p.adj.signif
: the significance level of p-values and
adjusted p-values, respectively.
estimate
: estimate of the
effect size, that is the difference between the two emmeans (estimated
marginal means).
conf.low,conf.high
: Lower and upper bound on a
confidence interval of the estimate.
The returned object has an attribute called args, which is a list holding the test arguments. It has also an attribute named "emmeans", a data frame containing the groups emmeans.
get_emmeans()
: returns the estimated marginal means from an object of class emmeans_test
# Data preparation df <- ToothGrowth df$dose <- as.factor(df$dose) # Pairwise comparisons res <- df %>% group_by(supp) %>% emmeans_test(len ~ dose, p.adjust.method = "bonferroni") res # Display estimated marginal means attr(res, "emmeans") # Show details df %>% group_by(supp) %>% emmeans_test(len ~ dose, p.adjust.method = "bonferroni", detailed = TRUE)
# Data preparation df <- ToothGrowth df$dose <- as.factor(df$dose) # Pairwise comparisons res <- df %>% group_by(supp) %>% emmeans_test(len ~ dose, p.adjust.method = "bonferroni") res # Display estimated marginal means attr(res, "emmeans") # Show details df %>% group_by(supp) %>% emmeans_test(len ~ dose, p.adjust.method = "bonferroni", detailed = TRUE)
Compute eta-squared and partial eta-squared for all terms in an ANOVA model.
eta_squared(model) partial_eta_squared(model)
eta_squared(model) partial_eta_squared(model)
model |
an object of class aov or anova. |
a numeric vector with the effect size statistics
eta_squared()
: compute eta squared
partial_eta_squared()
: compute partial eta squared.
# Data preparation df <- ToothGrowth df$dose <- as.factor(df$dose) # Compute ANOVA res.aov <- aov(len ~ supp*dose, data = df) summary(res.aov) # Effect size eta_squared(res.aov) partial_eta_squared(res.aov)
# Data preparation df <- ToothGrowth df$dose <- as.factor(df$dose) # Compute ANOVA res.aov <- aov(len ~ supp*dose, data = df) summary(res.aov) # Effect size eta_squared(res.aov) partial_eta_squared(res.aov)
Provides helper functions to build factorial design for easily
computing ANOVA using the Anova()
function. This might be
very useful for repeated measures ANOVA, which is hard to set up with the
car
package.
factorial_design(data, dv, wid, between, within, covariate)
factorial_design(data, dv, wid, between, within, covariate)
data |
a data frame containing the variables |
dv |
(numeric) dependent variable name. |
wid |
(factor) column name containing individuals/subjects identifier. Should be unique per individual. |
between |
(optional) between-subject factor variables. |
within |
(optional) within-subjects factor variables |
covariate |
(optional) covariate names (for ANCOVA) |
a list with the following components:
the
specified arguments: dv, wid, between, within
data: the original data (long format) or independent ANOVA. The wide format is returned for repeated measures ANOVA.
idata: an optional data frame giving the levels of factors defining the intra-subject model for multivariate repeated-measures data.
idesign: a one-sided model formula using the “data” in idata and specifying the intra-subject design.
repeated: logical. Value is TRUE when the data is a repeated design.
lm_formula: the formula used to build the
lm
model.
lm_data: the data used to build the lm
model. Can be either in a long format (i.e., the original data for
independent measures ANOVA) or in a wide format (case of repeated measures ANOVA).
model: the lm
model
Alboukadel Kassambara, [email protected]
anova_test()
, anova_summary()
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth head(df) # Repeated measures designs #::::::::::::::::::::::::::::::::::::::::: # Prepare the data df$id <- rep(1:10, 6) # Add individuals id head(df) # Build factorial designs design <- factorial_design(df, dv = len, wid = id, within = c(supp, dose)) design # Easily perform repeated measures ANOVA using the car package res.anova <- Anova(design$model, idata = design$idata, idesign = design$idesign, type = 3) summary(res.anova, multivariate = FALSE) # Independent measures designs #::::::::::::::::::::::::::::::::::::::::: # Build factorial designs df$id <- 1:nrow(df) design <- factorial_design(df, dv = len, wid = id, between = c(supp, dose)) design # Perform ANOVA Anova(design$model, type = 3)
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth head(df) # Repeated measures designs #::::::::::::::::::::::::::::::::::::::::: # Prepare the data df$id <- rep(1:10, 6) # Add individuals id head(df) # Build factorial designs design <- factorial_design(df, dv = len, wid = id, within = c(supp, dose)) design # Easily perform repeated measures ANOVA using the car package res.anova <- Anova(design$model, idata = design$idata, idesign = design$idesign, type = 3) summary(res.anova, multivariate = FALSE) # Independent measures designs #::::::::::::::::::::::::::::::::::::::::: # Build factorial designs df$id <- 1:nrow(df) design <- factorial_design(df, dv = len, wid = id, between = c(supp, dose)) design # Perform ANOVA Anova(design$model, type = 3)
Performs Fisher's exact test for testing the null of independence of rows and columns in a contingency table.
Wrappers around the R base function fisher.test()
but
have the advantage of performing pairwise and row-wise fisher tests, the
post-hoc tests following a significant chi-square test of homogeneity for 2xc
and rx2 contingency tables.
fisher_test( xtab, workspace = 2e+05, alternative = "two.sided", conf.int = TRUE, conf.level = 0.95, simulate.p.value = FALSE, B = 2000, detailed = FALSE, ... ) pairwise_fisher_test(xtab, p.adjust.method = "holm", detailed = FALSE, ...) row_wise_fisher_test(xtab, p.adjust.method = "holm", detailed = FALSE, ...)
fisher_test( xtab, workspace = 2e+05, alternative = "two.sided", conf.int = TRUE, conf.level = 0.95, simulate.p.value = FALSE, B = 2000, detailed = FALSE, ... ) pairwise_fisher_test(xtab, p.adjust.method = "holm", detailed = FALSE, ...) row_wise_fisher_test(xtab, p.adjust.method = "holm", detailed = FALSE, ...)
xtab |
a contingency table in a matrix form. |
workspace |
an integer specifying the size of the workspace
used in the network algorithm. In units of 4 bytes. Only used for
non-simulated p-values larger than |
alternative |
indicates the alternative hypothesis and must be
one of |
conf.int |
logical indicating if a confidence interval for the
odds ratio in a |
conf.level |
confidence level for the returned confidence
interval. Only used in the |
simulate.p.value |
a logical indicating whether to compute
p-values by Monte Carlo simulation, in larger than |
B |
an integer specifying the number of replicates used in the Monte Carlo test. |
detailed |
logical value. Default is FALSE. If TRUE, a detailed result is shown. |
... |
Other arguments passed to the function |
p.adjust.method |
method to adjust p values for multiple comparisons. Used when pairwise comparisons are performed. Allowed values include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". If you don't want to adjust the p value (not recommended), use p.adjust.method = "none". |
return a data frame with some the following columns:
group
: the categories in the row-wise proportion tests.
p
: p-value.
p.adj
: the adjusted p-value.
method
: the used statistical test.
p.signif,
p.adj.signif
: the significance level of p-values and adjusted p-values,
respectively.
estimate
: an estimate of the odds ratio. Only
present in the 2 by 2 case.
alternative
: a character string
describing the alternative hypothesis.
conf.low,conf.high
: a
confidence interval for the odds ratio. Only present in the 2 by 2 case and
if argument conf.int = TRUE.
The returned object has an attribute called args, which is a list holding the test arguments.
fisher_test()
: performs Fisher's exact test for testing the null of
independence of rows and columns in a contingency table with fixed
marginals. Wrapper around the function fisher.test()
.
pairwise_fisher_test()
: pairwise comparisons between proportions, a post-hoc
tests following a significant Fisher's exact test of homogeneity for 2xc
design.
row_wise_fisher_test()
: performs row-wise Fisher's exact test of count data, a post-hoc tests following a significant chi-square test
of homogeneity for rx2 contingency table. The test is conducted for each category (row).
# Comparing two proportions #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: frequencies of smokers between two groups xtab <- as.table(rbind(c(490, 10), c(400, 100))) dimnames(xtab) <- list( group = c("grp1", "grp2"), smoker = c("yes", "no") ) xtab # compare the proportion of smokers fisher_test(xtab, detailed = TRUE) # Homogeneity of proportions between groups #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # H0: the proportion of smokers is similar in the four groups # Ha: this proportion is different in at least one of the populations. # # Data preparation grp.size <- c( 106, 113, 156, 102 ) smokers <- c( 50, 100, 139, 80 ) no.smokers <- grp.size - smokers xtab <- as.table(rbind( smokers, no.smokers )) dimnames(xtab) <- list( Smokers = c("Yes", "No"), Groups = c("grp1", "grp2", "grp3", "grp4") ) xtab # Compare the proportions of smokers between groups fisher_test(xtab, detailed = TRUE) # Pairwise comparison between groups pairwise_fisher_test(xtab) # Pairwise proportion tests #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: Titanic xtab <- as.table(rbind( c(122, 167, 528, 673), c(203, 118, 178, 212) )) dimnames(xtab) <- list( Survived = c("No", "Yes"), Class = c("1st", "2nd", "3rd", "Crew") ) xtab # Compare the proportion of survived between groups pairwise_fisher_test(xtab) # Row-wise proportion tests #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: Titanic xtab <- as.table(rbind( c(180, 145), c(179, 106), c(510, 196), c(862, 23) )) dimnames(xtab) <- list( Class = c("1st", "2nd", "3rd", "Crew"), Gender = c("Male", "Female") ) xtab # Compare the proportion of males and females in each category row_wise_fisher_test(xtab) # A r x c table Agresti (2002, p. 57) Job Satisfaction Job <- matrix(c(1,2,1,0, 3,3,6,1, 10,10,14,9, 6,7,12,11), 4, 4, dimnames = list(income = c("< 15k", "15-25k", "25-40k", "> 40k"), satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS"))) fisher_test(Job) fisher_test(Job, simulate.p.value = TRUE, B = 1e5)
# Comparing two proportions #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: frequencies of smokers between two groups xtab <- as.table(rbind(c(490, 10), c(400, 100))) dimnames(xtab) <- list( group = c("grp1", "grp2"), smoker = c("yes", "no") ) xtab # compare the proportion of smokers fisher_test(xtab, detailed = TRUE) # Homogeneity of proportions between groups #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # H0: the proportion of smokers is similar in the four groups # Ha: this proportion is different in at least one of the populations. # # Data preparation grp.size <- c( 106, 113, 156, 102 ) smokers <- c( 50, 100, 139, 80 ) no.smokers <- grp.size - smokers xtab <- as.table(rbind( smokers, no.smokers )) dimnames(xtab) <- list( Smokers = c("Yes", "No"), Groups = c("grp1", "grp2", "grp3", "grp4") ) xtab # Compare the proportions of smokers between groups fisher_test(xtab, detailed = TRUE) # Pairwise comparison between groups pairwise_fisher_test(xtab) # Pairwise proportion tests #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: Titanic xtab <- as.table(rbind( c(122, 167, 528, 673), c(203, 118, 178, 212) )) dimnames(xtab) <- list( Survived = c("No", "Yes"), Class = c("1st", "2nd", "3rd", "Crew") ) xtab # Compare the proportion of survived between groups pairwise_fisher_test(xtab) # Row-wise proportion tests #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: Titanic xtab <- as.table(rbind( c(180, 145), c(179, 106), c(510, 196), c(862, 23) )) dimnames(xtab) <- list( Class = c("1st", "2nd", "3rd", "Crew"), Gender = c("Male", "Female") ) xtab # Compare the proportion of males and females in each category row_wise_fisher_test(xtab) # A r x c table Agresti (2002, p. 57) Job Satisfaction Job <- matrix(c(1,2,1,0, 3,3,6,1, 10,10,14,9, 6,7,12,11), 4, 4, dimnames = list(income = c("< 15k", "15-25k", "25-40k", "> 40k"), satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS"))) fisher_test(Job) fisher_test(Job, simulate.p.value = TRUE, B = 1e5)
compute frequency table.
freq_table(data, ..., vars = NULL, na.rm = TRUE)
freq_table(data, ..., vars = NULL, na.rm = TRUE)
data |
a data frame |
... |
One or more unquoted expressions (or variable names) separated by commas. Used to specify variables of interest. |
vars |
optional character vector containing variable names. |
na.rm |
logical value. If TRUE (default), remove missing values in the variables used to create the frequency table. |
a data frame
data("ToothGrowth") ToothGrowth %>% freq_table(supp, dose)
data("ToothGrowth") ToothGrowth %>% freq_table(supp, dose)
Compute the effect size estimate (referred to as w
) for
Friedman test: W = X2/N(K-1)
; where W
is the Kendall's W
value; X2
is the Friedman test statistic value; N
is the sample
size. k
is the number of measurements per subject.
The Kendall’s W coefficient assumes the value from 0 (indicating no relationship) to 1 (indicating a perfect relationship).
Kendalls uses the Cohen’s interpretation guidelines of 0.1 - < 0.3
(small
effect), 0.3 - < 0.5
(moderate effect) and >= 0.5
(large
effect)
Confidence intervals are calculated by bootstap.
friedman_effsize( data, formula, ci = FALSE, conf.level = 0.95, ci.type = "perc", nboot = 1000, ... )
friedman_effsize( data, formula, ci = FALSE, conf.level = 0.95, ci.type = "perc", nboot = 1000, ... )
data |
a data.frame containing the variables in the formula. |
formula |
a formula of the form |
ci |
If TRUE, returns confidence intervals by bootstrap. May be slow. |
conf.level |
The level for the confidence interval. |
ci.type |
The type of confidence interval to use. Can be any of "norm",
"basic", "perc", or "bca". Passed to |
nboot |
The number of replications to use for bootstrap. |
... |
other arguments passed to the function |
return a data frame with some of the following columns:
.y.
: the y variable used in the test.
n
: Sample
counts.
effsize
: estimate of the effect size.
magnitude
: magnitude of effect size.
conf.low,conf.high
:
lower and upper bound of the effect size confidence interval.
Maciej Tomczak and Ewa Tomczak. The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends in Sport Sciences. 2014; 1(21):19-25.
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth %>% filter(supp == "VC") %>% mutate(id = rep(1:10, 3)) head(df) # Friedman test effect size #::::::::::::::::::::::::::::::::::::::::: df %>% friedman_effsize(len ~ dose | id)
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth %>% filter(supp == "VC") %>% mutate(id = rep(1:10, 3)) head(df) # Friedman test effect size #::::::::::::::::::::::::::::::::::::::::: df %>% friedman_effsize(len ~ dose | id)
Provides a pipe-friendly framework to perform a Friedman rank sum
test, which is the non-parametric alternative to the one-way repeated
measures ANOVA test. Wrapper around the function
friedman.test()
. Read more:
Friedman
test in R.
friedman_test(data, formula, ...)
friedman_test(data, formula, ...)
data |
a data.frame containing the variables in the formula. |
formula |
a formula of the form |
... |
other arguments to be passed to the function
|
return a data frame with the following columns:
.y.
: the y (dependent) variable used in the test.
n
:
sample count.
statistic
: the value of Friedman's chi-squared
statistic, used to compute the p-value.
p
: p-value.
method
: the statistical test used to compare groups.
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth %>% filter(supp == "VC") %>% mutate(id = rep(1:10, 3)) head(df) # Friedman rank sum test #::::::::::::::::::::::::::::::::::::::::: df %>% friedman_test(len ~ dose | id)
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth %>% filter(supp == "VC") %>% mutate(id = rep(1:10, 3)) head(df) # Friedman rank sum test #::::::::::::::::::::::::::::::::::::::::: df %>% friedman_test(len ~ dose | id)
Performs Games-Howell test, which is used to compare all possible combinations of group differences when the assumption of homogeneity of variances is violated. This post hoc test provides confidence intervals for the differences between group means and shows whether the differences are statistically significant.
The test is based on Welch’s degrees of freedom correction and uses Tukey’s studentized range distribution for computing the p-values. The test compares the difference between each pair of means with appropriate adjustment for the multiple testing. So there is no need to apply additional p-value corrections.
games_howell_test(data, formula, conf.level = 0.95, detailed = FALSE)
games_howell_test(data, formula, conf.level = 0.95, detailed = FALSE)
data |
a data.frame containing the variables in the formula. |
formula |
a formula of the form |
conf.level |
confidence level of the interval. |
detailed |
logical value. Default is FALSE. If TRUE, a detailed result is shown. |
The Games-Howell method is an improved version of the Tukey-Kramer method and is applicable in cases where the equivalence of variance assumption is violated. It is a t-test using Welch’s degree of freedom. This method uses a strategy for controlling the type I error for the entire comparison and is known to maintain the preset significance level even when the size of the sample is different. However, the smaller the number of samples in each group, the it is more tolerant the type I error control. Thus, this method can be applied when the number of samples is six or more.
return a data frame with some of the following columns:
.y.
: the y (outcome) variable used in the test.
group1,group2
: the compared groups in the pairwise tests.
n1,n2
: Sample counts.
estimate, conf.low, conf.high
:
mean difference and its confidence intervals.
statistic
: Test
statistic (t-value) used to compute the p-value.
df
: degrees of
freedom calculated using Welch’s correction.
p.adj
: adjusted p-value using Tukey's method.
method
: the statistical test used to compare groups.
p.adj.signif
: the significance level of p-values.
The returned object has an attribute called args, which is a list holding the test arguments.
Aaron Schlege, https://rpubs.com/aaronsc32/games-howell-test.
Sangseok Lee, Dong Kyu Lee. What is the proper way to apply the multiple comparison test?. Korean J Anesthesiol. 2018;71(5):353-360.
# Simple test ToothGrowth %>% games_howell_test(len ~ dose) # Grouped data ToothGrowth %>% group_by(supp) %>% games_howell_test(len ~ dose)
# Simple test ToothGrowth %>% games_howell_test(len ~ dose) # Grouped data ToothGrowth %>% group_by(supp) %>% games_howell_test(len ~ dose)
Create a list of possible pairwise comparisons between groups. If a reference group is specified, only comparisons against reference will be kept.
get_comparisons(data, variable, ref.group = NULL)
get_comparisons(data, variable, ref.group = NULL)
data |
a data frame |
variable |
the grouping variable name. Can be unquoted. |
ref.group |
a character string specifying the reference group. Can be unquoted. If numeric, then it should be quoted. If specified, for a given grouping variable, each of the group levels will be compared to the reference group (i.e. control group). If |
a list of all possible pairwise comparisons.
# All possible pairwise comparisons ToothGrowth %>% get_comparisons("dose") # Comparisons against reference groups ToothGrowth %>% get_comparisons("dose", ref.group = "0.5") # Comparisons against all (basemean) ToothGrowth %>% get_comparisons("dose", ref.group = "all")
# All possible pairwise comparisons ToothGrowth %>% get_comparisons("dose") # Comparisons against reference groups ToothGrowth %>% get_comparisons("dose", ref.group = "0.5") # Comparisons against all (basemean) ToothGrowth %>% get_comparisons("dose", ref.group = "all")
Compute the mode in a given vector. Mode is the most frequent value.
get_mode(x)
get_mode(x)
x |
a vector. Can be numeric, factor or character vector. |
# Mode of numeric vector x <- c(1:5, 6, 6, 7:10) get_mode(x) # Bimodal x <- c(1:5, 6, 6, 7, 8, 9, 9, 10) get_mode(x) # No mode x <- c(1, 2, 3, 4, 5) get_mode(x) # Nominal vector fruits <- c(rep("orange", 10), rep("apple", 5), rep("lemon", 2)) get_mode(fruits)
# Mode of numeric vector x <- c(1:5, 6, 6, 7:10) get_mode(x) # Bimodal x <- c(1:5, 6, 6, 7, 8, 9, 9, 10) get_mode(x) # No mode x <- c(1, 2, 3, 4, 5) get_mode(x) # Nominal vector fruits <- c(rep("orange", 10), rep("apple", 5), rep("lemon", 2)) get_mode(fruits)
Extracts label information from statistical tests. Useful for labelling plots with test outputs.
get_pwc_label(stat.test, type = c("expression", "text")) get_test_label( stat.test, description = NULL, p.col = "p", type = c("expression", "text"), correction = c("auto", "GG", "HF", "none"), row = NULL, detailed = FALSE ) create_test_label( statistic.text, statistic, p, parameter = NA, description = NULL, n = NA, effect.size = NA, effect.size.text = NA, type = c("expression", "text"), detailed = FALSE ) get_n(stat.test) get_description(stat.test)
get_pwc_label(stat.test, type = c("expression", "text")) get_test_label( stat.test, description = NULL, p.col = "p", type = c("expression", "text"), correction = c("auto", "GG", "HF", "none"), row = NULL, detailed = FALSE ) create_test_label( statistic.text, statistic, p, parameter = NA, description = NULL, n = NA, effect.size = NA, effect.size.text = NA, type = c("expression", "text"), detailed = FALSE ) get_n(stat.test) get_description(stat.test)
stat.test |
statistical test results returned by |
type |
the label type. Can be one of "text" and "expression". Partial
match allowed. If you want to add the label onto a ggplot, it might be
useful to specify |
description |
the test description used as the prefix of the label.
Examples of description are "ANOVA", "Two Way ANOVA". To remove the default
description, specify |
p.col |
character specifying the column containing the p-value. Default
is |
correction |
character, considered only in the case of ANOVA test. Which sphericity
correction of the degrees of freedom should be reported for the
within-subject factors (repeated measures). The default is set to
|
row |
numeric, the row index to be considered. If NULL, the last row is automatically considered for ANOVA test. |
detailed |
logical value. If TRUE, returns detailed label. |
statistic.text |
character specifying the test statistic. For example
|
statistic |
the numeric value of a statistic. |
p |
the p-value of the test. |
parameter |
string containing the degree of freedom (if exists). Default
is |
n |
sample count, example: |
effect.size |
the effect size value |
effect.size.text |
a character specifying the relevant effect size. For
example, for |
a text label or an expression to pass to a plotting function.
get_pwc_label()
: Extract label from pairwise comparisons.
get_test_label()
: Extract labels for statistical tests.
create_test_label()
: Create labels from user specified test results.
get_n()
: Extracts sample counts (n) from an rstatix test outputs. Returns a numeric vector.
get_description()
: Extracts the description of an rstatix test outputs. Returns a character vector.
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # One-way ANOVA test #::::::::::::::::::::::::::::::::::::::::: anov <- df %>% anova_test(len ~ dose) get_test_label(anov, detailed = TRUE, type = "text") # Two-way ANOVA test #::::::::::::::::::::::::::::::::::::::::: anov <- df %>% anova_test(len ~ supp*dose) get_test_label(anov, detailed = TRUE, type = "text", description = "Two Way ANOVA") # Kruskal-Wallis test #::::::::::::::::::::::::::::::::::::::::: kruskal<- df %>% kruskal_test(len ~ dose) get_test_label(kruskal, detailed = TRUE, type = "text") # Wilcoxon test #::::::::::::::::::::::::::::::::::::::::: # Unpaired test wilcox <- df %>% wilcox_test(len ~ supp) get_test_label(wilcox, detailed = TRUE, type = "text") # Paired test wilcox <- df %>% wilcox_test(len ~ supp, paired = TRUE) get_test_label(wilcox, detailed = TRUE, type = "text") # T test #::::::::::::::::::::::::::::::::::::::::: ttest <- df %>% t_test(len ~ dose) get_test_label(ttest, detailed = TRUE, type = "text") # Pairwise comparisons labels #::::::::::::::::::::::::::::::::::::::::: get_pwc_label(ttest, type = "text") # Create test labels #::::::::::::::::::::::::::::::::::::::::: create_test_label( statistic.text = "F", statistic = 71.82, parameter = "4, 294", p = "<0.0001", description = "ANOVA", type = "text" ) # Extract infos #::::::::::::::::::::::::::::::::::::::::: stat.test <- df %>% t_test(len ~ dose) get_n(stat.test) get_description(stat.test)
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # One-way ANOVA test #::::::::::::::::::::::::::::::::::::::::: anov <- df %>% anova_test(len ~ dose) get_test_label(anov, detailed = TRUE, type = "text") # Two-way ANOVA test #::::::::::::::::::::::::::::::::::::::::: anov <- df %>% anova_test(len ~ supp*dose) get_test_label(anov, detailed = TRUE, type = "text", description = "Two Way ANOVA") # Kruskal-Wallis test #::::::::::::::::::::::::::::::::::::::::: kruskal<- df %>% kruskal_test(len ~ dose) get_test_label(kruskal, detailed = TRUE, type = "text") # Wilcoxon test #::::::::::::::::::::::::::::::::::::::::: # Unpaired test wilcox <- df %>% wilcox_test(len ~ supp) get_test_label(wilcox, detailed = TRUE, type = "text") # Paired test wilcox <- df %>% wilcox_test(len ~ supp, paired = TRUE) get_test_label(wilcox, detailed = TRUE, type = "text") # T test #::::::::::::::::::::::::::::::::::::::::: ttest <- df %>% t_test(len ~ dose) get_test_label(ttest, detailed = TRUE, type = "text") # Pairwise comparisons labels #::::::::::::::::::::::::::::::::::::::::: get_pwc_label(ttest, type = "text") # Create test labels #::::::::::::::::::::::::::::::::::::::::: create_test_label( statistic.text = "F", statistic = 71.82, parameter = "4, 294", p = "<0.0001", description = "ANOVA", type = "text" ) # Extract infos #::::::::::::::::::::::::::::::::::::::::: stat.test <- df %>% t_test(len ~ dose) get_n(stat.test) get_description(stat.test)
Compute summary statistics for one or multiple numeric variables.
get_summary_stats( data, ..., type = c("full", "common", "robust", "five_number", "mean_sd", "mean_se", "mean_ci", "median_iqr", "median_mad", "quantile", "mean", "median", "min", "max"), show = NULL, probs = seq(0, 1, 0.25) )
get_summary_stats( data, ..., type = c("full", "common", "robust", "five_number", "mean_sd", "mean_se", "mean_ci", "median_iqr", "median_mad", "quantile", "mean", "median", "min", "max"), show = NULL, probs = seq(0, 1, 0.25) )
data |
a data frame |
... |
(optional) One or more unquoted expressions (or variable names) separated by commas. Used to select a variable of interest. If no variable is specified, then the summary statistics of all numeric variables in the data frame is computed. |
type |
type of summary statistics. Possible values include: |
show |
a character vector specifying the summary statistics you want to
show. Example: |
probs |
numeric vector of probabilities with values in [0,1]. Used only when type = "quantile". |
A data frame containing descriptive statistics, such as:
n: the number of individuals
min: minimum
max: maximum
median: median
mean: mean
q1, q3: the first and the third quartile, respectively.
iqr: interquartile range
mad: median absolute deviation (see ?MAD)
sd: standard deviation of the mean
se: standard error of the mean
ci: 95 percent confidence interval of the mean
# Full summary statistics data("ToothGrowth") ToothGrowth %>% get_summary_stats(len) # Summary statistics of grouped data # Show only common summary ToothGrowth %>% group_by(dose, supp) %>% get_summary_stats(len, type = "common") # Robust summary statistics ToothGrowth %>% get_summary_stats(len, type = "robust") # Five number summary statistics ToothGrowth %>% get_summary_stats(len, type = "five_number") # Compute only mean and sd ToothGrowth %>% get_summary_stats(len, type = "mean_sd") # Compute full summary statistics but show only mean, sd, median, iqr ToothGrowth %>% get_summary_stats(len, show = c("mean", "sd", "median", "iqr"))
# Full summary statistics data("ToothGrowth") ToothGrowth %>% get_summary_stats(len) # Summary statistics of grouped data # Show only common summary ToothGrowth %>% group_by(dose, supp) %>% get_summary_stats(len, type = "common") # Robust summary statistics ToothGrowth %>% get_summary_stats(len, type = "robust") # Five number summary statistics ToothGrowth %>% get_summary_stats(len, type = "five_number") # Compute only mean and sd ToothGrowth %>% get_summary_stats(len, type = "mean_sd") # Compute full summary statistics but show only mean, sd, median, iqr ToothGrowth %>% get_summary_stats(len, show = c("mean", "sd", "median", "iqr"))
Compute p-value x and y positions for plotting significance levels. Many examples are provided at :
get_y_position( data, formula, fun = "max", ref.group = NULL, comparisons = NULL, step.increase = 0.12, y.trans = NULL, stack = FALSE, scales = c("fixed", "free", "free_y") ) add_y_position( test, fun = "max", step.increase = 0.12, data = NULL, formula = NULL, ref.group = NULL, comparisons = NULL, y.trans = NULL, stack = FALSE, scales = c("fixed", "free", "free_y") ) add_x_position(test, x = NULL, group = NULL, dodge = 0.8) add_xy_position( test, x = NULL, group = NULL, dodge = 0.8, stack = FALSE, fun = "max", step.increase = 0.12, scales = c("fixed", "free", "free_y"), ... )
get_y_position( data, formula, fun = "max", ref.group = NULL, comparisons = NULL, step.increase = 0.12, y.trans = NULL, stack = FALSE, scales = c("fixed", "free", "free_y") ) add_y_position( test, fun = "max", step.increase = 0.12, data = NULL, formula = NULL, ref.group = NULL, comparisons = NULL, y.trans = NULL, stack = FALSE, scales = c("fixed", "free", "free_y") ) add_x_position(test, x = NULL, group = NULL, dodge = 0.8) add_xy_position( test, x = NULL, group = NULL, dodge = 0.8, stack = FALSE, fun = "max", step.increase = 0.12, scales = c("fixed", "free", "free_y"), ... )
data |
a data.frame containing the variables in the formula. |
formula |
a formula of the form |
fun |
summary statistics functions used to compute automatically suitable
y positions of p-value labels and brackets. Possible values include:
For example, if
When the main plot is a boxplot, you need the option In some situations the main plot is a line plot or a barplot showing the
|
ref.group |
a character string specifying the reference group. If specified, for a given grouping variable, each of the group levels will be compared to the reference group (i.e. control group). |
comparisons |
A list of length-2 vectors specifying the groups of
interest to be compared. For example to compare groups "A" vs "B" and "B" vs
"C", the argument is as follow: |
step.increase |
numeric vector with the increase in fraction of total height for every additional comparison to minimize overlap. |
y.trans |
a function for transforming y axis scale. Value can be
|
stack |
logical. If TRUE, computes y position for a stacked plot. Useful when dealing with stacked bar plots. |
scales |
Should scales be fixed ( |
test |
an object of class |
x |
variable on x axis. |
group |
group variable (legend variable). |
dodge |
dodge width for grouped ggplot/test. Default is 0.8. Used only
when |
... |
other arguments to be passed to the function
|
get_y_position()
: compute the p-value y positions
add_y_position()
: add p-value y positions to an object of class rstatix_test
add_x_position()
: compute and add p-value x positions.
add_xy_position()
: compute and add both x and y positions.
# Data preparation #:::::::::::::::::::::::::::::::::::: df <- ToothGrowth df$dose <- as.factor(df$dose) df$group <- factor(rep(c(1, 2), 30)) head(df) # Stat tests #:::::::::::::::::::::::::::::::::::: stat.test <- df %>% t_test(len ~ dose) stat.test # Add the test into box plots #:::::::::::::::::::::::::::::::::::: stat.test <- stat.test %>% add_y_position() if(require("ggpubr")){ ggboxplot(df, x = "dose", y = "len") + stat_pvalue_manual(stat.test, label = "p.adj.signif", tip.length = 0.01) }
# Data preparation #:::::::::::::::::::::::::::::::::::: df <- ToothGrowth df$dose <- as.factor(df$dose) df$group <- factor(rep(c(1, 2), 30)) head(df) # Stat tests #:::::::::::::::::::::::::::::::::::: stat.test <- df %>% t_test(len ~ dose) stat.test # Add the test into box plots #:::::::::::::::::::::::::::::::::::: stat.test <- stat.test %>% add_y_position() if(require("ggpubr")){ ggboxplot(df, x = "dose", y = "len") + stat_pvalue_manual(stat.test, label = "p.adj.signif", tip.length = 0.01) }
Detect outliers using boxplot methods. Boxplots are a popular and an easy method for identifying outliers. There are two categories of outlier: (1) outliers and (2) extreme points.
Values above Q3 + 1.5xIQR
or below Q1 - 1.5xIQR
are considered
as outliers. Values above Q3 + 3xIQR
or below Q1 - 3xIQR
are
considered as extreme points (or extreme outliers).
Q1 and Q3 are the first and third quartile, respectively. IQR is the interquartile range (IQR = Q3 - Q1).
Generally speaking, data points that are labelled outliers in boxplots are
not considered as troublesome as those considered extreme points and might
even be ignored. Note that, any NA
and NaN
are automatically removed
before the quantiles are computed.
identify_outliers(data, ..., variable = NULL) is_outlier(x, coef = 1.5) is_extreme(x)
identify_outliers(data, ..., variable = NULL) is_outlier(x, coef = 1.5) is_extreme(x)
data |
a data frame |
... |
One unquoted expressions (or variable name). Used to select a
variable of interest. Alternative to the argument |
variable |
variable name for detecting outliers |
x |
a numeric vector |
coef |
coefficient specifying how far the outlier should be from the edge of their box. Possible values are 1.5 (for outlier) and 3 (for extreme points only). Default is 1.5 |
identify_outliers()
. Returns the input data
frame with two additional columns: "is.outlier" and "is.extreme", which hold
logical values.
is_outlier() and is_extreme()
. Returns logical
vectors.
identify_outliers()
: takes a data frame and extract rows suspected as outliers
according to a numeric column. The following columns are added "is.outlier"
and "is.extreme".
is_outlier()
: detect outliers in a numeric vector. Returns logical vector.
is_extreme()
: detect extreme points in a numeric vector. An alias of
is_outlier()
, where coef = 3. Returns logical vector.
# Generate a demo data set.seed(123) demo.data <- data.frame( sample = 1:20, score = c(rnorm(19, mean = 5, sd = 2), 50), gender = rep(c("Male", "Female"), each = 10) ) # Identify outliers according to the variable score demo.data %>% identify_outliers(score) # Identify outliers by groups demo.data %>% group_by(gender) %>% identify_outliers("score")
# Generate a demo data set.seed(123) demo.data <- data.frame( sample = 1:20, score = c(rnorm(19, mean = 5, sd = 2), 50), gender = rep(c("Male", "Female"), each = 10) ) # Identify outliers according to the variable score demo.data %>% identify_outliers(score) # Identify outliers by groups demo.data %>% group_by(gender) %>% identify_outliers("score")
Compute the effect size for Kruskal-Wallis test as the eta
squared based on the H-statistic: eta2[H] = (H - k + 1)/(n - k)
;
where H
is the value obtained in the Kruskal-Wallis test; k
is
the number of groups; n
is the total number of observations.
The eta-squared estimate assumes values from 0 to 1 and multiplied by 100
indicates the percentage of variance in the dependent variable explained by
the independent variable. The interpretation values commonly in published
litterature are: 0.01- < 0.06
(small effect), 0.06 - < 0.14
(moderate effect) and >= 0.14
(large effect).
Confidence intervals are calculated by bootstap.
kruskal_effsize( data, formula, ci = FALSE, conf.level = 0.95, ci.type = "perc", nboot = 1000 )
kruskal_effsize( data, formula, ci = FALSE, conf.level = 0.95, ci.type = "perc", nboot = 1000 )
data |
a data.frame containing the variables in the formula. |
formula |
a formula of the form |
ci |
If TRUE, returns confidence intervals by bootstrap. May be slow. |
conf.level |
The level for the confidence interval. |
ci.type |
The type of confidence interval to use. Can be any of "norm",
"basic", "perc", or "bca". Passed to |
nboot |
The number of replications to use for bootstrap. |
return a data frame with some of the following columns:
.y.
: the y variable used in the test.
n
: Sample
counts.
effsize
: estimate of the effect size.
magnitude
: magnitude of effect size.
conf.low,conf.high
:
lower and upper bound of the effect size confidence interval.
Maciej Tomczak and Ewa Tomczak. The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends in Sport Sciences. 2014; 1(21):19-25.
http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/effectSize
http://www.psy.gla.ac.uk/~steve/best/effect.html
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # Kruskal-wallis rank sum test #::::::::::::::::::::::::::::::::::::::::: df %>% kruskal_effsize(len ~ dose) # Grouped data df %>% group_by(supp) %>% kruskal_effsize(len ~ dose)
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # Kruskal-wallis rank sum test #::::::::::::::::::::::::::::::::::::::::: df %>% kruskal_effsize(len ~ dose) # Grouped data df %>% group_by(supp) %>% kruskal_effsize(len ~ dose)
Provides a pipe-friendly framework to perform Kruskal-Wallis
rank sum test. Wrapper around the function
kruskal.test()
.
kruskal_test(data, formula, ...)
kruskal_test(data, formula, ...)
data |
a data.frame containing the variables in the formula. |
formula |
a formula of the form |
... |
other arguments to be passed to the function
|
return a data frame with the following columns:
.y.
: the y variable used in the test.
n
: sample count.
statistic
: the kruskal-wallis rank sum statistic used to
compute the p-value.
p
: p-value.
method
: the
statistical test used to compare groups.
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # Kruskal-wallis rank sum test #::::::::::::::::::::::::::::::::::::::::: df %>% kruskal_test(len ~ dose) # Grouped data df %>% group_by(supp) %>% kruskal_test(len ~ dose)
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # Kruskal-wallis rank sum test #::::::::::::::::::::::::::::::::::::::::: df %>% kruskal_test(len ~ dose) # Grouped data df %>% group_by(supp) %>% kruskal_test(len ~ dose)
Provide a pipe-friendly framework to easily compute Levene's test for homogeneity of variance across groups.
Wrapper around the function leveneTest()
, which can
additionally handles a grouped data.
levene_test(data, formula, center = median)
levene_test(data, formula, center = median)
data |
a data frame for evaluating the formula or a model |
formula |
a formula |
center |
The name of a function to compute the center of each group; mean gives the original Levene's test; the default, median, provides a more robust test. |
a data frame with the following columns: df1, df2 (df.residual), statistic and p.
# Prepare the data data("ToothGrowth") df <- ToothGrowth df$dose <- as.factor(df$dose) # Compute Levene's Test df %>% levene_test(len ~ dose) # Grouped data df %>% group_by(supp) %>% levene_test(len ~ dose)
# Prepare the data data("ToothGrowth") df <- ToothGrowth df$dose <- as.factor(df$dose) # Compute Levene's Test df %>% levene_test(len ~ dose) # Grouped data df %>% group_by(supp) %>% levene_test(len ~ dose)
Pipe-friendly wrapper around to the function
mahalanobis()
, which returns the squared
Mahalanobis distance of all rows in x. Compared to the base function, it
automatically flags multivariate outliers.
Mahalanobis distance is a common metric used to identify multivariate outliers. The larger the value of Mahalanobis distance, the more unusual the data point (i.e., the more likely it is to be a multivariate outlier).
The distance tells us how far an observation is from the center of the cloud, taking into account the shape (covariance) of the cloud as well.
To detect outliers, the calculated Mahalanobis distance is compared against a chi-square (X^2) distribution with degrees of freedom equal to the number of dependent (outcome) variables and an alpha level of 0.001.
The threshold to declare a multivariate outlier is determined using the
function qchisq(0.999, df)
, where df is the degree of freedom (i.e.,
the number of dependent variable used in the computation).
mahalanobis_distance(data, ...)
mahalanobis_distance(data, ...)
data |
a data frame. Columns are variables. |
... |
One unquoted expressions (or variable name). Used to select a
variable of interest. Can be also used to ignore a variable that are not
needed for the computation. For example specify |
Returns the input data frame with two additional columns: 1) "mahal.dist": Mahalanobis distance values; and 2) "is.outlier": logical values specifying whether a given observation is a multivariate outlier
# Compute mahalonobis distance and flag outliers if any iris %>% doo(~mahalanobis_distance(.)) # Compute distance by groups and filter outliers iris %>% group_by(Species) %>% doo(~mahalanobis_distance(.)) %>% filter(is.outlier == TRUE)
# Compute mahalonobis distance and flag outliers if any iris %>% doo(~mahalanobis_distance(.)) # Compute distance by groups and filter outliers iris %>% group_by(Species) %>% doo(~mahalanobis_distance(.)) %>% filter(is.outlier == TRUE)
Pipe-friendly function to make syntactically valid names out of character vectors.
make_clean_names(data)
make_clean_names(data)
data |
a data frame or vector |
a data frame or a vector depending on the input data
# Vector make_clean_names(c("a and b", "a-and-b")) make_clean_names(1:10) # data frame df <- data.frame( `a and b` = 1:4, `c and d` = 5:8, check.names = FALSE ) df make_clean_names(df)
# Vector make_clean_names(c("a and b", "a-and-b")) make_clean_names(1:10) # data frame df <- data.frame( `a and b` = 1:4, `c and d` = 5:8, check.names = FALSE ) df make_clean_names(df)
Performs McNemar chi-squared test to compare paired proportions.
Wrappers around the R base function mcnemar.test()
, but
provide pairwise comparisons between multiple groups
mcnemar_test(x, y = NULL, correct = TRUE) pairwise_mcnemar_test( data, formula, type = c("mcnemar", "exact"), correct = TRUE, p.adjust.method = "bonferroni" )
mcnemar_test(x, y = NULL, correct = TRUE) pairwise_mcnemar_test( data, formula, type = c("mcnemar", "exact"), correct = TRUE, p.adjust.method = "bonferroni" )
x |
either a two-dimensional contingency table in matrix form, or a factor object. |
y |
a factor object; ignored if |
correct |
a logical indicating whether to apply continuity correction when computing the test statistic. |
data |
a data frame containing the variables in the formula. |
formula |
a formula of the form |
type |
type of statistical tests used for pairwise comparisons. Allowed
values are one of |
p.adjust.method |
method to adjust p values for multiple comparisons. Used when pairwise comparisons are performed. Allowed values include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". If you don't want to adjust the p value (not recommended), use p.adjust.method = "none". |
return a data frame with the following columns:
n
: the number of participants.
statistic
: the value of McNemar's statistic.
df
the
degrees of freedom of the approximate chi-squared distribution of the test
statistic.
p
: p-value.
p.adj
: the adjusted
p-value.
method
: the used statistical test.
p.signif
: the significance level of p-values.
The returned object has an attribute called args, which is a list holding the test arguments.
mcnemar_test()
: performs McNemar's chi-squared test for comparing two
paired proportions
pairwise_mcnemar_test()
: performs pairwise McNemar's chi-squared test between
multiple groups. Could be used for post-hoc tests following a significant Cochran's Q test.
# Comparing two paired proportions #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: frequencies of smokers before and after interventions xtab <- as.table( rbind(c(25, 6), c(21,10)) ) dimnames(xtab) <- list( before = c("non.smoker", "smoker"), after = c("non.smoker", "smoker") ) xtab # Compare the proportion of smokers mcnemar_test(xtab) # Comparing multiple related proportions # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Generate a demo data mydata <- data.frame( outcome = c(0,1,1,0,0,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,0,0,1,0,1,1,0,0,1), treatment = gl(3,1,30,labels=LETTERS[1:3]), participant = gl(10,3,labels=letters[1:10]) ) mydata$outcome <- factor( mydata$outcome, levels = c(1, 0), labels = c("success", "failure") ) # Cross-tabulation xtabs(~outcome + treatment, mydata) # Compare the proportion of success between treatments cochran_qtest(mydata, outcome ~ treatment|participant) # pairwise comparisons between groups pairwise_mcnemar_test(mydata, outcome ~ treatment|participant)
# Comparing two paired proportions #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: frequencies of smokers before and after interventions xtab <- as.table( rbind(c(25, 6), c(21,10)) ) dimnames(xtab) <- list( before = c("non.smoker", "smoker"), after = c("non.smoker", "smoker") ) xtab # Compare the proportion of smokers mcnemar_test(xtab) # Comparing multiple related proportions # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Generate a demo data mydata <- data.frame( outcome = c(0,1,1,0,0,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,0,0,1,0,1,1,0,0,1), treatment = gl(3,1,30,labels=LETTERS[1:3]), participant = gl(10,3,labels=letters[1:10]) ) mydata$outcome <- factor( mydata$outcome, levels = c(1, 0), labels = c("success", "failure") ) # Cross-tabulation xtabs(~outcome + treatment, mydata) # Compare the proportion of success between treatments cochran_qtest(mydata, outcome ~ treatment|participant) # pairwise comparisons between groups pairwise_mcnemar_test(mydata, outcome ~ treatment|participant)
Performs an exact multinomial test. Alternative to the chi-square test of goodness-of-fit-test when the sample size is small.
multinom_test(x, p = rep(1/length(x), length(x)), detailed = FALSE)
multinom_test(x, p = rep(1/length(x), length(x)), detailed = FALSE)
x |
numeric vector containing the counts. |
p |
a vector of probabilities of success. The length of p must be the same as the number of groups specified by x, and its elements must be greater than 0 and less than 1. |
detailed |
logical value. Default is FALSE. If TRUE, a detailed result is shown. |
return a data frame containing the p-value and its significance.
The returned object has an attribute called args, which is a list holding the test arguments.
# Data tulip <- c(red = 81, yellow = 50, white = 27) # Question 1: are the color equally common ? #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # this is a test of homogeneity res <- multinom_test(tulip) res attr(res, "descriptives") # Pairwise comparisons between groups pairwise_binom_test(tulip, p.adjust.method = "bonferroni") # Question 2: comparing observed to expected proportions #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # this is a goodness-of-fit test expected.p <- c(red = 0.5, yellow = 0.33, white = 0.17) res <- multinom_test(tulip, expected.p) res attr(res, "descriptives") # Pairwise comparisons against a given probabilities pairwise_binom_test_against_p(tulip, expected.p)
# Data tulip <- c(red = 81, yellow = 50, white = 27) # Question 1: are the color equally common ? #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # this is a test of homogeneity res <- multinom_test(tulip) res attr(res, "descriptives") # Pairwise comparisons between groups pairwise_binom_test(tulip, p.adjust.method = "bonferroni") # Question 2: comparing observed to expected proportions #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # this is a goodness-of-fit test expected.p <- c(red = 0.5, yellow = 0.33, white = 0.17) res <- multinom_test(tulip, expected.p) res attr(res, "descriptives") # Pairwise comparisons against a given probabilities pairwise_binom_test_against_p(tulip, expected.p)
Round and format p-values. Can also mark significant p-values with stars.
p_round(x, ..., digits = 3) p_format( x, ..., new.col = FALSE, digits = 2, accuracy = 1e-04, decimal.mark = ".", leading.zero = TRUE, trailing.zero = FALSE, add.p = FALSE, space = FALSE ) p_mark_significant( x, ..., new.col = FALSE, cutpoints = c(0, 1e-04, 0.001, 0.01, 0.05, 1), symbols = c("****", "***", "**", "*", "") ) p_detect(data, type = c("all", "p", "p.adj")) p_names() p_adj_names()
p_round(x, ..., digits = 3) p_format( x, ..., new.col = FALSE, digits = 2, accuracy = 1e-04, decimal.mark = ".", leading.zero = TRUE, trailing.zero = FALSE, add.p = FALSE, space = FALSE ) p_mark_significant( x, ..., new.col = FALSE, cutpoints = c(0, 1e-04, 0.001, 0.01, 0.05, 1), symbols = c("****", "***", "**", "*", "") ) p_detect(data, type = c("all", "p", "p.adj")) p_names() p_adj_names()
x |
a numeric vector of p-values or a data frame containing a p value
column. If data frame, the p-value column(s) will be automatically detected.
Known p-value column names can be obtained using the functions
|
... |
column names to manipulate in the case where |
digits |
the number of significant digits to be used. |
new.col |
logical, used only when |
accuracy |
number to round to, that is the threshold value above wich the function will replace the pvalue by "<0.0xxx". |
decimal.mark |
the character to be used to indicate the numeric decimal point. |
leading.zero |
logical. If FALSE, remove the leading zero. |
trailing.zero |
logical. If FALSE (default), remove the training extra zero. |
add.p |
logical value. If TRUE, add "p=" before the value. |
space |
logical. If TRUE (default) use space as separator between different elements and symbols. |
cutpoints |
numeric vector used for intervals |
symbols |
character vector, one shorter than cutpoints, used as significance symbols. |
data |
a data frame |
type |
the type of p-value to detect. Can be one of |
a vector or a data frame containing the rounded/formatted p-values.
p_round()
: round p-values
p_format()
: format p-values. Add a symbol "<" for small p-values.
p_mark_significant()
: mark p-values with significance levels
p_detect()
: detects and returns p-value column names in a data frame.
p_names()
: returns known p-value column names
p_adj_names()
: returns known adjust p-value column names
# Round and format a vector of p-values # :::::::::::::::::::::::::::::::::::::::::::: # Format p <- c(0.5678, 0.127, 0.045, 0.011, 0.009, 0.00002, NA) p_format(p) # Specify the accuracy p_format(p, accuracy = 0.01) # Add p and remove the leading zero p_format(p, add.p = TRUE, leading.zero = FALSE) # Remove space before and after "=" or "<". p_format(p, add.p = TRUE, leading.zero = FALSE, space = FALSE) # Mark significant p-values # :::::::::::::::::::::::::::::::::::::::::::: p_mark_significant(p) # Round, the mark significant p %>% p_round(digits = 2) %>% p_mark_significant() # Format, then mark significant p %>% p_format(digits = 2) %>% p_mark_significant() # Perform stat test, format p and mark significant # :::::::::::::::::::::::::::::::::::::::::::: ToothGrowth %>% group_by(dose) %>% t_test(len ~ supp) %>% p_format(digits = 2, leading.zero = FALSE) %>% p_mark_significant()
# Round and format a vector of p-values # :::::::::::::::::::::::::::::::::::::::::::: # Format p <- c(0.5678, 0.127, 0.045, 0.011, 0.009, 0.00002, NA) p_format(p) # Specify the accuracy p_format(p, accuracy = 0.01) # Add p and remove the leading zero p_format(p, add.p = TRUE, leading.zero = FALSE) # Remove space before and after "=" or "<". p_format(p, add.p = TRUE, leading.zero = FALSE, space = FALSE) # Mark significant p-values # :::::::::::::::::::::::::::::::::::::::::::: p_mark_significant(p) # Round, the mark significant p %>% p_round(digits = 2) %>% p_mark_significant() # Format, then mark significant p %>% p_format(digits = 2) %>% p_mark_significant() # Perform stat test, format p and mark significant # :::::::::::::::::::::::::::::::::::::::::::: ToothGrowth %>% group_by(dose) %>% t_test(len ~ supp) %>% p_format(digits = 2, leading.zero = FALSE) %>% p_mark_significant()
Performs proportion tests to either evaluate the homogeneity of proportions (probabilities of success) in several groups or to test that the proportions are equal to certain given values.
Wrappers around the R base function prop.test()
but have
the advantage of performing pairwise and row-wise z-test of two proportions,
the post-hoc tests following a significant chi-square test of homogeneity
for 2xc and rx2 contingency tables.
prop_test( x, n, p = NULL, alternative = c("two.sided", "less", "greater"), correct = TRUE, conf.level = 0.95, detailed = FALSE ) pairwise_prop_test(xtab, p.adjust.method = "holm", ...) row_wise_prop_test(xtab, p.adjust.method = "holm", detailed = FALSE, ...)
prop_test( x, n, p = NULL, alternative = c("two.sided", "less", "greater"), correct = TRUE, conf.level = 0.95, detailed = FALSE ) pairwise_prop_test(xtab, p.adjust.method = "holm", ...) row_wise_prop_test(xtab, p.adjust.method = "holm", detailed = FALSE, ...)
x |
a vector of counts of successes, a one-dimensional table with two entries, or a two-dimensional table (or matrix) with 2 columns, giving the counts of successes and failures, respectively. |
n |
a vector of counts of trials; ignored if |
p |
a vector of probabilities of success. The length of
|
alternative |
a character string specifying the alternative
hypothesis, must be one of |
correct |
a logical indicating whether Yates' continuity correction should be applied where possible. |
conf.level |
confidence level of the returned confidence interval. Must be a single number between 0 and 1. Only used when testing the null that a single proportion equals a given value, or that two proportions are equal; ignored otherwise. |
detailed |
logical value. Default is FALSE. If TRUE, a detailed result is shown. |
xtab |
a cross-tabulation (or contingency table) with two columns and multiple rows (rx2 design). The columns give the counts of successes and failures respectively. |
p.adjust.method |
method to adjust p values for multiple comparisons. Used when pairwise comparisons are performed. Allowed values include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". If you don't want to adjust the p value (not recommended), use p.adjust.method = "none". |
... |
Other arguments passed to the function |
return a data frame with some the following columns:
n
: the number of participants.
group
: the categories in the row-wise proportion tests.
statistic
: the value of Pearson's chi-squared test statistic.
df
: the degrees of freedom of the approximate chi-squared
distribution of the test statistic.
p
: p-value.
p.adj
: the adjusted p-value.
method
: the used
statistical test.
p.signif, p.adj.signif
: the significance
level of p-values and adjusted p-values, respectively.
estimate
: a vector with the sample proportions x/n.
estimate1, estimate2
: the proportion in each of the two populations.
alternative
: a character string describing the alternative
hypothesis.
conf.low,conf.high
: Lower and upper bound on a
confidence interval. a confidence interval for the true proportion if there
is one group, or for the difference in proportions if there are 2 groups and
p is not given, or NULL otherwise. In the cases where it is not NULL, the
returned confidence interval has an asymptotic confidence level as specified
by conf.level, and is appropriate to the specified alternative hypothesis.
The returned object has an attribute called args, which is a list holding the test arguments.
prop_test()
: performs one-sample and two-samples z-test of
proportions. Wrapper around the function prop.test()
.
pairwise_prop_test()
: pairwise comparisons between proportions, a post-hoc
tests following a significant chi-square test of homogeneity for 2xc
design. Wrapper around pairwise.prop.test()
row_wise_prop_test()
: performs row-wise z-test of two proportions, a post-hoc tests following a significant chi-square test
of homogeneity for rx2 contingency table. The z-test of two proportions is calculated for each category (row).
# Comparing an observed proportion to an expected proportion #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% prop_test(x = 95, n = 160, p = 0.5, detailed = TRUE) # Comparing two proportions #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: frequencies of smokers between two groups xtab <- as.table(rbind(c(490, 10), c(400, 100))) dimnames(xtab) <- list( group = c("grp1", "grp2"), smoker = c("yes", "no") ) xtab # compare the proportion of smokers prop_test(xtab, detailed = TRUE) # Homogeneity of proportions between groups #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # H0: the proportion of smokers is similar in the four groups # Ha: this proportion is different in at least one of the populations. # # Data preparation grp.size <- c( 106, 113, 156, 102 ) smokers <- c( 50, 100, 139, 80 ) no.smokers <- grp.size - smokers xtab <- as.table(rbind( smokers, no.smokers )) dimnames(xtab) <- list( Smokers = c("Yes", "No"), Groups = c("grp1", "grp2", "grp3", "grp4") ) xtab # Compare the proportions of smokers between groups prop_test(xtab, detailed = TRUE) # Pairwise comparison between groups pairwise_prop_test(xtab) # Pairwise proportion tests #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: Titanic xtab <- as.table(rbind( c(122, 167, 528, 673), c(203, 118, 178, 212) )) dimnames(xtab) <- list( Survived = c("No", "Yes"), Class = c("1st", "2nd", "3rd", "Crew") ) xtab # Compare the proportion of survived between groups pairwise_prop_test(xtab) # Row-wise proportion tests #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: Titanic xtab <- as.table(rbind( c(180, 145), c(179, 106), c(510, 196), c(862, 23) )) dimnames(xtab) <- list( Class = c("1st", "2nd", "3rd", "Crew"), Gender = c("Male", "Female") ) xtab # Compare the proportion of males and females in each category row_wise_prop_test(xtab)
# Comparing an observed proportion to an expected proportion #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% prop_test(x = 95, n = 160, p = 0.5, detailed = TRUE) # Comparing two proportions #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: frequencies of smokers between two groups xtab <- as.table(rbind(c(490, 10), c(400, 100))) dimnames(xtab) <- list( group = c("grp1", "grp2"), smoker = c("yes", "no") ) xtab # compare the proportion of smokers prop_test(xtab, detailed = TRUE) # Homogeneity of proportions between groups #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # H0: the proportion of smokers is similar in the four groups # Ha: this proportion is different in at least one of the populations. # # Data preparation grp.size <- c( 106, 113, 156, 102 ) smokers <- c( 50, 100, 139, 80 ) no.smokers <- grp.size - smokers xtab <- as.table(rbind( smokers, no.smokers )) dimnames(xtab) <- list( Smokers = c("Yes", "No"), Groups = c("grp1", "grp2", "grp3", "grp4") ) xtab # Compare the proportions of smokers between groups prop_test(xtab, detailed = TRUE) # Pairwise comparison between groups pairwise_prop_test(xtab) # Pairwise proportion tests #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: Titanic xtab <- as.table(rbind( c(122, 167, 528, 673), c(203, 118, 178, 212) )) dimnames(xtab) <- list( Survived = c("No", "Yes"), Class = c("1st", "2nd", "3rd", "Crew") ) xtab # Compare the proportion of survived between groups pairwise_prop_test(xtab) # Row-wise proportion tests #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data: Titanic xtab <- as.table(rbind( c(180, 145), c(179, 106), c(510, 196), c(862, 23) )) dimnames(xtab) <- list( Class = c("1st", "2nd", "3rd", "Crew"), Gender = c("Male", "Female") ) xtab # Compare the proportion of males and females in each category row_wise_prop_test(xtab)
Performs chi-squared test for trend in proportion. This test is also known as Cochran-Armitage trend test.
Wrappers around the R base function prop.trend.test()
but
returns a data frame for easy data visualization.
prop_trend_test(xtab, score = NULL)
prop_trend_test(xtab, score = NULL)
xtab |
a cross-tabulation (or contingency table) with two columns and multiple rows (rx2 design). The columns give the counts of successes and failures respectively. |
score |
group score. If |
return a data frame with some the following columns:
n
: the number of participants.
statistic
: the value of
Chi-squared trend test statistic.
df
: the degrees of
freedom.
p
: p-value.
method
: the used statistical test.
p.signif
: the significance level of p-values and adjusted p-values,
respectively.
The returned object has an attribute called args, which is a list holding the test arguments.
# Proportion of renal stone (calculi) across age #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data xtab <- as.table(rbind( c(384, 536, 335), c(951, 869, 438) )) dimnames(xtab) <- list( stone = c("yes", "no"), age = c("30-39", "40-49", "50-59") ) xtab # Compare the proportion of survived between groups prop_trend_test(xtab)
# Proportion of renal stone (calculi) across age #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Data xtab <- as.table(rbind( c(384, 536, 335), c(951, 869, 438) )) dimnames(xtab) <- list( stone = c("yes", "no"), age = c("30-39", "40-49", "50-59") ) xtab # Compare the proportion of survived between groups prop_trend_test(xtab)
Returns the lower or the upper triangular part of a (correlation) matrix.
pull_triangle(x, triangle = c("lower", "upper"), diagonal = FALSE) pull_upper_triangle(x, diagonal = FALSE) pull_lower_triangle(x, diagonal = FALSE)
pull_triangle(x, triangle = c("lower", "upper"), diagonal = FALSE) pull_upper_triangle(x, diagonal = FALSE) pull_lower_triangle(x, diagonal = FALSE)
x |
a (correlation) matrix |
triangle |
the triangle to pull. Allowed values are one of "upper" and "lower". |
diagonal |
logical. Default is FALSE. If TRUE, the matrix diagonal is included. |
an object of class cor_mat_tri
, which is a data frame
pull_triangle()
: returns either the lower or upper triangular part of a matrix.
pull_upper_triangle()
: returns an object of class upper_tri
, which
is a data frame containing the upper triangular part of a matrix.
pull_lower_triangle()
: returns an object of class lower_tri
, which
is a data frame containing the lower triangular part of a matrix.
# Data preparation #:::::::::::::::::::::::::::::::::::::::::: mydata <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) head(mydata, 3) # Compute correlation matrix and pull triangles #:::::::::::::::::::::::::::::::::::::::::: # Correlation matrix cor.mat <- cor_mat(mydata) cor.mat # Pull lower triangular part cor.mat %>% pull_lower_triangle() # Pull upper triangular part cor.mat %>% pull_upper_triangle()
# Data preparation #:::::::::::::::::::::::::::::::::::::::::: mydata <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) head(mydata, 3) # Compute correlation matrix and pull triangles #:::::::::::::::::::::::::::::::::::::::::: # Correlation matrix cor.mat <- cor_mat(mydata) cor.mat # Pull lower triangular part cor.mat %>% pull_lower_triangle() # Pull upper triangular part cor.mat %>% pull_upper_triangle()
Filter out non-significant (NS) p-values from a statistical test. Can detect automatically p-value columns
remove_ns(stat.test, col = NULL, signif.cutoff = 0.05)
remove_ns(stat.test, col = NULL, signif.cutoff = 0.05)
stat.test |
statistical test results returned by |
col |
(optional) character specifying the column containing the p-value
or the significance information, to be used for the filtering step.
Possible values include: |
signif.cutoff |
the significance cutoff; default is 0.05. Significance
is declared at |
a data frame
# Statistical test stat.test <- PlantGrowth %>% wilcox_test(weight ~ group) # Remove ns: automatic detection of p-value columns stat.test %>% remove_ns() # Remove ns by the column p stat.test %>% remove_ns(col ="p")
# Statistical test stat.test <- PlantGrowth %>% wilcox_test(weight ~ group) # Remove ns: automatic detection of p-value columns stat.test %>% remove_ns() # Remove ns by the column p stat.test %>% remove_ns(col ="p")
Replace the lower or the upper triangular part of a (correlation) matrix.
replace_triangle(x, triangle = c("lower", "upper"), by = "", diagonal = FALSE) replace_upper_triangle(x, by = "", diagonal = FALSE) replace_lower_triangle(x, by = "", diagonal = FALSE)
replace_triangle(x, triangle = c("lower", "upper"), by = "", diagonal = FALSE) replace_upper_triangle(x, by = "", diagonal = FALSE) replace_lower_triangle(x, by = "", diagonal = FALSE)
x |
a (correlation) matrix |
triangle |
the triangle to replace. Allowed values are one of "upper" and "lower". |
by |
a replacement argument. Appropriate values are either "" or NA. Used to replace the upper, lower or the diagonal part of the matrix. |
diagonal |
logical. Default is FALSE. If TRUE, the matrix diagonal is included. |
an object of class cor_mat_tri
, which is a data frame
replace_triangle()
: replaces the specified triangle by empty or NA.
replace_upper_triangle()
: replaces the upper triangular part of a matrix.
Returns an object of class lower_tri
.
replace_lower_triangle()
: replaces the lower triangular part of a matrix.
Returns an object of class lower_tri
# Compute correlation matrix and pull triangles #:::::::::::::::::::::::::::::::::::::::::: # Correlation matrix cor.mat <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_mat() cor.mat # Replace upper triangle by NA #:::::::::::::::::::::::::::::::::::::::::: cor.mat %>% replace_upper_triangle(by = NA) # Replace upper triangle by NA and reshape the # correlation matrix to have unique combinations of variables #:::::::::::::::::::::::::::::::::::::::::: cor.mat %>% replace_upper_triangle(by = NA) %>% cor_gather()
# Compute correlation matrix and pull triangles #:::::::::::::::::::::::::::::::::::::::::: # Correlation matrix cor.mat <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec) %>% cor_mat() cor.mat # Replace upper triangle by NA #:::::::::::::::::::::::::::::::::::::::::: cor.mat %>% replace_upper_triangle(by = NA) # Replace upper triangle by NA and reshape the # correlation matrix to have unique combinations of variables #:::::::::::::::::::::::::::::::::::::::::: cor.mat %>% replace_upper_triangle(by = NA) %>% cor_gather()
sample n rows by group from a table using the sample_n()
function.
sample_n_by(data, ..., size = 1, replace = FALSE)
sample_n_by(data, ..., size = 1, replace = FALSE)
data |
a data frame |
... |
Variables to group by |
size |
the number of rows to select |
replace |
with or without replacement? |
ToothGrowth %>% sample_n_by(dose, supp, size = 2)
ToothGrowth %>% sample_n_by(dose, supp, size = 2)
Provides a pipe-friendly framework to performs Shapiro-Wilk test
of normality. Support grouped data and multiple variables for multivariate
normality tests. Wrapper around the R base function
shapiro.test()
. Can handle grouped data. Read more:
Normality
Test in R.
shapiro_test(data, ..., vars = NULL) mshapiro_test(data)
shapiro_test(data, ..., vars = NULL) mshapiro_test(data)
data |
a data frame. Columns are variables. |
... |
One or more unquoted expressions (or variable names) separated by commas. Used to select a variable of interest. |
vars |
optional character vector containing variable names. Ignored when dot vars are specified. |
a data frame containing the value of the Shapiro-Wilk statistic and the corresponding p.value.
shapiro_test()
: univariate Shapiro-Wilk normality test
mshapiro_test()
: multivariate Shapiro-Wilk normality test. This is a
modified copy of the mshapiro.test()
function of the package
mvnormtest, for internal convenience.
# Shapiro Wilk normality test for one variable iris %>% shapiro_test(Sepal.Length) # Shapiro Wilk normality test for two variables iris %>% shapiro_test(Sepal.Length, Petal.Width) # Multivariate normality test mshapiro_test(iris[, 1:3])
# Shapiro Wilk normality test for one variable iris %>% shapiro_test(Sepal.Length) # Shapiro Wilk normality test for two variables iris %>% shapiro_test(Sepal.Length, Petal.Width) # Multivariate normality test mshapiro_test(iris[, 1:3])
Performs one-sample and two-sample sign tests. Read more: Sign Test in R.
sign_test( data, formula, comparisons = NULL, ref.group = NULL, p.adjust.method = "holm", alternative = "two.sided", mu = 0, conf.level = 0.95, detailed = FALSE ) pairwise_sign_test( data, formula, comparisons = NULL, ref.group = NULL, p.adjust.method = "holm", detailed = FALSE, ... )
sign_test( data, formula, comparisons = NULL, ref.group = NULL, p.adjust.method = "holm", alternative = "two.sided", mu = 0, conf.level = 0.95, detailed = FALSE ) pairwise_sign_test( data, formula, comparisons = NULL, ref.group = NULL, p.adjust.method = "holm", detailed = FALSE, ... )
data |
a data.frame containing the variables in the formula. |
formula |
a formula of the form |
comparisons |
A list of length-2 vectors specifying the groups of
interest to be compared. For example to compare groups "A" vs "B" and "B" vs
"C", the argument is as follow: |
ref.group |
a character string specifying the reference group. If specified, for a given grouping variable, each of the group levels will be compared to the reference group (i.e. control group). |
p.adjust.method |
method to adjust p values for multiple comparisons. Used when pairwise comparisons are performed. Allowed values include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". If you don't want to adjust the p value (not recommended), use p.adjust.method = "none". |
alternative |
a character string specifying the alternative
hypothesis, must be one of |
mu |
a single number representing the value of the population median specified by the null hypothesis. |
conf.level |
confidence level of the interval. |
detailed |
logical value. Default is FALSE. If TRUE, a detailed result is shown. |
... |
other arguments passed to the function |
return a data frame with some the following columns:
.y.
: the y variable used in the test.
group1,group2
: the
compared groups in the pairwise tests.
n,n1,n2
: Sample counts.
statistic
: Test statistic used to compute the p-value. That is
the S-statistic (the number of positive differences between the data and the
hypothesized median), with names attribute "S"
.
df,
parameter
: degrees of freedom. Here, the total number of valid differences.
p
: p-value.
method
: the statistical test used to
compare groups.
p.signif, p.adj.signif
: the significance level
of p-values and adjusted p-values, respectively.
estimate
:
estimate of the effect size. It corresponds to the median of the
differences.
alternative
: a character string describing the
alternative hypothesis.
conf.low,conf.high
: Lower and upper
bound on a confidence interval of the estimate.
The returned object has an attribute called args, which is a list holding the test arguments.
sign_test()
: Sign test
pairwise_sign_test()
: performs pairwise two sample Wilcoxon test.
This function is a reimplementation of the function SignTest()
from the DescTools
package.
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # One-sample test #::::::::::::::::::::::::::::::::::::::::: df %>% sign_test(len ~ 1, mu = 0) # Two-samples paired test #::::::::::::::::::::::::::::::::::::::::: df %>% sign_test(len ~ supp) # Compare supp levels after grouping the data by "dose" #:::::::::::::::::::::::::::::::::::::::: df %>% group_by(dose) %>% sign_test(data =., len ~ supp) %>% adjust_pvalue(method = "bonferroni") %>% add_significance("p.adj") # pairwise comparisons #:::::::::::::::::::::::::::::::::::::::: # As dose contains more than two levels ==> # pairwise test is automatically performed. df %>% sign_test(len ~ dose) # Comparison against reference group #:::::::::::::::::::::::::::::::::::::::: # each level is compared to the ref group df %>% sign_test(len ~ dose, ref.group = "0.5")
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # One-sample test #::::::::::::::::::::::::::::::::::::::::: df %>% sign_test(len ~ 1, mu = 0) # Two-samples paired test #::::::::::::::::::::::::::::::::::::::::: df %>% sign_test(len ~ supp) # Compare supp levels after grouping the data by "dose" #:::::::::::::::::::::::::::::::::::::::: df %>% group_by(dose) %>% sign_test(data =., len ~ supp) %>% adjust_pvalue(method = "bonferroni") %>% add_significance("p.adj") # pairwise comparisons #:::::::::::::::::::::::::::::::::::::::: # As dose contains more than two levels ==> # pairwise test is automatically performed. df %>% sign_test(len ~ dose) # Comparison against reference group #:::::::::::::::::::::::::::::::::::::::: # each level is compared to the ref group df %>% sign_test(len ~ dose, ref.group = "0.5")
Provides a pipe-friendly framework to performs one and two sample t-tests. Read more: T-test in R.
t_test( data, formula, comparisons = NULL, ref.group = NULL, p.adjust.method = "holm", paired = FALSE, var.equal = FALSE, alternative = "two.sided", mu = 0, conf.level = 0.95, detailed = FALSE ) pairwise_t_test( data, formula, comparisons = NULL, ref.group = NULL, p.adjust.method = "holm", paired = FALSE, pool.sd = !paired, detailed = FALSE, ... )
t_test( data, formula, comparisons = NULL, ref.group = NULL, p.adjust.method = "holm", paired = FALSE, var.equal = FALSE, alternative = "two.sided", mu = 0, conf.level = 0.95, detailed = FALSE ) pairwise_t_test( data, formula, comparisons = NULL, ref.group = NULL, p.adjust.method = "holm", paired = FALSE, pool.sd = !paired, detailed = FALSE, ... )
data |
a data.frame containing the variables in the formula. |
formula |
a formula of the form |
comparisons |
A list of length-2 vectors specifying the groups of
interest to be compared. For example to compare groups "A" vs "B" and "B" vs
"C", the argument is as follow: |
ref.group |
a character string specifying the reference group. If specified, for a given grouping variable, each of the group levels will be compared to the reference group (i.e. control group). If |
p.adjust.method |
method to adjust p values for multiple comparisons. Used when pairwise comparisons are performed. Allowed values include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". If you don't want to adjust the p value (not recommended), use p.adjust.method = "none". |
paired |
a logical indicating whether you want a paired test. |
var.equal |
a logical variable indicating whether to treat the
two variances as being equal. If |
alternative |
a character string specifying the alternative
hypothesis, must be one of |
mu |
a number specifying an optional parameter used to form the null hypothesis. |
conf.level |
confidence level of the interval. |
detailed |
logical value. Default is FALSE. If TRUE, a detailed result is shown. |
pool.sd |
logical value used in the function The If |
... |
other arguments to be passed to the function
|
- If a list of comparisons is specified, the result of the pairwise tests is filtered to keep only the comparisons of interest. The p-value is adjusted after filtering.
- For a grouped data, if pairwise test is performed, then the p-values are adjusted for each group level independently.
return a data frame with some the following columns:
.y.
: the y variable used in the test.
group1,group2
: the
compared groups in the pairwise tests.
n,n1,n2
: Sample counts.
statistic
: Test statistic used to compute the p-value.
df
: degrees of freedom.
p
: p-value.
p.adj
:
the adjusted p-value.
method
: the statistical test used to
compare groups.
p.signif, p.adj.signif
: the significance level
of p-values and adjusted p-values, respectively.
estimate
:
estimate of the effect size. It corresponds to the estimated mean or
difference in means depending on whether it was a one-sample test or a
two-sample test.
estimate1, estimate2
: show the mean values of
the two groups, respectively, for independent samples t-tests.
alternative
: a character string describing the alternative
hypothesis.
conf.low,conf.high
: Lower and upper bound on a
confidence interval.
The returned object has an attribute called args, which is a list holding the test arguments.
t_test()
: t test
pairwise_t_test()
: performs pairwise two sample t-test. Wrapper around the R
base function pairwise.t.test
.
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # One-sample test #::::::::::::::::::::::::::::::::::::::::: df %>% t_test(len ~ 1, mu = 0) # Two-samples unpaired test #::::::::::::::::::::::::::::::::::::::::: df %>% t_test(len ~ supp) # Two-samples paired test #::::::::::::::::::::::::::::::::::::::::: df %>% t_test (len ~ supp, paired = TRUE) # Compare supp levels after grouping the data by "dose" #:::::::::::::::::::::::::::::::::::::::: df %>% group_by(dose) %>% t_test(data =., len ~ supp) %>% adjust_pvalue(method = "bonferroni") %>% add_significance("p.adj") # pairwise comparisons #:::::::::::::::::::::::::::::::::::::::: # As dose contains more than two levels ==> # pairwise test is automatically performed. df %>% t_test(len ~ dose) # Comparison against reference group #:::::::::::::::::::::::::::::::::::::::: # each level is compared to the ref group df %>% t_test(len ~ dose, ref.group = "0.5") # Comparison against all #:::::::::::::::::::::::::::::::::::::::: df %>% t_test(len ~ dose, ref.group = "all")
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # One-sample test #::::::::::::::::::::::::::::::::::::::::: df %>% t_test(len ~ 1, mu = 0) # Two-samples unpaired test #::::::::::::::::::::::::::::::::::::::::: df %>% t_test(len ~ supp) # Two-samples paired test #::::::::::::::::::::::::::::::::::::::::: df %>% t_test (len ~ supp, paired = TRUE) # Compare supp levels after grouping the data by "dose" #:::::::::::::::::::::::::::::::::::::::: df %>% group_by(dose) %>% t_test(data =., len ~ supp) %>% adjust_pvalue(method = "bonferroni") %>% add_significance("p.adj") # pairwise comparisons #:::::::::::::::::::::::::::::::::::::::: # As dose contains more than two levels ==> # pairwise test is automatically performed. df %>% t_test(len ~ dose) # Comparison against reference group #:::::::::::::::::::::::::::::::::::::::: # each level is compared to the ref group df %>% t_test(len ~ dose, ref.group = "0.5") # Comparison against all #:::::::::::::::::::::::::::::::::::::::: df %>% t_test(len ~ dose, ref.group = "all")
Provides a pipe-friendly framework to performs Tukey post-hoc
tests. Wrapper around the function TukeyHSD()
. It is
essentially a t-test that corrects for multiple testing.
Can handle different inputs formats: aov, lm, formula.
tukey_hsd(x, ...) ## Default S3 method: tukey_hsd(x, ...) ## S3 method for class 'lm' tukey_hsd(x, ...) ## S3 method for class 'data.frame' tukey_hsd(x, formula, ...)
tukey_hsd(x, ...) ## Default S3 method: tukey_hsd(x, ...) ## S3 method for class 'lm' tukey_hsd(x, ...) ## S3 method for class 'data.frame' tukey_hsd(x, formula, ...)
x |
an object of class |
... |
other arguments passed to the function
|
formula |
a formula of the form |
data |
a data.frame containing the variables in the formula. |
a tibble data frame containing the results of the different comparisons.
tukey_hsd(default)
: performs tukey post-hoc test from aov()
results.
tukey_hsd(lm)
: performs tukey post-hoc test from lm()
model.
tukey_hsd(data.frame)
: performs tukey post-hoc tests using data and formula as
inputs. ANOVA will be automatically performed using the function
aov()
# Data preparation df <- ToothGrowth df$dose <- as.factor(df$dose) # Tukey HSD from ANOVA results aov(len ~ dose, data = df) %>% tukey_hsd() # two-way anova with interaction aov(len ~ dose*supp, data = df) %>% tukey_hsd() # Tukey HSD from lm() results lm(len ~ dose, data = df) %>% tukey_hsd() # Tukey HSD from data frame and formula tukey_hsd(df, len ~ dose) # Tukey HSD using grouped data df %>% group_by(supp) %>% tukey_hsd(len ~ dose)
# Data preparation df <- ToothGrowth df$dose <- as.factor(df$dose) # Tukey HSD from ANOVA results aov(len ~ dose, data = df) %>% tukey_hsd() # two-way anova with interaction aov(len ~ dose*supp, data = df) %>% tukey_hsd() # Tukey HSD from lm() results lm(len ~ dose, data = df) %>% tukey_hsd() # Tukey HSD from data frame and formula tukey_hsd(df, len ~ dose) # Tukey HSD using grouped data df %>% group_by(supp) %>% tukey_hsd(len ~ dose)
Tests for equal means in a one-way design (not assuming equal
variance). A wrapper around the base function
oneway.test()
. This is is an alternative to the
standard one-way ANOVA in the situation where the homogeneity of variance
assumption is violated.
welch_anova_test(data, formula)
welch_anova_test(data, formula)
data |
a data frame containing the variables in the formula. |
formula |
a formula specifying the ANOVA model similar to aov. Can be of the form y ~ group where y is a numeric variable giving the data values and group is a factor with one or multiple levels giving the corresponding groups. For example, formula = TP53 ~ cancer_group. |
return a data frame with the following columns:
.y.
: the y variable used in the test.
n
: sample count.
statistic
: the value of the test statistic.
p
:
p-value.
method
: the statistical test used to compare groups.
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth df$dose <- as.factor(df$dose) # Welch one-way ANOVA test (not assuming equal variance) #::::::::::::::::::::::::::::::::::::::::: df %>% welch_anova_test(len ~ dose) # Grouped data #::::::::::::::::::::::::::::::::::::::::: df %>% group_by(supp) %>% welch_anova_test(len ~ dose)
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth df$dose <- as.factor(df$dose) # Welch one-way ANOVA test (not assuming equal variance) #::::::::::::::::::::::::::::::::::::::::: df %>% welch_anova_test(len ~ dose) # Grouped data #::::::::::::::::::::::::::::::::::::::::: df %>% group_by(supp) %>% welch_anova_test(len ~ dose)
Compute Wilcoxon effect size (r
) for:
one-sample test (Wilcoxon one-sample signed-rank test);
paired two-samples test (Wilcoxon two-sample paired signed-rank test) and
independent two-samples test ( Mann-Whitney, two-sample rank-sum test).
It can also returns confidence intervals by bootstap.
The effect size r
is calculated as Z
statistic divided by
square root of the sample size (N) (). The
Z
value is
extracted from either coin::wilcoxsign_test()
(case of one- or
paired-samples test) or coin::wilcox_test()
(case of independent
two-samples test).
Note that N
corresponds to total sample size for independent samples
test and to total number of pairs for paired samples test.
The r
value varies from 0 to close to 1. The interpretation values
for r commonly in published litterature and on the internet are: 0.10
- < 0.3
(small effect), 0.30 - < 0.5
(moderate effect) and >=
0.5
(large effect).
wilcox_effsize( data, formula, comparisons = NULL, ref.group = NULL, paired = FALSE, alternative = "two.sided", mu = 0, ci = FALSE, conf.level = 0.95, ci.type = "perc", nboot = 1000, ... )
wilcox_effsize( data, formula, comparisons = NULL, ref.group = NULL, paired = FALSE, alternative = "two.sided", mu = 0, ci = FALSE, conf.level = 0.95, ci.type = "perc", nboot = 1000, ... )
data |
a data.frame containing the variables in the formula. |
formula |
a formula of the form |
comparisons |
A list of length-2 vectors specifying the groups of
interest to be compared. For example to compare groups "A" vs "B" and "B" vs
"C", the argument is as follow: |
ref.group |
a character string specifying the reference group. If specified, for a given grouping variable, each of the group levels will be compared to the reference group (i.e. control group). If |
paired |
a logical indicating whether you want a paired test. |
alternative |
a character string specifying the alternative
hypothesis, must be one of |
mu |
a number specifying an optional parameter used to form the null hypothesis. |
ci |
If TRUE, returns confidence intervals by bootstrap. May be slow. |
conf.level |
The level for the confidence interval. |
ci.type |
The type of confidence interval to use. Can be any of "norm",
"basic", "perc", or "bca". Passed to |
nboot |
The number of replications to use for bootstrap. |
... |
Additional arguments passed to the functions
|
return a data frame with some of the following columns:
.y.
: the y variable used in the test.
group1,group2
: the compared groups in the pairwise tests.
n,n1,n2
: Sample counts.
effsize
: estimate of the effect
size (r
value).
magnitude
: magnitude of effect size.
conf.low,conf.high
: lower and upper bound of the effect size
confidence interval.
Maciej Tomczak and Ewa Tomczak. The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends in Sport Sciences. 2014; 1(21):19-25.
if(require("coin")){ # One-sample Wilcoxon test effect size ToothGrowth %>% wilcox_effsize(len ~ 1, mu = 0) # Independent two-samples wilcoxon effect size ToothGrowth %>% wilcox_effsize(len ~ supp) # Paired-samples wilcoxon effect size ToothGrowth %>% wilcox_effsize(len ~ supp, paired = TRUE) # Pairwise comparisons ToothGrowth %>% wilcox_effsize(len ~ dose) # Grouped data ToothGrowth %>% group_by(supp) %>% wilcox_effsize(len ~ dose) }
if(require("coin")){ # One-sample Wilcoxon test effect size ToothGrowth %>% wilcox_effsize(len ~ 1, mu = 0) # Independent two-samples wilcoxon effect size ToothGrowth %>% wilcox_effsize(len ~ supp) # Paired-samples wilcoxon effect size ToothGrowth %>% wilcox_effsize(len ~ supp, paired = TRUE) # Pairwise comparisons ToothGrowth %>% wilcox_effsize(len ~ dose) # Grouped data ToothGrowth %>% group_by(supp) %>% wilcox_effsize(len ~ dose) }
Provides a pipe-friendly framework to performs one and two sample Wilcoxon tests. Read more: Wilcoxon in R.
wilcox_test( data, formula, comparisons = NULL, ref.group = NULL, p.adjust.method = "holm", paired = FALSE, exact = NULL, alternative = "two.sided", mu = 0, conf.level = 0.95, detailed = FALSE ) pairwise_wilcox_test( data, formula, comparisons = NULL, ref.group = NULL, p.adjust.method = "holm", detailed = FALSE, ... )
wilcox_test( data, formula, comparisons = NULL, ref.group = NULL, p.adjust.method = "holm", paired = FALSE, exact = NULL, alternative = "two.sided", mu = 0, conf.level = 0.95, detailed = FALSE ) pairwise_wilcox_test( data, formula, comparisons = NULL, ref.group = NULL, p.adjust.method = "holm", detailed = FALSE, ... )
data |
a data.frame containing the variables in the formula. |
formula |
a formula of the form |
comparisons |
A list of length-2 vectors specifying the groups of
interest to be compared. For example to compare groups "A" vs "B" and "B" vs
"C", the argument is as follow: |
ref.group |
a character string specifying the reference group. If specified, for a given grouping variable, each of the group levels will be compared to the reference group (i.e. control group). If |
p.adjust.method |
method to adjust p values for multiple comparisons. Used when pairwise comparisons are performed. Allowed values include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". If you don't want to adjust the p value (not recommended), use p.adjust.method = "none". |
paired |
a logical indicating whether you want a paired test. |
exact |
a logical indicating whether an exact p-value should be computed. |
alternative |
a character string specifying the alternative
hypothesis, must be one of |
mu |
a number specifying an optional parameter used to form the null hypothesis. |
conf.level |
confidence level of the interval. |
detailed |
logical value. Default is FALSE. If TRUE, a detailed result is shown. |
... |
other arguments to be passed to the function
|
- pairwise_wilcox_test()
applies the standard two sample
Wilcoxon test to all possible pairs of groups. This method calls the
wilcox.test()
, so extra arguments are accepted.
- If a list of comparisons is specified, the result of the pairwise tests is filtered to keep only the comparisons of interest.The p-value is adjusted after filtering.
- For a grouped data, if pairwise test is performed, then the p-values are adjusted for each group level independently.
- a nonparametric confidence interval and an estimator for the pseudomedian
(one-sample case) or for the difference of the location parameters
x-y
is computed, where x and y are the compared samples or groups.
The column estimate
and the confidence intervals are displayed in the
test result when the option detailed = TRUE
is specified in the
wilcox_test()
and pairwise_wilcox_test()
functions. Read more
about the calculation of the estimate in the details section of the R base
function wilcox.test()
documentation by typing ?wilcox.test
in
the R console.
return a data frame with some of the following columns:
.y.
: the y variable used in the test.
group1,group2
: the compared groups in the pairwise tests.
n,n1,n2
: Sample counts.
statistic
: Test statistic used
to compute the p-value.
p
: p-value.
p.adj
: the
adjusted p-value.
method
: the statistical test used to compare
groups.
p.signif, p.adj.signif
: the significance level of
p-values and adjusted p-values, respectively.
estimate
: an
estimate of the location parameter (Only present if argument detailed
= TRUE
). This corresponds to the pseudomedian (for one-sample case) or to
the difference of the location parameter (for two-samples case).
The pseudomedian of a distribution F
is the median of the
distribution of (u+v)/2
, where u
and v are independent, each
with distribution F
. If F
is symmetric, then the pseudomedian
and median coincide.
Note that in the two-sample case the estimator for the difference in location parameters does not estimate the difference in medians (a common misconception) but rather the median of the difference between a sample from x and a sample from y.
conf.low,
conf.high
: a confidence interval for the location parameter. (Only present
if argument conf.int = TRUE.)
The returned object has an attribute called args, which is a list holding the test arguments.
wilcox_test()
: Wilcoxon test
pairwise_wilcox_test()
: performs pairwise two sample Wilcoxon test.
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # One-sample test #::::::::::::::::::::::::::::::::::::::::: df %>% wilcox_test(len ~ 1, mu = 0) # Two-samples unpaired test #::::::::::::::::::::::::::::::::::::::::: df %>% wilcox_test(len ~ supp) # Two-samples paired test #::::::::::::::::::::::::::::::::::::::::: df %>% wilcox_test (len ~ supp, paired = TRUE) # Compare supp levels after grouping the data by "dose" #:::::::::::::::::::::::::::::::::::::::: df %>% group_by(dose) %>% wilcox_test(data =., len ~ supp) %>% adjust_pvalue(method = "bonferroni") %>% add_significance("p.adj") # pairwise comparisons #:::::::::::::::::::::::::::::::::::::::: # As dose contains more than two levels ==> # pairwise test is automatically performed. df %>% wilcox_test(len ~ dose) # Comparison against reference group #:::::::::::::::::::::::::::::::::::::::: # each level is compared to the ref group df %>% wilcox_test(len ~ dose, ref.group = "0.5") # Comparison against all #:::::::::::::::::::::::::::::::::::::::: df %>% wilcox_test(len ~ dose, ref.group = "all")
# Load data #::::::::::::::::::::::::::::::::::::::: data("ToothGrowth") df <- ToothGrowth # One-sample test #::::::::::::::::::::::::::::::::::::::::: df %>% wilcox_test(len ~ 1, mu = 0) # Two-samples unpaired test #::::::::::::::::::::::::::::::::::::::::: df %>% wilcox_test(len ~ supp) # Two-samples paired test #::::::::::::::::::::::::::::::::::::::::: df %>% wilcox_test (len ~ supp, paired = TRUE) # Compare supp levels after grouping the data by "dose" #:::::::::::::::::::::::::::::::::::::::: df %>% group_by(dose) %>% wilcox_test(data =., len ~ supp) %>% adjust_pvalue(method = "bonferroni") %>% add_significance("p.adj") # pairwise comparisons #:::::::::::::::::::::::::::::::::::::::: # As dose contains more than two levels ==> # pairwise test is automatically performed. df %>% wilcox_test(len ~ dose) # Comparison against reference group #:::::::::::::::::::::::::::::::::::::::: # each level is compared to the ref group df %>% wilcox_test(len ~ dose, ref.group = "0.5") # Comparison against all #:::::::::::::::::::::::::::::::::::::::: df %>% wilcox_test(len ~ dose, ref.group = "all")