| Title: | Broadly Useful Convenient and Efficient R Functions |
|---|---|
| Description: | Broadly useful convenient and efficient R functions that bring users concise and elegant R data analyses. This package includes easy-to-use functions for (1) basic R programming (e.g., set working directory to the path of currently opened file; import/export data from/to files in any format; print tables to Microsoft Word); (2) multivariate computation (e.g., compute scale sums/means/... with reverse scoring); (3) reliability analyses and factor analyses; (4) descriptive statistics and correlation analyses; (5) t-test, multi-factor analysis of variance (ANOVA), simple-effect analysis, and post-hoc multiple comparison; (6) tidy report of statistical models (to R Console and Microsoft Word); (7) mediation and moderation analyses (PROCESS); and (8) additional toolbox for statistics and graphics. |
| Authors: | Han Wu Shuang Bao [aut, cre] (ORCID: <https://orcid.org/0000-0003-3043-710X>) |
| Maintainer: | Han Wu Shuang Bao <[email protected]> |
| License: | GPL-3 |
| Version: | 2026.1 |
| Built: | 2026-06-02 09:42:41 UTC |
| Source: | https://github.com/psychbruce/brucer |
Broadly useful convenient and efficient R functions that bring users concise and elegant R data analyses. This package includes easy-to-use functions for (1) basic R programming (e.g., set working directory to the path of currently opened file; import/export data from/to files in any format; print tables to Microsoft Word); (2) multivariate computation (e.g., compute scale sums/means/... with reverse scoring); (3) reliability analyses and factor analyses; (4) descriptive statistics and correlation analyses; (5) t-test, multi-factor analysis of variance (ANOVA), simple-effect analysis, and post-hoc multiple comparison; (6) tidy report of statistical models (to R Console and Microsoft Word); (7) mediation and moderation analyses (PROCESS); and (8) additional toolbox for statistics and graphics.
bruceR
Maintainer: Han Wu Shuang Bao [email protected] (ORCID)
Useful links:
Easily compute multivariate sum, mean, and other scores. Reverse scoring can also be easily implemented without saving extra variables.
Alpha() function uses a similar method to deal with reverse scoring.
Three options to specify variables:
var + items: common and unique parts of variable names (suggested).
vars: a character vector of variable names (suggested).
varrange: starting and stopping positions of variables (NOT suggested).
COUNT(data, var = NULL, items = NULL, vars = NULL, varrange = NULL, value = NA) MODE(data, var = NULL, items = NULL, vars = NULL, varrange = NULL) SUM( data, var = NULL, items = NULL, vars = NULL, varrange = NULL, rev = NULL, range = likert, likert = NULL, na.rm = TRUE ) .sum( var = NULL, items = NULL, vars = NULL, varrange = NULL, rev = NULL, range = likert, likert = NULL, na.rm = TRUE ) MEAN( data, var = NULL, items = NULL, vars = NULL, varrange = NULL, rev = NULL, range = likert, likert = NULL, na.rm = TRUE ) .mean( var = NULL, items = NULL, vars = NULL, varrange = NULL, rev = NULL, range = likert, likert = NULL, na.rm = TRUE ) STD( data, var = NULL, items = NULL, vars = NULL, varrange = NULL, rev = NULL, range = likert, likert = NULL, na.rm = TRUE ) CONSEC( data, var = NULL, items = NULL, vars = NULL, varrange = NULL, values = 0:9 )COUNT(data, var = NULL, items = NULL, vars = NULL, varrange = NULL, value = NA) MODE(data, var = NULL, items = NULL, vars = NULL, varrange = NULL) SUM( data, var = NULL, items = NULL, vars = NULL, varrange = NULL, rev = NULL, range = likert, likert = NULL, na.rm = TRUE ) .sum( var = NULL, items = NULL, vars = NULL, varrange = NULL, rev = NULL, range = likert, likert = NULL, na.rm = TRUE ) MEAN( data, var = NULL, items = NULL, vars = NULL, varrange = NULL, rev = NULL, range = likert, likert = NULL, na.rm = TRUE ) .mean( var = NULL, items = NULL, vars = NULL, varrange = NULL, rev = NULL, range = likert, likert = NULL, na.rm = TRUE ) STD( data, var = NULL, items = NULL, vars = NULL, varrange = NULL, rev = NULL, range = likert, likert = NULL, na.rm = TRUE ) CONSEC( data, var = NULL, items = NULL, vars = NULL, varrange = NULL, values = 0:9 )
data |
Data frame. |
var |
[Option 1] Common part across variables: e.g., |
items |
[Option 1] Unique part across variables: e.g., |
vars |
[Option 2] Character vector specifying variables: e.g., |
varrange |
[Option 3] Character string specifying positions ( |
value |
[Only for |
rev |
[Optional] Variables that need to be reversed. It can be (1) a character vector specifying the reverse-scoring variables (recommended), or (2) a numeric vector specifying the item number of reverse-scoring variables (not recommended). |
range, likert
|
[Optional] Range of likert scale: e.g., |
na.rm |
Ignore missing values. Defaults to |
values |
[Only for |
A vector of computed values.
COUNT(): Count a certain value across variables.
MODE(): Compute mode across variables.
SUM(): Compute sum across variables.
.sum(): Tidy version of SUM(), can only be used in add()/added().
MEAN(): Compute mean across variables.
.mean(): Tidy version of MEAN(), can only be used in add()/added().
STD(): Compute standard deviation across variables.
CONSEC(): Compute consecutive identical digits across variables (especially useful in detecting careless responding).
d = data.table( x1 = 1:5, x4 = c(2,2,5,4,5), x3 = c(3,2,NA,NA,5), x2 = c(4,4,NA,2,5), x5 = c(5,4,1,4,5) ) d ## I deliberately set this order to show you ## the difference between "vars" and "varrange". ## ====== Usage 1: data.table `:=` ====== ## d[, `:=`( na = COUNT(d, "x", 1:5, value=NA), n.2 = COUNT(d, "x", 1:5, value=2), sum = SUM(d, "x", 1:5), m1 = MEAN(d, "x", 1:5), m2 = MEAN(d, vars=c("x1", "x4")), m3 = MEAN(d, varrange="x1:x2", rev="x2", range=1:5), cons1 = CONSEC(d, "x", 1:5), cons2 = CONSEC(d, varrange="x1:x5") )] d ## ====== Usage 2: `add()` & `added()` ====== ## data = as.data.table(psych::bfi) added(data, { gender = as.factor(gender) education = as.factor(education) E = .mean("E", 1:5, rev=c(1,2), range=1:6) A = .mean("A", 1:5, rev=1, range=1:6) C = .mean("C", 1:5, rev=c(4,5), range=1:6) N = .mean("N", 1:5, range=1:6) O = .mean("O", 1:5, rev=c(2,5), range=1:6) }, drop=TRUE) data ## ====== New Feature for `var` & `items` ====== ## d = data.table( XX.1.pre = 1:5, XX.2.pre = 6:10, XX.3.pre = 11:15 ) add(d, { XX.mean = .mean("XX.{i}.pre", 1:3) }) add(d, { XX.mean = .mean("XX.{items}.pre", 1:3) }) # the same add(d, { XX.mean = .mean("XX.{#$%^&}.pre", 1:3) }) # the samed = data.table( x1 = 1:5, x4 = c(2,2,5,4,5), x3 = c(3,2,NA,NA,5), x2 = c(4,4,NA,2,5), x5 = c(5,4,1,4,5) ) d ## I deliberately set this order to show you ## the difference between "vars" and "varrange". ## ====== Usage 1: data.table `:=` ====== ## d[, `:=`( na = COUNT(d, "x", 1:5, value=NA), n.2 = COUNT(d, "x", 1:5, value=2), sum = SUM(d, "x", 1:5), m1 = MEAN(d, "x", 1:5), m2 = MEAN(d, vars=c("x1", "x4")), m3 = MEAN(d, varrange="x1:x2", rev="x2", range=1:5), cons1 = CONSEC(d, "x", 1:5), cons2 = CONSEC(d, varrange="x1:x5") )] d ## ====== Usage 2: `add()` & `added()` ====== ## data = as.data.table(psych::bfi) added(data, { gender = as.factor(gender) education = as.factor(education) E = .mean("E", 1:5, rev=c(1,2), range=1:6) A = .mean("A", 1:5, rev=1, range=1:6) C = .mean("C", 1:5, rev=c(4,5), range=1:6) N = .mean("N", 1:5, range=1:6) O = .mean("O", 1:5, rev=c(2,5), range=1:6) }, drop=TRUE) data ## ====== New Feature for `var` & `items` ====== ## d = data.table( XX.1.pre = 1:5, XX.2.pre = 6:10, XX.3.pre = 11:15 ) add(d, { XX.mean = .mean("XX.{i}.pre", 1:3) }) add(d, { XX.mean = .mean("XX.{items}.pre", 1:3) }) # the same add(d, { XX.mean = .mean("XX.{#$%^&}.pre", 1:3) }) # the same
Paste strings together. A wrapper of paste0().
Why %^%? Because typing % and ^ is pretty easy by
pressing Shift + 5 + 6 + 5.
x %^% yx %^% y
x, y
|
Any objects, usually a numeric or character string or vector. |
A character string/vector of the pasted values.
"He" %^% "llo" "X" %^% 1:10 "Q" %^% 1:5 %^% letters[1:5]"He" %^% "llo" "X" %^% 1:10 "Q" %^% 1:5 %^% letters[1:5]
%in%.A simple extension of %in%.
x %allin% vectorx %allin% vector
x |
Numeric or character vector. |
vector |
Numeric or character vector. |
TRUE or FALSE.
1:2 %allin% 1:3 # TRUE 3:4 %allin% 1:3 # FALSE1:2 %allin% 1:3 # TRUE 3:4 %allin% 1:3 # FALSE
%in%.A simple extension of %in%.
x %anyin% vectorx %anyin% vector
x |
Numeric or character vector. |
vector |
Numeric or character vector. |
TRUE or FALSE.
3:4 %anyin% 1:3 # TRUE 4:5 %anyin% 1:3 # FALSE3:4 %anyin% 1:3 # TRUE 4:5 %anyin% 1:3 # FALSE
%in%.A simple extension of %in%.
x %nonein% vectorx %nonein% vector
x |
Numeric or character vector. |
vector |
Numeric or character vector. |
TRUE or FALSE.
3:4 %nonein% 1:3 # FALSE 4:5 %nonein% 1:3 # TRUE3:4 %nonein% 1:3 # FALSE 4:5 %nonein% 1:3 # TRUE
%in%.A simple extension of %in%.
pattern %partin% vectorpattern %partin% vector
pattern |
Character string containing regular expressions to be matched. |
vector |
Character vector. |
TRUE or FALSE.
"Bei" %partin% c("Beijing", "Shanghai") # TRUE "bei" %partin% c("Beijing", "Shanghai") # FALSE "[aeiou]ng" %partin% c("Beijing", "Shanghai") # TRUE"Bei" %partin% c("Beijing", "Shanghai") # TRUE "bei" %partin% c("Beijing", "Shanghai") # FALSE "[aeiou]ng" %partin% c("Beijing", "Shanghai") # TRUE
Enhanced functions to create, modify, and/or delete variables. The functions integrate the advantages of base::within(), dplyr::mutate(), dplyr::transmute(), and data.table::let().
add(data, expr, when, by, drop = FALSE) added(data, expr, when, by, drop = FALSE)add(data, expr, when, by, drop = FALSE) added(data, expr, when, by, drop = FALSE)
data |
A |
expr |
Passing to R expression(s) to compute variables. Execute each line of expression one by one, such that newly created variables are available immediately. This is an advantage of |
when |
[Optional] Passing to Compute for which rows or rows meeting what condition(s)? |
by |
[Optional] Passing to Compute by what group(s)? |
drop |
Drop existing variables and return only new variables? Defaults to |
add() returns a new data.table, with the raw data unchanged.
added() returns nothing and has already changed the raw data.
## ====== Usage 1: add() ====== ## d = as.data.table(within.1) d$XYZ = 1:8 d # add() does not change the raw data: add(d, {B = 1; C = 2}) d # new data should be assigned to an object: d = d %>% add({ ID = str_extract(ID, "\\d") # modify a variable XYZ = NULL # delete a variable A = .mean("A", 1:4) # create a new variable B = A * 4 # new variable is immediately available C = 1 # never need ,/; at the end of any line }) d ## ====== Usage 2: added() ====== ## d = as.data.table(within.1) d$XYZ = 1:8 d # added() has already changed the raw data: added(d, {B = 1; C = 2}) d # raw data has already become the new data: added(d, { ID = str_extract(ID, "\\d") XYZ = NULL A = .mean("A", 1:4) B = A * 4 C = 1 }) d ## ====== Using `when` and `by` ====== ## d = as.data.table(between.2) d added(d, {SCORE2 = SCORE - mean(SCORE)}, A == 1 & B %in% 1:2, # `when`: for what conditions by=B) # `by`: by what groups d na.omit(d) ## ====== Return Only New Variables ====== ## newvars = add(within.1, { ID = str_extract(ID, "\\d") A = .mean("A", 1:4) }, drop=TRUE) newvars ## ====== Better Than `base::within()` ====== ## d = as.data.table(within.1) # wrong order: C B A within(d, { A = 4 B = A + 1 C = 6 }) # correct order: A B C add(d, { A = 4 B = A + 1 C = 6 })## ====== Usage 1: add() ====== ## d = as.data.table(within.1) d$XYZ = 1:8 d # add() does not change the raw data: add(d, {B = 1; C = 2}) d # new data should be assigned to an object: d = d %>% add({ ID = str_extract(ID, "\\d") # modify a variable XYZ = NULL # delete a variable A = .mean("A", 1:4) # create a new variable B = A * 4 # new variable is immediately available C = 1 # never need ,/; at the end of any line }) d ## ====== Usage 2: added() ====== ## d = as.data.table(within.1) d$XYZ = 1:8 d # added() has already changed the raw data: added(d, {B = 1; C = 2}) d # raw data has already become the new data: added(d, { ID = str_extract(ID, "\\d") XYZ = NULL A = .mean("A", 1:4) B = A * 4 C = 1 }) d ## ====== Using `when` and `by` ====== ## d = as.data.table(between.2) d added(d, {SCORE2 = SCORE - mean(SCORE)}, A == 1 & B %in% 1:2, # `when`: for what conditions by=B) # `by`: by what groups d na.omit(d) ## ====== Return Only New Variables ====== ## newvars = add(within.1, { ID = str_extract(ID, "\\d") A = .mean("A", 1:4) }, drop=TRUE) newvars ## ====== Better Than `base::within()` ====== ## d = as.data.table(within.1) # wrong order: C B A within(d, { A = 4 B = A + 1 C = 6 }) # correct order: A B C add(d, { A = 4 B = A + 1 C = 6 })
and McDonald's ).An extension of psych::alpha() and psych::omega(), reporting (1) scale statistics (Cronbach's and McDonald's ) and (2) item statistics (item-rest correlation [i.e., corrected item-total correlation] and Cronbach's if item deleted).
Three options to specify variables:
var + items: common and unique parts of variable names (suggested).
vars: a character vector of variable names (suggested).
varrange: starting and stopping positions of variables (NOT suggested).
Alpha(data, var, items, vars = NULL, varrange = NULL, rev = NULL, digits = 3)Alpha(data, var, items, vars = NULL, varrange = NULL, rev = NULL, digits = 3)
data |
Data frame. |
var |
[Option 1] Common part across variables: e.g., |
items |
[Option 1] Unique part across variables: e.g., |
vars |
[Option 2] Character vector specifying variables: e.g., |
varrange |
[Option 3] Character string specifying positions ( |
rev |
[Optional] Variables that need to be reversed. It can be (1) a character vector specifying the reverse-scoring variables (recommended), or (2) a numeric vector specifying the item number of reverse-scoring variables (not recommended). |
digits |
Number of decimal places of output. Defaults to |
A list of results obtained from psych::alpha() and psych::omega().
# ?psych::bfi data = psych::bfi Alpha(data, "E", 1:5) # "E1" & "E2" should be reversed Alpha(data, "E", 1:5, rev=1:2) # correct Alpha(data, "E", 1:5, rev=cc("E1, E2")) # also correct Alpha(data, vars=cc("E1, E2, E3, E4, E5"), rev=cc("E1, E2")) Alpha(data, varrange="E1:E5", rev=cc("E1, E2")) # using dplyr::select() data %>% select(E1, E2, E3, E4, E5) %>% Alpha(vars=names(.), rev=cc("E1, E2"))# ?psych::bfi data = psych::bfi Alpha(data, "E", 1:5) # "E1" & "E2" should be reversed Alpha(data, "E", 1:5, rev=1:2) # correct Alpha(data, "E", 1:5, rev=cc("E1, E2")) # also correct Alpha(data, vars=cc("E1, E2, E3, E4, E5"), rev=cc("E1, E2")) Alpha(data, varrange="E1:E5", rev=cc("E1, E2")) # using dplyr::select() data %>% select(E1, E2, E3, E4, E5) %>% Alpha(vars=names(.), rev=cc("E1, E2"))
Split up a string (with separators) into a character vector (whitespace around separator is trimmed).
cc(..., sep = "auto", trim = TRUE)cc(..., sep = "auto", trim = TRUE)
... |
Character string(s). |
sep |
Pattern for separation.
Defaults to |
trim |
Remove whitespace from start and end of string(s)?
Defaults to |
Character vector.
cc("a,b,c,d,e") cc(" a , b , c , d , e ") cc(" a , b , c , d , e ", trim=FALSE) cc("1, 2, 3, 4, 5") cc("A 1 , B 2 ; C 3 | D 4 \t E 5") cc("A, B, C", " D | E ", c("F", "G")) cc(" American British Chinese ")cc("a,b,c,d,e") cc(" a , b , c , d , e ") cc(" a , b , c , d , e ", trim=FALSE) cc("1, 2, 3, 4, 5") cc("A 1 , B 2 ; C 3 | D 4 \t E 5") cc("A, B, C", " D | E ", c("F", "G")) cc(" American British Chinese ")
Plot the results of cross-correlation analysis using ggplot2.
ccf_plot( formula, data, lag.max = 30, sig.level = 0.05, xbreaks = seq(-100, 100, 10), ybreaks = seq(-1, 1, 0.2), ylim = NULL, alpha.ns = 1, pos.color = "black", neg.color = "black", ci.color = "blue", title = NULL, subtitle = NULL, xlab = "Lag", ylab = "Cross-Correlation" )ccf_plot( formula, data, lag.max = 30, sig.level = 0.05, xbreaks = seq(-100, 100, 10), ybreaks = seq(-1, 1, 0.2), ylim = NULL, alpha.ns = 1, pos.color = "black", neg.color = "black", ci.color = "blue", title = NULL, subtitle = NULL, xlab = "Lag", ylab = "Cross-Correlation" )
formula |
Model formula like |
data |
Data frame. |
lag.max |
Maximum time lag. Defaults to |
sig.level |
Significance level. Defaults to |
xbreaks |
X-axis breaks. |
ybreaks |
Y-axis breaks. |
ylim |
Y-axis limits. Defaults to |
alpha.ns |
Color transparency (opacity: 0~1) for non-significant values. Defaults to |
pos.color |
Color for positive values. Defaults to |
neg.color |
Color for negative values. Defaults to |
ci.color |
Color for upper and lower bounds of significant values. Defaults to |
title |
Plot title. Defaults to an illustration of the formula. |
subtitle |
Plot subtitle. |
xlab |
X-axis title. Defaults to |
ylab |
Y-axis title. Defaults to |
Significant correlations with negative time lags suggest shifts in a predictor precede shifts in an outcome.
A ggplot object, which can be further modified using ggplot2 syntax and saved using ggsave().
# resemble the default plot output by `ccf()` p1 = ccf_plot(chicken ~ egg, data=lmtest::ChickEgg) p1 # a more colorful plot p2 = ccf_plot(chicken ~ egg, data=lmtest::ChickEgg, alpha.ns=0.3, pos.color="#CD201F", neg.color="#21759B", ci.color="black") p2# resemble the default plot output by `ccf()` p1 = ccf_plot(chicken ~ egg, data=lmtest::ChickEgg) p1 # a more colorful plot p2 = ccf_plot(chicken ~ egg, data=lmtest::ChickEgg, alpha.ns=0.3, pos.color="#CD201F", neg.color="#21759B", ci.color="black") p2
An extension of lavaan::cfa().
CFA( data, model = "A =~ a[1:5]; B =~ b[c(1,3,5)]; C =~ c1 + c2 + c3", estimator = "ML", highorder = "", orthogonal = FALSE, missing = "listwise", digits = 3, file = NULL )CFA( data, model = "A =~ a[1:5]; B =~ b[c(1,3,5)]; C =~ c1 + c2 + c3", estimator = "ML", highorder = "", orthogonal = FALSE, missing = "listwise", digits = 3, file = NULL )
data |
Data frame. |
model |
Model formula. See examples. |
estimator |
The estimator to be used (for details, see lavaan options). Defaults to
|
highorder |
High-order factor. Defaults to |
orthogonal |
Defaults to |
missing |
Defaults to |
digits |
Number of decimal places of output. Defaults to |
file |
File name of MS Word ( |
A list of results returned by lavaan::cfa().
data.cfa=lavaan::HolzingerSwineford1939 CFA(data.cfa, "Visual =~ x[1:3]; Textual =~ x[c(4,5,6)]; Speed =~ x7 + x8 + x9") CFA(data.cfa, model=" Visual =~ x[1:3] Textual =~ x[c(4,5,6)] Speed =~ x7 + x8 + x9 ", highorder="Ability") data.bfi = na.omit(psych::bfi) CFA(data.bfi, "E =~ E[1:5]; A =~ A[1:5]; C =~ C[1:5]; N =~ N[1:5]; O =~ O[1:5]")data.cfa=lavaan::HolzingerSwineford1939 CFA(data.cfa, "Visual =~ x[1:3]; Textual =~ x[c(4,5,6)]; Speed =~ x7 + x8 + x9") CFA(data.cfa, model=" Visual =~ x[1:3] Textual =~ x[c(4,5,6)] Speed =~ x7 + x8 + x9 ", highorder="Ability") data.bfi = na.omit(psych::bfi) CFA(data.bfi, "E =~ E[1:5]; A =~ A[1:5]; C =~ C[1:5]; N =~ N[1:5]; O =~ O[1:5]")
Test the difference between two correlations.
cor_diff(r1, n1, r2, n2, n = NULL, rcov = NULL)cor_diff(r1, n1, r2, n2, n = NULL, rcov = NULL)
r1, r2
|
Correlation coefficients (Pearson's |
n, n1, n2
|
Sample sizes. |
rcov |
[Optional] Only for nonindependent
then, as Y and Z are also correlated, we should also consider |
Invisibly return the p value.
# two independent rs (X~Y vs. Z~W) cor_diff(r1=0.20, n1=100, r2=0.45, n2=100) # two nonindependent rs (X~Y vs. X~Z, with Y and Z also correlated [rcov]) cor_diff(r1=0.20, r2=0.45, n=100, rcov=0.80)# two independent rs (X~Y vs. Z~W) cor_diff(r1=0.20, n1=100, r2=0.45, n2=100) # two nonindependent rs (X~Y vs. X~Z, with Y and Z also correlated [rcov]) cor_diff(r1=0.20, r2=0.45, n=100, rcov=0.80)
Multilevel correlations (within-level and between-level).
For details, see description in HLM_ICC_rWG().
cor_multilevel(data, group, digits = 3)cor_multilevel(data, group, digits = 3)
data |
Data frame. |
group |
Grouping variable. |
digits |
Number of decimal places of output. Defaults to |
Invisibly return a list of results.
# see https://psychbruce.github.io/supp/CEM# see https://psychbruce.github.io/supp/CEM
Correlation analysis.
Corr( data, method = "pearson", p.adjust = "none", all.as.numeric = TRUE, digits = 2, file = NULL, plot = TRUE, plot.r.size = 4, plot.colors = NULL, plot.file = NULL, plot.width = 8, plot.height = 6, plot.dpi = 500 )Corr( data, method = "pearson", p.adjust = "none", all.as.numeric = TRUE, digits = 2, file = NULL, plot = TRUE, plot.r.size = 4, plot.colors = NULL, plot.file = NULL, plot.width = 8, plot.height = 6, plot.dpi = 500 )
data |
Data frame. |
method |
|
p.adjust |
Adjustment of p values for multiple tests:
|
all.as.numeric |
|
digits |
Number of decimal places of output. Defaults to |
file |
File name of MS Word ( |
plot |
|
plot.r.size |
Font size of correlation text label. Defaults to |
plot.colors |
Plot colors (character vector). Defaults to |
plot.file |
|
plot.width |
Width (in "inch") of the saved plot. Defaults to |
plot.height |
Height (in "inch") of the saved plot. Defaults to |
plot.dpi |
DPI (dots per inch) of the saved plot. Defaults to |
Invisibly return a list with
(1) correlation results from psych::corr.test() and
(2) a ggplot object if plot=TRUE.
Corr(airquality) Corr(airquality, p.adjust="bonferroni", plot.colors=c("#b2182b", "white", "#2166ac")) d = as.data.table(psych::bfi) added(d, { gender = as.factor(gender) education = as.factor(education) E = .mean("E", 1:5, rev=c(1,2), range=1:6) A = .mean("A", 1:5, rev=1, range=1:6) C = .mean("C", 1:5, rev=c(4,5), range=1:6) N = .mean("N", 1:5, range=1:6) O = .mean("O", 1:5, rev=c(2,5), range=1:6) }) Corr(d[, .(age, gender, education, E, A, C, N, O)])Corr(airquality) Corr(airquality, p.adjust="bonferroni", plot.colors=c("#b2182b", "white", "#2166ac")) d = as.data.table(psych::bfi) added(d, { gender = as.factor(gender) education = as.factor(education) E = .mean("E", 1:5, rev=c(1,2), range=1:6) A = .mean("A", 1:5, rev=1, range=1:6) C = .mean("C", 1:5, rev=c(4,5), range=1:6) N = .mean("N", 1:5, range=1:6) O = .mean("O", 1:5, rev=c(2,5), range=1:6) }) Corr(d[, .(age, gender, education, E, A, C, N, O)])
Descriptive statistics.
Describe( data, all.as.numeric = TRUE, digits = 2, file = NULL, plot = FALSE, upper.triangle = FALSE, upper.smooth = "none", plot.file = NULL, plot.width = 8, plot.height = 6, plot.dpi = 500 )Describe( data, all.as.numeric = TRUE, digits = 2, file = NULL, plot = FALSE, upper.triangle = FALSE, upper.smooth = "none", plot.file = NULL, plot.width = 8, plot.height = 6, plot.dpi = 500 )
data |
Data frame or numeric vector. |
all.as.numeric |
|
digits |
Number of decimal places of output. Defaults to |
file |
File name of MS Word ( |
plot |
|
upper.triangle |
|
upper.smooth |
|
plot.file |
|
plot.width |
Width (in "inch") of the saved plot. Defaults to |
plot.height |
Height (in "inch") of the saved plot. Defaults to |
plot.dpi |
DPI (dots per inch) of the saved plot. Defaults to |
Invisibly return a list with
(1) a data frame of descriptive statistics and
(2) a ggplot object if plot=TRUE.
set.seed(1) Describe(rnorm(1000000), plot=TRUE) Describe(airquality) Describe(airquality, plot=TRUE, upper.triangle=TRUE, upper.smooth="lm") # ?psych::bfi Describe(psych::bfi[c("age", "gender", "education")]) d = as.data.table(psych::bfi) added(d, { gender = as.factor(gender) education = as.factor(education) E = .mean("E", 1:5, rev=c(1,2), range=1:6) A = .mean("A", 1:5, rev=1, range=1:6) C = .mean("C", 1:5, rev=c(4,5), range=1:6) N = .mean("N", 1:5, range=1:6) O = .mean("O", 1:5, rev=c(2,5), range=1:6) }) Describe(d[, .(age, gender, education)], plot=TRUE, all.as.numeric=FALSE) Describe(d[, .(age, gender, education, E, A, C, N, O)], plot=TRUE)set.seed(1) Describe(rnorm(1000000), plot=TRUE) Describe(airquality) Describe(airquality, plot=TRUE, upper.triangle=TRUE, upper.smooth="lm") # ?psych::bfi Describe(psych::bfi[c("age", "gender", "education")]) d = as.data.table(psych::bfi) added(d, { gender = as.factor(gender) education = as.factor(education) E = .mean("E", 1:5, rev=c(1,2), range=1:6) A = .mean("A", 1:5, rev=1, range=1:6) C = .mean("C", 1:5, rev=c(4,5), range=1:6) N = .mean("N", 1:5, range=1:6) O = .mean("O", 1:5, rev=c(2,5), range=1:6) }) Describe(d[, .(age, gender, education)], plot=TRUE, all.as.numeric=FALSE) Describe(d[, .(age, gender, education, E, A, C, N, O)], plot=TRUE)
Timer (compute time difference).
dtime(t0, unit = "secs", digits = 0)dtime(t0, unit = "secs", digits = 0)
t0 |
Time at the beginning. |
unit |
Options: |
digits |
Number of decimal places of output. Defaults to |
A character string of time difference.
## Not run: t0 = Sys.time() dtime(t0) ## End(Not run)## Not run: t0 = Sys.time() dtime(t0) ## End(Not run)
An extension of psych::principal() and psych::fa(), performing either Principal Component Analysis (PCA) or Exploratory Factor Analysis (EFA).
Three options to specify variables:
var + items: common and unique parts of variable names (suggested).
vars: a character vector of variable names (suggested).
varrange: starting and stopping positions of variables (NOT suggested).
EFA( data, var, items, vars = NULL, varrange = NULL, rev = NULL, method = c("pca", "pa", "ml", "minres", "uls", "ols", "wls", "gls", "alpha"), rotation = c("none", "varimax", "oblimin", "promax", "quartimax", "equamax"), nfactors = c("eigen", "parallel", "(any number >= 1)"), sort.loadings = TRUE, hide.loadings = 0, plot.scree = TRUE, kaiser = TRUE, max.iter = 25, min.eigen = 1, digits = 3, file = NULL ) PCA(..., method = "pca")EFA( data, var, items, vars = NULL, varrange = NULL, rev = NULL, method = c("pca", "pa", "ml", "minres", "uls", "ols", "wls", "gls", "alpha"), rotation = c("none", "varimax", "oblimin", "promax", "quartimax", "equamax"), nfactors = c("eigen", "parallel", "(any number >= 1)"), sort.loadings = TRUE, hide.loadings = 0, plot.scree = TRUE, kaiser = TRUE, max.iter = 25, min.eigen = 1, digits = 3, file = NULL ) PCA(..., method = "pca")
data |
Data frame. |
var |
[Option 1] Common part across variables: e.g., |
items |
[Option 1] Unique part across variables: e.g., |
vars |
[Option 2] Character vector specifying variables: e.g., |
varrange |
[Option 3] Character string specifying positions ( |
rev |
[Optional] Variables that need to be reversed. It can be (1) a character vector specifying the reverse-scoring variables (recommended), or (2) a numeric vector specifying the item number of reverse-scoring variables (not recommended). |
method |
Extraction method.
|
rotation |
Rotation method.
|
nfactors |
How to determine the number of factors/components?
|
sort.loadings |
Sort factor/component loadings by size? Defaults to |
hide.loadings |
A number (0~1) for hiding absolute factor/component loadings below this value. Defaults to |
plot.scree |
Display the scree plot? Defaults to |
kaiser |
Do the Kaiser normalization (as in SPSS)? Defaults to |
max.iter |
Maximum number of iterations for convergence. Defaults to |
min.eigen |
Minimum eigenvalue (used if |
digits |
Number of decimal places of output. Defaults to |
file |
File name of MS Word ( |
... |
A list of results:
resultThe R object returned from psych::principal() or psych::fa()
result.kaiserThe R object returned from psych::kaiser() (if any)
extraction.methodExtraction method
rotation.methodRotation method
eigenvaluesA data.frame of eigenvalues and sum of squared (SS) loadings
loadingsA data.frame of factor/component loadings and communalities
scree.plotA ggplot object of the scree plot
EFA(): Exploratory Factor Analysis
PCA(): Principal Component Analysis - a wrapper of EFA(..., method="pca")
Results based on the varimax rotation method are identical to SPSS. The other rotation methods may produce results slightly different from SPSS.
data = psych::bfi EFA(data, "E", 1:5) # var + items EFA(data, "E", 1:5, nfactors=2) # var + items EFA(data, varrange="A1:O5", nfactors="parallel", hide.loadings=0.45) # the same as above: # using dplyr::select() and dplyr::matches() # to select variables whose names end with numbers # (regexp: \d matches all numbers, $ matches the end of a string) data %>% select(matches("\\d$")) %>% EFA(vars=names(.), # all selected variables method="pca", # default rotation="varimax", # default nfactors="parallel", # parallel analysis hide.loadings=0.45) # hide loadings < 0.45data = psych::bfi EFA(data, "E", 1:5) # var + items EFA(data, "E", 1:5, nfactors=2) # var + items EFA(data, varrange="A1:O5", nfactors="parallel", hide.loadings=0.45) # the same as above: # using dplyr::select() and dplyr::matches() # to select variables whose names end with numbers # (regexp: \d matches all numbers, $ matches the end of a string) data %>% select(matches("\\d$")) %>% EFA(vars=names(.), # all selected variables method="pca", # default rotation="varimax", # default nfactors="parallel", # parallel analysis hide.loadings=0.45) # hide loadings < 0.45
Perform (1) simple-effect (and simple-simple-effect) analyses, including both simple main effects and simple interaction effects, and (2) post-hoc multiple comparisons (e.g., pairwise, sequential, polynomial), with p values adjusted for factors with >= 3 levels. This function is based on and extends emmeans::joint_tests(), emmeans::emmeans(), and emmeans::contrast(). You only need to specify the model object, to-be-tested effect(s), and moderator(s). Almost all results you need will be displayed together, including effect sizes (partial and Cohen's d) and their confidence intervals (CIs). 90% CIs for partial and 95% CIs for Cohen's d are reported.
EMMEANS( model, effect = NULL, by = NULL, contrast = "pairwise", reverse = TRUE, p.adjust = "bonferroni", sd.pooled = NULL, model.type = "multivariate", digits = 3, file = NULL )EMMEANS( model, effect = NULL, by = NULL, contrast = "pairwise", reverse = TRUE, p.adjust = "bonferroni", sd.pooled = NULL, model.type = "multivariate", digits = 3, file = NULL )
model |
The model object returned by |
effect |
Effect(s) you want to test. If set to a character string (e.g., |
by |
Moderator variable(s). Defaults to |
contrast |
Contrast method for multiple comparisons. Defaults to Options: |
reverse |
The order of levels to be contrasted. Defaults to |
p.adjust |
Adjustment method of p values for multiple comparisons. Defaults to Options: |
sd.pooled |
By default, it uses |
model.type |
|
digits |
Number of decimal places of output. Defaults to |
file |
File name of MS Word ( |
The same model object as returned by MANOVA() (for recursive use), with a list of tables: sim (simple effects), emm (estimated marginal means), con (contrasts). Each EMMEANS(...) appends one list to the returned object.
By default, the root mean square error (RMSE) is used to compute the pooled SD for Cohen's d. Specifically, it uses:
the square root of mean square error (MSE) for between-subjects designs;
the square root of mean variance of all paired differences of the residuals of repeated measures for within-subjects and mixed designs.
Disclaimer: There is substantial disagreement on the appropriate pooled SD to use in computing the effect size. For alternative methods, see emmeans::eff_size() and effectsize::t_to_d(). Please do not take the default output as the only right results and users are completely responsible for specifying sd.pooled.
You can save the returned object and use the emmeans::emmip() function to create an interaction plot (based on the fitted model and a formula). See examples below for the usage. emmeans::emmip() returns a ggplot object, which can be modified and saved with ggplot2 syntax.
Some may confuse the statistical terms "simple effects", "post-hoc tests", and "multiple comparisons". Such a confusion is not uncommon. Here I explain what these terms actually refer to.
When we speak of "simple effect", we are referring to ...
simple main effect
simple interaction effect (only for designs with 3 or more factors)
simple simple effect (only for designs with 3 or more factors)
When the interaction effect in ANOVA is significant, we should then perform a "simple-effect analysis". In regression, we call this "simple-slope analysis". They are identical in statistical principles.
In a two-factors design, we only test "simple main effect". That is, at different levels of a factor "B", the main effects of "A" would be different. However, in a three-factors (or more) design, we may also test "simple interaction effect" and "simple simple effect". That is, at different combinations of levels of factors "B" and "C", the main effects of "A" would be different.
To note, simple effects per se never require p-value adjustment, because what we test in simple-effect analyses are still "omnibus F-tests".
The term "post-hoc" means that the tests are performed after ANOVA. Given this, some may (wrongly) regard simple-effect analyses also as a kind of post-hoc tests. However, these two terms should be distinguished. In many situations, "post-hoc tests" only refer to "post-hoc comparisons" using t-tests and some p-value adjustment techniques. We need post-hoc comparisons only when there are factors with 3 or more levels.
Post-hoc tests are totally independent of whether there is a significant interaction effect. It only deals with factors with multiple levels. In most cases, we use pairwise comparisons to do post-hoc tests. See the next part for details.
As mentioned above, multiple comparisons are indeed post-hoc tests but have no relationship with simple-effect analyses. Post-hoc multiple comparisons are independent of interaction effects and simple effects. Furthermore, if a simple main effect contains 3 or more levels, we also need to do multiple comparisons within the simple-effect analysis. In this situation, we also need p-value adjustment with methods such as Bonferroni, Tukey's HSD (honest significant difference), FDR (false discovery rate), and so forth.
Options for multiple comparison:
"pairwise": Pairwise comparisons (defaults to "higher level - lower level")
"seq" or "consec": Consecutive (sequential) comparisons
"poly": Polynomial contrasts (linear, quadratic, cubic, quartic, ...)
"eff": Effect contrasts (vs. the grand mean)
#### Between-Subjects Design #### between.1 MANOVA(between.1, dv="SCORE", between="A") %>% EMMEANS("A") MANOVA(between.1, dv="SCORE", between="A") %>% EMMEANS("A", p.adjust="tukey") MANOVA(between.1, dv="SCORE", between="A") %>% EMMEANS("A", contrast="seq") MANOVA(between.1, dv="SCORE", between="A") %>% EMMEANS("A", contrast="poly") between.2 MANOVA(between.2, dv="SCORE", between=c("A", "B")) %>% EMMEANS("A", by="B") %>% EMMEANS("B", by="A") ## How to create an interaction plot using `emmeans::emmip()`? ## See help page: ?emmeans::emmip() m = MANOVA(between.2, dv="SCORE", between=c("A", "B")) emmip(m, ~ A | B, CIs=TRUE) emmip(m, ~ B | A, CIs=TRUE) emmip(m, B ~ A, CIs=TRUE) emmip(m, A ~ B, CIs=TRUE) between.3 MANOVA(between.3, dv="SCORE", between=c("A", "B", "C")) %>% EMMEANS("A", by="B") %>% EMMEANS(c("A", "B"), by="C") %>% EMMEANS("A", by=c("B", "C")) ## Just to name a few... ## You may test other combinations... #### Within-Subjects Design #### within.1 MANOVA(within.1, dvs="A1:A4", dvs.pattern="A(.)", within="A") %>% EMMEANS("A") within.2 MANOVA(within.2, dvs="A1B1:A2B3", dvs.pattern="A(.)B(.)", within=c("A", "B")) %>% EMMEANS("A", by="B") %>% EMMEANS("B", by="A") # singular error matrix # ::::::::::::::::::::::::::::::::::::::: # This would produce a WARNING because of # the linear dependence of A2B2 and A2B3. # See: Corr(within.2[c("A2B2", "A2B3")]) within.3 MANOVA(within.3, dvs="A1B1C1:A2B2C2", dvs.pattern="A(.)B(.)C(.)", within=c("A", "B", "C")) %>% EMMEANS("A", by="B") %>% EMMEANS(c("A", "B"), by="C") %>% EMMEANS("A", by=c("B", "C")) ## Just to name a few... ## You may test other combinations... #### Mixed Design #### mixed.2_1b1w MANOVA(mixed.2_1b1w, dvs="B1:B3", dvs.pattern="B(.)", between="A", within="B", sph.correction="GG") %>% EMMEANS("A", by="B") %>% EMMEANS("B", by="A") mixed.3_1b2w MANOVA(mixed.3_1b2w, dvs="B1C1:B2C2", dvs.pattern="B(.)C(.)", between="A", within=c("B", "C")) %>% EMMEANS("A", by="B") %>% EMMEANS(c("A", "B"), by="C") %>% EMMEANS("A", by=c("B", "C")) ## Just to name a few... ## You may test other combinations... mixed.3_2b1w MANOVA(mixed.3_2b1w, dvs="B1:B2", dvs.pattern="B(.)", between=c("A", "C"), within="B") %>% EMMEANS("A", by="B") %>% EMMEANS("A", by="C") %>% EMMEANS(c("A", "B"), by="C") %>% EMMEANS("B", by=c("A", "C")) ## Just to name a few... ## You may test other combinations... #### Other Examples #### air = airquality air$Day.1or2 = ifelse(air$Day %% 2 == 1, 1, 2) %>% factor(levels=1:2, labels=c("odd", "even")) MANOVA(air, dv="Temp", between=c("Month", "Day.1or2"), covariate=c("Solar.R", "Wind")) %>% EMMEANS("Month", contrast="seq") %>% EMMEANS("Month", by="Day.1or2", contrast="poly")#### Between-Subjects Design #### between.1 MANOVA(between.1, dv="SCORE", between="A") %>% EMMEANS("A") MANOVA(between.1, dv="SCORE", between="A") %>% EMMEANS("A", p.adjust="tukey") MANOVA(between.1, dv="SCORE", between="A") %>% EMMEANS("A", contrast="seq") MANOVA(between.1, dv="SCORE", between="A") %>% EMMEANS("A", contrast="poly") between.2 MANOVA(between.2, dv="SCORE", between=c("A", "B")) %>% EMMEANS("A", by="B") %>% EMMEANS("B", by="A") ## How to create an interaction plot using `emmeans::emmip()`? ## See help page: ?emmeans::emmip() m = MANOVA(between.2, dv="SCORE", between=c("A", "B")) emmip(m, ~ A | B, CIs=TRUE) emmip(m, ~ B | A, CIs=TRUE) emmip(m, B ~ A, CIs=TRUE) emmip(m, A ~ B, CIs=TRUE) between.3 MANOVA(between.3, dv="SCORE", between=c("A", "B", "C")) %>% EMMEANS("A", by="B") %>% EMMEANS(c("A", "B"), by="C") %>% EMMEANS("A", by=c("B", "C")) ## Just to name a few... ## You may test other combinations... #### Within-Subjects Design #### within.1 MANOVA(within.1, dvs="A1:A4", dvs.pattern="A(.)", within="A") %>% EMMEANS("A") within.2 MANOVA(within.2, dvs="A1B1:A2B3", dvs.pattern="A(.)B(.)", within=c("A", "B")) %>% EMMEANS("A", by="B") %>% EMMEANS("B", by="A") # singular error matrix # ::::::::::::::::::::::::::::::::::::::: # This would produce a WARNING because of # the linear dependence of A2B2 and A2B3. # See: Corr(within.2[c("A2B2", "A2B3")]) within.3 MANOVA(within.3, dvs="A1B1C1:A2B2C2", dvs.pattern="A(.)B(.)C(.)", within=c("A", "B", "C")) %>% EMMEANS("A", by="B") %>% EMMEANS(c("A", "B"), by="C") %>% EMMEANS("A", by=c("B", "C")) ## Just to name a few... ## You may test other combinations... #### Mixed Design #### mixed.2_1b1w MANOVA(mixed.2_1b1w, dvs="B1:B3", dvs.pattern="B(.)", between="A", within="B", sph.correction="GG") %>% EMMEANS("A", by="B") %>% EMMEANS("B", by="A") mixed.3_1b2w MANOVA(mixed.3_1b2w, dvs="B1C1:B2C2", dvs.pattern="B(.)C(.)", between="A", within=c("B", "C")) %>% EMMEANS("A", by="B") %>% EMMEANS(c("A", "B"), by="C") %>% EMMEANS("A", by=c("B", "C")) ## Just to name a few... ## You may test other combinations... mixed.3_2b1w MANOVA(mixed.3_2b1w, dvs="B1:B2", dvs.pattern="B(.)", between=c("A", "C"), within="B") %>% EMMEANS("A", by="B") %>% EMMEANS("A", by="C") %>% EMMEANS(c("A", "B"), by="C") %>% EMMEANS("B", by=c("A", "C")) ## Just to name a few... ## You may test other combinations... #### Other Examples #### air = airquality air$Day.1or2 = ifelse(air$Day %% 2 == 1, 1, 2) %>% factor(levels=1:2, labels=c("odd", "even")) MANOVA(air, dv="Temp", between=c("Month", "Day.1or2"), covariate=c("Solar.R", "Wind")) %>% EMMEANS("Month", contrast="seq") %>% EMMEANS("Month", by="Day.1or2", contrast="poly")
Export data to a file, with format automatically judged from file extension. This function is inspired by rio::export() and has several modifications. Its purpose is to avoid using lots of write_xxx() functions in your code and to provide one tidy function for data export.
It supports many file formats and uses corresponding R functions:
Plain text (.txt, .csv, .csv2, .tsv, .psv), using data.table::fwrite(); if the encoding argument is specified, using utils::write.table() instead
Excel (.xls, .xlsx), using openxlsx::write.xlsx()
SPSS (.sav), using haven::write_sav()
Stata (.dta), using haven::write_dta()
R objects (.rda, .rdata, .RData), using save()
R serialized objects (.rds), using saveRDS()
Clipboard (on Windows and Mac OS), using clipr::write_clip()
Other formats, using rio::export()
export( x, file, encoding = NULL, header = "auto", sheet = NULL, overwrite = TRUE, verbose = FALSE )export( x, file, encoding = NULL, header = "auto", sheet = NULL, overwrite = TRUE, verbose = FALSE )
x |
Any R object, usually a data frame ( |
file |
File name (with extension). If unspecified, data will be exported to clipboard. |
encoding |
File encoding. Defaults to Options: If you find messy code for Chinese text in the exported data (often in CSV when opened with Excel), it is usually useful to set |
header |
Does the first row contain column names ( |
sheet |
[Only for Excel] Excel sheet name(s). Defaults to "Sheet1", "Sheet2", ... You may specify multiple sheet names in a character vector |
overwrite |
Overwrite the existing file (if any)? Defaults to |
verbose |
Print output information? Defaults to |
No return value.
## Not run: export(airquality) # paste to clipboard export(airquality, file="mydata.csv") export(airquality, file="mydata.sav") export(list(airquality, npk), file="mydata.xlsx") # Sheet1, Sheet2 export(list(air=airquality, npk=npk), file="mydata.xlsx") # a named list export(list(airquality, npk), sheet=c("air", "npk"), file="mydata.xlsx") export(list(a=1, b=npk, c="character"), file="abc.Rdata") # .rda, .rdata d = import("abc.Rdata") # load only the first object and rename it to `d` load("abc.Rdata") # load all objects with original names to environment export(lm(yield ~ N*P*K, data=npk), file="lm_npk.Rdata") model = import("lm_npk.Rdata") load("lm_npk.Rdata") # because x is unnamed, the object has a name "List1" export(list(m1=lm(yield ~ N*P*K, data=npk)), file="lm_npk.Rdata") model = import("lm_npk.Rdata") load("lm_npk.Rdata") # because x is named, the object has a name "m1" ## End(Not run)## Not run: export(airquality) # paste to clipboard export(airquality, file="mydata.csv") export(airquality, file="mydata.sav") export(list(airquality, npk), file="mydata.xlsx") # Sheet1, Sheet2 export(list(air=airquality, npk=npk), file="mydata.xlsx") # a named list export(list(airquality, npk), sheet=c("air", "npk"), file="mydata.xlsx") export(list(a=1, b=npk, c="character"), file="abc.Rdata") # .rda, .rdata d = import("abc.Rdata") # load only the first object and rename it to `d` load("abc.Rdata") # load all objects with original names to environment export(lm(yield ~ N*P*K, data=npk), file="lm_npk.Rdata") model = import("lm_npk.Rdata") load("lm_npk.Rdata") # because x is unnamed, the object has a name "List1" export(list(m1=lm(yield ~ N*P*K, data=npk)), file="lm_npk.Rdata") model = import("lm_npk.Rdata") load("lm_npk.Rdata") # because x is named, the object has a name "m1" ## End(Not run)
Format numeric values.
formatF(x, digits = 3)formatF(x, digits = 3)
x |
A number or numeric vector. |
digits |
Number of decimal places of output. Defaults to |
Formatted character string.
formatF(pi, 20)formatF(pi, 20)
Format "1234" to "1,234".
formatN(x, mark = ",")formatN(x, mark = ",")
x |
A number or numeric vector. |
mark |
Usually |
Formatted character string.
formatN(1234)formatN(1234)
Expand all interaction terms in a formula.
formula_expand(formula, as.char = FALSE)formula_expand(formula, as.char = FALSE)
formula |
R formula or a character string indicating the formula. |
as.char |
Return character? Defaults to |
A formula/character object including all expanded terms.
formula_expand(y ~ a*b*c) formula_expand("y ~ a*b*c")formula_expand(y ~ a*b*c) formula_expand("y ~ a*b*c")
Paste a formula into a string.
formula_paste(formula)formula_paste(formula)
formula |
R formula. |
A character string indicating the formula.
formula_paste(y ~ x) formula_paste(y ~ x + (1 | g))formula_paste(y ~ x) formula_paste(y ~ x + (1 | g))
Frequency statistics.
Freq(x, varname, labels, sort = "", digits = 1, file = NULL)Freq(x, varname, labels, sort = "", digits = 1, file = NULL)
x |
A vector of values (or a data frame). |
varname |
[Optional] Variable name, if |
labels |
[Optional] A vector re-defining the labels of values. |
sort |
|
digits |
Number of decimal places of output. Defaults to |
file |
File name of MS Word ( |
A data frame of frequency statistics.
data = psych::bfi ## Input `data$variable` Freq(data$education) Freq(data$gender, labels=c("Male", "Female")) Freq(data$age) ## Input one data frame and one variable name Freq(data, "education") Freq(data, "gender", labels=c("Male", "Female")) Freq(data, "age")data = psych::bfi ## Input `data$variable` Freq(data$education) Freq(data$gender, labels=c("Male", "Female")) Freq(data$age) ## Input one data frame and one variable name Freq(data, "education") Freq(data, "gender", labels=c("Male", "Female")) Freq(data, "age")
lm and glm models).NOTE: model_summary() is preferred.
GLM_summary(model, robust = FALSE, cluster = NULL, digits = 3, ...)GLM_summary(model, robust = FALSE, cluster = NULL, digits = 3, ...)
model |
A model fitted with |
robust |
[Only for Options: Note: |
cluster |
[Only for |
digits |
Number of decimal places of output. Defaults to |
... |
Other arguments. You may re-define |
No return value.
print_table() (print simple table)
model_summary() (strongly suggested)
## Example 1: OLS regression lm = lm(Temp ~ Month + Day + Wind + Solar.R, data=airquality) GLM_summary(lm) GLM_summary(lm, robust="HC1") # Stata's default is "HC1" # R package <sandwich>'s default is "HC3" ## Example 2: Logistic regression glm = glm(case ~ age + parity + education + spontaneous + induced, data=infert, family=binomial) GLM_summary(glm) GLM_summary(glm, robust="HC1", cluster="stratum")## Example 1: OLS regression lm = lm(Temp ~ Month + Day + Wind + Solar.R, data=airquality) GLM_summary(lm) GLM_summary(lm, robust="HC1") # Stata's default is "HC1" # R package <sandwich>'s default is "HC3" ## Example 2: Logistic regression glm = glm(case ~ age + parity + education + spontaneous + induced, data=infert, family=binomial) GLM_summary(glm) GLM_summary(glm, robust="HC1", cluster="stratum")
Compute grand-mean centered variables. Usually used for GLM interaction-term predictors and HLM level-2 predictors.
grand_mean_center(data, vars = names(data), std = FALSE, add.suffix = "")grand_mean_center(data, vars = names(data), std = FALSE, add.suffix = "")
data |
Data object. |
vars |
Variable(s) to be centered. |
std |
Standardized or not. Defaults to |
add.suffix |
The suffix of the centered variable(s). Defaults to |
A new data object containing the centered variable(s).
d = data.table(a=1:5, b=6:10) d.c = grand_mean_center(d, "a") d.c d.c = grand_mean_center(d, c("a", "b"), add.suffix="_center") d.cd = data.table(a=1:5, b=6:10) d.c = grand_mean_center(d, "a") d.c d.c = grand_mean_center(d, c("a", "b"), add.suffix="_center") d.c
Granger test of predictive causality (between multivariate time series) based on vector autoregression model using vars::VAR(). Its output resembles the output of the vargranger command in Stata (but here using an test).
granger_causality( varmodel, var.y = NULL, var.x = NULL, test = c("F", "Chisq"), file = NULL, check.dropped = FALSE )granger_causality( varmodel, var.y = NULL, var.x = NULL, test = c("F", "Chisq"), file = NULL, check.dropped = FALSE )
varmodel |
VAR model fitted using |
var.y, var.x
|
[Optional] Defaults to |
test |
|
file |
File name of MS Word ( |
check.dropped |
Check dropped variables. Defaults to |
Granger causality test (based on VAR model) examines whether the lagged values of a predictor (or predictors) help to predict an outcome when controlling for the lagged values of the outcome itself. Granger causality does not represent a true causal effect.
A data frame of results.
# R package "vars" should be installed library(vars) data(Canada) VARselect(Canada) vm = VAR(Canada, p=3) model_summary(vm) granger_causality(vm)# R package "vars" should be installed library(vars) data(Canada) VARselect(Canada) vm = VAR(Canada, p=3) model_summary(vm) granger_causality(vm)
Granger test of predictive causality (between two time series) using lmtest::grangertest().
granger_test(formula, data, lags = 1:5, test.reverse = TRUE, file = NULL, ...)granger_test(formula, data, lags = 1:5, test.reverse = TRUE, file = NULL, ...)
formula |
Model formula like |
data |
Data frame. |
lags |
Time lags. Defaults to |
test.reverse |
Whether to test reverse causality. Defaults to |
file |
File name of MS Word ( |
... |
Arguments passed on to |
Granger causality test examines whether the lagged values of a predictor have an incremental role in predicting (i.e., help to predict) an outcome when controlling for the lagged values of the outcome. Granger causality does not represent a true causal effect.
A data frame of results.
granger_test(chicken ~ egg, data=lmtest::ChickEgg) granger_test(chicken ~ egg, data=lmtest::ChickEgg, lags=1:10, file="Granger.doc") unlink("Granger.doc") # delete file for code checkgranger_test(chicken ~ egg, data=lmtest::ChickEgg) granger_test(chicken ~ egg, data=lmtest::ChickEgg, lags=1:10, file="Granger.doc") unlink("Granger.doc") # delete file for code check
Compute group-mean centered variables. Usually used for HLM level-1 predictors.
group_mean_center( data, vars = setdiff(names(data), by), by, std = FALSE, add.suffix = "", add.group.mean = "_mean" )group_mean_center( data, vars = setdiff(names(data), by), by, std = FALSE, add.suffix = "", add.group.mean = "_mean" )
data |
Data object. |
vars |
Variable(s) to be centered. |
by |
Grouping variable. |
std |
Standardized or not. Defaults to |
add.suffix |
The suffix of the centered variable(s). Defaults to |
add.group.mean |
The suffix of the variable name(s) of group means. Defaults to |
A new data object containing the centered variable(s).
d = data.table(x=1:9, g=rep(1:3, each=3)) d.c = group_mean_center(d, "x", by="g") d.c d.c = group_mean_center(d, "x", by="g", add.suffix="_c") d.cd = data.table(x=1:9, g=rep(1:3, each=3)) d.c = group_mean_center(d, "x", by="g") d.c d.c = group_mean_center(d, "x", by="g", add.suffix="_c") d.c
Compute ICC(1) (non-independence of data), ICC(2) (reliability of group means), and (within-group agreement for single-item/multi-item measures) in multilevel analysis (HLM).
HLM_ICC_rWG( data, group, icc.var, rwg.vars = icc.var, rwg.levels = 0, digits = 3 )HLM_ICC_rWG( data, group, icc.var, rwg.vars = icc.var, rwg.levels = 0, digits = 3 )
data |
Data frame. |
group |
Grouping variable. |
icc.var |
Key variable for analysis (usually the dependent variable). |
rwg.vars |
Defaults to
|
rwg.levels |
As
|
digits |
Number of decimal places of output. Defaults to |
Invisibly return a list of results.
ICC(1) = var.u0 / (var.u0 + var.e) =
ICC(1) is the ICC we often compute and report in multilevel analysis (usually in the Null Model, where only the random intercept of group is included). It can be interpreted as either "the proportion of variance explained by groups" (i.e., heterogeneity between groups) or "the expectation of correlation coefficient between any two observations within any group" (i.e., homogeneity within groups).
ICC(2) = mean(var.u0 / (var.u0 + var.e / n.k)) =
ICC(2) is a measure of "the representativeness of group-level aggregated means for within-group individual values" or "the degree to which an individual score can be considered a reliable assessment of a group-level construct".
/ (within-group agreement for single-item/multi-item measures)
is a measure of within-group agreement or consensus. Each group has an .
: between-group variance (i.e., tau00)
: within-group variance (i.e., residual variance)
: group size of the k-th group
: number of groups
: actual group variance of the k-th group
: mean value of actual group variance of the k-th group across all J items
: expected random variance (i.e., the variance of uniform distribution)
: number of items
Bliese, P. D. (2000). Within-group agreement, non-independence, and reliability: Implications for data aggregation and Analysis. In K. J. Klein & S. W. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 349–381). San Francisco, CA: Jossey-Bass, Inc.
James, L.R., Demaree, R.G., & Wolf, G. (1984). Estimating within-group interrater reliability with and without response bias. Journal of Applied Psychology, 69, 85–98.
data = lme4::sleepstudy # continuous variable HLM_ICC_rWG(data, group="Subject", icc.var="Reaction") data = lmerTest::carrots # 7-point scale HLM_ICC_rWG(data, group="Consumer", icc.var="Preference", rwg.vars="Preference", rwg.levels=7) HLM_ICC_rWG(data, group="Consumer", icc.var="Preference", rwg.vars=c("Sweetness", "Bitter", "Crisp"), rwg.levels=7)data = lme4::sleepstudy # continuous variable HLM_ICC_rWG(data, group="Subject", icc.var="Reaction") data = lmerTest::carrots # 7-point scale HLM_ICC_rWG(data, group="Consumer", icc.var="Preference", rwg.vars="Preference", rwg.levels=7) HLM_ICC_rWG(data, group="Consumer", icc.var="Preference", rwg.vars=c("Sweetness", "Bitter", "Crisp"), rwg.levels=7)
lmer and glmer models).NOTE: model_summary() is preferred.
HLM_summary(model = NULL, test.rand = FALSE, digits = 3, ...)HLM_summary(model = NULL, test.rand = FALSE, digits = 3, ...)
model |
A model fitted with |
test.rand |
[Only for |
digits |
Number of decimal places of output. Defaults to |
... |
Other arguments. You may re-define |
No return value.
Hox, J. J. (2010). Multilevel analysis: Techniques and applications (2nd ed.). New York, NY: Routledge.
Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R^2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4, 133–142.
Xu, R. (2003). Measuring explained variation in linear mixed effects models. Statistics in Medicine, 22, 3527–3541.
print_table() (print simple table)
model_summary() (strongly suggested)
library(lmerTest) ## Example 1: data from lme4::sleepstudy # (1) 'Subject' is a grouping/clustering variable # (2) 'Days' is a level-1 predictor nested within 'Subject' # (3) No level-2 predictors m1 = lmer(Reaction ~ (1 | Subject), data=sleepstudy) m2 = lmer(Reaction ~ Days + (1 | Subject), data=sleepstudy) m3 = lmer(Reaction ~ Days + (Days | Subject), data=sleepstudy) HLM_summary(m1) HLM_summary(m2) HLM_summary(m3) ## Example 2: data from lmerTest::carrots # (1) 'Consumer' is a grouping/clustering variable # (2) 'Sweetness' is a level-1 predictor # (3) 'Age' and 'Frequency' are level-2 predictors hlm.1 = lmer(Preference ~ Sweetness + Age + Frequency + (1 | Consumer), data=carrots) hlm.2 = lmer(Preference ~ Sweetness + Age + Frequency + (Sweetness | Consumer) + (1 | Product), data=carrots) HLM_summary(hlm.1) HLM_summary(hlm.2)library(lmerTest) ## Example 1: data from lme4::sleepstudy # (1) 'Subject' is a grouping/clustering variable # (2) 'Days' is a level-1 predictor nested within 'Subject' # (3) No level-2 predictors m1 = lmer(Reaction ~ (1 | Subject), data=sleepstudy) m2 = lmer(Reaction ~ Days + (1 | Subject), data=sleepstudy) m3 = lmer(Reaction ~ Days + (Days | Subject), data=sleepstudy) HLM_summary(m1) HLM_summary(m2) HLM_summary(m3) ## Example 2: data from lmerTest::carrots # (1) 'Consumer' is a grouping/clustering variable # (2) 'Sweetness' is a level-1 predictor # (3) 'Age' and 'Frequency' are level-2 predictors hlm.1 = lmer(Preference ~ Sweetness + Age + Frequency + (1 | Consumer), data=carrots) hlm.2 = lmer(Preference ~ Sweetness + Age + Frequency + (Sweetness | Consumer) + (1 | Product), data=carrots) HLM_summary(hlm.1) HLM_summary(hlm.2)
Import data from a file, with format automatically judged from file extension. This function is inspired by rio::import() and has several modifications. Its purpose is to avoid using lots of read_xxx() functions in your code and to provide one tidy function for data import. It supports many file formats (local or URL) and uses the corresponding R functions:
Plain text (.txt, .csv, .csv2, .tsv, .psv), using data.table::fread()
Excel (.xls, .xlsx), using readxl::read_excel()
SPSS (.sav), using haven::read_sav() or foreign::read.spss()
Stata (.dta), using haven::read_dta() or foreign::read.dta()
R objects (.rda, .rdata, .RData), using load()
R serialized objects (.rds), using readRDS()
Clipboard (on Windows and Mac OS), using clipr::read_clip_tbl()
Other formats, using rio::import()
import( file, encoding = NULL, header = "auto", sheet = NULL, range = NULL, pkg = c("haven", "foreign"), value.labels = FALSE, as = "data.frame", verbose = FALSE )import( file, encoding = NULL, header = "auto", sheet = NULL, range = NULL, pkg = c("haven", "foreign"), value.labels = FALSE, as = "data.frame", verbose = FALSE )
file |
File name (with extension). If unspecified, then data will be imported from clipboard. |
encoding |
File encoding. Defaults to Options: If you find messy code for Chinese text in the imported data, it is usually effective to set |
header |
Does the first row contain column names ( |
sheet |
[Only for Excel] Excel sheet name (or sheet number). Defaults to the first sheet. Ignored if the sheet is specified via |
range |
[Only for Excel] Excel cell range. Defaults to all cells in a sheet. You may specify it as |
pkg |
[Only for SPSS & Stata] Use which R package to read
SPSS (.sav) or Stata (.dta) data file? Defaults to Notably, |
value.labels |
[Only for SPSS & Stata] Convert variables with value labels into R factors with those levels? Defaults to |
as |
Class of the imported data. Defaults to Options:
|
verbose |
Print data information? Defaults to |
A data object (default class is data.frame).
## Not run: # Import data from system clipboard data = import() # read from clipboard (on Windows and Mac OS) # If you have an Excel file named "mydata.xlsx" export(airquality, file="mydata.xlsx") # Import data from a file data = import("mydata.xlsx") # default: data.frame data = import("mydata.xlsx", as="data.table") ## End(Not run)## Not run: # Import data from system clipboard data = import() # read from clipboard (on Windows and Mac OS) # If you have an Excel file named "mydata.xlsx" export(airquality, file="mydata.xlsx") # Import data from a file data = import("mydata.xlsx") # default: data.frame data = import("mydata.xlsx", as="data.table") ## End(Not run)
Tidy report of lavaan model.
lavaan_summary( lavaan, ci = c("raw", "boot", "bc.boot", "bca.boot"), nsim = 100, seed = NULL, digits = 3, print = TRUE, covariance = FALSE, file = NULL )lavaan_summary( lavaan, ci = c("raw", "boot", "bc.boot", "bca.boot"), nsim = 100, seed = NULL, digits = 3, print = TRUE, covariance = FALSE, file = NULL )
lavaan |
Model object fitted by |
ci |
Method for estimating standard error (SE) and 95% confidence interval (CI). Defaults to
|
nsim |
Number of simulation samples (bootstrap resampling)
for estimating SE and 95% CI.
In formal analyses, |
seed |
Random seed for obtaining reproducible results. Defaults to |
digits |
Number of decimal places of output. Defaults to |
print |
Print results. Defaults to |
covariance |
Print (co)variances. Defaults to |
file |
File name of MS Word ( |
Invisibly return a list of results:
fitModel fit indices.
measureLatent variable measures.
regressionRegression paths.
covarianceVariances and/or covariances.
effectDefined effect estimates.
## Simple Mediation: ## Solar.R (X) => Ozone (M) => Temp (Y) # PROCESS(airquality, y="Temp", x="Solar.R", # meds="Ozone", ci="boot", nsim=1000, seed=1) model = " Ozone ~ a*Solar.R Temp ~ c.*Solar.R + b*Ozone Indirect := a*b Direct := c. Total := c. + a*b " lv = lavaan::sem(model=model, data=airquality) lavaan::summary(lv, fit.measure=TRUE, ci=TRUE, nd=3) # raw output lavaan_summary(lv) # lavaan_summary(lv, ci="boot", nsim=1000, seed=1) ## Serial Multiple Mediation: ## Solar.R (X) => Ozone (M1) => Wind(M2) => Temp (Y) # PROCESS(airquality, y="Temp", x="Solar.R", # meds=c("Ozone", "Wind"), # med.type="serial", ci="boot", nsim=1000, seed=1) model0 = " Ozone ~ a1*Solar.R Wind ~ a2*Solar.R + d12*Ozone Temp ~ c.*Solar.R + b1*Ozone + b2*Wind Indirect_All := a1*b1 + a2*b2 + a1*d12*b2 Ind_X_M1_Y := a1*b1 Ind_X_M2_Y := a2*b2 Ind_X_M1_M2_Y := a1*d12*b2 Direct := c. Total := c. + a1*b1 + a2*b2 + a1*d12*b2 " lv0 = lavaan::sem(model=model0, data=airquality) lavaan::summary(lv0, fit.measure=TRUE, ci=TRUE, nd=3) # raw output lavaan_summary(lv0) # lavaan_summary(lv0, ci="boot", nsim=1000, seed=1) model1 = " Ozone ~ a1*Solar.R Wind ~ d12*Ozone Temp ~ c.*Solar.R + b1*Ozone + b2*Wind Indirect_All := a1*b1 + a1*d12*b2 Ind_X_M1_Y := a1*b1 Ind_X_M1_M2_Y := a1*d12*b2 Direct := c. Total := c. + a1*b1 + a1*d12*b2 " lv1 = lavaan::sem(model=model1, data=airquality) lavaan::summary(lv1, fit.measure=TRUE, ci=TRUE, nd=3) # raw output lavaan_summary(lv1) # lavaan_summary(lv1, ci="boot", nsim=1000, seed=1)## Simple Mediation: ## Solar.R (X) => Ozone (M) => Temp (Y) # PROCESS(airquality, y="Temp", x="Solar.R", # meds="Ozone", ci="boot", nsim=1000, seed=1) model = " Ozone ~ a*Solar.R Temp ~ c.*Solar.R + b*Ozone Indirect := a*b Direct := c. Total := c. + a*b " lv = lavaan::sem(model=model, data=airquality) lavaan::summary(lv, fit.measure=TRUE, ci=TRUE, nd=3) # raw output lavaan_summary(lv) # lavaan_summary(lv, ci="boot", nsim=1000, seed=1) ## Serial Multiple Mediation: ## Solar.R (X) => Ozone (M1) => Wind(M2) => Temp (Y) # PROCESS(airquality, y="Temp", x="Solar.R", # meds=c("Ozone", "Wind"), # med.type="serial", ci="boot", nsim=1000, seed=1) model0 = " Ozone ~ a1*Solar.R Wind ~ a2*Solar.R + d12*Ozone Temp ~ c.*Solar.R + b1*Ozone + b2*Wind Indirect_All := a1*b1 + a2*b2 + a1*d12*b2 Ind_X_M1_Y := a1*b1 Ind_X_M2_Y := a2*b2 Ind_X_M1_M2_Y := a1*d12*b2 Direct := c. Total := c. + a1*b1 + a2*b2 + a1*d12*b2 " lv0 = lavaan::sem(model=model0, data=airquality) lavaan::summary(lv0, fit.measure=TRUE, ci=TRUE, nd=3) # raw output lavaan_summary(lv0) # lavaan_summary(lv0, ci="boot", nsim=1000, seed=1) model1 = " Ozone ~ a1*Solar.R Wind ~ d12*Ozone Temp ~ c.*Solar.R + b1*Ozone + b2*Wind Indirect_All := a1*b1 + a1*d12*b2 Ind_X_M1_Y := a1*b1 Ind_X_M1_M2_Y := a1*d12*b2 Direct := c. Total := c. + a1*b1 + a1*d12*b2 " lv1 = lavaan::sem(model=model1, data=airquality) lavaan::summary(lv1, fit.measure=TRUE, ci=TRUE, nd=3) # raw output lavaan_summary(lv1) # lavaan_summary(lv1, ci="boot", nsim=1000, seed=1)
INDEX + MATCH).In Excel, we can use VLOOKUP, HLOOKUP, XLOOKUP (a new function released in 2019), or the combination of INDEX and MATCH to search, match, and look up values. Here I provide a similar function. If multiple values were simultaneously matched, a warning message would be printed.
LOOKUP( data, vars, data.ref, vars.ref, vars.lookup, return = c("new.data", "new.var", "new.value") )LOOKUP( data, vars, data.ref, vars.ref, vars.lookup, return = c("new.data", "new.var", "new.value") )
data |
Main data. |
vars |
Character (vector), specifying the variable(s) to be searched in |
data.ref |
Reference data containing both the reference variable(s) and the lookup variable(s). |
vars.ref |
Character (vector), with the same length and order as |
vars.lookup |
Character (vector), specifying the variable(s) to be looked up and returned from |
return |
What to return. Default ( |
New data object, new variable, or new value (see the argument return).
ref = data.table(City=rep(c("A", "B", "C"), each=5), Year=rep(2013:2017, times=3), GDP=sample(1000:2000, 15), PM2.5=sample(10:300, 15)) ref data = data.table(sub=1:5, city=c("A", "A", "B", "C", "C"), year=c(2013, 2014, 2015, 2016, 2017)) data LOOKUP(data, c("city", "year"), ref, c("City", "Year"), "GDP") LOOKUP(data, c("city", "year"), ref, c("City", "Year"), c("GDP", "PM2.5"))ref = data.table(City=rep(c("A", "B", "C"), each=5), Year=rep(2013:2017, times=3), GDP=sample(1000:2000, 15), PM2.5=sample(10:300, 15)) ref data = data.table(sub=1:5, city=c("A", "A", "B", "C", "C"), year=c(2013, 2014, 2015, 2016, 2017)) data LOOKUP(data, c("city", "year"), ref, c("City", "Year"), "GDP") LOOKUP(data, c("city", "year"), ref, c("City", "Year"), c("GDP", "PM2.5"))
Multi-factor ANOVA (between-subjects, within-subjects, and mixed designs), with and without covariates (ANCOVA). This function is based on and extends afex::aov_ez(). You only need to specify the data, dependent variable(s), and factors (between-subjects and/or within-subjects). Almost all results you need will be displayed together, including effect sizes (partial ) and their confidence intervals (CIs). 90% CIs for partial (two-sided) are reported, following Steiger (2004). In addition, it reports generalized , following Olejnik & Algina (2003).
MANOVA( data, subID = NULL, dv = NULL, dvs = NULL, dvs.pattern = NULL, between = NULL, within = NULL, covariate = NULL, ss.type = "III", sph.correction = "none", aov.include = FALSE, digits = 3, file = NULL )MANOVA( data, subID = NULL, dv = NULL, dvs = NULL, dvs.pattern = NULL, between = NULL, within = NULL, covariate = NULL, ss.type = "III", sph.correction = "none", aov.include = FALSE, digits = 3, file = NULL )
data |
Data frame. Both wide-format and long-format are supported. |
subID |
Subject ID (the column name). Only necessary for long-format data. |
dv |
Dependent variable.
|
dvs |
Repeated measures. Only for wide-format data (within-subjects or mixed designs). Can be:
|
dvs.pattern |
If you use Examples:
Tips on regular expression:
|
between |
Between-subjects factor(s). Multiple variables should be included in a character vector |
within |
Within-subjects factor(s). Multiple variables should be included in a character vector |
covariate |
Covariates. Multiple variables should be included in a character vector |
ss.type |
Type of sums of squares (SS) for ANOVA. Defaults to Options: |
sph.correction |
[Only for repeated measures with >= 3 levels] Sphericity correction method for adjusting the degrees of freedom (df) when the sphericity assumption is violated. Defaults to |
aov.include |
Include the |
digits |
Number of decimal places of output. Defaults to |
file |
File name of MS Word ( |
A result object (list) returned by afex::aov_ez() with several other elements: between, within, data.wide, data.long.
How to prepare your data and specify the arguments of MANOVA()?
Wide-format data (one person in one row, and repeated measures in multiple columns):
MANOVA(data=, dv=, between=, ...)
MANOVA(data=, dvs=, dvs.pattern=, within=, ...)
MANOVA(data=, dvs=, dvs.pattern=, between=, within=, ...)
Long-format data (one person in multiple rows, and repeated measures in one column):
(not applicable)
MANOVA(data=, subID=, dv=, within=, ...)
MANOVA(data=, subID=, dv=, between=, within=, ...)
If observations are not uniquely identified in user-defined long-format data, the function takes averages across those multiple observations for each case. In technical details, it specifies fun_aggregate=mean in afex::aov_ez() and values_fn=mean in tidyr::pivot_wider().
You can save the returned object and use emmeans::emmip() to create an interaction plot (based on the fitted model and a formula specification). It returns a ggplot object, which can be easily modified and saved using ggplot2 syntax.
Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: Measures of effect size for some common research designs. Psychological Methods, 8(4), 434–447.
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9(2), 164–182.
#### Between-Subjects Design #### between.1 MANOVA(between.1, dv="SCORE", between="A") between.2 MANOVA(between.2, dv="SCORE", between=c("A", "B")) between.3 MANOVA(between.3, dv="SCORE", between=c("A", "B", "C")) ## How to create an interaction plot using `emmeans::emmip()`? ## See help page for its usage: ?emmeans::emmip() m = MANOVA(between.2, dv="SCORE", between=c("A", "B")) emmip(m, ~ A | B, CIs=TRUE) emmip(m, ~ B | A, CIs=TRUE) emmip(m, B ~ A, CIs=TRUE) emmip(m, A ~ B, CIs=TRUE) #### Within-Subjects Design #### within.1 MANOVA(within.1, dvs="A1:A4", dvs.pattern="A(.)", within="A") ## the same: MANOVA(within.1, dvs=c("A1", "A2", "A3", "A4"), dvs.pattern="A(.)", within="MyFactor") # renamed the within-subjects factor within.2 MANOVA(within.2, dvs="A1B1:A2B3", dvs.pattern="A(.)B(.)", within=c("A", "B")) within.3 MANOVA(within.3, dvs="A1B1C1:A2B2C2", dvs.pattern="A(.)B(.)C(.)", within=c("A", "B", "C")) #### Mixed Design #### mixed.2_1b1w MANOVA(mixed.2_1b1w, dvs="B1:B3", dvs.pattern="B(.)", between="A", within="B") MANOVA(mixed.2_1b1w, dvs="B1:B3", dvs.pattern="B(.)", between="A", within="B", sph.correction="GG") mixed.3_1b2w MANOVA(mixed.3_1b2w, dvs="B1C1:B2C2", dvs.pattern="B(.)C(.)", between="A", within=c("B", "C")) mixed.3_2b1w MANOVA(mixed.3_2b1w, dvs="B1:B2", dvs.pattern="B(.)", between=c("A", "C"), within="B") #### Other Examples #### data.new = mixed.3_1b2w names(data.new) = c("Group", "Cond_01", "Cond_02", "Cond_03", "Cond_04") MANOVA(data.new, dvs="Cond_01:Cond_04", dvs.pattern="Cond_(..)", between="Group", within="Condition") # rename the factor # ?afex::obk.long MANOVA(afex::obk.long, subID="id", dv="value", between=c("treatment", "gender"), within=c("phase", "hour"), cov="age", sph.correction="GG")#### Between-Subjects Design #### between.1 MANOVA(between.1, dv="SCORE", between="A") between.2 MANOVA(between.2, dv="SCORE", between=c("A", "B")) between.3 MANOVA(between.3, dv="SCORE", between=c("A", "B", "C")) ## How to create an interaction plot using `emmeans::emmip()`? ## See help page for its usage: ?emmeans::emmip() m = MANOVA(between.2, dv="SCORE", between=c("A", "B")) emmip(m, ~ A | B, CIs=TRUE) emmip(m, ~ B | A, CIs=TRUE) emmip(m, B ~ A, CIs=TRUE) emmip(m, A ~ B, CIs=TRUE) #### Within-Subjects Design #### within.1 MANOVA(within.1, dvs="A1:A4", dvs.pattern="A(.)", within="A") ## the same: MANOVA(within.1, dvs=c("A1", "A2", "A3", "A4"), dvs.pattern="A(.)", within="MyFactor") # renamed the within-subjects factor within.2 MANOVA(within.2, dvs="A1B1:A2B3", dvs.pattern="A(.)B(.)", within=c("A", "B")) within.3 MANOVA(within.3, dvs="A1B1C1:A2B2C2", dvs.pattern="A(.)B(.)C(.)", within=c("A", "B", "C")) #### Mixed Design #### mixed.2_1b1w MANOVA(mixed.2_1b1w, dvs="B1:B3", dvs.pattern="B(.)", between="A", within="B") MANOVA(mixed.2_1b1w, dvs="B1:B3", dvs.pattern="B(.)", between="A", within="B", sph.correction="GG") mixed.3_1b2w MANOVA(mixed.3_1b2w, dvs="B1C1:B2C2", dvs.pattern="B(.)C(.)", between="A", within=c("B", "C")) mixed.3_2b1w MANOVA(mixed.3_2b1w, dvs="B1:B2", dvs.pattern="B(.)", between=c("A", "C"), within="B") #### Other Examples #### data.new = mixed.3_1b2w names(data.new) = c("Group", "Cond_01", "Cond_02", "Cond_03", "Cond_04") MANOVA(data.new, dvs="Cond_01:Cond_04", dvs.pattern="Cond_(..)", between="Group", within="Condition") # rename the factor # ?afex::obk.long MANOVA(afex::obk.long, subID="id", dv="value", between=c("treatment", "gender"), within=c("phase", "hour"), cov="age", sph.correction="GG")
Tidy report of mediation analysis, which is performed using mediation::mediate().
med_summary(model, digits = 3, file = NULL)med_summary(model, digits = 3, file = NULL)
model |
Mediation model built with |
digits |
Number of decimal places of output. Defaults to |
file |
File name of MS Word ( |
Invisibly return a data frame containing the results.
## Not run: library(mediation) # ?mediation::mediate ## Example 1: OLS Regression ## Bias-corrected and accelerated (BCa) bootstrap confidence intervals ## Hypothesis: Solar radiation -> Ozone -> Daily temperature lm.m = lm(Ozone ~ Solar.R + Month + Wind, data=airquality) lm.y = lm(Temp ~ Ozone + Solar.R + Month + Wind, data=airquality) set.seed(123) # set a random seed for reproduction med = mediate(lm.m, lm.y, treat="Solar.R", mediator="Ozone", sims=1000, boot=TRUE, boot.ci.type="bca") med_summary(med) ## Example 2: Multilevel Linear Model (Linear Mixed Model) ## (models must be fit using "lme4::lmer" rather than "lmerTest::lmer") ## Monte Carlo simulation (quasi-Bayesian approximation) ## (bootstrap method is not applicable to "lmer" models) ## Hypothesis: Crips -> Sweetness -> Preference (for carrots) data = lmerTest::carrots # long-format data data = na.omit(data) # omit missing values lmm.m = lme4::lmer(Sweetness ~ Crisp + Gender + Age + (1 | Consumer), data=data) lmm.y = lme4::lmer(Preference ~ Sweetness + Crisp + Gender + Age + (1 | Consumer), data=data) set.seed(123) # set a random seed for reproduction med.lmm = mediate(lmm.m, lmm.y, treat="Crisp", mediator="Sweetness", sims=1000) med_summary(med.lmm) ## End(Not run)## Not run: library(mediation) # ?mediation::mediate ## Example 1: OLS Regression ## Bias-corrected and accelerated (BCa) bootstrap confidence intervals ## Hypothesis: Solar radiation -> Ozone -> Daily temperature lm.m = lm(Ozone ~ Solar.R + Month + Wind, data=airquality) lm.y = lm(Temp ~ Ozone + Solar.R + Month + Wind, data=airquality) set.seed(123) # set a random seed for reproduction med = mediate(lm.m, lm.y, treat="Solar.R", mediator="Ozone", sims=1000, boot=TRUE, boot.ci.type="bca") med_summary(med) ## Example 2: Multilevel Linear Model (Linear Mixed Model) ## (models must be fit using "lme4::lmer" rather than "lmerTest::lmer") ## Monte Carlo simulation (quasi-Bayesian approximation) ## (bootstrap method is not applicable to "lmer" models) ## Hypothesis: Crips -> Sweetness -> Preference (for carrots) data = lmerTest::carrots # long-format data data = na.omit(data) # omit missing values lmm.m = lme4::lmer(Sweetness ~ Crisp + Gender + Age + (1 | Consumer), data=data) lmm.y = lme4::lmer(Preference ~ Sweetness + Crisp + Gender + Age + (1 | Consumer), data=data) set.seed(123) # set a random seed for reproduction med.lmm = mediate(lmm.m, lmm.y, treat="Crisp", mediator="Sweetness", sims=1000) med_summary(med.lmm) ## End(Not run)
Tidy report of regression models (most model types are supported). This function uses:
model_summary( model.list, std = FALSE, digits = 3, file = NULL, check = TRUE, zero = ifelse(std, FALSE, TRUE), modify.se = NULL, modify.head = NULL, line = TRUE, bold = 0, ... )model_summary( model.list, std = FALSE, digits = 3, file = NULL, check = TRUE, zero = ifelse(std, FALSE, TRUE), modify.se = NULL, modify.head = NULL, line = TRUE, bold = 0, ... )
model.list |
A single model or a list of (various types of) models. Most types of regression models are supported! |
std |
Standardized coefficients? Defaults to |
digits |
Number of decimal places of output. Defaults to |
file |
File name of MS Word ( |
check |
If there is only one model in |
zero |
Display "0" before "."? Defaults to |
modify.se |
Modify standard errors. Useful if you need to change raw SEs to robust SEs. New SEs should be provided as a list of numeric vectors. See usage in |
modify.head |
Modify model names. |
line |
Lines look like true line ( |
bold |
The p-value threshold below which the coefficients will be formatted in bold. |
... |
Arguments passed on to |
Invisibly return the output (character string).
print_table() (print simple table)
#### Example 1: Linear Model #### lm1 = lm(Temp ~ Month + Day, data=airquality) lm2 = lm(Temp ~ Month + Day + Wind + Solar.R, data=airquality) model_summary(lm1) model_summary(lm2) model_summary(list(lm1, lm2)) model_summary(list(lm1, lm2), std=TRUE, digits=2) model_summary(list(lm1, lm2), file="OLS Models.doc") unlink("OLS Models.doc") # delete file for code check #### Example 2: Generalized Linear Model #### glm1 = glm(case ~ age + parity, data=infert, family=binomial) glm2 = glm(case ~ age + parity + education + spontaneous + induced, data=infert, family=binomial) model_summary(list(glm1, glm2)) # "std" is not applicable to glm model_summary(list(glm1, glm2), file="GLM Models.doc") unlink("GLM Models.doc") # delete file for code check #### Example 3: Linear Mixed Model #### library(lmerTest) hlm1 = lmer(Reaction ~ (1 | Subject), data=sleepstudy) hlm2 = lmer(Reaction ~ Days + (1 | Subject), data=sleepstudy) hlm3 = lmer(Reaction ~ Days + (Days | Subject), data=sleepstudy) model_summary(list(hlm1, hlm2, hlm3)) model_summary(list(hlm1, hlm2, hlm3), std=TRUE) model_summary(list(hlm1, hlm2, hlm3), file="HLM Models.doc") unlink("HLM Models.doc") # delete file for code check #### Example 4: Generalized Linear Mixed Model #### library(lmerTest) data.glmm = MASS::bacteria glmm1 = glmer(y ~ trt + week + (1 | ID), data=data.glmm, family=binomial) glmm2 = glmer(y ~ trt + week + hilo + (1 | ID), data=data.glmm, family=binomial) model_summary(list(glmm1, glmm2)) # "std" is not applicable to glmm model_summary(list(glmm1, glmm2), file="GLMM Models.doc") unlink("GLMM Models.doc") # delete file for code check #### Example 5: Multinomial Logistic Model #### library(nnet) d = airquality d$Month = as.factor(d$Month) # Factor levels: 5, 6, 7, 8, 9 mn1 = multinom(Month ~ Temp, data=d, Hess=TRUE) mn2 = multinom(Month ~ Temp + Wind + Ozone, data=d, Hess=TRUE) model_summary(mn1) model_summary(mn2) model_summary(mn2, file="Multinomial Logistic Model.doc") unlink("Multinomial Logistic Model.doc") # delete file for code check#### Example 1: Linear Model #### lm1 = lm(Temp ~ Month + Day, data=airquality) lm2 = lm(Temp ~ Month + Day + Wind + Solar.R, data=airquality) model_summary(lm1) model_summary(lm2) model_summary(list(lm1, lm2)) model_summary(list(lm1, lm2), std=TRUE, digits=2) model_summary(list(lm1, lm2), file="OLS Models.doc") unlink("OLS Models.doc") # delete file for code check #### Example 2: Generalized Linear Model #### glm1 = glm(case ~ age + parity, data=infert, family=binomial) glm2 = glm(case ~ age + parity + education + spontaneous + induced, data=infert, family=binomial) model_summary(list(glm1, glm2)) # "std" is not applicable to glm model_summary(list(glm1, glm2), file="GLM Models.doc") unlink("GLM Models.doc") # delete file for code check #### Example 3: Linear Mixed Model #### library(lmerTest) hlm1 = lmer(Reaction ~ (1 | Subject), data=sleepstudy) hlm2 = lmer(Reaction ~ Days + (1 | Subject), data=sleepstudy) hlm3 = lmer(Reaction ~ Days + (Days | Subject), data=sleepstudy) model_summary(list(hlm1, hlm2, hlm3)) model_summary(list(hlm1, hlm2, hlm3), std=TRUE) model_summary(list(hlm1, hlm2, hlm3), file="HLM Models.doc") unlink("HLM Models.doc") # delete file for code check #### Example 4: Generalized Linear Mixed Model #### library(lmerTest) data.glmm = MASS::bacteria glmm1 = glmer(y ~ trt + week + (1 | ID), data=data.glmm, family=binomial) glmm2 = glmer(y ~ trt + week + hilo + (1 | ID), data=data.glmm, family=binomial) model_summary(list(glmm1, glmm2)) # "std" is not applicable to glmm model_summary(list(glmm1, glmm2), file="GLMM Models.doc") unlink("GLMM Models.doc") # delete file for code check #### Example 5: Multinomial Logistic Model #### library(nnet) d = airquality d$Month = as.factor(d$Month) # Factor levels: 5, 6, 7, 8, 9 mn1 = multinom(Month ~ Temp, data=d, Hess=TRUE) mn2 = multinom(Month ~ Temp + Wind + Ozone, data=d, Hess=TRUE) model_summary(mn1) model_summary(mn2) model_summary(mn2, file="Multinomial Logistic Model.doc") unlink("Multinomial Logistic Model.doc") # delete file for code check
Compute p value.
p( z = NULL, t = NULL, f = NULL, r = NULL, chi2 = NULL, n = NULL, df = NULL, df1 = NULL, df2 = NULL, digits = 2 ) p.z(z) p.t(t, df) p.f(f, df1, df2) p.r(r, n) p.chi2(chi2, df)p( z = NULL, t = NULL, f = NULL, r = NULL, chi2 = NULL, n = NULL, df = NULL, df1 = NULL, df2 = NULL, digits = 2 ) p.z(z) p.t(t, df) p.f(f, df1, df2) p.r(r, n) p.chi2(chi2, df)
z, t, f, r, chi2
|
|
n, df, df1, df2
|
Sample size or degree of freedom. |
digits |
Number of decimal places of output. Defaults to |
p value statistics.
p.z(): Two-tailed p value of .
p.t(): Two-tailed p value of .
p.f(): One-tailed p value of . (Note: test is one-tailed only.)
p.r(): Two-tailed p value of .
p.chi2(): One-tailed p value of . (Note: test is one-tailed only.)
p.z(1.96) p.t(2, 100) p.f(4, 1, 100) p.r(0.2, 100) p.chi2(3.84, 1) p(z=1.96) p(t=2, df=100) p(f=4, df1=1, df2=100) p(r=0.2, n=100) p(chi2=3.84, df=1)p.z(1.96) p.t(2, 100) p.f(4, 1, 100) p.r(0.2, 100) p.chi2(3.84, 1) p(z=1.96) p(t=2, df=100) p(f=4, df1=1, df2=100) p(r=0.2, n=100) p(chi2=3.84, df=1)
Check dependencies of R packages.
pkg_depend(pkgs, excludes = NULL)pkg_depend(pkgs, excludes = NULL)
pkgs |
Package(s). |
excludes |
[Optional] Package(s) and their dependencies excluded from the dependencies of |
A character vector of package names.
Frustrated with print() and cat()? Try this!
Run examples to see what it can do.
Print(...) Glue(...)Print(...) Glue(...)
... |
Character strings enclosed by Character strings enclosed by Long strings are broken by line and concatenated together. Leading whitespace and blank lines from the first and last lines are automatically trimmed. |
Possible formats/colors that can be used in "<< >>" include:
(1) bold, italic, underline, reset, blurred, inverse, hidden, strikethrough;
(2) black, white, silver, red, green, blue, yellow, cyan, magenta;
(3) bgBlack, bgWhite, bgRed, bgGreen, bgBlue, bgYellow, bgCyan, bgMagenta.
See more details in glue::glue() and glue::glue_col().
Formatted text.
Print(): Paste and print strings.
Glue(): Paste strings.
name = "Bruce" Print("My name is <<underline <<bold {name}>>>>. <<bold <<blue Pi = {pi:.15}.>>>> <<italic <<green 1 + 1 = {1 + 1}.>>>> sqrt({x}) = <<red {sqrt(x):.3}>>", x=10)name = "Bruce" Print("My name is <<underline <<bold {name}>>>>. <<bold <<blue Pi = {pi:.15}.>>>> <<italic <<green 1 + 1 = {1 + 1}.>>>> sqrt({x}) = <<red {sqrt(x):.3}>>", x=10)
This basic function prints any data frame as a three-line table to either R Console or Microsoft Word (.doc). It has been used in many other functions of bruceR (see below).
print_table( x, digits = 3, nspaces = 1, row.names = TRUE, col.names = TRUE, title = "", note = "", append = "", line = TRUE, file = NULL, file.align.head = "auto", file.align.text = "auto" )print_table( x, digits = 3, nspaces = 1, row.names = TRUE, col.names = TRUE, title = "", note = "", append = "", line = TRUE, file = NULL, file.align.head = "auto", file.align.text = "auto" )
x |
Matrix, data.frame (or data.table), or any model object (e.g., |
digits |
Numeric vector specifying the number of decimal places of output. Defaults to |
nspaces |
Number of whitespaces between columns. Defaults to |
row.names, col.names
|
Print row/column names. Defaults to |
title |
Title text, which will be inserted in |
note |
Note text, which will be inserted in |
append |
Other contents, which will be appended in the end (HTML code). |
line |
Lines looks like true line ( |
file |
File name of MS Word ( |
file.align.head, file.align.text
|
Alignment of table head or table text: |
Invisibly return a list of data frame and HTML code.
print_table(data.frame(x=1)) print_table(airquality, file="airquality.doc") unlink("airquality.doc") # delete file for code check model = lm(Temp ~ Month + Day + Wind + Solar.R, data=airquality) print_table(model) print_table(model, file="model.doc") unlink("model.doc") # delete file for code checkprint_table(data.frame(x=1)) print_table(airquality, file="airquality.doc") unlink("airquality.doc") # delete file for code check model = lm(Temp ~ Month + Day + Wind + Solar.R, data=airquality) print_table(model) print_table(model, file="model.doc") unlink("model.doc") # delete file for code check
Model-based mediation and moderation analyses (i.e., using raw regression model objects with distinct R packages, BUT NOT with the SPSS PROCESS Macro, to estimate effects in mediation/moderation models).
NOTE: PROCESS() DOES NOT use or transform any code or macro from the original SPSS PROCESS macro developed by Hayes, though its output would link model settings to a PROCESS Model ID in Hayes's numbering system.
To use PROCESS() in publications, please cite not only bruceR but also the following R packages:
interactions::sim_slopes() is used to estimate simple slopes (and conditional direct effects) in moderation, moderated moderation, and moderated mediation models (for PROCESS Model IDs 1, 2, 3, 5, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 58, 59, 72, 73, 75, 76).
mediation::mediate() is used to estimate (conditional) indirect effects in (moderated) mediation models (for PROCESS Model IDs 4, 5, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 58, 59, 72, 73, 75, 76).
lavaan::sem() is used to perform serial multiple mediation analysis (for PROCESS Model ID 6).
PROCESS( data, y = "", x = "", meds = c(), mods = c(), covs = c(), clusters = c(), hlm.re.m = "", hlm.re.y = "", hlm.type = c("1-1-1", "2-1-1", "2-2-1"), med.type = c("parallel", "serial"), mod.type = c("2-way", "3-way"), mod.path = c("x-y", "x-m", "m-y", "all"), cov.path = c("y", "m", "both"), mod1.val = NULL, mod2.val = NULL, ci = c("boot", "bc.boot", "bca.boot", "mcmc"), nsim = 100, seed = NULL, center = TRUE, std = FALSE, digits = 3, file = NULL )PROCESS( data, y = "", x = "", meds = c(), mods = c(), covs = c(), clusters = c(), hlm.re.m = "", hlm.re.y = "", hlm.type = c("1-1-1", "2-1-1", "2-2-1"), med.type = c("parallel", "serial"), mod.type = c("2-way", "3-way"), mod.path = c("x-y", "x-m", "m-y", "all"), cov.path = c("y", "m", "both"), mod1.val = NULL, mod2.val = NULL, ci = c("boot", "bc.boot", "bca.boot", "mcmc"), nsim = 100, seed = NULL, center = TRUE, std = FALSE, digits = 3, file = NULL )
data |
Data frame. |
y, x
|
Variable name of outcome (Y) and predictor (X).
|
meds |
Variable name(s) of mediator(s) (M). Use
|
mods |
Variable name(s) of 0~2 moderator(s) (W). Use
|
covs |
Variable name(s) of covariate(s) (i.e., control variables). Use
|
clusters |
HLM (multilevel) cluster(s): e.g., |
hlm.re.m, hlm.re.y
|
HLM (multilevel) random effect term of M model and Y model. By default, it converts You may specify these arguments to include more complex terms: e.g., random slopes |
hlm.type |
HLM (multilevel) mediation type (levels of "X-M-Y"): |
med.type |
Type of mediator: |
mod.type |
Type of moderator: |
mod.path |
Which path(s) do the moderator(s) influence? |
cov.path |
Which path(s) do the control variable(s) influence? |
mod1.val, mod2.val
|
By default ( |
ci |
Method for estimating the standard error (SE) and 95% confidence interval (CI) of indirect effect(s). Defaults to
Note that these methods never apply to the estimates of simple slopes. You should not report the 95% CIs of simple slopes as Bootstrap or Monte Carlo CIs, because they are just standard CIs without any resampling method. |
nsim |
Number of simulation samples (bootstrap resampling or Monte Carlo simulation) for estimating SE and 95% CI. Defaults to |
seed |
Random seed for reproducible results. Defaults to |
center |
Centering numeric (continuous) predictors? Defaults to |
std |
Standardizing variables to get standardized coefficients? Defaults to |
digits |
Number of decimal places of output. Defaults to |
file |
File name of MS Word ( |
Invisibly return a list of results:
process.idPROCESS Model ID (in Hayes's numbering system).
process.typePROCESS model type.
model.mMediator (M) model(s) (a list of multiple models).
model.yOutcome (Y) model.
resultsEffect estimates and other results (unnamed list object).
Two parts of results are printed:
PART 1. Regression model summary
PART 2. Mediation/moderation effect estimates
PROCESS() DOES NOT use or transform any code or macro from the original SPSS PROCESS macro developed by Hayes, though its output would link model settings to a PROCESS Model ID in Hayes's numbering system.
DO NOT state that "the bruceR package runs the PROCESS Model Code developed by Hayes (2018)" — it was not the truth. The bruceR package only links results to Hayes's numbering system but never uses his code.
To perform mediation, moderation, and conditional process (moderated mediation) analyses, people may use Mplus, SPSS "PROCESS" macro, or SPSS "MLmed" macro. Some R packages and functions can also perform such analyses, in a somewhat complex way, including mediation::mediate(), interactions::sim_slopes(), and lavaan::sem().
Furthermore, some other R packages or scripts/modules have been developed, including jamovi module jAMM (by Marcello Gallucci, based on the lavaan package), R package processR (by Keon-Woong Moon, not official, also based on the lavaan package), and R script file "process.R" (the official PROCESS R code by Andrew F. Hayes, but it is not yet an R package).
Distinct from these existing tools, PROCESS() provides an integrative way for performing mediation/moderation analyses in R. This function supports 24 kinds of SPSS PROCESS models numbered by Hayes (2018) (but does not use or transform his code), and also supports multilevel mediation/moderation analyses. Overall, it supports the most frequently used types of mediation, moderation, moderated moderation (3-way interaction), and moderated mediation (conditional indirect effect) analyses for (generalized) linear or linear mixed models.
Specifically, PROCESS() fits regression models based on the data, variable names, and a few other arguments that users input (with no need to specify the PROCESS Model ID or manually mean-center the variables). The function can automatically link model settings to Hayes's numbering system.
PROCESS() automatically conducts grand-mean centering, using grand_mean_center(), before model building, though it can be turned off by setting center=FALSE.
The grand-mean centering is important because it:
makes the results of main effects accurate for interpretation (see my commentary on this issue: Bao et al., 2022);
does not change any model fit indices (it only affects the interpretation of main effects);
is only conducted in "PART 1" (for an accurate estimate of main effects) but not in "PART 2" because it is more intuitive and interpretable to use the raw values of variables for the simple-slope tests in "PART 2";
is not conflicted with group-mean centering because after group-mean centering the grand mean of a variable will also be 0, such that the automatic grand-mean centering (with mean = 0) will not change any values of the variable.
Conduct group-mean centering, if necessary, with group_mean_center() before using PROCESS(). Remember that the automatic grand-mean centering never affects the values of a group-mean centered variable, which already has a grand mean of 0.
Hayes, A. F. (2018). Introduction to mediation, moderation, and conditional process analysis (second edition): A regression-based approach. Guilford Press.
Yzerbyt, V., Muller, D., Batailler, C., & Judd, C. M. (2018). New recommendations for testing indirect effects in mediational models: The need to report and test component paths. Journal of Personality and Social Psychology, 115(6), 929–943.
For more details and illustrations, see PROCESS-bruceR-SPSS (PDF and Markdown files).
#### NOTE #### ## In the following examples, I set nsim=100 to save time. ## In formal analyses, nsim=1000 (or larger) is suggested! #### Demo Data #### # ?mediation::student data = mediation::student %>% dplyr::select(SCH_ID, free, smorale, pared, income, gender, work, attachment, fight, late, score) names(data)[2:3] = c("SCH_free", "SCH_morale") names(data)[4:7] = c("parent_edu", "family_inc", "gender", "partjob") data$gender01 = 1 - data$gender # 0 = female, 1 = male # dichotomous X: as.factor() data$gender = factor(data$gender01, levels=0:1, labels=c("Female", "Male")) # dichotomous Y: as.factor() data$pass = as.factor(ifelse(data$score>=50, 1, 0)) #### Descriptive Statistics and Correlation Analyses #### Freq(data$gender) Freq(data$pass) Describe(data) # file="xxx.doc" Corr(data[,4:11]) # file="xxx.doc" #### PROCESS Analyses #### ## Model 1 ## PROCESS(data, y="score", x="late", mods="gender") # continuous Y PROCESS(data, y="pass", x="late", mods="gender") # dichotomous Y # (multilevel moderation) PROCESS(data, y="score", x="late", mods="gender", # continuous Y (LMM) clusters="SCH_ID") PROCESS(data, y="pass", x="late", mods="gender", # dichotomous Y (GLMM) clusters="SCH_ID") # (Johnson-Neyman (J-N) interval and plot) PROCESS(data, y="score", x="gender", mods="late") -> P P$results[[1]]$jn[[1]] # Johnson-Neyman interval P$results[[1]]$jn[[1]]$plot # Johnson-Neyman plot (ggplot object) GLM_summary(P$model.y) # detailed results of regression # (allows multicategorical moderator) d = airquality d$Month = as.factor(d$Month) # moderator: factor with levels "5"~"9" PROCESS(d, y="Temp", x="Solar.R", mods="Month") ## Model 2 ## PROCESS(data, y="score", x="late", mods=c("gender", "family_inc"), mod.type="2-way") # or omit "mod.type", default is "2-way" ## Model 3 ## PROCESS(data, y="score", x="late", mods=c("gender", "family_inc"), mod.type="3-way") PROCESS(data, y="pass", x="gender", mods=c("late", "family_inc"), mod1.val=c(1, 3, 5), # moderator 1: late mod2.val=seq(1, 15, 2), # moderator 2: family_inc mod.type="3-way") ## Model 4 ## PROCESS(data, y="score", x="parent_edu", meds="family_inc", covs="gender", ci="boot", nsim=100, seed=1) # (allows an infinite number of multiple mediators in parallel) PROCESS(data, y="score", x="parent_edu", meds=c("family_inc", "late"), covs=c("gender", "partjob"), ci="boot", nsim=100, seed=1) # (multilevel mediation) PROCESS(data, y="score", x="SCH_free", meds="late", clusters="SCH_ID", ci="mcmc", nsim=100, seed=1) ## Model 6 ## PROCESS(data, y="score", x="parent_edu", meds=c("family_inc", "late"), covs=c("gender", "partjob"), med.type="serial", ci="boot", nsim=100, seed=1) ## Model 8 ## PROCESS(data, y="score", x="fight", meds="late", mods="gender", mod.path=c("x-m", "x-y"), ci="boot", nsim=100, seed=1) ## For more examples and details, see: ## https://github.com/psychbruce/bruceR/tree/main/note#### NOTE #### ## In the following examples, I set nsim=100 to save time. ## In formal analyses, nsim=1000 (or larger) is suggested! #### Demo Data #### # ?mediation::student data = mediation::student %>% dplyr::select(SCH_ID, free, smorale, pared, income, gender, work, attachment, fight, late, score) names(data)[2:3] = c("SCH_free", "SCH_morale") names(data)[4:7] = c("parent_edu", "family_inc", "gender", "partjob") data$gender01 = 1 - data$gender # 0 = female, 1 = male # dichotomous X: as.factor() data$gender = factor(data$gender01, levels=0:1, labels=c("Female", "Male")) # dichotomous Y: as.factor() data$pass = as.factor(ifelse(data$score>=50, 1, 0)) #### Descriptive Statistics and Correlation Analyses #### Freq(data$gender) Freq(data$pass) Describe(data) # file="xxx.doc" Corr(data[,4:11]) # file="xxx.doc" #### PROCESS Analyses #### ## Model 1 ## PROCESS(data, y="score", x="late", mods="gender") # continuous Y PROCESS(data, y="pass", x="late", mods="gender") # dichotomous Y # (multilevel moderation) PROCESS(data, y="score", x="late", mods="gender", # continuous Y (LMM) clusters="SCH_ID") PROCESS(data, y="pass", x="late", mods="gender", # dichotomous Y (GLMM) clusters="SCH_ID") # (Johnson-Neyman (J-N) interval and plot) PROCESS(data, y="score", x="gender", mods="late") -> P P$results[[1]]$jn[[1]] # Johnson-Neyman interval P$results[[1]]$jn[[1]]$plot # Johnson-Neyman plot (ggplot object) GLM_summary(P$model.y) # detailed results of regression # (allows multicategorical moderator) d = airquality d$Month = as.factor(d$Month) # moderator: factor with levels "5"~"9" PROCESS(d, y="Temp", x="Solar.R", mods="Month") ## Model 2 ## PROCESS(data, y="score", x="late", mods=c("gender", "family_inc"), mod.type="2-way") # or omit "mod.type", default is "2-way" ## Model 3 ## PROCESS(data, y="score", x="late", mods=c("gender", "family_inc"), mod.type="3-way") PROCESS(data, y="pass", x="gender", mods=c("late", "family_inc"), mod1.val=c(1, 3, 5), # moderator 1: late mod2.val=seq(1, 15, 2), # moderator 2: family_inc mod.type="3-way") ## Model 4 ## PROCESS(data, y="score", x="parent_edu", meds="family_inc", covs="gender", ci="boot", nsim=100, seed=1) # (allows an infinite number of multiple mediators in parallel) PROCESS(data, y="score", x="parent_edu", meds=c("family_inc", "late"), covs=c("gender", "partjob"), ci="boot", nsim=100, seed=1) # (multilevel mediation) PROCESS(data, y="score", x="SCH_free", meds="late", clusters="SCH_ID", ci="mcmc", nsim=100, seed=1) ## Model 6 ## PROCESS(data, y="score", x="parent_edu", meds=c("family_inc", "late"), covs=c("gender", "partjob"), med.type="serial", ci="boot", nsim=100, seed=1) ## Model 8 ## PROCESS(data, y="score", x="fight", meds="late", mods="gender", mod.path=c("x-m", "x-y"), ci="boot", nsim=100, seed=1) ## For more examples and details, see: ## https://github.com/psychbruce/bruceR/tree/main/note
A wrapper of car::recode().
RECODE(var, recodes)RECODE(var, recodes)
var |
Variable (numeric, character, or factor). |
recodes |
A character string definine the rule of recoding. e.g., |
A vector of recoded variable.
d = data.table(var=c(NA, 0, 1, 2, 3, 4, 5, 6)) added(d, { var.new = RECODE(var, "lo:1=0; c(2,3)=1; 4=2; 5:hi=3; else=999") }) dd = data.table(var=c(NA, 0, 1, 2, 3, 4, 5, 6)) added(d, { var.new = RECODE(var, "lo:1=0; c(2,3)=1; 4=2; 5:hi=3; else=999") }) d
NOTE: model_summary() is preferred.
regress( formula, data, family = NULL, digits = 3, robust = FALSE, cluster = NULL, test.rand = FALSE )regress( formula, data, family = NULL, digits = 3, robust = FALSE, cluster = NULL, test.rand = FALSE )
formula |
Model formula. |
data |
Data frame. |
family |
[Optional] The same as in |
digits |
Number of decimal places of output. Defaults to |
robust |
[Only for Options: Note: |
cluster |
[Only for |
test.rand |
[Only for |
No return value.
print_table() (print simple table)
model_summary() (strongly suggested)
## Not run: ## lm regress(Temp ~ Month + Day + Wind + Solar.R, data=airquality, robust=TRUE) ## glm regress(case ~ age + parity + education + spontaneous + induced, data=infert, family=binomial, robust="HC1", cluster="stratum") ## lmer library(lmerTest) regress(Reaction ~ Days + (Days | Subject), data=sleepstudy) regress(Preference ~ Sweetness + Gender + Age + Frequency + (1 | Consumer), data=carrots) ## glmer library(lmerTest) data.glmm = MASS::bacteria regress(y ~ trt + week + (1 | ID), data=data.glmm, family=binomial) regress(y ~ trt + week + hilo + (1 | ID), data=data.glmm, family=binomial) ## End(Not run)## Not run: ## lm regress(Temp ~ Month + Day + Wind + Solar.R, data=airquality, robust=TRUE) ## glm regress(case ~ age + parity + education + spontaneous + induced, data=infert, family=binomial, robust="HC1", cluster="stratum") ## lmer library(lmerTest) regress(Reaction ~ Days + (Days | Subject), data=sleepstudy) regress(Preference ~ Sweetness + Gender + Age + Frequency + (1 | Consumer), data=carrots) ## glmer library(lmerTest) data.glmm = MASS::bacteria regress(y ~ trt + week + (1 | ID), data=data.glmm, family=binomial) regress(y ~ trt + week + hilo + (1 | ID), data=data.glmm, family=binomial) ## End(Not run)
Repeat a character string for many times and paste them up.
rep_char(char, rep.times)rep_char(char, rep.times)
char |
Character string. |
rep.times |
Times for repeat. |
Character string.
rep_char("a", 5)rep_char("a", 5)
Rescale a variable (e.g., from 5-point to 7-point).
RESCALE(var, from = range(var, na.rm = T), to)RESCALE(var, from = range(var, na.rm = T), to)
var |
Variable (numeric). |
from |
Numeric vector, the range of old scale (e.g., |
to |
Numeric vector, the range of new scale (e.g., |
A vector of rescaled variable.
d = data.table(var=rep(1:5, 2)) added(d, { var1 = RESCALE(var, to=1:7) var2 = RESCALE(var, from=1:5, to=1:7) }) d # var1 is equal to var2d = data.table(var=rep(1:5, 2)) added(d, { var1 = RESCALE(var, to=1:7) var2 = RESCALE(var, from=1:5, to=1:7) }) d # var1 is equal to var2
rgb().A simple extension of rgb().
RGB(r, g, b, alpha)RGB(r, g, b, alpha)
r, g, b
|
Red, Green, Blue: 0~255. |
alpha |
Color transparency (opacity): 0~1. If not specified, an opaque color will be generated. |
"#rrggbb" or "#rrggbbaa".
RGB(255, 0, 0) # red: "#FF0000" RGB(255, 0, 0, 0.8) # red with 80% opacity: "#FF0000CC"RGB(255, 0, 0) # red: "#FF0000" RGB(255, 0, 0, 0.8) # red with 80% opacity: "#FF0000CC"
Run code parsed from text.
Run(..., silent = FALSE)Run(..., silent = FALSE)
... |
Character string(s) to run. You can use |
silent |
Suppress error/warning messages. Defaults to |
Invisibly return the running expression(s).
Run("a=1", "b=2") Run("print({a+b})")Run("a=1", "b=2") Run("print({a+b})")
This function resembles RESCALE()
and it is just equivalent to RESCALE(var, to=0:1).
scaler(v, min = 0, max = 1)scaler(v, min = 0, max = 1)
v |
Variable (numeric vector). |
min |
Minimum value (defaults to 0). |
max |
Maximum value (defaults to 1). |
A vector of rescaled variable.
scaler(1:5) # the same: RESCALE(1:5, to=0:1)scaler(1:5) # the same: RESCALE(1:5, to=0:1)
Set working directory to the path of currently opened file (usually an R script). You may use this function in both .R/.Rmd files and R Console. RStudio (version >= 1.2) is required for running this function.
set.wd(path = NULL, ask = FALSE) set_wd(path = NULL, ask = FALSE)set.wd(path = NULL, ask = FALSE) set_wd(path = NULL, ask = FALSE)
path |
|
ask |
|
Invisibly return the path.
set.wd(): Main function
set_wd(): The alias of set.wd (the same)
## Not run: # RStudio (version >= 1.2) is required for running this function. set.wd() # set working directory to the path of the currently opened file set.wd("~/") # set working directory to the home path set.wd("../") # set working directory to the parent path set.wd(ask=TRUE) # select a folder with the prompt of a dialog ## End(Not run)## Not run: # RStudio (version >= 1.2) is required for running this function. set.wd() # set working directory to the path of the currently opened file set.wd("~/") # set working directory to the home path set.wd("../") # set working directory to the parent path set.wd(ask=TRUE) # select a folder with the prompt of a dialog ## End(Not run)
Show colors.
show_colors(colors)show_colors(colors)
colors |
Color names. Examples:
|
A ggplot object.
show_colors("blue") show_colors("#0000FF") # blue (hex name) show_colors(RGB(0, 0, 255)) # blue (RGB) show_colors(see::social_colors()) show_colors(see::pizza_colors())show_colors("blue") show_colors("#0000FF") # blue (hex name) show_colors(RGB(0, 0, 255)) # blue (RGB) show_colors(see::social_colors()) show_colors(see::pizza_colors())
ggplot2 theme that enables Markdown/HTML rich text.A nice ggplot2 theme for scientific publication. It also uses ggtext::element_markdown() to render Markdown/HTML formatted rich text. You can use a combination of Markdown and/or HTML syntax (e.g., "*y* = *x*<sup>2</sup>") in plot text or title, and this function draws text elements with rich text format.
For more usage, see:
theme_bruce( markdown = FALSE, base.size = 12, line.width = 0.5, border = "black", bg = "white", panel.bg = "white", tag = "bold", plot.title = "bold", axis.title = "plain", title.pos = 0.5, subtitle.pos = 0.5, caption.pos = 1, font = NULL, grid.x = "", grid.y = "", line.x = TRUE, line.y = TRUE, tick.x = TRUE, tick.y = TRUE )theme_bruce( markdown = FALSE, base.size = 12, line.width = 0.5, border = "black", bg = "white", panel.bg = "white", tag = "bold", plot.title = "bold", axis.title = "plain", title.pos = 0.5, subtitle.pos = 0.5, caption.pos = 1, font = NULL, grid.x = "", grid.y = "", line.x = TRUE, line.y = TRUE, tick.x = TRUE, tick.y = TRUE )
markdown |
Use |
base.size |
Basic font size. Defaults to |
line.width |
Line width. Defaults to |
border |
|
bg |
Background color of whole plot. Defaults to |
panel.bg |
Background color of panel. Defaults to |
tag |
Font face of tag. Choose from |
plot.title |
Font face of title. Choose from |
axis.title |
Font face of axis text. Choose from |
title.pos |
Title position (0~1). |
subtitle.pos |
Subtitle position (0~1). |
caption.pos |
Caption position (0~1). |
font |
Text font. Only applicable to Windows system. |
grid.x |
|
grid.y |
|
line.x |
Draw the x-axis line. Defaults to |
line.y |
Draw the y-axis line. Defaults to |
tick.x |
Draw the x-axis ticks. Defaults to |
tick.y |
Draw the y-axis ticks. Defaults to |
A theme object that should be used for ggplot2.
## Example 1 (bivariate correlation) d = as.data.table(psych::bfi) added(d, { E = .mean("E", 1:5, rev=c(1,2), range=1:6) O = .mean("O", 1:5, rev=c(2,5), range=1:6) }) ggplot(data=d, aes(x=E, y=O)) + geom_point(alpha=0.1) + geom_smooth(method="loess") + labs(x="Extraversion<sub>Big 5</sub>", y="Openness<sub>Big 5</sub>") + theme_bruce(markdown=TRUE) ## Example 2 (2x2 ANOVA) d = data.frame(X1 = factor(rep(1:3, each=2)), X2 = factor(rep(1:2, 3)), Y.mean = c(5, 3, 2, 7, 3, 6), Y.se = rep(c(0.1, 0.2, 0.1), each=2)) ggplot(data=d, aes(x=X1, y=Y.mean, fill=X2)) + geom_bar(position="dodge", stat="identity", width=0.6, show.legend=FALSE) + geom_errorbar(aes(x=X1, ymin=Y.mean-Y.se, ymax=Y.mean+Y.se), width=0.1, color="black", position=position_dodge(0.6)) + scale_y_continuous(expand=expansion(add=0), limits=c(0,8), breaks=0:8) + scale_fill_brewer(palette="Set1") + labs(x="Independent Variable (*X*)", # italic X y="Dependent Variable (*Y*)", # italic Y title="Demo Plot<sup>bruceR</sup>") + theme_bruce(markdown=TRUE, border="")## Example 1 (bivariate correlation) d = as.data.table(psych::bfi) added(d, { E = .mean("E", 1:5, rev=c(1,2), range=1:6) O = .mean("O", 1:5, rev=c(2,5), range=1:6) }) ggplot(data=d, aes(x=E, y=O)) + geom_point(alpha=0.1) + geom_smooth(method="loess") + labs(x="Extraversion<sub>Big 5</sub>", y="Openness<sub>Big 5</sub>") + theme_bruce(markdown=TRUE) ## Example 2 (2x2 ANOVA) d = data.frame(X1 = factor(rep(1:3, each=2)), X2 = factor(rep(1:2, 3)), Y.mean = c(5, 3, 2, 7, 3, 6), Y.se = rep(c(0.1, 0.2, 0.1), each=2)) ggplot(data=d, aes(x=X1, y=Y.mean, fill=X2)) + geom_bar(position="dodge", stat="identity", width=0.6, show.legend=FALSE) + geom_errorbar(aes(x=X1, ymin=Y.mean-Y.se, ymax=Y.mean+Y.se), width=0.1, color="black", position=position_dodge(0.6)) + scale_y_continuous(expand=expansion(add=0), limits=c(0,8), breaks=0:8) + scale_fill_brewer(palette="Set1") + labs(x="Independent Variable (*X*)", # italic X y="Dependent Variable (*Y*)", # italic Y title="Demo Plot<sup>bruceR</sup>") + theme_bruce(markdown=TRUE, border="")
One-sample, independent-samples, and paired-samples t-test, with both Frequentist and Bayesian approaches. The output includes descriptives, t statistics, mean difference with 95% CI, Cohen's d with 95% CI, and Bayes factor (; BayesFactor package needs to be installed). It also tests the assumption of homogeneity of variance and allows users to determine whether variances are equal or not.
Users can simultaneously test multiple dependent and/or independent variables. The results of one pair of Y-X would be summarized in one row in the output. Key results can be saved in APA format to MS Word.
TTEST( data, y, x = NULL, paired = FALSE, paired.d.type = "dz", var.equal = TRUE, mean.diff = TRUE, test.value = 0, test.sided = c("=", "<", ">"), factor.rev = TRUE, bf10 = FALSE, bayes.prior = "medium", digits = 2, file = NULL )TTEST( data, y, x = NULL, paired = FALSE, paired.d.type = "dz", var.equal = TRUE, mean.diff = TRUE, test.value = 0, test.sided = c("=", "<", ">"), factor.rev = TRUE, bf10 = FALSE, bayes.prior = "medium", digits = 2, file = NULL )
data |
Data frame (wide-format only, i.e., one case in one row). |
y |
Dependent variable(s). Multiple variables should be included in a character vector For paired-samples t-test, the number of variables should be 2, 4, 6, etc. |
x |
Independent variable(s). Multiple variables should be included in a character vector Only necessary for independent-samples t-test. |
paired |
For paired-samples t-test, set it as |
paired.d.type |
Type of Cohen's d for paired-samples t-test (see Lakens, 2013). Defaults to Options:
|
var.equal |
If Levene's test indicates a violation of the homogeneity of variance,
then you should better set this argument as |
mean.diff |
Whether to display results of mean difference and its 95% CI. Defaults to |
test.value |
The true value of the mean (or difference in means for a two-samples test). Defaults to |
test.sided |
Any of |
factor.rev |
Whether to reverse the levels of factor (X)
such that the test compares higher vs. lower level. Defaults to |
bf10 |
Show BF10 (Bayes Factor) in results? Defaults to |
bayes.prior |
Prior scale in Bayesian t-test. Defaults to 0.707.
See details in |
digits |
Number of decimal places of output. Defaults to |
file |
File name of MS Word ( |
Note that the point estimate of Cohen's d is computed using the common method "Cohen's d = mean difference / (pooled) standard deviation", which is consistent with results from other R packages (e.g., effectsize) and software (e.g., jamovi). The 95% CI of Cohen's d is estimated based on the 95% CI of mean difference (i.e., also divided by the pooled standard deviation).
However, different packages and software diverge greatly on the estimate of the 95% CI of Cohen's d. R packages such as psych and effectsize, R software jamovi, and several online statistical tools for estimating effect sizes indeed produce surprisingly inconsistent results on the 95% CI of Cohen's d.
See an illustration of this issue in the section "Examples".
Invisibly return the results.
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, Article 863.
## Demo data ## d1 = between.3 d1$Y1 = d1$SCORE # shorter name for convenience d1$Y2 = rnorm(32) # random variable d1$B = factor(d1$B, levels=1:2, labels=c("Low", "High")) d1$C = factor(d1$C, levels=1:2, labels=c("M", "F")) d2 = within.1 ## One-sample t-test ## TTEST(d1, "SCORE") TTEST(d1, "SCORE", test.value=5) ## Independent-samples t-test ## TTEST(d1, "SCORE", x="A") TTEST(d1, "SCORE", x="A", var.equal=FALSE) TTEST(d1, y="Y1", x=c("A", "B", "C")) TTEST(d1, y=c("Y1", "Y2"), x=c("A", "B", "C"), mean.diff=FALSE, # remove to save space file="t-result.doc") unlink("t-result.doc") # delete file for code check ## Paired-samples t-test ## TTEST(d2, y=c("A1", "A2"), paired=TRUE) TTEST(d2, y=c("A1", "A2", "A3", "A4"), paired=TRUE) ## Not run: ## Illustration for the issue stated in "Details" # Inconsistency in the 95% CI of Cohen's d between R packages: # In this example, the true point estimate of Cohen's d = 3.00 # and its 95% CI should be equal to 95% CI of mean difference. data = data.frame(X=rep(1:2, each=3), Y=1:6) data # simple demo data TTEST(data, y="Y", x="X") # d = 3.00 [0.73, 5.27] (estimated based on 95% CI of mean difference) MANOVA(data, dv="Y", between="X") %>% EMMEANS("X") # d = 3.00 [0.73, 5.27] (the same as TTEST) psych::cohen.d(x=data, group="X") # d = 3.67 [0.04, 7.35] (strange) psych::d.ci(d=3.00, n1=3, n2=3) # d = 3.00 [-0.15, 6.12] (significance inconsistent with t-test) # jamovi uses psych::d.ci() to compute 95% CI # so its results are also: 3.00 [-0.15, 6.12] effectsize::cohens_d(Y ~ rev(X), data=data) # d = 3.00 [0.38, 5.50] (using the noncentrality parameter method) effectsize::t_to_d(t=t.test(Y ~ rev(X), data=data, var.equal=TRUE)$statistic, df_error=4) # d = 3.67 [0.47, 6.74] (merely an approximate estimate, often overestimated) # see ?effectsize::t_to_d # https://www.psychometrica.de/effect_size.html # d = 3.00 [0.67, 5.33] (slightly different from TTEST) # https://www.campbellcollaboration.org/escalc/ # d = 3.00 [0.67, 5.33] (slightly different from TTEST) # Conclusion: # TTEST() provides a reasonable estimate of Cohen's d and its 95% CI, # and effectsize::cohens_d() offers another method to compute the CI. ## End(Not run)## Demo data ## d1 = between.3 d1$Y1 = d1$SCORE # shorter name for convenience d1$Y2 = rnorm(32) # random variable d1$B = factor(d1$B, levels=1:2, labels=c("Low", "High")) d1$C = factor(d1$C, levels=1:2, labels=c("M", "F")) d2 = within.1 ## One-sample t-test ## TTEST(d1, "SCORE") TTEST(d1, "SCORE", test.value=5) ## Independent-samples t-test ## TTEST(d1, "SCORE", x="A") TTEST(d1, "SCORE", x="A", var.equal=FALSE) TTEST(d1, y="Y1", x=c("A", "B", "C")) TTEST(d1, y=c("Y1", "Y2"), x=c("A", "B", "C"), mean.diff=FALSE, # remove to save space file="t-result.doc") unlink("t-result.doc") # delete file for code check ## Paired-samples t-test ## TTEST(d2, y=c("A1", "A2"), paired=TRUE) TTEST(d2, y=c("A1", "A2", "A3", "A4"), paired=TRUE) ## Not run: ## Illustration for the issue stated in "Details" # Inconsistency in the 95% CI of Cohen's d between R packages: # In this example, the true point estimate of Cohen's d = 3.00 # and its 95% CI should be equal to 95% CI of mean difference. data = data.frame(X=rep(1:2, each=3), Y=1:6) data # simple demo data TTEST(data, y="Y", x="X") # d = 3.00 [0.73, 5.27] (estimated based on 95% CI of mean difference) MANOVA(data, dv="Y", between="X") %>% EMMEANS("X") # d = 3.00 [0.73, 5.27] (the same as TTEST) psych::cohen.d(x=data, group="X") # d = 3.67 [0.04, 7.35] (strange) psych::d.ci(d=3.00, n1=3, n2=3) # d = 3.00 [-0.15, 6.12] (significance inconsistent with t-test) # jamovi uses psych::d.ci() to compute 95% CI # so its results are also: 3.00 [-0.15, 6.12] effectsize::cohens_d(Y ~ rev(X), data=data) # d = 3.00 [0.38, 5.50] (using the noncentrality parameter method) effectsize::t_to_d(t=t.test(Y ~ rev(X), data=data, var.equal=TRUE)$statistic, df_error=4) # d = 3.67 [0.47, 6.74] (merely an approximate estimate, often overestimated) # see ?effectsize::t_to_d # https://www.psychometrica.de/effect_size.html # d = 3.00 [0.67, 5.33] (slightly different from TTEST) # https://www.campbellcollaboration.org/escalc/ # d = 3.00 [0.67, 5.33] (slightly different from TTEST) # Conclusion: # TTEST() provides a reasonable estimate of Cohen's d and its 95% CI, # and effectsize::cohens_d() offers another method to compute the CI. ## End(Not run)