Title: | Companion Datasets and Functions for Research Design in the Social Sciences |
---|---|
Description: | Helper functions to accompany the Blair, Coppock, and Humphreys (2022) "Research Design in the Social Sciences: Declaration, Diagnosis, and Redesign" <https://book.declaredesign.org>. 'rdss' includes datasets, helper functions, and plotting components to enable use and replication of the book. |
Authors: | Graeme Blair [aut, cre] , Alexander Coppock [aut] , Macartan Humphreys [aut] |
Maintainer: | Graeme Blair <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.12 |
Built: | 2024-10-25 08:20:49 UTC |
Source: | https://github.com/cranhaven/cranhaven.r-universe.dev |
Add parentheses around standard error estimates
add_parens(x, digits = 3)
add_parens(x, digits = 3)
x |
Numeric vector |
digits |
Number of digits to retain |
A character vector with enclosing parentheses
std.error <- c(0.12, 0.001, 1.2) add_parens(std.error)
std.error <- c(0.12, 0.001, 1.2) add_parens(std.error)
Best predictor function from causal_forest
best_predictor(data, covariate_names, cuts = 20)
best_predictor(data, covariate_names, cuts = 20)
data |
A data.frame of covariates |
covariate_names |
A character vector of covariates to assess |
cuts |
Either a numeric vector of two or more unique cut points or a single number (greater than or equal to 2) giving the number of intervals into which each covariate is to be cut. |
a data.frame of the best predictors
Replication data for Bonilla and Tillery (2020), American Political Science Review (obtained from Dataverse 10.7910/DVN/IUZDQI)
bonilla_tillery
bonilla_tillery
A data.frame
Runs estimates estimation function from interference package and returns tidy data frame output
causal_forest_handler(data, covariate_names, share_train = 0.5, ...)
causal_forest_handler(data, covariate_names, share_train = 0.5, ...)
data |
A data.frame |
covariate_names |
Names of covariates |
share_train |
Share of units to be used for training |
... |
Options to causal_forest |
https://draft.declaredesign.org/complex-designs.html#discovery-using-causal-forests
See ?causal_forest for further details
a data.frame of estimates
library(DeclareDesign) library(ggplot2) dat <- fabricate( N = 1000, A = rnorm(N), B = rnorm(N), Z = complete_rs(N), Y = A*Z + rnorm(N)) # note: remove num.threads = 1 to use more processors estimates <- causal_forest_handler(data = dat, covariate_names = c("A", "B"), num.threads = 1) ggplot(data = estimates, aes(A, pred)) + geom_point()
library(DeclareDesign) library(ggplot2) dat <- fabricate( N = 1000, A = rnorm(N), B = rnorm(N), Z = complete_rs(N), Y = A*Z + rnorm(N)) # note: remove num.threads = 1 to use more processors estimates <- causal_forest_handler(data = dat, covariate_names = c("A", "B"), num.threads = 1) ggplot(data = estimates, aes(A, pred)) + geom_point()
Replication data for David Clingingsmith, Asim Ijaz Khwaja, Michael Kremer (2020): Estimating the Impact of The Hajj: Religion and Tolerance in Islam's Global Gathering. The Quarterly Journal of Economics, Volume 124, Issue 3, August 2009, Pages 1133-1170
clingingsmith_etal
clingingsmith_etal
A data.frame
See https://book.declaredesign.org/experimental-descriptive.html#conjoint-experiments
conjoint_assignment(data, levels_list)
conjoint_assignment(data, levels_list)
data |
A data.frame |
levels_list |
List of conjoint levels to assign |
a data.frame with random assignment added
See https://book.declaredesign.org/experimental-descriptive.html#conjoint-experiments
conjoint_inquiries(data, levels_list, utility_fn)
conjoint_inquiries(data, levels_list, utility_fn)
data |
A data.frame |
levels_list |
List of conjoint levels |
utility_fn |
a function that takes data and returns an additional column called U, which represents the utility of the choice |
a data.frame of estimand values
See https://book.declaredesign.org/experimental-descriptive.html#conjoint-experiments
conjoint_measurement(data, utility_fn)
conjoint_measurement(data, utility_fn)
data |
A data.frame |
utility_fn |
a function that takes data and returns an additional column called U, which represents the utility of the choice |
a data.frame
Based on Karthik Ram's wesanderson package (https://github.com/karthik/wesanderson)
dd_palette(name, n)
dd_palette(name, n)
name |
Color palette name (character) |
n |
Number of colors |
Available color palettes:
color_palette = c("#72B4F3", "#F38672", "#C6227F")
grey_palette = c("#72B4F3", "#F38672", "#C6227F", gray(0.8))
dd_dark_blue = "#3564ED"
dd_light_blue = "#72B4F3"
dd_orange = "#F38672"
dd_purple = "#7E43B6"
dd_gray = gray(0.2)
dd_pink = "#C6227F"
dd_light_gray = gray(0.8)
dd_dark_blue_alpha = "#3564EDA0"
dd_light_blue_alpha = "#72B4F3A0"
character vector of colors
Runs did_multiplegt estimation function and returns tidy data frame output
did_multiplegt_tidy(data, ...)
did_multiplegt_tidy(data, ...)
data |
a data.frame |
... |
options passed to did_multiplegt |
See https://book.declaredesign.org/observational-causal.html#difference-in-differences
a data.frame of estimates
Runs estimates estimation function from interference package and returns tidy data frame output
estimator_AS_tidy(data, permutatation_matrix, adj_matrix)
estimator_AS_tidy(data, permutatation_matrix, adj_matrix)
data |
a data.frame |
permutatation_matrix |
a permutation matrix of random assignments |
adj_matrix |
an adjacency matrix |
The estimator_AS_tidy function requires the 'interference' package, which is not yet available on CRAN.
To use this function:
install the developer version of interference via remotes::install_github('szonszein/interference') and
install the developer version of rdss via remotes::install_github('DeclareDesign/rdss@remotes')
See https://book.declaredesign.org/experimental-causal.html#experiments-over-networks
a data.frame of estimates
An sf object containing the boundaries of voting precincts for Fairfax County, Virginia as well as precinct ID, name, district, polling place name, address, city, zip code, area, length, and geometry (polygons)
fairfax
fairfax
An sf object with 236 rows and 10 variables:
Replication data for Foos, John, Muller, and Cunningham (2021), Journal of Politics (derived from from Dataverse 10.7910/DVN/NDPXND)
foos_etal
foos_etal
A data.frame
Round and pad a number to a specific decimal place
format_num(x, digits = 3)
format_num(x, digits = 3)
x |
Numeric vector |
digits |
Number of digits to retain |
a character vector of formatted numbers
std.error <- c(0.12, 0.001, 1.2) format_num(std.error)
std.error <- c(0.12, 0.001, 1.2) format_num(std.error)
See https://book.declaredesign.org/experimental-causal.html#experiments-over-networks
get_exposure_AS(obs_exposure)
get_exposure_AS(obs_exposure)
obs_exposure |
A numeric vector |
a data.frame of observed exposure to a treatment created using the interference package
See https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/HYVPO5 for further details and the code used to create these files.
get_rdss_file(name, verbose = TRUE)
get_rdss_file(name, verbose = TRUE)
name |
quoted name of the file on the dataverse archive |
verbose |
print declaration code if requesting a declaration |
The available names include:
Design declaration objects:
declaration_9.5
declaration_2.1
declaration_2.2
declaration_4.1
declaration_5.1
declaration_7.1
declaration_9.1
declaration_9.2
declaration_9.3
declaration_9.4
declaration_9.6
declaration_9.7
declaration_10.1
declaration_10.2
declaration_10.3
declaration_10.4
declaration_10a
declaration_11.1
declaration_11.2
declaration_11.3
declaration_11.4
declaration_11.5
declaration_12.1a
declaration_12.1b
declaration_12.1c
declaration_12.1d
declaration_13.1
declaration_13.2
declaration_15.1
declaration_15.2
declaration_15.3a
declaration_15.3b
declaration_15.3c
declaration_15.4
declaration_15.5
declaration_15.6
declaration_16.1a
declaration_16.1b
declaration_16.2
declaration_16.3
declaration_16.4
declaration_16.5
declaration_16.6
declaration_17.1
declaration_17.2
declaration_17.3
declaration_17.4
declaration_17.5
declaration_17.6_a
declaration_17.6_b
declaration_18.1
declaration_18.2
declaration_18.3
declaration_18.4
declaration_18.5
declaration_18.6
declaration_18.7
declaration_18.8
declaration_18.9a
declaration_18.9b
declaration_18.9c
declaration_18.10
declaration_18.11
declaration_18.12
declaration_18.13
declaration_19.1
declaration_19.2
declaration_19.3
declaration_19.4
declaration_23.1a
declaration_23.1b
declaration_23.1c
declaration_23.1d
Diagnosis objects:
diagnosis_2.1
diagnosis_4.1
diagnosis_9.1
diagnosis_9.2
diagnosis_9.3
diagnosis_9.4
diagnosis_9.5
diagnosis_9.6
diagnosis_9.7
simulation_10.1
diagnosis_10.1
diagnosis_10.2
diagnosis_10.3
diagnosis_10.4
diagnosis_10.5
diagnosis_10a
diagnosis_11.1
diagnosis_11.2
diagnosis_11.3
diagnosis_11.4
diagnosis_11.5
diagnosis_12.1
diagnosis_12.2
diagnosis_13.1
diagnosis_15.1
diagnosis_15.2
diagnosis_15.3
diagnosis_15.4
diagnosis_15.5
diagnosis_16.1
diagnosis_16.2
diagnosis_16.3
diagnosis_16.4
diagnosis_16.5
diagnosis_17.1
diagnosis_17.2
diagnosis_17.3
diagnosis_17.4
diagnosis_17.5
diagnosis_18.1
diagnosis_18.10_encouragment
diagnosis_18.10_placebo
diagnosis_18.11
diagnosis_18.12
diagnosis_18.13
diagnosis_18.2
diagnosis_18.3
diagnosis_18.4
diagnosis_18.5
diagnosis_18.6
diagnosis_18.7
diagnosis_18.8
diagnosis_18.9
diagnosis_19.1
diagnosis_19.2
diagnosis_19.3
diagnosis_19.4
diagnosis_19a
diagnosis_21a
diagnosis_21b
diagnosis_23.1
diagnosis_23a
an r object
# Requires internet access if(curl::has_internet()) { diagnosis_2.1 <- get_rdss_file("diagnosis_2.1") diagnosis_2.1 }
# Requires internet access if(curl::has_internet()) { diagnosis_2.1 <- get_rdss_file("diagnosis_2.1") diagnosis_2.1 }
Add alpha transparency to a color defined in hexadecimal
hex_add_alpha(col, alpha)
hex_add_alpha(col, alpha)
col |
Original color code in hex |
alpha |
Level of alpha transparency to add |
color codes with alpha added
A dataset containing the party registration, age, census tract number, and voter turnout in 2012 for 1,000 randomly-sampled registered voters in Los Angeles County, California.
la_voter_file
la_voter_file
A data frame with 1000 rows and 4 variables:
political party registration
age of voter in years
US Census tract number
voter turnout in 2012 election
California Secretary of State.
See https://book.declaredesign.org/observational-causal.html#difference-in-differences
lag_by_group(x, groups, n = 1, order_by, default = NA)
lag_by_group(x, groups, n = 1, order_by, default = NA)
x |
Vector of values |
groups |
Grouping variable |
n |
Positive integer of length 1, giving the number of positions to lead or lag by |
order_by |
Ordering variable withing group (e.g., time) |
default |
Value used for non-existent rows. Defaults to NA. |
vector of lagged values
These data were resampled with replacement from LAPOP data (to 10,000 rows) for a subset of variables. These data cannot be used for scientific inferences, and are only useful for teaching purposes. ID numbers were scrambled so that individuals and municipalities cannot easily be identified.
lapop_brazil
lapop_brazil
A data.frame
Download the original data from https://www.vanderbilt.edu/lapop/raw-data.php
See https://www.vanderbilt.edu/lapop/core-surveys.php for survey questionnaire
Format confidence intervals for nice printing
make_interval_entry(conf.low, conf.high, digits = 2)
make_interval_entry(conf.low, conf.high, digits = 2)
conf.low |
a numeric vector of lower bounds |
conf.high |
a numeric vector of upper bounds |
digits |
number of digits to retain |
a character vector of intervals
conf.low <- c(-0.1652, 0.00304, -6.352) conf.high <- c(0.3052, 0.00696, -1.648) make_interval_entry(conf.low, conf.high)
conf.low <- c(-0.1652, 0.00304, -6.352) conf.high <- c(0.3052, 0.00696, -1.648) make_interval_entry(conf.low, conf.high)
Format estimates and standard errors for nice printing
make_se_entry(estimate, std.error, digits = 2)
make_se_entry(estimate, std.error, digits = 2)
estimate |
a numeric vector of parameter estimates |
std.error |
a numeric vector of standard error estimates |
digits |
number of digits to retain |
a character vector of formatted estimates and standard errors
estimate <- c(0.07, 0.005, -4) std.error <- c(0.12, 0.001, 1.2) make_se_entry(estimate, std.error)
estimate <- c(0.07, 0.005, -4) std.error <- c(0.12, 0.001, 1.2) make_se_entry(estimate, std.error)
Calculates predicted values from a multilevel regression and the post-stratified state-level estimates
post_stratification_helper(model_fit, data, group, weights)
post_stratification_helper(model_fit, data, group, weights)
model_fit |
a model fit object from, e.g., glmer or lm_robust |
data |
a data.frame |
group |
unquoted name of the group variable to construct estimates for |
weights |
unquoted name of post-stratification weights variable |
Please see https://book.declaredesign.org/observational-descriptive.html#multi-level-regression-and-poststratification
data.frame of post-stratified group-level estimates
Draw conclusions from a model given a query, data, and process tracing strategies
process_tracing_estimator(causal_model, query, data, strategies)
process_tracing_estimator(causal_model, query, data, strategies)
causal_model |
a model generated by |
query |
a causal query of interest |
data |
a single row dataset with data on nodes in the model |
strategies |
a vector describing sets of nodes to be examined e.g. c("X", "X-Y") |
See https://book.declaredesign.org/observational-causal.html#process-tracing
a data.frame of estimates
# Simple example showing ambiguity in attribution process_tracing_estimator( causal_model = CausalQueries::make_model("X -> Y"), query = "Y[X=1] > Y[X=0]", data = data.frame(X=1, Y = 1), strategies = "X-Y") # Example where M=1 acts as a hoop test process_tracing_estimator( causal_model = CausalQueries::make_model("X -> M -> Y") |> CausalQueries::set_restrictions("Y[M=1] < Y[M=0]") |> CausalQueries::set_restrictions("M[X=1] < M[X=0]"), query = "Y[X=1] > Y[X=0]", data = data.frame(X=1, Y = 1, M = 0), strategies = c("Y", "X-Y", "X-M-Y"))
# Simple example showing ambiguity in attribution process_tracing_estimator( causal_model = CausalQueries::make_model("X -> Y"), query = "Y[X=1] > Y[X=0]", data = data.frame(X=1, Y = 1), strategies = "X-Y") # Example where M=1 acts as a hoop test process_tracing_estimator( causal_model = CausalQueries::make_model("X -> M -> Y") |> CausalQueries::set_restrictions("Y[M=1] < Y[M=0]") |> CausalQueries::set_restrictions("M[X=1] < M[X=0]"), query = "Y[X=1] > Y[X=0]", data = data.frame(X=1, Y = 1, M = 0), strategies = c("Y", "X-Y", "X-M-Y"))
declare_estimator
Helper function for using rdrobust as a model in declare_estimator
rdrobust_helper(data, y, x, subset = NULL, ...)
rdrobust_helper(data, y, x, subset = NULL, ...)
data |
a data.frame |
y |
unquoted name of the outcome variable |
x |
unquoted name of the running variable |
subset |
an optional vector specifying a subset of observations to be used in the fitting process |
... |
Other arguments to |
rdrobust model fit object
Companion datasets and functions for the book "Research Design in the Social Sciences: Declaration, Diagnosis, and Redesign" (book.declaredesign.org)
Maintainer: Graeme Blair [email protected] (ORCID)
Authors:
Alexander Coppock [email protected] (ORCID)
Macartan Humphreys [email protected] (ORCID)
See https://book.declaredesign.org/complex-designs.html#meta-analysis
rma_helper(data, yi, sei, method = "REML", ...)
rma_helper(data, yi, sei, method = "REML", ...)
data |
a data.frame |
yi |
unquoted variable name of estimates used in meta-analysis |
sei |
unquoted variable name of standard errors used in meta-analysis |
method |
character string to specify whether a fixed- or a random/mixed-effects model should be fitted. A fixed-effects model (with or without moderators) is fitted when using method = "FE". Random/mixed-effects models are fitted by setting method equal to one of the following: "DL", "HE", "SJ", "ML", "REML", "EB", "HS", "HSk", or "GENQ". Default is "REML". |
... |
Further options to be passed to rma |
See ?rma for further details
a data.frame of estimates
See https://book.declaredesign.org/complex-designs.html#meta-analysis
rma_mu_tau(fit)
rma_mu_tau(fit)
fit |
Fit object from the rma function in the metafor package |
a data.frame of estimates
ggplot Theme used in the book "Research Design: Declare, Diagnose, Redesign" (Blair, Coppock, Humphreys)
theme_dd()
theme_dd()
ggplot theme
Note no standard errors or other summary statistics are provided
Note no standard errors or other summary statistics are provided
tidy_stan(x, conf.int = FALSE, conf.level = 0.95, exponentiate = FALSE, ...) tidy_stan(x, conf.int = FALSE, conf.level = 0.95, exponentiate = FALSE, ...)
tidy_stan(x, conf.int = FALSE, conf.level = 0.95, exponentiate = FALSE, ...) tidy_stan(x, conf.int = FALSE, conf.level = 0.95, exponentiate = FALSE, ...)
x |
A stanreg fit from stan_glm |
conf.int |
Logical indicating whether or not to include a confidence interval in the tidied output. Defaults to FALSE. |
conf.level |
The confidence level to use for the confidence interval if conf.int = TRUE. Must be strictly greater than 0 and less than 1. Defaults to 0.95, which corresponds to a 95 percent confidence interval. |
exponentiate |
Logical indicating whether or not to exponentiate the the coefficient estimates. Defaults to FALSE. Note that standard errors are not included when |
... |
Other arguments to broom.mixed::tidy |
See https://book.declaredesign.org/choosing-an-answer-strategy.html#bayesian-formalizations
See https://book.declaredesign.org/choosing-an-answer-strategy.html#bayesian-formalizations
data.frame of results
data.frame of results
Runs amce estimation function and returns tidy data frame output
## S3 method for class 'amce' tidy(x, alpha = 0.05, ...)
## S3 method for class 'amce' tidy(x, alpha = 0.05, ...)
x |
an amce fit object from cjoint::amce |
alpha |
Confidence level |
... |
Extra arguments to pass to tidy |
See https://book.declaredesign.org/experimental-descriptive.html#conjoint-experiments
a data.frame of estimates
library(cjoint) data(immigrationconjoint) data(immigrationdesign) # Run AMCE estimator using all attributes in the design results <- amce(Chosen_Immigrant ~ Gender + Education + `Language Skills` + `Country of Origin` + Job + `Job Experience` + `Job Plans` + `Reason for Application` + `Prior Entry`, data = immigrationconjoint, cluster = TRUE, respondent.id = "CaseID", design = immigrationdesign) # Print summary # tidy(results)
library(cjoint) data(immigrationconjoint) data(immigrationdesign) # Run AMCE estimator using all attributes in the design results <- amce(Chosen_Immigrant ~ Gender + Education + `Language Skills` + `Country of Origin` + Job + `Job Experience` + `Job Plans` + `Reason for Application` + `Prior Entry`, data = immigrationconjoint, cluster = TRUE, respondent.id = "CaseID", design = immigrationdesign) # Print summary # tidy(results)
Runs rdrobust estimation function and returns tidy data frame output
## S3 method for class 'rdrobust' tidy(x, ...)
## S3 method for class 'rdrobust' tidy(x, ...)
x |
Model fit object from rdrobust |
... |
Other arguments (not used) |
See https://book.declaredesign.org/observational-causal.html#regression-discontinuity-designs
a data.frame of estimates