| Title: | Packages and Functions for 'CourseKata' Courses |
|---|---|
| Description: | Easily install and load all packages and functions used in 'CourseKata' courses. Aid teaching with helper functions and augment generic functions to provide cohesion between the network of packages. Learn more about 'CourseKata' at <https://www.coursekata.org>. |
| Authors: | Adam Blake [cre, aut] (ORCID: <https://orcid.org/0000-0001-7881-8652>), Ji Son [aut] (ORCID: <https://orcid.org/0000-0002-4258-4791>), Jim Stigler [aut] (ORCID: <https://orcid.org/0000-0001-6107-7827>) |
| Maintainer: | Adam Blake <[email protected]> |
| License: | AGPL (>= 3) |
| Version: | 0.19.2 |
| Built: | 2026-05-11 10:40:17 UTC |
| Source: | https://github.com/cranhaven/cranhaven.r-universe.dev |
Data describing all residential home sales in Ames, Iowa from the years 2006–2010 as reported by the Ames City Assessor's Office and compiled by De Cock (2011). Ames is located about 30 miles north of Des Moines (the stats capitol) and is home to Iowa State University (the largest university in the state). Each row represents the latest sale of a home (one row per home in the dataset). Columns represent home features and sale prices (outcome). The original dataset includes a uniquely detailed (81 features per home) and comprehensive look at the housing market. The data included here are only a subset used for examples in CourseKata course material. See the references and data source for the full dataset.
To simplify the dataset for instructional purposes, the data were filtered to include only single family homes, residential zoning, 1-2 story homes, homes with brick, cinder block, or concrete foundations, and average to excellent kitchen qualities. Further, the descriptive variables were reduced to the subset described in the format section.
AmesAmes
A data frame with 2930 observations on the following 80 variables:
YearBuiltYear home was built (YYYY).
YearSoldYear of home sale (YYYY). Note: all home sales in this dataset occurred
between 2006 - 2010. If a home was sold more than once between 2006 - 2010, only its latest
sale is included in dataset.
NeighborhoodOne of two neighborhoods in Ames county:
College Creek (CollegeCreek), a neighborhood located adjacent to Iowa State
University (the largest University in the state).
Old Town (OldTown), a nationally designated historic district in Ames. The old
neighborhood is located just north of the central business district.
HomeSizeRRaw above-ground area of home, measured in square feet.
HomeSizeKAbove-ground area of home, measured in thousands of square feet.
LotSizeRRaw total property lot size, measured in square feet.
LotSizeKTotal property lot size, in thousands of square feet.
FloorsNumber of above-ground floors (1 story or 2 story).
BuildQualityAssessor's rating of overall material and finish of the house.
10: Very Excellent
9: Excellent
8: Very Good
7: Good
6: Above Average
5: Average
4: Below Average
3: Fair
2: Poor
1: Very Poor
FoundationType of foundation (ground material underneath the house).
Brick&Tile: Brick and Tile
CinderBlock: Cinder Blocks
PouredConcrete: Poured Concrete
HasCentralAirIndicator if home contains central air conditioning (0 = No, 1 = Yes).
BathroomsNumber of full above-ground bathrooms.
BedroomsNumber of full above-ground bedrooms.
TotalRoomsNumber of above-ground rooms in home, excluding bathrooms.
KitchenQualityAssessor's rating of kitchen material quality.
Excellent
Good
Average
HasFireplaceIndicator if home contains at least one fireplace (0 = No, 1 = Yes).
GarageTypeType of garage.
Attached: includes attached, built-in, basement, and dual-type garages
Detached: includes detached and carport garages
None: home does not have a garage or carport
GarageCarsNumber of cars that can fit in garage.
PriceRSale price of home, in raw USD ($)
PriceKSale price of home, in thousands of USD ($)
TinySet(Ignore) Whether or not this row is in ames_tiny.csv
https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data
De Cock, Dean, (2011). Ames, Iowa: Alternative to the Boston Housing Data as an end of semester regression project, Journal of Statistics Education, 19(3). doi:10.1080/10691898.2011.11889627
These data were generated as outcomes for "students" for three different "instructors" named A, B, and C. The outcome have means such that C > B > A, but the difference is only clearly significant for C > A, and borderline for the others.
class_dataclass_data
An object of class tbl_df (inherits from tbl, data.frame) with 105 rows and 2 columns.
outcomeA hypothetical, numerical outcome of an intervention.
teacherEither "A", "B", or "C", associating the outcome to a teacher.
Attach the CourseKata course packages
coursekata_attach(do_not_ask = FALSE, quietly = FALSE)coursekata_attach(do_not_ask = FALSE, quietly = FALSE)
do_not_ask |
Prevent asking the user to install missing packages (they are skipped). |
quietly |
Whether to suppress messages. |
A named logical vector indicating which packages were attached.
coursekata_attach()coursekata_attach()
Install or update all CourseKata packages.
coursekata_install(...) coursekata_update(...)coursekata_install(...) coursekata_update(...)
... |
Arguments passed on to |
The state of all the packages after any updates have been performed.
This function is called at package start-up and should rarely be needed by the user. The
exception is when the user has called coursekata_unload_theme() and wants to go back to the
CourseKata look and feel. When run, this function sets the CourseKata color palettes
coursekata_palette(), sets the default theme to theme_coursekata(), and tweaks some
default settings for specific plots. To restore the original ggplot2 settings, run
coursekata_unload_theme().
coursekata_load_theme()coursekata_load_theme()
No return value, called to adjust the global state of ggplot2.
coursekata_palette theme_coursekata scale_discrete_coursekata coursekata_unload_theme
List all CourseKata course packages
coursekata_packages(check_remote_version = FALSE)coursekata_packages(check_remote_version = FALSE)
check_remote_version |
Should the remote version number be checked? Requires internet, and will take longer. |
A data frame with three variables: the name of the package package, the version, and
whether it is currently attached.
coursekata_packages()coursekata_packages()
The color palettes used in our theme system
coursekata_palette(indices = integer(0))coursekata_palette(indices = integer(0))
indices |
The indices of the colors to pull (or all colors if no indices are given). |
A named list of the requested colors in the palette.
Create a function that provides a colorblind palette.
coursekata_palette_provider()coursekata_palette_provider()
A function that accepts one argument n, which is the number of colors you want to use
in the plot. This function is used by scales like scale_color_discrete to provide colorblind-
safe palettes. Where possible, the function will use the hand-picked colors from
coursekata_palette(), and when more colors are needed than are available, it will use the
viridisLite::viridis() palette.
scale_discrete_coursekata
Ensures a default CRAN is set if one is not already set, and adds the repository for fivethirtyeightdata.
coursekata_repos(repos = getOption("repos"))coursekata_repos(repos = getOption("repos"))
repos |
Optionally set a repository character vector to augment. |
A set of repositories that can be used to install or update the CourseKata packages.
coursekata_repos()coursekata_repos()
ggplot2 default settingsThis function will restore all of the tweaks to themes and plotting to the original ggplot2
defaults. If you want to go back to the CourseKata look and feel, run
coursekata_load_theme().
coursekata_unload_theme()coursekata_unload_theme()
No return value, called to restore the global state of ggplot2.
coursekata_load_theme
Data from: Controlled clinical trial of canine therapy versus usual care to reduce patient anxiety in the emergency department.
Test if therapy dogs can reduce anxiety in emergency department (ED) patients.
In this controlled clinical trial (NCT03471429), medically stable, adult patients were approached if the physician believed that the patient had “moderate or greater anxiety.” Patients were allocated on a 1:1 ratio to either 15 min exposure to a certified therapy dog and handler (dog), or usual care (control). Patient reported anxiety, pain and depression were assessed using a 0-10 scale (10=worst). Primary outcome was change in anxiety from baseline (T0) to 30 min and 90 min after exposure to dog or control (T1 and T2 respectively); secondary outcomes were pain, depression and frequency of pain medication.
Among 98 patients willing to participate in research, 7 had aversions to dogs, leaving 91 (93%) were willing to see a dog; 40 patients were allocated to each group (dog or control). No data were normally distributed. Median baseline anxiety, pain and depression were similar between groups. With dog exposure, anxiety decreased significantly from T0 to T1: 6 (IQR 4-9.75) to T1: 2 (0-6) compared with 6 (4-8) to 6 (2.5-8) in controls (P<0.001, for T1, Mann-Whitney U). Dog exposure was associated with significantly lower anxiety at T2 and a significant overall treatment effect on two-way repeated measures ANOVA for anxiety, pain and depression. After exposure, 1/40 in the dog group needed pain medication, versus 7/40 in controls (P=0.056, Fisher’s).
Exposure to therapy dogs plus handlers significantly reduced anxiety in ED patients.
erer
A data frame with 84 observations on the following 53 variables:
idSubject ID
conditionWhether the subject saw a Dog or was in the Control group
ageSubject's age in years
genderSubject's self-identified gender
raceSubject's self-identified race
veteranIs the subject a veteran?
disabledIs the subject disabled?
dog_nameThe name of the therapy dog
base_painSubject's self reported pain before the intervention (T0)
base_depressionSubject's self reported depression before the intervention (T0)
base_anxietySubject's self reported anxiety before the intervention (T0)
base_totalThe sum of the subject's base_* scores
later_painSubject's self reported pain after the intervention (T1)
later_depressionSubject's self reported depression after the intervention (T1)
later_anxietySubject's self reported anxiety after the intervention (T1)
later_totalThe sum of the subject's later_* scores
last_painSubject's self reported pain after the intervention (T2)
last_depressionSubject's self reported depression after the intervention (T2)
last_anxietySubject's self reported anxiety after the intervention (T2)
last_totalThe sum of the subject's last_* scores
change_painThe change in subject's pain from before the intervention to after
change_depressionThe change in subject's depression from before the intervention to after
change_anxietyThe change in subject's anxiety from before the intervention to after
change_totalThe sum of the subject's change_* scores
provider_maleWas the health care provider male?
providerThe health care provider's status: either an Advanced Practitioner,
Resident physician, or Attending physician
heart_rateThe subject's heart rate at baseline (T0)
resp_rateThe subject's respiratory rate at baseline (T0)
sp_o2The subject's SpO2 at baseline (T0)
bp_systThe subject's systolic blood pressure at baseline (T0)
bp_diastThe subject's diastolic blood pressure at baseline (T0)
med_givenWas the subject given medication prior to the study? (T0)
mh_noneNone of the other medical history items were indicated
mh_asthmaMedical history: asthma
mh_smokerMedical history: smoker
mh_cadMedical history: coronary artery disease
mh_diabetesMedical history: diabetes mellitus
mh_hypertensionMedical history: hypertension
mh_strokeMedical history: prior stroke
mh_chronic_kidneyMedical history: chronic kidney disease
mh_copdMedical history: chronic obstructive pulmonary disease
mh_hyperlipidemiaMedical history: hyperlipidemia
mh_hivMedical history: HIV
mh_otherMedical history: other (write-in)
ph_adhdPsychiatric history: attention-deficit/hyperactivity disorder
ph_anxietyPsychiatric history: anxiety
ph_bipolarPsychiatric history: bipolar
ph_borderlinePsychiatric history: borderline personality disorder
ph_depressionPsychiatric history: depression
ph_schizophreniaPsychiatric history: schizophrenia
ph_ptsdPsychiatric history: PTSD
ph_noneNone of the other psychiatric history items were indicated
ph_otherPsychiatric history: other (write-in)
Kline, J. A., Fisher, M. A., Pettit, K. L., Linville, C. T., & Beck, A. M. (2019). Controlled clinical trial of canine therapy versus usual care to reduce patient anxiety in the emergency department. PloS One, 14(1), e0209232. doi:10.1371/journal.pone.0209232
This collection of functions is useful for extracting estimates and statistics from a fitted
model. They are particularly useful when estimating many models, like when bootstrapping
confidence intervals. Each function can be used with an already fitted model as an lm object,
or a formula and associated data can be passed to it. All of these assume the comparison is the
empty model.
b0(object, data = NULL) b1(object, data = NULL) b(object, data = NULL, all = FALSE, predictor = character()) f(object, data = NULL, all = FALSE, predictor = character(), type = 3) pre(object, data = NULL, all = FALSE, predictor = character(), type = 3) p(object, data = NULL, all = FALSE, predictor = character(), type = 3) fVal(object, data = NULL, all = FALSE, predictor = character(), type = 3) PRE(object, data = NULL, all = FALSE, predictor = character(), type = 3)b0(object, data = NULL) b1(object, data = NULL) b(object, data = NULL, all = FALSE, predictor = character()) f(object, data = NULL, all = FALSE, predictor = character(), type = 3) pre(object, data = NULL, all = FALSE, predictor = character(), type = 3) p(object, data = NULL, all = FALSE, predictor = character(), type = 3) fVal(object, data = NULL, all = FALSE, predictor = character(), type = 3) PRE(object, data = NULL, all = FALSE, predictor = character(), type = 3)
object |
|
data |
If |
all |
If |
predictor |
Filter the output down to just the statistics for these terms (e.g. "hp" to
just get the statistics for that term in the model). This argument is flexible: you can pass
a character vector of terms ( |
type |
The type of sums of squares to calculate (see |
b0: The intercept from the full model.
b1: The slope b1 from the full model.
b: The coefficients from the full model.
f: The F value from the full model.
pre: The Proportional Reduction in Error for the full model.
p: The p-value from the full model.
sse: The SS Error (SS Residual) from the model.
ssm: The SS Model (SS Regression) for the full model.
ssr: Alias for SSM.
The value of the estimate as a single number.
Judd, C. M., McClelland, G. H., & Ryan, C. S. (2017). Data Analysis: A Model Comparison Approach to Regression, ANOVA, and Beyond (3rd ed.). New York: Routledge. ISBN:879-1138819832
supernova(lm(mpg ~ disp, data = mtcars)) change_p_decimals <- supernova(lm(mpg ~ disp, data = mtcars)) print(change_p_decimals, pcut = 8)supernova(lm(mpg ~ disp, data = mtcars)) change_p_decimals <- supernova(lm(mpg ~ disp, data = mtcars)) print(change_p_decimals, pcut = 8)
Data from: Fundamentals of Biostatistics Notes from: Kahn, M.
Sample of 654 youths, aged 3 to 19, in the area of East Boston during middle to late 1970's. Interest concerns the relationship between smoking and FEV. Since the study is necessarily observational, statistical adjustment via regression models clarifies the relationship.
This is a versatile dataset that can be used throughout an introductory statistics course as well as an introductory modeling course. It includes many issues from statistical adjustment in observational studies, to subgroup analysis, quadratic regression and analysis of covariance.
fevdatafevdata
A data frame with 654 observations on the following 5 variables:
AGEAge, in years
FEVForced expiratory volume, in liters
HEIGHTHeight, in inches
SEX0 = Female, 1 = Male
SMOKE0 = Non-smoker, 1 = Smoker
Kahn,M. (2003). Data Sleuth, STATS, 37, 24. https://jse.amstat.org/datasets/fev.txt Rosner, B. (1999). Fundamentals of Biostatistics, Pacific Grove, CA: Duxbury
Students at a university taking an introductory statistics course were asked to complete this survey as part of their homework.
FingersFingers
A data frame with 157 observations on the following 16 variables:
GenderGender of participant.
RaceEthnicRacial or ethnic background.
FamilyMembersMembers of immediate family (excluding self).
SSLastLast digit of social security number (NA if no SSN).
YearYear in school: 1=First, 2=Second, 3=Third, 4=Fourth, 5=Other
JobCurrent employment status: 1=Not Working, 2=Part-time Job, 3=Full-time Job
MathAnxiousAgreement with the statement "In general I tend to feel very anxious
about mathematics": 1=Strongly Disagree, 2=Disagree, 3=Neither Agree nor Disagree,
4=Agree, 5=Strongly Agree
InterestInterest in statistics and the course: 1=No Interest, 2=Somewhat
Interested, 3=Very Interested
GradePredictNumeric prediction for final grade in the course. The value is
converted from the student's letter grade prediction. 4.0=A, 3.7=A-, 3.3=B+, 3.0=B,
2.7=B-, 2.3=C+, 2.0=C, 1.7=C-, 1.3=Below C-
ThumbLength in mm from tip of thumb to the crease between the thumb and palm.
IndexLength in mm from tip of index finger to the crease between the index finger and palm.
MiddleLength in mm from tip of middle finger to the crease between the middle finger and palm.
RingLength in mm from tip of ring finger to the crease between the middle finger and palm.
PinkieLength in mm from tip of pinkie finger to the crease between the pinkie finger and palm.
HeightHeight in inches.
WeightWeight in pounds.
SexSex of participant.
This is the Fingers dataset before it was cleaned. In the cleaning process, we converted the values from numbers to appropriate types (where applicable), removed outliers that suggested data was input incorrectly, and we removed incomplete cases. The description for the dataset is: Students at a university taking an introductory statistics course were asked to complete this survey as part of their homework. (This is the same data set as the Fingers data)
FingersMessyFingersMessy
A data frame with 157 observations on the following 16 variables:
GenderGender of participant.
RaceEthnicRacial or ethnic background.
FamilyMembersMembers of immediate family (excluding self).
SSLastLast digit of social security number (NA if no SSN).
YearYear in school: 1=First, 2=Second, 3=Third, 4=Fourth, 5=Other
JobCurrent employment status: 1=Not Working, 2=Part-time Job, 3=Full-time Job
MathAnxiousAgreement with the statement "In general I tend to feel very anxious
about mathematics": 1=Strongly Disagree, 2=Disagree, 3=Neither Agree nor Disagree,
4=Agree, 5=Strongly Agree
InterestInterest in statistics and the course: 1=No Interest, 2=Somewhat
Interested, 3=Very Interested
GradePredictNumeric prediction for final grade in the course. The value is
converted from the student's letter grade prediction. 4.0=A, 3.7=A-, 3.3=B+, 3.0=B,
2.7=B-, 2.3=C+, 2.0=C, 1.7=C-, 1.3=Below C-
ThumbLength in mm from tip of thumb to the crease between the thumb and palm.
IndexLength in mm from tip of index finger to the crease between the index finger and palm.
MiddleLength in mm from tip of middle finger to the crease between the middle finger and palm.
RingLength in mm from tip of ring finger to the crease between the middle finger and palm.
PinkieLength in mm from tip of pinkie finger to the crease between the pinkie finger and palm.
HeightHeight in inches.
WeightWeight in pounds.
SexSex of participant.
Test the fit of a model on a train and test set.
fit_stats(model, df_train, df_test) fitstats(model, df_train, df_test)fit_stats(model, df_train, df_test) fitstats(model, df_train, df_test)
model |
An |
df_train |
A data frame with the training data. |
df_test |
A data frame with the test data. |
A data frame with the fit statistics.
The simulated results of a small study comparing the effectiveness of three different computer- based math games in a sample of 105 fifth-grade students. All three games focused on the same topic and had identical learning goals, and none of the students had any prior knowledge of the topic.
game_datagame_data
A data frame with 105 observations on the following 2 variables:
gameThe game the student was randomly assigned to, coded as "A", "B", or "C".
outcomeEach student's score on the outcome test.
When teaching about regression it can be useful to visualize the data as a point plot with the
outcome on the y-axis and the explanatory variable on the x-axis. For regression models, this is
most easily achieved by calling ggformula::gf_lm(), with empty models
ggformula::gf_hline() using the mean, and a more complicated call to
ggformula::gf_segment() for group models. This function simplifies this
by making a guess about what kind of model you are plotting (empty/null, regression, group) and
then making the appropriate plot layer for it.
gf_model(object, model, ...)gf_model(object, model, ...)
object |
A plot created with the |
model |
|
... |
Additional arguments. Typically these are (a) ggplot2 aesthetics to be set with
|
This function only works with models that have a continuous outcome measure.
a gg object (a plot layer) that can be added to a plot.
This function adds vertical lines representing residuals from a linear model to a ggformula plot. The residuals are drawn from the observed data points to the predicted values from the model.
gf_resid(plot, model, linewidth = 0.2, ...)gf_resid(plot, model, linewidth = 0.2, ...)
plot |
A ggformula plot object, typically created with |
model |
A fitted linear model object created using |
linewidth |
A numeric value specifying the width of the residual lines. Default is |
... |
Additional aesthetics passed to |
A ggplot object with residual lines added.
Height_model <- lm(Thumb ~ Height, data = Fingers) gf_point(Thumb ~ Height, data = Fingers) %>% gf_model(Height_model) %>% gf_resid(Height_model, color = "red", alpha = 0.5)Height_model <- lm(Thumb ~ Height, data = Fingers) gf_point(Thumb ~ Height, data = Fingers) %>% gf_model(Height_model) %>% gf_resid(Height_model, color = "red", alpha = 0.5)
gf_resid_fun(plot, fun, linewidth = 0.2, ...)gf_resid_fun(plot, fun, linewidth = 0.2, ...)
plot |
A ggformula/ggplot object, typically created with |
fun |
A function that takes a numeric vector x and returns predicted y. |
linewidth |
Numeric width of the residual lines. Default |
... |
Additional aesthetics passed to |
Draws vertical residual lines from observed points to predicted values
computed by a user-supplied function of x (e.g., the function plotted with
gf_function()).
A ggplot object with residual segments added.
set.seed(1) df <- data.frame(X = 1:10, Y = 2 + 3 * (1:10) + rnorm(10)) my_fun <- function(x) 2 + 3 * x gf_point(Y ~ X, data = df) %>% gf_function(my_fun) %>% gf_resid_fun(my_fun, color = "red", alpha = 0.5)set.seed(1) df <- data.frame(X = 1:10, Y = 2 + 3 * (1:10) + rnorm(10)) my_fun <- function(x) 2 + 3 * x gf_point(Y ~ X, data = df) %>% gf_function(my_fun) %>% gf_resid_fun(my_fun, color = "red", alpha = 0.5)
gf_sd_ruler( p, y = NULL, data = NULL, x = NULL, where = c("middle", "mean", "median"), color = "red", size = 0.8, ... )gf_sd_ruler( p, y = NULL, data = NULL, x = NULL, where = c("middle", "mean", "median"), color = "red", size = 0.8, ... )
p |
A ggplot object (typically from |
y |
The y-variable (bare name or string). Defaults to the plot's mapped y aesthetic if omitted. |
data |
Dataset. Defaults to |
x |
The x-variable for placement. Defaults to the plot's mapped x. |
where |
Where on the x-axis to place the ruler: |
color |
Segment color. Default |
size |
Segment |
... |
Additional arguments passed to |
Adds a vertical segment showing one standard deviation of a variable, placed at a specified x position. Works for both numeric x (scatter) and categorical x (jitter) plots.
A ggplot object with the SD ruler segment added.
gf_jitter(Thumb ~ Height, data = Fingers) %>% gf_model(lm(Thumb ~ NULL, data = Fingers)) %>% gf_sd_ruler()gf_jitter(Thumb ~ Height, data = Fingers) %>% gf_model(lm(Thumb ~ NULL, data = Fingers)) %>% gf_sd_ruler()
gf_squaresid() was renamed to gf_square_resid() for naming consistency
and is now deprecated.
gf_square_resid(plot, model, aspect = 4/6, alpha = 0.1, ...) gf_squaresid(plot, model, aspect = 4/6, alpha = 0.1, ...)gf_square_resid(plot, model, aspect = 4/6, alpha = 0.1, ...) gf_squaresid(plot, model, aspect = 4/6, alpha = 0.1, ...)
plot |
A ggformula plot object, typically created with |
model |
A fitted linear model object created using |
aspect |
A numeric value controlling the square's aspect ratio. Default is |
alpha |
A numeric value specifying the transparency of the square's fill. Default is |
... |
Additional aesthetics passed to |
This function adds squared residual representations to a ggformula plot, illustrating squared error as a polygon. The function dynamically adjusts the aspect ratio to ensure proper scaling of squares.
A ggplot object with squared residuals added.
Height_model <- lm(Thumb ~ Height, data = Fingers) gf_point(Thumb ~ Height, data = Fingers) %>% gf_model(Height_model) %>% gf_square_resid(Height_model, color = "blue", alpha = 0.5)Height_model <- lm(Thumb ~ Height, data = Fingers) gf_point(Thumb ~ Height, data = Fingers) %>% gf_model(Height_model) %>% gf_square_resid(Height_model, color = "blue", alpha = 0.5)
gf_square_resid_fun(plot, fun, aspect = 4/6, alpha = 0.1, ...)gf_square_resid_fun(plot, fun, aspect = 4/6, alpha = 0.1, ...)
plot |
A ggformula/ggplot object, typically created with |
fun |
A function that takes a numeric vector x and returns predicted y. |
aspect |
A numeric value controlling the square's aspect ratio.
Default is |
alpha |
Transparency of the filled squares. Default |
... |
Additional aesthetics passed to |
Draws squared residual polygons between observed points and predicted values computed by a user-supplied function of x.
A ggplot object with squared residual polygons added.
set.seed(1) df <- data.frame(X = 1:10, Y = 2 + 3 * (1:10) + rnorm(10)) my_fun <- function(x) 2 + 3 * x gf_point(Y ~ X, data = df) %>% gf_function(my_fun) %>% gf_square_resid_fun(my_fun, color = "red", alpha = 0.3)set.seed(1) df <- data.frame(X = 1:10, Y = 2 + 3 * (1:10) + rnorm(10)) my_fun <- function(x) 2 + 3 * x gf_point(Y ~ X, data = df) %>% gf_function(my_fun) %>% gf_square_resid_fun(my_fun, color = "red", alpha = 0.3)
gf_squareplot( x, data = NULL, binwidth = NULL, origin = NULL, boundary = NULL, fill = "#7fcecc", color = "black", alpha = 1, na.rm = TRUE, mincount = NULL, bars = c("none", "outline", "solid"), xbreaks = NULL, xrange = NULL, show_dgp = FALSE, show_mean = FALSE, auto_subdivide = FALSE )gf_squareplot( x, data = NULL, binwidth = NULL, origin = NULL, boundary = NULL, fill = "#7fcecc", color = "black", alpha = 1, na.rm = TRUE, mincount = NULL, bars = c("none", "outline", "solid"), xbreaks = NULL, xrange = NULL, show_dgp = FALSE, show_mean = FALSE, auto_subdivide = FALSE )
x |
Formula ( |
data |
Data frame (required if |
binwidth |
Width of histogram bins. Auto-calculated if |
origin |
Starting position for bins. |
boundary |
Alias for |
fill |
Rectangle fill color. Default |
color |
Rectangle border color. Default |
alpha |
Transparency. Default |
na.rm |
Remove |
mincount |
Minimum y-axis height for consistent scaling. |
bars |
Display style: |
xbreaks |
Number of x-axis breaks or vector of specific positions. |
xrange |
X-axis limits as |
show_dgp |
Show DGP annotation overlay. Default |
show_mean |
Show dashed mean line. Default |
auto_subdivide |
Split bins with >75 observations into sub-columns.
Default |
Creates histograms where individual data points are visible as stacked unit rectangles, making counts easy to visualize. Designed for teaching statistical concepts, particularly sampling distributions.
A ggplot object with S3 class c("gf_squareplot", "gg", "ggplot").
gf_squareplot(~Thumb, data = Fingers) gf_squareplot(~Thumb, data = Fingers, bars = "outline")gf_squareplot(~Thumb, data = Fingers) gf_squareplot(~Thumb, data = Fingers, bars = "outline")
Given a distribution, find which values lie in the upper, lower, or middle proportion of the
distribution. Useful when you want to do something like shade in the middle 95% of a plot. This
is a greedy operation, meaning that if the cutoff point is between two whole numbers the
specified region will suck up the extra space. For example, the requesting the upper 30% of the
[1 2 3 4] will return [FALSE FALSE TRUE TRUE] because the 30% was greedy.
outer() marks values in both outer tails of a distribution. It is the
complement of middle(): outer(x, prop) is equivalent to
tails(x, 1 - prop).
middle(x, prop = 0.95, greedy = TRUE) tails(x, prop = 0.95, greedy = TRUE) outer(x, prop) lower(x, prop = 0.025, greedy = TRUE) upper(x, prop = 0.025, greedy = TRUE)middle(x, prop = 0.95, greedy = TRUE) tails(x, prop = 0.95, greedy = TRUE) outer(x, prop) lower(x, prop = 0.025, greedy = TRUE) upper(x, prop = 0.025, greedy = TRUE)
x |
The distribution of values to check. |
prop |
The total proportion in both tails combined, must be in (0, 1). |
greedy |
Whether the function should be greedy, as per the description above. |
Note that NA values are ignored, i.e. they will always return FALSE.
A logical vector indicating which values are in the specified region.
upper(1:10, .1) lower(1:10, .2) middle(1:10, .5) tails(1:10, .5) sampling_distribution <- do(1000) * mean(rnorm(100, 5, 10)) sampling_distribution %>% gf_histogram(~mean, data = sampling_distribution, fill = ~ middle(mean, .68)) %>% gf_refine(scale_fill_manual(values = c("blue", "coral")))upper(1:10, .1) lower(1:10, .2) middle(1:10, .5) tails(1:10, .5) sampling_distribution <- do(1000) * mean(rnorm(100, 5, 10)) sampling_distribution %>% gf_histogram(~mean, data = sampling_distribution, fill = ~ middle(mean, .68)) %>% gf_refine(scale_fill_manual(values = c("blue", "coral")))
palmerpenguins::penguins data set.The modifications are to select only a subset of the variables, and convert some of the units.
penguinspenguins
A data frame with 333 observations on the following 7 variables:
speciesThe species of penguin, coded as "Adelie", "Chinstrap", or "Gentoo".
gentooWhether the penguin is a Gentoo penguin (1) or not (0).
body_mass_kgThe mass of the penguin's body, in kilograms.
flipper_length_mThe length of the penguin's flipper, in m.
bill_length_cmThe length of the penguin's bill, in cm.
femaleWhether the penguin is female (1) or not (0).
islandThe island where the penguin was observed, coded as "Biscoe", "Dream", or "Torgersen".
See coursekata_palette() for more information.
scale_discrete_coursekata(...)scale_discrete_coursekata(...)
... |
Additional parameters passed on to the scale type. |
A discrete color scale.
coursekata_palette
show_cutoffs(plot, color = "#1e3a8a", size = 4, labels = FALSE)show_cutoffs(plot, color = "#1e3a8a", size = 4, labels = FALSE)
plot |
A ggplot histogram with |
color |
Marker/line color. Default |
size |
Marker size. Default |
labels |
Whether to add text annotations explaining the cutoffs.
Default |
Adds downward-pointing triangle markers at the empirical quantile cutoffs on
a histogram that uses a distribution part function (middle(), tails(),
upper(), lower(), or outer()) in its fill aesthetic.
A ggplot object with cutoff markers and optional labels.
gf_histogram(~Thumb, data = Fingers, fill = ~middle(Thumb, .95)) %>% show_cutoffs(labels = TRUE)gf_histogram(~Thumb, data = Fingers, fill = ~middle(Thumb, .95)) %>% show_cutoffs(labels = TRUE)
These data are simulated to be similar to the Ames housing data, but with far fewer variables and much smaller effect sizes.
SmallvilleSmallville
A data frame with 32 observations on the following 4 variables:
PriceKPrice the home sold for (in thousands of dollars)
NeighborhoodThe neighborhood the home is in (Eastside, Downtown)
HomeSizeKThe size of the home (in thousands of square feet)
HasFireplaceWhether the home has a fireplace (0 = no, 1 = yes)
Split data into train and test sets.
split_data(data, prop = 0.7)split_data(data, prop = 0.7)
data |
A data frame. |
prop |
The proportion of rows to assign to the training set. |
A list with two data frames, train and test.
Students at a university taking an introductory statistics course were asked to complete this survey as part of their homework.
SurveySurvey
A data frame with 211 observations on the following 1 variable:
Any1_20The random number between 1 and 20 that a student thought of.
Data about tips collected from an experiment with 44 tables at a restaurant.
TablesTables
A data frame with 44 observations on the following 2 variables.
TableIDA number assigned to each table.
TipHow much the tip was.
ggplot2::theme_bw
The coursekata package automatically loads this theme when the package is loaded. This is in
addition to a number of other plot tweaks and option settings. To just restore the theme to the
default, you can run set_theme(theme_grey). If you want to restore all plot related settings
and/or prevent them when loading the package, see coursekata_unload_theme.
theme_coursekata()theme_coursekata()
A gg theme object
gf_boxplot(Thumb ~ RaceEthnic, data = Fingers, fill = ~RaceEthnic)gf_boxplot(Thumb ~ RaceEthnic, data = Fingers, fill = ~RaceEthnic)
These are simulated data that are similar to the TipExperiment data. Hypothetical tables
were randomly assigned to receive checks that either included or did not include a drawing
of a smiley face, either from a male or a female server.
tip_exptip_exp
A data frame with 44 observations on the following 3 variables.
genderWhether the server was female or male
conditionWhether the check had a smiley face or not (control)
tip_percentThe size of the tip as a percentage of the price of the meal
Tables were randomly assigned to receive checks that either included or did not include a drawing of a smiley face. Data was collected from 44 tables in an effort to examine whether the added smiley face would cause more generous tipping.
TipExperimentTipExperiment
A data frame with 44 observations on the following 3 variables.
TableIDA number assigned to each table.
TipHow much the tip was.
ConditionWhich experimental condition the table was randomly assigned to.
Check(Simulated) The amount of money the table paid for their meal.
FoodQuality(Simulated) The perceived quality of the food.
These data have been updated with some historical height data (from Our World in Data), drinking data (collected by the World Health Organization featured in fivethirtyeight), population and land characteristics, and vaccination data (from March 2023).
WorldWorld
A data frame with 130 observations on the following 14 variables:
CountryName of country
RegionOne of 5 UN defined regions: Africa, Americas, Asia, Europe, Oceania
CodeThree-letter country codes defined by the International Organization for Standardization (ISO) to represent countries in a way that avoids errors since a country’s name changes depending on the language being used.
LifeExpectancyAverage life expectancy (in years)
GirlsH1900The average of 18-year-old girls heights in 1900 (in cm)
GirlsH1980The average of 18-year-old girls heights in 1980 (in cm)
HappinessScore on a 0-10 scale for average level of happiness (10 being happiest)
GDPperCapitaGross Domestic Product (per capita)
FertRateThe average number of children that will be born to a woman over her lifetime
PeopleVaccTotal number of people vaccinated in the country
PeopleVacc_per100Total number of people vaccinated in the country (in percent)
Population2010Population (in millions) in 2010
Population2020Population (in millions) in 2020
WineServAverage wine consumption per capita for those age 15 and over per week (collected by WHO)