--- title: "split and combine_smsm" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{split and combine_smsm} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: split_ref.bib link-citations: true --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, fig.width=7, fig.height=5) ``` ```{=html} ``` ## Introduction The Introduction to `synthACS` briefly mentions the `split` and `combine_smsm` functionality in Sections 3.2 and 3.4 respectively. There, we note that deriving the sample synthetic micro data is a memory intensive process and advise using `synthACS` on a high performance machine. Of course, such a machine is not always available, which is when `split` and `combine_smsm` are needed. A brief illustration of these two functions is provided in this vignette. The same example data is used as in the introductory vignette: ```{r, echo= TRUE, eval= FALSE} library(data.table) library(acs) library(synthACS) library(retry) ca_geo <- geo.make(state = "CA", county = "*") ca_dat_SMSM <- pull_synth_data(2014, 5, ca_geo) ``` ## Overview of `split()` and `combine_smsm()` The `split` and `combine_smsm` functions are used, respectively, to reduce the computational requirements of a large spatial microsimulation task into a set of smaller tasks and to recombine the results. They enable the well known "split-apply-combine" strategy for Data Analysis [@plyr]. In this case, the "apply" step is intentionally performed sequentially and **not** inside another function in order to minimize RAM usage and enable a garbage-collection step between intensive in-memory function calls. The syntax for both is straightforward: - `split(