---
title: "split and combine_smsm"
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{split and combine_smsm}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: split_ref.bib
link-citations: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE,
fig.width=7, fig.height=5)
```
```{=html}
```
## Introduction
The Introduction to `synthACS` briefly mentions the `split` and `combine_smsm` functionality
in Sections 3.2 and 3.4 respectively. There, we note that deriving the sample synthetic micro data is a memory intensive process and advise using `synthACS` on a high performance machine. Of course, such
a machine is not always available, which is when `split` and `combine_smsm` are needed.
A brief illustration of these two functions is provided in this vignette. The same example data is
used as in the introductory vignette:
```{r, echo= TRUE, eval= FALSE}
library(data.table)
library(acs)
library(synthACS)
library(retry)
ca_geo <- geo.make(state = "CA", county = "*")
ca_dat_SMSM <- pull_synth_data(2014, 5, ca_geo)
```
## Overview of `split()` and `combine_smsm()`
The `split` and `combine_smsm` functions are used, respectively, to reduce the computational
requirements of a large spatial microsimulation task into a set of smaller tasks and to recombine
the results. They enable the well known "split-apply-combine" strategy for Data Analysis [@plyr].
In this case, the "apply" step is intentionally performed sequentially and **not** inside another
function in order to minimize RAM usage and enable a garbage-collection step between intensive
in-memory function calls.
The syntax for both is straightforward:
- `split(