Title: | Downloading Supplementary Data from Published Manuscripts |
---|---|
Description: | Downloads data supplementary materials from manuscripts, using papers' DOIs as references. Facilitates open, reproducible research workflows: scientists re-analyzing published datasets can work with them as easily as if they were stored on their own computer, and others can track their analysis workflow painlessly. The main function suppdata() returns a (temporary) location on the user's computer where the file is stored, making it simple to use suppdata() with standard functions like read.csv(). |
Authors: | William D. Pearse [aut, cre] , Scott Chamberlain [aut] , Daniel Nuest [aut] , Ross Mounce [rev] (Ross Mounce reviewed the package for rOpenSci, see https://github.com/ropensci/onboarding/issues/195), Sarah Supp [rev] (Sarah Supp reviewed the package for rOpenSci, see https://github.com/ropensci/onboarding/issues/195) |
Maintainer: | William D. Pearse <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1-9 |
Built: | 2025-01-13 12:39:57 UTC |
Source: | https://github.com/cranhaven/cranhaven.r-universe.dev |
Put a call to this function where you would put a file-path - everything is cached by default, so you don't have to worry about multiple downloads in the same session.
suppdata( x, si, from = c("auto", "plos", "wiley", "science", "proceedings", "figshare", "esa_data_archives", "esa_archives", "biorxiv", "epmc", "peerj", "copernicus", "jstatsoft"), save.name = NA, dir = NA, cache = TRUE, vol = NA, issue = NA, list = FALSE, timeout = 10, zip = FALSE ) ## S3 method for class 'character' suppdata( x, si, from = c("auto", "plos", "wiley", "science", "proceedings", "figshare", "esa_data_archives", "esa_archives", "biorxiv", "epmc", "peerj", "copernicus", "jstatsoft"), save.name = NA, dir = NA, cache = TRUE, vol = NA, issue = NA, list = FALSE, timeout = 10, zip = FALSE )
suppdata( x, si, from = c("auto", "plos", "wiley", "science", "proceedings", "figshare", "esa_data_archives", "esa_archives", "biorxiv", "epmc", "peerj", "copernicus", "jstatsoft"), save.name = NA, dir = NA, cache = TRUE, vol = NA, issue = NA, list = FALSE, timeout = 10, zip = FALSE ) ## S3 method for class 'character' suppdata( x, si, from = c("auto", "plos", "wiley", "science", "proceedings", "figshare", "esa_data_archives", "esa_archives", "biorxiv", "epmc", "peerj", "copernicus", "jstatsoft"), save.name = NA, dir = NA, cache = TRUE, vol = NA, issue = NA, list = FALSE, timeout = 10, zip = FALSE )
x |
One of: vector of DOI(s) of article(s) (a
|
si |
number of the supplementary information (SI) to be
downloaded (1, 2, 3, etc.), or (for ESA, Science, and
Copernicus journals) the name of the supplement (e.g.,
"S1_data.csv"). Can be a |
from |
Publisher of article ( |
save.name |
a name for the file to download
( |
dir |
directory to save file to ( |
cache |
if |
vol |
Article volume (Proceedings journals only;
|
issue |
Article issue (Proceedings journals only;
|
list |
if |
timeout |
how long to wait for successful download (default 10 seconds) |
zip |
if |
The examples probably give the best indication of how to
use this function. In general, just specify the DOI of the article
you want to download data from, and the number of the supplement
you want to download (1, 5, etc.). Proceedings, and Science journals
need you to give the filename of the supplement to
download. The file extensions (suffixes) of files are returned as
suffix
attributes (see first example), which may be useful
if you don't know the format of the file you're downloading.
For any DOIs not recognised (and if asked) the European PubMed
Central API is used to look up articles. What this database calls a
supplementary file varies by publisher; often they will simply be
figures within articles, but we (obviously) have no way to check
this at run-time. I strongly recommend you run any EPMC calls with
list=TRUE
the first time, to see the filenames that EPMC
gives supplements, as these also often vary from what the authors
gave them. This may actually be a 'feature', not a 'bug', if you're
trying to automate some sort of meta-analysis.
Below is a list of all the publishers this supports, and examples of journals from them.
Default. Use a cross-ref search
(cr_works
) on the DOI to
determine the publisher.
Public Library of Science journals (e.g., PLoS One;)
Wiley journals (e.g., Ecology Letters)
Science magazine (e.g., Science Advances)
Royal Society of London journals (e.g.,
Proceedings of the Royal Society of London B). Requires
vol
and issue
of the article.
Figshare
Load from bioRxiv
Look up an article on the Europe PubMed Central, and then download the file using their supplementary materials API. See comments above in 'notes' about EPMC.
PeerJ journals (e.g., PeerJ Preprints).
Copernicus Publications journals (e.g., Biogeosciences).
Only one supplemental is supported, which can be a zip archive or a PDF file.
A numeric si
parameter must be 1
to download the
whole archive, which is saved using Copernicus naming scheme
(<journalname>-<volume>-<firstpage>-<year>-supplement.zip)
and save.name
is ignored, or to download the PDF.
If si
matches the name of the supplemental archive (i.e. uses the
Copernicus naming scheme), then the suppdata archive is not unzipped.
si
may be the name of a file in that
archive, so only that file is extracted and saved to save.name
.
Make sure that the article from which you're attempting to download supplementary materials *has* supplementary materials. 404 errors and 'file not found' errors can result from such cases.
Will Pearse ([email protected]) and Scott Chamberlain ([email protected])
# NOTE: The examples below are flagged as 'dontrun' to avoid # running downloads repeatedly on CRAN servers ## Not run: #Put the function wherever you would put a file path crabs <- read.csv(suppdata("10.6084/m9.figshare.979288", 2)) epmc.fig <- suppdata("10.1371/journal.pone.0126524", "pone.0126524.s002.jpg", "epmc") #...note this 'SI' is not actually an SI, but rather an image from the paper. #View the suffix (file extension) of downloaded files # - note that not all files are uploaded/stored with useful file extensions! attr(epmc.fig, "suffix") copernicus.csv <- suppdata("10.5194/bg-14-1739-2017", "Table S1 v2 UFK FOR_PUBLICATION.csv", save.name = "data.csv") #...note this 'SI' is not an SI but the name of a file in the supplementary information archive. ## End(Not run) # (examples not run on CRAN to avoid downloading files repeatedly)
# NOTE: The examples below are flagged as 'dontrun' to avoid # running downloads repeatedly on CRAN servers ## Not run: #Put the function wherever you would put a file path crabs <- read.csv(suppdata("10.6084/m9.figshare.979288", 2)) epmc.fig <- suppdata("10.1371/journal.pone.0126524", "pone.0126524.s002.jpg", "epmc") #...note this 'SI' is not actually an SI, but rather an image from the paper. #View the suffix (file extension) of downloaded files # - note that not all files are uploaded/stored with useful file extensions! attr(epmc.fig, "suffix") copernicus.csv <- suppdata("10.5194/bg-14-1739-2017", "Table S1 v2 UFK FOR_PUBLICATION.csv", save.name = "data.csv") #...note this 'SI' is not an SI but the name of a file in the supplementary information archive. ## End(Not run) # (examples not run on CRAN to avoid downloading files repeatedly)