quanteda.textmodels - Scaling Models and Classifiers for Textual Data
Scaling models and classifiers for sparse matrix objects representing textual data in the form of a document-feature matrix. Includes original implementations of 'Laver', 'Benoit', and Garry's (2003) <doi:10.1017/S0003055403000698>, 'Wordscores' model, the Perry and 'Benoit' (2017) <doi:10.48550/arXiv.1710.08963> class affinity scaling model, and the 'Slapin' and 'Proksch' (2008) <doi:10.1111/j.1540-5907.2008.00338.x> 'wordfish' model, as well as methods for correspondence analysis, latent semantic analysis, and fast Naive Bayes and linear 'SVMs' specially designed for sparse textual data.
Last updated 6 days ago
archivedpackagesr-universeopenblascpp
6.16 score 5 stars 448 scripts 1.3k downloadstidymv - Tidy Model Visualisation for Generalised Additive Models
Provides functions for visualising generalised additive models and getting predicted values using tidy tools from the 'tidyverse' packages.
Last updated 11 days ago
archivedpackagesr-universe
5.70 score 5 stars 1 dependents 336 scripts 565 downloadsreadxlsb - Read 'Excel' Binary (.xlsb) Workbooks
Import data from 'Excel' binary (.xlsb) workbooks into R.
Last updated 11 days ago
archivedpackagesr-universecpp
5.40 score 5 stars 1 dependents 27 scripts 12k downloadsfedmatch - Fast, Flexible, and User-Friendly Record Linkage Methods
Provides a flexible set of tools for matching two un-linked data sets. 'fedmatch' allows for three ways to match data: exact matches, fuzzy matches, and multi-variable matches. It also allows an easy combination of these three matches via the tier matching function.
Last updated 11 days ago
archivedpackagesr-universecppopenmp
5.00 score 5 stars 80 scripts 601 downloadsFeatureHashing - Creates a Model Matrix via Feature Hashing with a Formula Interface
Feature hashing, also called as the hashing trick, is a method to transform features of a instance to a vector. Thus, it is a method to transform a real dataset to a matrix. Without looking up the indices in an associative array, it applies a hash function to the features and uses their hash values as indices directly. The method of feature hashing in this package was proposed in Weinberger et al. (2009) <arXiv:0902.2206>. The hashing algorithm is the murmurhash3 from the 'digest' package. Please see the README in <https://github.com/wush978/FeatureHashing> for more information.
Last updated 2 days ago
archivedpackagesr-universecpp
4.85 score 5 stars 141 scripts 294 downloadsdimRed - A Framework for Dimensionality Reduction
A collection of dimensionality reduction techniques from R packages and a common interface for calling the methods.
Last updated 11 days ago
archivedpackagesr-universe
4.36 score 5 stars 154 scripts 922 downloadsSurvMetrics - Predictive Evaluation Metrics in Survival Analysis
An implementation of popular evaluation metrics that are commonly used in survival prediction including Concordance Index, Brier Score, Integrated Brier Score, Integrated Square Error, Integrated Absolute Error and Mean Absolute Error. For a detailed information, see (Ishwaran H, Kogalur UB, Blackstone EH and Lauer MS (2008) <doi:10.1214/08-AOAS169>) and (Moradian H, Larocque D and Bellavance F (2017) <doi:10.1007/s10985-016-9372-1>) for different evaluation metrics.
Last updated 16 days ago
archivedpackagesr-universe
4.21 score 5 stars 65 scripts 677 downloadsEmbedSOM - Fast Embedding Guided by Self-Organizing Map
Provides a smooth mapping of multidimensional points into low-dimensional space defined by a self-organizing map. Designed to work with 'FlowSOM' and flow-cytometry use-cases. See Kratochvil et al. (2019) <doi:10.12688/f1000research.21642.1>.
Last updated 11 days ago
archivedpackagesr-universecpp
4.00 score 5 stars 8 scripts 250 downloadsmregions2 - Access Data from Marineregions.org: Gazetteer & Data Products
Explore and retrieve marine geospatial data from the Marine Regions Gazetteer <https://marineregions.org/gazetteer.php?p=webservices> and the Marine Regions Data Products <https://marineregions.org/webservices.php>.
Last updated 8 days ago
archivedpackagesr-universe
3.94 score 5 stars 35 scripts 181 downloadsglycanr - Tools for Analysing N-Glycan Data
Useful utilities in N-glycan data analysis. This package tries to fill the gap in N-glycan data analysis by providing easy to use functions for basic operations on data (see <https://en.wikipedia.org/wiki/Glycomics> for more details on Glycomics). At the moment 'glycanr' is mostly oriented to data obtained by UPLC (Ultra Performance Liquid Chromatography) and LCMS (Liquid chromatography–mass spectrometry) analysis of Plasma and IgG glycome.
Last updated 8 days ago
archivedpackagesr-universe
3.85 score 5 stars 14 scripts 185 downloadstimeplyr - Fast Tidy Tools for Date and Date-Time Manipulation
A set of fast tidy functions for wrangling, completing and summarising date and date-time data. It combines 'tidyverse' syntax with the efficiency of 'data.table' and speed of 'collapse'.
Last updated 5 days ago
archivedpackagesr-universecpp
3.83 score 5 stars 268 scripts 440 downloadsaPEAR - Advanced Pathway Enrichment Analysis Representation
Simplify pathway enrichment analysis results by detecting clusters of similar pathways and visualizing it as an enrichment network, where nodes and edges describe the pathways and similarity between them, respectively. This reduces the redundancy of the overlapping pathways and helps to notice the most important biological themes in the data (Kerseviciute and Gordevicius (2023) <doi:10.1101/2023.03.28.534514>).
Last updated 11 days ago
archivedpackagesr-universe
3.74 score 5 stars 22 scripts 412 downloadsfluxfinder - Parsing, Computation, and Diagnostics for Greenhouse Gas Measurements
Parse static-chamber greenhouse gas measurement files generated by a variety of instruments; compute flux rates using multi-observation metadata; and generate diagnostic metrics and plots. Designed to be easy to integrate into reproducible scientific workflows.
Last updated 11 days ago
archivedpackagesr-universe
3.70 score 5 stars 7 scripts 480 downloadsrticulate - Ultrasound Tongue Imaging
A tool for processing Articulate Assistant Advanced™ (AAA) export files and plot tongue contour data from any system.
Last updated 11 days ago
archivedpackagesr-universe
3.45 score 5 stars 14 scripts 269 downloadsreadwritesqlite - Enhanced Reading and Writing for 'SQLite' Databases
Reads and writes data frames to 'SQLite' databases while preserving time zones (for POSIXct columns), projections (for 'sfc' columns), units (for 'units' columns), levels (for factors and ordered factors) and classes for logical, Date and 'hms' columns. It also logs changes to tables and provides more informative error messages.
Last updated 4 days ago
archivedpackagesr-universe
3.44 score 5 stars 11 scripts 271 downloadsCovRegRF - Covariance Regression with Random Forests
Covariance Regression with Random Forests (CovRegRF) is a random forest method for estimating the covariance matrix of a multivariate response given a set of covariates. Random forest trees are built with a new splitting rule which is designed to maximize the distance between the sample covariance matrix estimates of the child nodes. The method is described in Alakus et al. (2023) <doi:10.1186/s12859-023-05377-y>. 'CovRegRF' uses 'randomForestSRC' package (Ishwaran and Kogalur, 2022) <https://cran.r-project.org/package=randomForestSRC> by freezing at the version 3.1.0. The custom splitting rule feature is utilised to apply the proposed splitting rule. The 'randomForestSRC' package implements 'OpenMP' by default, contingent upon the support provided by the target architecture and operating system. In this package, 'LAPACK' and 'BLAS' libraries are used for matrix decompositions.
Last updated 3 days ago
archivedpackagesr-universeopenblasopenmp
3.40 score 5 stars 3 scripts 253 downloadsAssetAllocation - Backtesting Simple Asset Allocation Strategies
Easy and quick testing of customizable asset allocation strategies. Users can rely on their own data, or have the package automatically download data from Yahoo Finance (<https://finance.yahoo.com/>). Several pre-loaded portfolios with data are available, including some which are discussed in Faber (2015, ISBN:9780988679924).
Last updated 11 days ago
archivedpackagesr-universe
3.40 score 5 stars 9 scripts 298 downloadsbinaryMM - Flexible Marginalized Models for Binary Correlated Outcomes
Estimates marginalized mean and dependence model parameters for correlated binary response data. Dependence model may include transition and/or latent variable terms. Methods are described in: Schildcrout and Heagerty (2007) <doi:10.1111/j.1541-0420.2006.00680.x>, Heagerty (1999) <doi:10.1111/j.0006-341x.1999.00688.x>, Heagerty (2002) <doi:10.1111/j.0006-341x.2002.00342.x>.
Last updated 16 days ago
archivedpackagesr-universe
3.40 score 5 stars 3 scripts 220 downloadsOptimalGoldstandardDesigns - Design Parameter Optimization for Gold-Standard Non-Inferiority Trials
Methods to calculate optimal design parameters for one- and two-stage three-arm group-sequential gold-standard non-inferiority trial designs with or without binding or nonbinding futility boundaries, as described in Meis et al. (2023) <doi:10.1002/sim.9630>.
Last updated 16 days ago
archivedpackagesr-universe
3.40 score 5 stars 5 scripts 190 downloadsIP - Classes and Methods for 'IP' Addresses
Provides S4 classes for Internet Protocol (IP) versions 4 and 6 addresses and efficient methods for 'IP' addresses comparison, arithmetic, bit manipulation and lookup. Both 'IPv4' and 'IPv6' arbitrary ranges are also supported as well as internationalized (IDN) domain lookup with and 'whois' query.
Last updated 11 days ago
archivedpackagesr-universe
3.11 score 5 stars 13 scripts 236 downloadsdepower - Power Analysis for Differential Expression Studies
Provides a convenient framework to simulate, test, power, and visualize data for differential expression studies with lognormal or negative binomial outcomes. Supported designs are two-sample comparisons of independent or dependent outcomes. Power may be summarized in the context of controlling the per-family error rate or family-wise error rate. Negative binomial methods are described in Yu, Fernandez, and Brock (2017) <doi:10.1186/s12859-017-1648-2> and Yu, Fernandez, and Brock (2020) <doi:10.1186/s12859-020-3541-7>.
Last updated 11 days ago
archivedpackagesr-universe
2.97 score 5 stars 37 scripts 221 downloadsecdata - Loads Data from the Executive Communications Dataset
A minimal package for downloading data from 'GitHub' repositories of the Executive Communications Database.
Last updated 9 days ago
archivedpackagesr-universe
2.95 score 5 stars 18 scripts 593 downloadsCoFRA - Complete Functional Regulation Analysis
Calculates complete functional regulation analysis and visualize the results in a single heatmap. The provided example data is for biological data but the methodology can be used for large data sets to compare quantitative entities that can be grouped. For example, a store might divide entities into cloth, food, car products etc and want to see how sales changes in the groups after some event. The theoretical background for the calculations are provided in New insights into functional regulation in MS-based drug profiling, Ana Sofia Carvalho, Henrik Molina & Rune Matthiesen, Scientific Reports <doi:10.1038/srep18826>.
Last updated 11 days ago
archivedpackagesr-universe
2.88 score 5 stars 15 scripts 520 downloadsmsBP - Multiscale Bernstein Polynomials for Densities
Performs Bayesian nonparametric multiscale density estimation and multiscale testing of group differences with multiscale Bernstein polynomials (msBP) mixtures as in Canale and Dunson (2016).
Last updated 11 days ago
archivedpackagesr-universecpp
2.74 score 5 stars 11 scripts 247 downloadssuppdata - Downloading Supplementary Data from Published Manuscripts
Downloads data supplementary materials from manuscripts, using papers' DOIs as references. Facilitates open, reproducible research workflows: scientists re-analyzing published datasets can work with them as easily as if they were stored on their own computer, and others can track their analysis workflow painlessly. The main function suppdata() returns a (temporary) location on the user's computer where the file is stored, making it simple to use suppdata() with standard functions like read.csv().
Last updated 8 days ago
archivedpackagesr-universe
2.70 score 5 stars 9 scripts 241 downloadsDBpower - Finite Sample Power Calculations for Detection Boundary Tests
Calculates lower bound on power, upper bound on power, and exact power (small sets only) for detection boundary tests (e.g. Berk-Jones, Generalized Berk-Jones, innovated Berk-Jones) used in set-based inference studies. These detection boundary tests are described in Sun et al., (2019) <doi:10.1080/01621459.2019.1660170>.
Last updated 9 days ago
archivedpackagesr-universe
2.70 score 5 stars 2 scripts 164 downloadsstreetscape - Collect And Investigate Street Views For Urban Science
A collection of functions to search and download street view imagery ('Mapilary' <https://www.mapillary.com/developer/api-documentation>) and to extract, quantify, and visualize visual features. Moreover, there are functions provided to generate Qualtrics survey in TXT format using the collection of street views for various research purposes.
Last updated 11 days ago
archivedpackagesr-universe
2.70 score 5 stars 1 scripts 291 downloadsDBCVindex - Calculates the Density-Based Clustering Validation Index (DBCV) Index
A metric called 'Density-Based Clustering Validation index' (DBCV) index to evaluate clustering results, following the <https://github.com/FelSiq/DBCV> 'Python' implementation by Felipe Alves Siqueira. Original 'DBCV' index article: Moulavi, D., Jaskowiak, P. A., Campello, R. J., Zimek, A., & Sander, J. (2014, April). "Density-based clustering validation", Proceedings of SDM 2014 -- the 2014 SIAM International Conference on Data Mining (pp. 839-847), <doi:10.1137/1.9781611973440.96>.
Last updated 1 months ago
archivedpackagesr-universe
2.70 score 5 stars 121 downloadshtmldf - Simple Scraping and Tidy Webpage Summaries
Simple tools for scraping webpages, extracting common html tags and parsing contents to a tidy, tabular format. Tools help with extraction of page titles, links, images, rss feeds, social media handles and page metadata.
Last updated 11 days ago
archivedpackagesr-universe
2.44 score 5 stars 11 scripts 232 downloadsklexdatr - Kootenay Lake Exploitation Study Data
Six relational 'tibbles' from the Kootenay Lake Large Trout Exploitation study. The study which ran from 2008 to 2014 caught, tagged and released large Rainbow Trout and Bull Trout in Kootenay Lake by boat angling. The fish were tagged with internal acoustic tags and/or high reward external tags and subsequently detected by an acoustic receiver array as well as reported by anglers. The data are analysed by Thorley and Andrusak (1994) <doi:10.7717/peerj.2874> to estimate the natural and fishing mortality of both species.
Last updated 4 days ago
archivedpackagesr-universe
2.40 score 5 stars 7 scripts 150 downloadsunstruwwel - Detect and Parse Historic Dates
Automatically converts language-specific verbal information, e.g., "1st half of the 19th century," to its standardized numerical counterparts, e.g., "1801-01-01/1850-12-31." It follows the recommendations of the 'MIDAS' ('Marburger Informations-, Dokumentations- und Administrations-System'), see <doi:10.11588/artdok.00003770>.
Last updated 4 days ago
archivedpackagesr-universe
2.40 score 5 stars 2 scripts 167 downloadszerotradeflow - An Implementation for the Gravitational Models of Trade
A system for creating the bilateral trade flow between a country pair equal to zero. You provide the data, tell get_zerotradeflow() which variables are of interest and it expands the base by creating the bilateral zero trade flow. The bases on the flow of trade between countries only report positive trade (greater than zero), however, for some analyzes of gravitacional models, data on zero flow is also necessary. Some examples for Gravity Model: Figueiredo and Loures (2016) <doi:10.5935/0034-7140.20160015> and Yotov, Piermartini, Monteiro and Larch <https://vi.unctad.org/tpa/web/docs/vol2/book.pdf>.
Last updated 11 days ago
archivedpackagesr-universe
2.40 score 5 stars 1 scripts 150 downloadsDNH4 - Crawling for Daum News Text
Provides some utils to get Korean text sample from news articles in Daum which is popular news portal service in Korea.
Last updated 11 days ago
archivedpackagesr-universe
2.40 score 5 stars 6 scripts 210 downloadsKaradaColor - Color Palettes Inspired by Japanese Landscape and Culture
The palette includes motifs from Japanese landscape and culture. And it provides commands for color manipulation and 'ggplot2' color scales.
Last updated 11 days ago
archivedpackagesr-universe
2.40 score 5 stars 1 scripts 624 downloadsRFpredInterval - Prediction Intervals with Random Forests and Boosted Forests
Implements various prediction interval methods with random forests and boosted forests. The package has two main functions: pibf() produces prediction intervals with boosted forests (PIBF) as described in Alakus et al. (2022) <doi:10.32614/RJ-2022-012> and rfpi() builds 15 distinct variations of prediction intervals with random forests (RFPI) proposed by Roy and Larocque (2020) <doi:10.1177/0962280219829885>.
Last updated 11 days ago
archivedpackagesr-universeopenmp
2.40 score 5 stars 3 scripts 257 downloadsoptpart - Optimal Partitioning of Similarity Relations
Contains a set of algorithms for creating partitions and coverings of objects largely based on operations on (dis)similarity relations (or matrices). There are several iterative re-assignment algorithms optimizing different goodness-of-clustering criteria. In addition, there are covering algorithms 'clique' which derives maximal cliques, and 'maxpact' which creates a covering of maximally compact sets. Graphical analyses and conversion routines are also included.
Last updated 11 days ago
archivedpackagesr-universefortran
2.40 score 5 stars 50 scripts 248 downloadsrmi - Mutual Information Estimators
Provides mutual information estimators based on k-nearest neighbor estimators by A. Kraskov, et al. (2004) <doi:10.1103/PhysRevE.69.066138>, S. Gao, et al. (2015) <http://proceedings.mlr.press/v38/gao15.pdf> and local density estimators by W. Gao, et al. (2017) <doi:10.1109/ISIT.2017.8006749>.
Last updated 14 days ago
archivedpackagesr-universeopenblascpp
2.40 score 5 stars 7 scripts 155 downloadsvines - Multivariate Dependence Modeling with Vines
Implementation of the vine graphical model for building high-dimensional probability distributions as a factorization of bivariate copulas and marginal density functions. This package provides S4 classes for vines (C-vines and D-vines) and methods for inference, goodness-of-fit tests, density/distribution function evaluation, and simulation.
Last updated 11 days ago
archivedpackagesr-universe
2.18 score 5 stars 1 dependents 10 scripts 373 downloadsGiNA - High Throughput Phenotyping
Performs image segmentation in fruit or seeds pictures in order to measure physical features in a high-throughput manner for genome-wide association (GWAS) and genomic selection programs.
Last updated 16 days ago
archivedpackagesr-universe
2.18 score 5 stars 8 scripts 131 downloadscopulaedas - Estimation of Distribution Algorithms Based on Copulas
Provides a platform where EDAs (estimation of distribution algorithms) based on copulas can be implemented and studied. The package offers complete implementations of various EDAs based on copulas and vines, a group of well-known optimization problems, and utility functions to study the performance of the algorithms. Newly developed EDAs can be easily integrated into the package by extending an S4 class with generic functions for their main components.
Last updated 11 days ago
archivedpackagesr-universe
1.95 score 5 stars 18 scripts 314 downloadssmoothAPC - Smoothing of Two-Dimensional Demographic Data, Optionally Taking into Account Period and Cohort Effects
The implemented method uses for smoothing bivariate thin plate splines, bivariate lasso-type regularization, and allows for both period and cohort effects. Thus the mortality rates are modelled as the sum of four components: a smooth bivariate function of age and time, smooth one-dimensional cohort effects, smooth one-dimensional period effects and random errors.
Last updated 11 days ago
archivedpackagesr-universe
1.81 score 5 stars 13 scripts 177 downloadsLTRCforests - Ensemble Methods for Survival Data with Time-Varying Covariates
Implements the conditional inference forest and relative risk forest algorithm to modeling left-truncated right-censored data with time-invariant covariates, and (left-truncated) right-censored survival data with time-varying covariates. It also provides functions to tune the parameters and evaluate the model fit. See Yao et al. (2022) <doi:10.1177/09622802221111549>.
Last updated 11 days ago
archivedpackagesr-universe
1.81 score 5 stars 13 scripts 213 downloadsboostmtree - Boosted Multivariate Trees for Longitudinal Data
Implements Friedman's gradient descent boosting algorithm for modeling longitudinal response using multivariate tree base learners. Longitudinal response could be continuous, binary, nominal or ordinal. A time-covariate interaction effect is modeled using penalized B-splines (P-splines) with estimated adaptive smoothing parameter. Although the package is design for longitudinal data, it can handle cross-sectional data as well. Implementation details are provided in Pande et al. (2017), Mach Learn <DOI:10.1007/s10994-016-5597-1>.
Last updated 2 days ago
archivedpackagesr-universe
1.70 score 5 stars 9 scripts 319 downloadsmultilaterals - Transitive Index Numbers for Cross-Sections and Panel Data
Computing transitive (and non-transitive) index numbers (Coelli et al., 2005 <doi:10.1007/b136381>) for cross-sections and panel data. For the calculation of transitive indexes, the EKS (Coelli et al., 2005 <doi:10.1007/b136381>; Rao et al., 2002 <doi:10.1007/978-1-4615-0851-9_4>) and Minimum spanning tree (Hill, 2004 <doi:10.1257/0002828043052178>) methods are implemented. Traditional fixed-base and chained indexes, and their growth rates, can also be derived using the Paasche, Laspeyres, Fisher and Tornqvist formulas.
Last updated 5 days ago
archivedpackagesr-universe
1.70 score 5 stars 8 scripts 142 downloadshasseDiagram - Drawing Hasse Diagram
Drawing Hasse diagram - visualization of transitive reduction of a finite partially ordered set.
Last updated 6 days ago
archivedpackagesr-universe
1.70 score 5 stars 3 scripts 568 downloadsgrec - Gradient-Based Recognition of Spatial Patterns in Environmental Data
Provides algorithms for detection of spatial patterns from oceanographic data using image processing methods based on Gradient Recognition.
Last updated 8 days ago
archivedpackagesr-universe
1.70 score 5 stars 6 scripts 310 downloadscovid19brazil - COVID-19 Dataset for Brazil
Dataset with strategic information about COVID-19 in Brazil. Data for municipalities, states, region and Brazil. Data source: Sistema Unico de Saude - SUS.
Last updated 11 days ago
archivedpackagesr-universe
1.70 score 5 stars 1 scripts 171 downloadsCauchyCP - Powerful Test for Survival Data under Non-Proportional Hazards
An omnibus test of change-point Cox regression models to improve the statistical power of detecting signals of non-proportional hazards patterns. The technical details can be found in Hong Zhang, Qing Li, Devan Mehrotra and Judong Shen (2021) <arXiv:2101.00059>. Extensive simulation studies demonstrate that, compared to existing tests under non-proportional hazards, the proposed CauchyCP test 1) controls the type I error better at small alpha levels; 2) increases the power of detecting time-varying effects; and 3) is more computationally efficient.
Last updated 11 days ago
archivedpackagesr-universe
1.70 score 5 stars 254 downloadsrego - Automatic Time Series Forecasting and Missing Value Imputation
Machine learning algorithm for predicting and imputing time series. It can automatically set all the parameters needed, thus in the minimal configuration it only requires the target variable and the dependent variables if present. It can address large problems with hundreds or thousands of dependent variables and problems in which the number of dependent variables is greater than the number of observations. Moreover it can be used not only for time series but also for any other real valued target variable. The algorithm implemented includes a Bayesian stochastic search methodology for model selection and a robust estimation based on bootstrapping. 'rego' is fast because all the code is C++.
Last updated 11 days ago
archivedpackagesr-universecpp
1.70 score 5 stars 9 scripts 240 downloadsqVarSel - Select Variables for Optimal Clustering
Finding hidden clusters in structured data can be hindered by the presence of masking variables. If not detected, masking variables are used to calculate the overall similarities between units, and therefore the cluster attribution is more imprecise. The algorithm q-vars implements an optimization method to find the variables that most separate units between clusters. In this way, masking variables can be discarded from the data frame and the clustering is more accurate. Tests can be found in Benati et al.(2017) <doi:10.1080/01605682.2017.1398206>.
Last updated 25 days ago
archivedpackagesr-universecpp
1.70 score 5 stars 4 scripts 101 downloadsivdesc - Profiling Compliers and Non-Compliers for Instrumental Variable Analysis
Estimating the mean and variance of a covariate for the complier, never-taker and always-taker subpopulation in the context of instrumental variable estimation. This package implements the method described in Marbach and Hangartner (2020) <doi:10.1017/pan.2019.48> and Hangartner, Marbach, Henckel, Maathuis, Kelz and Keele (2021) <arXiv:2103.06328>.
Last updated 1 months ago
archivedpackagesr-universe
1.70 score 5 stars 6 scripts 162 downloads