Package: tok 0.2.2

Tomasz Kalinowski

tok: Fast Text Tokenization

Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <https://huggingface.co/docs/tokenizers/index>. It's extremely fast for both training new vocabularies and tokenizing texts.

Authors:Tomasz Kalinowski [ctb, cre], Daniel Falbel [aut], Regouby Christophe [ctb], Posit [cph]

tok_0.2.2.tar.gz
tok_0.2.2.zip(r-4.7)tok_0.2.2.zip(r-4.6)tok_0.2.2.zip(r-4.5)
tok_0.2.2.tgz(r-4.6-x86_64)tok_0.2.2.tgz(r-4.6-arm64)tok_0.2.2.tgz(r-4.5-x86_64)tok_0.2.2.tgz(r-4.5-arm64)
tok_0.2.2.tar.gz(r-4.7-arm64)tok_0.2.2.tar.gz(r-4.7-x86_64)tok_0.2.2.tar.gz(r-4.6-arm64)tok_0.2.2.tar.gz(r-4.6-x86_64)
manual.pdf |manual.html
card.svg |card.png
tok/json (API)
NEWS

# Install 'tok' in R:
install.packages('tok', repos = c('https://cranhaven.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/mlverse/tok/issues

On CRAN:

Conda:

archivedpackagesr-universerustcargo

4.17 score 5 stars 1 packages 15 scripts 13k downloads 20 exports 2 dependencies

Last updated from:b4a2c37fdb (on package/tok). Checks:12 OK, 1 FAIL. Indexed: no.

TargetResultTimeFilesSyslog
linux-devel-arm64OK233
linux-devel-x86_64OK220
source / vignettesOK273
linux-release-arm64OK229
linux-release-x86_64OK219
macos-release-arm64OK180
macos-release-x86_64OK381
macos-oldrel-arm64OK210
macos-oldrel-x86_64OK374
windows-develOK363
windows-releaseOK382
windows-oldrelOK312
wasm-releaseFAIL134

Exports:decoder_byte_levelencodingmodel_bpemodel_unigrammodel_wordpiecenormalizer_nfcnormalizer_nfkcpre_tokenizerpre_tokenizer_byte_levelpre_tokenizer_whitespaceprocessor_byte_leveltok_decodertok_modeltok_normalizertok_processortok_trainertokenizertrainer_bpetrainer_unigramtrainer_wordpiece

Dependencies:cliR6