Package: PsychWordVec 2023.9

PsychWordVec: Word Embedding Research Framework for Psychological Science

An integrative toolbox of word embedding research that provides: (1) a collection of 'pre-trained' static word vectors in the '.RData' compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a series of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; (4) a set of training methods to locally train (static) word vectors from text corpora, including 'Word2Vec' <arxiv:1301.3781>, 'GloVe' <doi:10.3115/v1/D14-1162>, and 'FastText' <arxiv:1607.04606>; (5) a group of functions to download 'pre-trained' language models (e.g., 'GPT', 'BERT') and extract contextualized (dynamic) word vectors (based on the R package 'text').

Authors:Han-Wu-Shuang Bao [aut, cre]

PsychWordVec_2023.9.tar.gz
PsychWordVec_2023.9.zip(r-4.5)PsychWordVec_2023.9.zip(r-4.4)PsychWordVec_2023.9.zip(r-4.3)
PsychWordVec_2023.9.tgz(r-4.4-any)PsychWordVec_2023.9.tgz(r-4.3-any)
PsychWordVec_2023.9.tar.gz(r-4.5-noble)PsychWordVec_2023.9.tar.gz(r-4.4-noble)
PsychWordVec_2023.9.tgz(r-4.4-emscripten)PsychWordVec_2023.9.tgz(r-4.3-emscripten)
PsychWordVec.pdf |PsychWordVec.html
PsychWordVec/json (API)
NEWS

# Install 'PsychWordVec' in R:
install.packages('PsychWordVec', repos = c('https://psychbruce.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/psychbruce/psychwordvec/issues

Datasets:
  • demodata - Demo data (pre-trained using word2vec on Google News; 8000 vocab, 300 dims).

On CRAN:

bertcosine-similarityfasttextglovegptlanguage-modelnatural-language-processingnlppretrained-modelspsychologysemantic-analysistext-analysistext-miningtsneword-embeddingsword-vectorsword2vec

4.04 score 22 stars 8 scripts 379 downloads 34 exports 229 dependencies

Last updated 1 years agofrom:0f1b53e2fb. Checks:OK: 7. Indexed: yes.

TargetResultDate
Doc / VignettesOKOct 25 2024
R-4.5-winOKOct 25 2024
R-4.5-linuxOKOct 25 2024
R-4.4-winOKOct 25 2024
R-4.4-macOKOct 25 2024
R-4.3-winOKOct 25 2024
R-4.3-macOKOct 25 2024

Exports:as_embedas_wordveccccos_distcos_simcosine_similaritydata_transformdata_wordvec_loaddata_wordvec_subsetdict_expanddict_reliabilityget_wordvecload_embedload_wordvecmost_similarnormalizeorth_procrustespair_similaritypatternplot_networkplot_similarityplot_wordvecplot_wordvec_tSNEsum_wordvectab_similaritytest_RNDtest_WEATtext_inittext_model_downloadtext_model_removetext_to_vectext_unmasktokenizetrain_wordvec

Dependencies:abindafexaskpassbackportsbase64encbayestestRbitbit64bootbriobroombroom.mixedbruceRbslibcachemcallrcarcarDatacellrangercheckmateclassclicliprclockclustercodacodetoolscolorspacecorpcorcorrplotcowplotcpp11crayoncurldata.tabledatawizardDerivdescdiagramdialsDiceDesigndiffobjdigestdoBydoFuturedplyreffectsizeemmeansestimabilityevaluatefansifarverfastmapfastTextRfdrtoolfloatfontawesomeforcatsforeachforeignFormulafsfurrrfuturefuture.applygenericsggplot2ggrepelglassoglobalsgluegowerGPArotationGPfitgridExtragtablegtoolshardhathavenherehighrHmischmshtmlTablehtmltoolshtmlwidgetshttrigraphinsightinteractionsipredisobanditeratorsjpegjquerylibjsonlitejtoolsKernSmoothknitrlabelinglatticelavalavaanlgrlhslifecyclelistenvlme4lmerTestlpSolvelubridatemagrittrMASSMatrixMatrixExtraMatrixModelsmediationmemoisemgcvmicrobenchmarkmimeminqamlapimnormtmodelenvmodelrmunsellmvtnormnlmenloptrnnetnumDerivopenssloverlappingpanderparallellyparametersparsnippbapplypbivnormpbkrtestperformancepillarpkgbuildpkgconfigpkgloadplyrpngpraiseprettyunitsprocessxprodlimprogressprogressrpspsychpurrrqgraphquadprogquantregR.methodsS3R.ooR.utilsR6rappdirsRColorBrewerRcppRcppArmadilloRcppEigenRcppProgressRcppTOMLreadrreadxlrecipesrematchrematch2reshape2reticulaterglRhpcBLASctlriorlangrmarkdownrpartrprojrootrsamplersparserstudioapiRtsnesandwichsassscalessfdshapeslamsliderSparseMSQUAREMstringistringrsurvivalsystestthattexregtexttext2vectibbletidyrtidyselecttimechangetimeDatetinytextunetzdbutf8vctrsviridisviridisLitevroomwaldowarpwithrword2vecworkflowswritexlxfunyamlyardstickzoo

Readme and manuals

Help Manual

Help pageTopics
Word vectors data class: 'wordvec' and 'embed'.as_embed as_wordvec pattern [.embed
Cosine similarity/distance between two vectors.cosine_similarity cos_dist cos_sim
Transform plain text of word vectors into 'wordvec' (data.table) or 'embed' (matrix), saved in a compressed ".RData" file.data_transform
Load word vectors data ('wordvec' or 'embed') from ".RData" file.data_wordvec_load load_embed load_wordvec
Extract a subset of word vectors data (with S3 methods).data_wordvec_subset subset.embed subset.wordvec
Demo data (pre-trained using word2vec on Google News; 8000 vocab, 300 dims).demodata
Expand a dictionary from the most similar words.dict_expand
Reliability analysis and PCA of a dictionary.dict_reliability
Extract word vector(s).get_wordvec
Find the Top-N most similar words.most_similar
Normalize all word vectors to the unit length 1.normalize
Orthogonal Procrustes rotation for matrix alignment.orth_procrustes
Compute a matrix of cosine similarity/distance of word pairs.pair_similarity
Visualize a (partial correlation) network graph of words.plot_network
Visualize cosine similarity of word pairs.plot_similarity
Visualize word vectors.plot_wordvec
Visualize word vectors with dimensionality reduced using t-SNE.plot_wordvec_tSNE
Calculate the sum vector of multiple words.sum_wordvec
Tabulate cosine similarity/distance of word pairs.tab_similarity
Relative Norm Distance (RND) analysis.test_RND
Word Embedding Association Test (WEAT) and Single-Category WEAT.test_WEAT
Install required Python modules in a new conda environment and initialize the environment, necessary for all 'text_*' functions designed for contextualized word embeddings.text_init
Download pre-trained language models from HuggingFace.text_model_download
Remove downloaded models from the local .cache folder.text_model_remove
Extract contextualized word embeddings from transformers (pre-trained language models).text_to_vec
<Deprecated> Fill in the blank mask(s) in a query (sentence).text_unmask
Tokenize raw text for training word embeddings.tokenize
Train static word embeddings using the Word2Vec, GloVe, or FastText algorithm.train_wordvec