Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add r-doc2vec package from CRAN #16027

Merged
merged 2 commits into from
Aug 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions recipes/r-doc2vec/bld.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
"%R%" CMD INSTALL --build . %R_ARGS%
IF %ERRORLEVEL% NEQ 0 exit /B 1
3 changes: 3 additions & 0 deletions recipes/r-doc2vec/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash
export DISABLE_AUTOBREW=1
${R} CMD INSTALL --build . ${R_ARGS}
74 changes: 74 additions & 0 deletions recipes/r-doc2vec/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
{% set version = '0.2.0' %}
{% set posix = 'm2-' if win else '' %}
{% set native = 'm2w64-' if win else '' %}

package:
name: r-doc2vec
version: {{ version|replace("-", "_") }}

source:
url:
- {{ cran_mirror }}/src/contrib/doc2vec_{{ version }}.tar.gz
- {{ cran_mirror }}/src/contrib/Archive/doc2vec/doc2vec_{{ version }}.tar.gz
sha256: db3853685072554402434ea699d703e01ac7818044cf47a2ee7d0e1040858908

build:
merge_build_host: True # [win]
number: 0
rpaths:
- lib/R/lib/
- lib/

requirements:
build:
- {{ compiler('c') }} # [not win]
- {{ compiler('m2w64_c') }} # [win]
- {{ compiler('cxx') }} # [not win]
- {{ compiler('m2w64_cxx') }} # [win]
- {{ posix }}filesystem # [win]
- {{ posix }}make
- {{ posix }}sed # [win]
- {{ posix }}coreutils # [win]
- {{ posix }}zip # [win]
- cross-r-base {{ r_base }} # [build_platform != target_platform]
host:
- r-base
- r-rcpp >=0.11.5
run:
- r-base
- {{ native }}gcc-libs # [win]
- r-rcpp >=0.11.5

test:
commands:
- $R -e "library('doc2vec')" # [not win]
- "\"%R%\" -e \"library('doc2vec')\"" # [win]

about:
home: https://github.com/bnosac/doc2vec
license: MIT
summary: 'Learn vector representations of sentences, paragraphs or documents by using the ''Paragraph
Vector'' algorithms, namely the distributed bag of words (''PV-DBOW'') and the distributed
memory (''PV-DM'') model. The techniques in the package are detailed in the paper
"Distributed Representations of Sentences and Documents" by Mikolov et al. (2014),
available at <arXiv:1405.4053>. The package also provides an implementation to cluster
documents based on these embedding using a technique called top2vec. Top2vec finds
clusters in text documents by combining techniques to embed documents and words
and density-based clustering. It does this by embedding documents in the semantic
space as defined by the ''doc2vec'' algorithm. Next it maps these document embeddings
to a lower-dimensional space using the ''Uniform Manifold Approximation and Projection''
(UMAP) clustering algorithm and finds dense areas in that space using a ''Hierarchical
Density-Based Clustering'' technique (HDBSCAN). These dense areas are the topic
clusters which can be represented by the corresponding topic vector which is an
aggregate of the document embeddings of the documents which are part of that topic
cluster. In the same semantic space similar words can be found which are representative
of the topic. More details can be found in the paper ''Top2Vec: Distributed Representations
of Topics'' by D. Angelov available at <arXiv:2008.09470>.'
license_family: MIT
license_file:
- '{{ environ["PREFIX"] }}/lib/R/share/licenses/MIT'
- LICENSE

extra:
recipe-maintainers:
- conda-forge/r