Releases: PolMine/RcppCWB
Releases · PolMine/RcppCWB
Miniscule Step
- The configure script now covers the case of Power PCs. Files for the power pc scenario have been added to src/cwb/config/platform; darwin-64 has been renamed to darwin-x86_64 as a matter of consistency #79.
- Warning "variable 'nr_targets' set but not used" for files newly reported by Apple clang version 14.0.3 (clang-1403.0.22.14.1) is addressed #83.
- Misleading indentation warning issued by clang-15 addressed #85.
cwb_encode()
,cwb_makeall()
,cwb_huffcode()
andcwb_compress_rdx()
perform tilde expansion on filename provided by argumentregistry
, avoiding a crash #84.
Red Feather
- New function
region_to_strucs()
to get minimumum and maximum struc of s-attribute within region provided. Works also for nested s-attributes. - New function
region_matrix_to_struc_matrix()
. - Functions
cl_cpos2lbound()
andcl_cpos2rbound()
return NA if corpus position is outside stru for given s-attribute. #78. - Functions
cl_cpos2lbound()
andcl_cpos2rbound()
are exposed directly from C++ without R wrappers, improving performance. Using the environment variable 'CORPUS_REGISTRY' if argumentregistry
is handled implicitly now.
Houseboat
Seven Sisters
- The example for
corpus_data_dir()
dir not work as intended without
explicitly setting theregistry
argument. Fixed. - New functions
corpus_info_file()
,corpus_full_name()
,
corpus_p_attributes()
,corpus_s_attributes()
,corpus_properties()
and
corpus_property()
to retrieve registry file data. - New function
corpus_registry_dir()
. - The path to the info file in the registry file of the REUTERS corpus was
broken. Fixed.
Wolpertinger
New Features
- The CWB code is updated to v3.4.33 / r1690 (#29). Automated patches that have been developed are a safeguard that it will be painless in the future to align RcppCWB with upstream CWB development.
- The C code in the files
cwb-huffcode.c
,cwb-compress-rdx.c
andcwb-makeall.c
was not in line with the CWB version of the rest of the code (v3.4.14 / SVN revision 1069) but rather v2.2.b99 or v3.0.0. All code changes up to v3.4.14 were reconstructed and implemented (#35). Note thatcwb-encode.c
was at CWB v3.4.14, as the encoding functionality was exposed at a later stage. - A new function
cwb_version()
will report the version of the CWB source code. - The
cwb_encode()
function now has a previously missing argumentencoding
to state the encoding of the corpus to be indexed. - Reduced number of example *.vrt-files to one to keep package size below 5GB.
Minor Improvements
- Encoding a cropus using
cwb_encode()
now assumes implicitly that input files are XML files and remove blank lines and leading and trailing whitespace. This is equivalent to the option "-xsB" of the command line utilitycwb-encode
. - The C++ code of
cwb_encode()
is now a patch of themain()
function ofcwb-encode.c
, so that code in the *.cpp file can be limited to a slim wrapper, limiting the risk that the code in RcppCWB looses touch with CWB upstream development. - Header files
_eval.h
,_globalvars.h
and_cl.h
in the./src
directory are autogenerated files now, not to be edited by hand. - The C++ code of the
cqp_drop_subcorpus()
function is temporarily disabled to ensure that the package can be built (#34).
Jaberwocky
- Fixed a mishandling of paths on Windows in
check_corpus()
that would trigger resetting the registry unintendendly and potentially falsely. - To avoid a compiler warning (unused variable) issued by Rcpp solved by Rcpp v1.0.7, this version of Rcpp is now required (#22).
- In
use_tmp_dir()
,normalizePath()
is applied on thetempdir()
result to avoid confusion with symbolic links on macOS. - New unit test for
cwb_encode()
(not yet run on Windows). - A C-level inconsistency in
cqp_get_registry()
that would sometimes result in a wrong return value (i.e. registry path) has been fixed (#14). - To avoid an unintended behavior of
cwb_makeall()
, an internal check is performed whether the corpus has been loaded already and whether the home directory of the loaded corpus and defined in the registry file are identical (#31). - The link to the TXM project has been removed from the documentation to avoid the error 'SSL certificate problem: unable to get local issuer certificate' (#32).
- The
cl_delete_corpus()
function crashed when trying to delete a corpus that has not been loaded (#33). The function now aborts gracefully returning 0 when trying to delete a corpus that has not been loaded. - A new function
corpus_is_loaded()
can be used to check whether a corpus is loaded.
Mole Paw
New Features
- Encode XML (vrt file format) with new function
cwb_encode()
that exposes functionality of cwb-encode CWB utility. - Functions
cl_cpos2lbound()
andcl_cpos2rbound()
will now accept an integer vector with length > 1 as argumentcpos
and return a vector with the same length. Useful to speed up iterated queries for left and right boundaries of regions (#19). - A new function
cl_struc_values()
exposes the corresponding C function of the Corpus Library (CL). The previous implicit assumption that all structural attributes have values can thus be tested. Intended to work with annotations of sentences and paragraphs, i.e. common structural attributes that do usually not have values. - A new function
corpus_data_dir()
will derive the data directory from the internal C representation of a corpus. - New function
s_attr_regions()
will derive regions defined by a structural attribute from the *.rng file. Fastest option for large corpora. - New functions
s_attr_is_sibling()
ands_attr_is_descendent()
test the sibling/descendent relationship of structural attributes.
Minor Improvements
- Function
check_corpus()
now includes checks whether the registry provided (argumentregistry
) is identical with the registry defined internally by CQP. The registry is reset if directories are not identical. - Minor adjustments of configure script for aarch64, adding -fPIC to CFLAGS so that this flag will be used when Linux default configuration is used as fallback.
- The implementation of the
s_attribute_decode()
method was incomplete for method "Rcpp". This alternative to the "pure R" approach is now implemented (#2). - The unused file 'setpaths.R' has been removed from the tools directory (#10).
- The argument
method
previously setting "wininet" in ./tools/winlibs.R is omitted to avoid the warning "the 'wininet' method is deprecated for http:// and https:// URLs" on Windows. - The configure script will print the libdirs derived using pcre-config and link against libintl on macOS by default.
Dune Ride
- If RcppCWB is compiled on macOS, the package configure script checks the architecture of the machine and ensures that (if glib-2.0 is not yet present) a version of glib-2.0 compiled for Apple Silicon/the M1 chip is loaded in case an amd64 architecture is detected.
- The package configure script now uses
pcre-config
to locate header files of PCRE. - The configure script checks whether pcre has been compiled with Unicode properties support. If not, a warning is issued that also explains the recommended solution to use '--enable-unicode-properties' when calling configure.
Sunrise
- To avoid warnings when running R CMD check, the http://pcre.org is used rather than https://pcre.org in the DESCRIPTION and the README file.
- To overcome a somewhat dirty solution for multiple symbol definitions, adding the 'fcommon' flag to the CFLAGS in the configure script has been removed. The C code has been modified such that multiple symbol definitions are omitted.
- The macOS image used for test on Travis CI is now 'xcode9.4'
- On Solaris, the configure script would define the flag "-Wl,--allow-multiple-definition" to be passed to the linker flags. The rework of the CWB includes and the inclusion of the header file 'env.h' makes it possible to drop this flag. It was defined at a confusing place anyway.
- Using the compiler desired by the user (in Makeconf, Makevars file) is now there for all OSes.
- If pkg-config is not present on macOS, a warning is issued; the user gets the advice to use the brew package manager to install pkg-config.
- There is an explicit check in the configure script whether the dependencies ncurses, pcre and glib-2.0 are present. If not, a telling error with installation instructions is displayed.
- When unloading the package, the dynamic library RcppCWB.so is unloaded.
- When loading the package, CQP is initialized by default (call
cqp_initialize()
)
v0.2.7
RcppCWB 0.2.7
- If glib-2.0 is not present on macOS, binaries of the static library and
header files are downloaded from a GitHub repo. This prepares to get RcppCWB
pass macOS checking on CRAN machines. - A slight modification of the C code will now prevent previous crashes resulting
from a faulty CQP syntax. The solution will not yet be effective for Windows
systems until we have recompiled the libcqp static library that is downloaded
during the installation process. - A new C++-level function 'check_corpus' checks whether a given corpus is
available and is used by thecheck_corpus()
-function. Problems with
the previous implementation that relied on files in the registry directory to
ensure the presence of a corpus hopefully do not occur. - Calling the 'find_readline.perl' utility script is omitted on macOS, so
previous warning messages when running the makefile do not show up any more.