Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(deps): update dependency pyarrow to v19 #216

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

renovate[bot]
Copy link

@renovate renovate bot commented Jan 16, 2025

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
pyarrow ==3.0.0 -> ==19.0.0 age adoption passing confidence

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.


Release Notes

apache/arrow (pyarrow)

v19.0.0: Apache Arrow 19.0.0

Release Notes URL: https://arrow.apache.org/release/19.0.0.html

v18.1.0: Apache Arrow 18.1.0

Release Notes URL: https://arrow.apache.org/release/18.1.0.html

v18.0.0: Apache Arrow 18.0.0

Release Notes URL: https://arrow.apache.org/release/18.0.0.html

v17.0.0: Apache Arrow 17.0.0

Release Notes URL: https://arrow.apache.org/release/17.0.0.html

v16.1.0

v16.0.0

v15.0.2

v15.0.1

v15.0.0

v14.0.2

v6.0.1

Bug Fixes

  • ARROW-14437 - [Python] Make CSV cancellation test more robust
  • ARROW-14492 - [JS] Fix export for browser bundles
  • ARROW-14513 - [Release][Go] Add /v6 suffix to release-6.0.0
  • ARROW-14519 - [C++] joins segfault when data contains list column
  • ARROW-14523 - [C++] Fix potential data loss in S3 multipart upload
  • ARROW-14538 - [R] Work around empty tr call on Solaris
  • ARROW-14550 - [Doc] Remove the JSON license; a non-free one.
  • ARROW-14583 - [R][C++] Crash when summarizing after filtering to no rows on partitioned data
  • ARROW-14584 - [Python][CI] Python sdist installation fails with latest setuptools 58.5
  • ARROW-14620 - [Python] Missing bindings for existing_data_behavior makes it impossible to maintain old behavior
  • ARROW-14630 - [C++] DCHECK in GroupByNode when error encountered
  • ARROW-14739 - [JS][Docs] Point to wrong source
  • ARROW-15071 - [C#] Fixed a bug in Column.cs ValidateArrayDataTypes method
  • ARROW-15072 - [R] Error: This build of the arrow package does not support Datasets

New Features and Improvements

  • ARROW-13156 - [R] bindings for str_count
  • ARROW-14181 - [C++][Compute] Hash Join support for dictionary
  • ARROW-14189 - [Docs] Add version dropdown to the sphinx docs
  • ARROW-14310 - [R] Make expect_dplyr_equal() more intuitive
  • ARROW-14365 - [R] Update README example to reflect new capabilities
  • ARROW-14390 - [Packaging][Ubuntu] Add support for Ubuntu 21.10
  • ARROW-14433 - [Release][APT] Skip arm64 Ubuntu 21.04 verification
  • ARROW-14450 - [R] Old macos build error
  • ARROW-14459 - [Doc] Update the pinned sphinx version to 4.2
  • ARROW-14480 - [R] Expose arrow::dataset::ExistingDataBehavior to R
  • ARROW-14486 - [Packaging][deb] Add missing libthrift-dev dependency
  • ARROW-14490 - [Doc] Regenerate CHANGELOG.md to include all versions
  • ARROW-14496 - [Docs] Create relative links for R / JS / C/Glib references in the sphinx toctree using stub pages
  • ARROW-14499 - [Docs] Version dropdown side-by-side with search box
  • ARROW-14514 - [C++][R] UBSAN error on round kernel
  • ARROW-14580 - [Python] update trove classifiers to include Python 3.10
  • ARROW-14623 - [Packaging][Java] Upload not only .jar but also .pom
  • ARROW-14628 - [Release][Python] Use python -m pytest
  • ARROW-15058 - [Java] Remove log4j2 dependency in performance module

v6.0.0

Bug Fixes

  • ARROW-6946 - [Go] Run tests with assert build tag enabled to ensure safety
  • ARROW-8452 - [Go] support proper nested nullable flags
  • ARROW-8453 - [Go][Integration] Support and enable recursive nested type integration tests
  • ARROW-8999 - [Python][C++] Non-deterministic segfault in "AMD64 MacOS 10.15 Python 3.7" build
  • ARROW-9948 - [C++] Fix scale handling in Decimal{128, 256}::FromString
  • ARROW-10213 - [C++] Temporal cast from timestamp to date rounds instead of extracting date component
  • ARROW-10373 - [C++] Validate null_count in Array::ValidateFull()
  • ARROW-10773 - [R] parallel as.data.frame.Table hangs indefinitely on Windows
  • ARROW-11518 - [C++][Parquet] Fix buffer allocation when reading/skipping boolean columns
  • ARROW-11579 - [R] read_feather hanging on Windows
  • ARROW-11634 - [C++][Parquet] Parquet statistics (min/max) for dictionary columns are incorrect
  • ARROW-11729 - [R] Add examples to datasets documentation
  • ARROW-12011 - [C++] Fix crashes and incorrect results when printing extreme date values
  • ARROW-12072 - [Go] Fix panics in ipc writer for sliced records
  • ARROW-12087 - [C++] Allow sorting durations, timestamps with timezones
  • ARROW-12321 - [R][C++] Arrow opens too many files at once when writing a dataset
  • ARROW-12513 - [C++][Parquet] Parquet Writer always puts null_count=0 in Parquet statistics for dictionary-encoded array with nulls
  • ARROW-12540 - [C++] Implementing casting support from date32/date64 to uft8/large_utf8
  • ARROW-12636 - [JS] ESM Tree-Shaking produces broken code
  • ARROW-12700 - [R] Read/Write_feather stuck forever after bad write, R, Win32
  • ARROW-12837 - [C++] Do not crash when printing invalid arrays
  • ARROW-13134 - [C++][CI] Unpin conda package for aws-sdk-cpp
  • ARROW-13151 - [C++][Parquet] Propagate schema changes from selection all the way up the stack
  • ARROW-13198 - [C++][Dataset] Async scanner occasionally segfaulting in CI
  • ARROW-13293 - [R] open_dataset followed by collect hangs (while compute works)
  • ARROW-13304 - [C++] Unable to install nightly on Ubuntu 21.04 due to day of week options
  • ARROW-13336 - [Doc] Make clean in docs should clean generated docs
  • ARROW-13422 - [R] Clarify README about S3 support on Windows
  • ARROW-13424 - [C++] Remove needless workaround for conda and benchmark
  • ARROW-13425 - [Archery] Avoid importing PyArrow indirectly
  • ARROW-13429 - [C++][Gandiva] Fix Gandiva codegen for if-else expression with binary type
  • ARROW-13430 - [Go] fix handling of zero value for FromBigInt
  • ARROW-13436 - [Python][Doc] Clarify what should be expected if read_table is passed an empty list of columns
  • ARROW-13437 - [C++] Relax FixedSizeList validation to allow excess child values
  • ARROW-13441 - [C++][CSV] Skip empty batches in column decoder
  • ARROW-13443 - [C++] : Fix the incorrect mapping from flatbuf::MetadataVersion to arrow::ipc::MetadataVersion
  • ARROW-13445 - [Java][Packaging] Fix artifact patterns for the Java jars
  • ARROW-13446 - [Release] Fix verification on amazon linux
  • ARROW-13447 - [Release] Verification script for arm64 and universal2 macOS wheels
  • ARROW-13450 - [Python][Packaging] Set deployment target to 10.13 for universal2 wheels
  • ARROW-13469 - [C++] Suppress -Wmissing-field-initializers in DayMilliseconds arrow/type.h
  • ARROW-13474 - [Python] Fix crash in take/filter of empty ExtensionArray
  • ARROW-13477 - [Release] Pass ARTIFACTORY_API_KEY to the upload script
  • ARROW-13484 - [Release] Add support for uploading Amazon Linux 2 packages
  • ARROW-13490 - [R][CI] Need to gate duckdb examples on duckdb version
  • ARROW-13492 - [R][CI] Move r tools 35 build back to per-commit/pre-PR
  • ARROW-13493 - [C++] Anonymous structs in an anonymous union are a GNU extension
  • ARROW-13495 - [C++][Compute] Fixing unaligned memory access in GrouperFastImpl
  • ARROW-13496 - [CI][R] Repair r-sanitizer job
  • ARROW-13497 - [C++][R] FunctionOptions not used by aggregation nodes
  • ARROW-13499 - [R] Aggregation on expression doesn't NSE correctly
  • ARROW-13500 - [C++] Fix using '-Wno-unknown-warning-option' with GCC
  • ARROW-13504 - [Python] Move marks from fixtures to individual tests/params
  • ARROW-13507 - [R] LTO job on CRAN fails
  • ARROW-13509 - [C++] Take kernel with empty inputs
  • ARROW-13522 - [C++] Fix regression in UTF8 trim functions
  • ARROW-13523 - [C++] Normalize test executable name
  • ARROW-13524 - [C++] Fix description for ApplicationVersion::VersionEq
  • ARROW-13529 - [Go] Fixing too many releases in IPC writer
  • ARROW-13538 - [R][CI] Don't test DuckDB in the minimal build
  • ARROW-13543 - [R] Handle summarize() with 0 arguments or no aggregate functions
  • ARROW-13556 - [C++] Add protobuf to linking for flight
  • ARROW-13559 - [CI][C++] Move the test-conda-cpp-valgrind nightly build to azure
  • ARROW-13560 - [R] Allow Scanner$create() to accept filter / project even with arrow_dplyr_querys
  • ARROW-13580 - [C++] quoted_strings_can_be_null only applied to string columns
  • ARROW-13597 - [C++][Compute] Remove AddOnLoad helper
  • ARROW-13600 - [C++] Fix maybe uninitialized warnings
  • ARROW-13602 - [C++] Fix strict aliasing warning in bit util test
  • ARROW-13603 - [GLib] Fix typos in GARROW_VERSION_CHECK()
  • ARROW-13605 - [C++] Capture node with shared_ptr to avoid TSan warning
  • ARROW-13608 - [R] vendor cpp11 to fix segfault under LTO
  • ARROW-13611 - [C++] Scanning datasets does not enforce back pressure
  • ARROW-13624 - [R] readr short type mapping has T and t backwards
  • ARROW-13628 - [Format][C++][Java] Add MONTH_DAY_NANO interval type
  • ARROW-13630 - [CI][C++][s390x] Reduce parallelism to build Arrow library
  • ARROW-13632 - [C++] Fix filtering of sliced FixedSizeList array
  • ARROW-13638 - [C++] Hold owned copy of function options in GroupByNode
  • ARROW-13639 - [C++] Fix out-of-bounds access in Concatenate with null slots and empty dictionary
  • ARROW-13654 - [C++][Parquet] Avoid infinite loop when appending a FileMetaData to itself
  • ARROW-13655 - [C++][Parquet] Disable Thrift message size protections
  • ARROW-13662 - [CI] Fix failing strftime test with older pandas
  • ARROW-13662 - [CI] Failing test test_extract_datetime_components with pandas 0.24
  • ARROW-13669 - [C++] Fix variant emplace methods (add brackets)
  • ARROW-13671 - [Dev] Fix conda recipe on Arm 64k page system
  • ARROW-13676 - [C++][Parquet] Avoid potential invalid access.
  • ARROW-13681 - [C++] Fix list_parent_indices behaviour on chunked array
  • ARROW-13685 - [C++] Cannot write dataset to S3FileSystem if bucket already exists
  • ARROW-13689 - [C#][Integration] Initial commit of C# Integration tests
  • ARROW-13694 - [R] Arrow filter crashes (R aborted session)
  • ARROW-13743 - [CI] OSX job fails due to incompatible git and libcurl
  • ARROW-13744 - [CI] c++14 and 17 nightly job fails
  • ARROW-13747 - [Python][CI] Requiring s3fs >= 2021.8
  • ARROW-13755 - [Python] Allow writing datasets using a partitioning that only specifies field_names
  • ARROW-13761 - [R] arrow::filter() crashes (aborts R session)
  • ARROW-13784 - [Python] Table.from_arrays should raise an error when array is empty but names is not
  • ARROW-13786 - [R][CI] Don't fail the RCHK build if arrow doesn't build
  • ARROW-13788 - [C++] Temporal component extraction functions don't support date32/64
  • ARROW-13792 - [Java] : The toString representation is incorrect for unsigned integer vectors
  • ARROW-13799 - [R] case_when error handling is capturing strings
  • ARROW-13800 - [R] Use divide instead of divide_checked
  • ARROW-13812 - [C++] Fix Valgrind error in Grouper.BooleanKey test
  • ARROW-13814 - [CI] Fix Spark master integration tests
  • ARROW-13819 - [C++] Initialize subseconds in value_parsing.h
  • ARROW-13846 - [C++] Fix crashes on invalid IPC file
  • ARROW-13850 - [C++] Fix crashes on invalid Parquet data
  • ARROW-13860 - [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame
  • ARROW-13865 - [C++][R] Writing moderate-size parquet files of nested dataframes from R slows down/process hangs
  • ARROW-13872 - [Java] ExtensionTypeVector does not work with RangeEqualsVisitor
  • ARROW-13876 - [C++] Add trivial null kernels to arithmetic, sort functions
  • ARROW-13877 - [C++] Support FixedSizeList in generic list kernels
  • ARROW-13878 - [C++] Implement fixed-size-binary support for several kernels
  • ARROW-13880 - [C++] Compute function sort_indices does not support timestamps with time zones
  • ARROW-13881 - [C++][FlightRPC][Packaging] Ensure Flight is packaged with advanced TLS options on Windows
  • ARROW-13882 - [C++] Improve min_max/hash_min_max type support
  • ARROW-13884 - [JS] Move source files into a separate directory
  • ARROW-13912 - [R] TrimOptions implementation breaks test-r-minimal-build due to dependencies
  • ARROW-13913 - [C++] Don't segfault if IndexOptions omitted
  • ARROW-13915 - [R][CI] R UCRT C++ bundles are incomplete
  • ARROW-13916 - [C++] Implement strftime on date32/64 types
  • ARROW-13921 - [Python][Packaging] Pin minimum setuptools version for the macos wheels
  • ARROW-13940 - [R] Turn on multithreading with Arrow engine queries
  • ARROW-13961 - [C++] Fix use of non-const references, declaration without initialization
  • ARROW-13976 - [C++] Add path to libjvm.so in ARM CPU
  • ARROW-13978 - [C++] Bump gtest to 1.11 to unbreak builds with recent clang
  • ARROW-13981 - [Java] VectorSchemaRootAppender doesn't work for BitVector
  • ARROW-13982 - [C++] Don't stall in async scanner if a fragment generates no batches
  • ARROW-13983 - [C++] Avoid raising error if fadvise() isn't supported
  • ARROW-13996 - [Go][Parquet] Fix file offsets in go impl
  • ARROW-13997 - [C++] restore exec node based query performance
  • ARROW-14001 - [Go] Fixing AppendBoolean function in BitmapWriter
  • ARROW-14004 - [Python][Doc] Document nullable dtypes handling and usage of types_mapper in to_pandas conversion
  • ARROW-14014 - [Java] Fix Flight parseTrailers for :status keys
  • ARROW-14017 - [C++] NULLPTR is not included in type_fwd.h
  • ARROW-14020 - [R] Writing datafames with list columns is slow and scales poorly with nesting level
  • ARROW-14024 - [C++] Test that batch size is respected for IPC/CSV
  • ARROW-14026 - [C++] Enable batch parallelism in Parquet scanner
  • ARROW-14027 - [C++] Handle scalars in Grouper
  • ARROW-14040 - [C++] Fix result order dependence in scanner test
  • ARROW-14053 - [C++][CSV] Use atomic counter for async tests
  • ARROW-14057 - [C++] Bump aws-c-common version
  • ARROW-14063 - [R] open_dataset() does not work on CSVs without header rows
  • ARROW-14076 - Unable to use `red-arrow` gem on Heroku/Ubuntu 20.04 (focal)
  • ARROW-14090 - [C++][Parquet] rows_written_ should be int64_t instead of int
  • ARROW-14103 - [R] [C++] Allow min/max in grouped aggregation
  • ARROW-14109 - [C++] Fix segfault when parsing JSON with duplicate keys.
  • ARROW-14124 - [R] Timezone support in R <= 3.4
  • ARROW-14129 - [C++][Python] Fix unique/value_counts on empty dictionary arrays
  • ARROW-14139 - [IR][C++] Table flatbuffer object fails to compile on older GCCs
  • ARROW-14141 - [IR][C++] Join missing from RelationImpl
  • ARROW-14156 - [C++] Properly synthesize validity buffer in StructArray::Flatten
  • ARROW-14162 - [R] Simple arrange %>% head does not respect ordering
  • ARROW-14173 - [IR] Allow typed null literals to be represented
  • ARROW-14179 - [C++][C] Do not export/import null bitmap for union and null types
  • ARROW-14184 - [C++] allow joins where the keys include new columns on the left
  • ARROW-14192 - [C++][Dataset] Backpressure broken on ordered scans
  • ARROW-14195 - [R] Fix ExecPlan binding annotations
  • ARROW-14197 - [C++][Compute] Fixing wrong buffer size in GrouperFastImpl
  • ARROW-14200 - [R] strftime on a date should not use or be confused by timezones
  • ARROW-14203 - [C++] Fix description of ExecBatch.length for Scalars in aggregate kernels
  • ARROW-14204 - [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike
  • ARROW-14206 - [Go][Parquet] Clean up s390x and arm build code
  • ARROW-14206 - [Go][CI] Fix build on s390x and ARM
  • ARROW-14208 - [C++] Fix compilation on Windows
  • ARROW-14210 - [C++] Add AR and RANLIB flags to bzip2
  • ARROW-14211 - [C++][Compute] Fixing thread sanitizer problems in hash join node
  • ARROW-14214 - [Python][CI] Fix tests using OrcFileFormat for Python 3.6 + orc not built
  • ARROW-14216 - [R] Disable auto-cleaning of duckdb tables
  • ARROW-14219 - [R][CI] DuckDB valgrind failure
  • ARROW-14220 - [C++] Missing ending quote in thirdpartyversions
  • ARROW-14221 - [R][CI] DuckDB tests fail on R < 4.0
  • ARROW-14223 - [C++] add missing third-party dependency
  • ARROW-14224 - [C++] Try to reduce build time/memory usage
  • ARROW-14226 - [R] Handle n_distinct() (and others) with args != 1
  • ARROW-14237 - [R][CI] Disable altrep in R <= 3.5
  • ARROW-14240 - [C++] Fix wrong nlohmann-json header path
  • ARROW-14246 - [C++] Fix wrong find_package() usage in build_google_cloud_cpp_storage()
  • ARROW-14247 - [C++] Fix Valgrind errors in parquet-arrow-test
  • ARROW-14249 - [R] Slow down in dataframe-to-table benchmark
  • ARROW-14252 - [R] Partial matching of arguments warning
  • ARROW-14255 - [Python] Fix FlightClient.do_action
  • ARROW-14257 - [Python][Docs] Fix usage of sync scanner in dataset writing docs
  • ARROW-14260 - [C++] GTest linker error with vcpkg and Visual Studio 2019
  • ARROW-14283 - [CI][C++] Use LLVM 12 on macOS GHA builds
  • ARROW-14285 - [C++] Fix crashes when pretty-printing data from valid IPC file
  • ARROW-14299 - [Dev][CI] Avoid downloading MinIO multiple times
  • ARROW-14300 - [C++][R][CI] Work around missing include in xsimd
  • ARROW-14301 - [C++] use consistent CMAKE_CXX_STANDARD definition
  • ARROW-14302 - [C++] Valgrind errors
  • ARROW-14305 - [C++][Compute] Fixing Valgrind errors in hash join node tests
  • ARROW-14307 - [R] crashes when reading empty feather with POSIXct column
  • ARROW-14313 - [Doc] Make Archery installation docs more accurate
  • ARROW-14321 - [R] segfault converting dictionary ChunkedArray with 0 chunks
  • ARROW-14340 - [C++] Bump xsimd to fix build error on Apple M1
  • ARROW-14370 - [C++] Fix memory leak in SeqMergedGeneratorTestFixture.ErrorItem
  • ARROW-14373 - [Packaging][Java] Missing LLVM dependency in the macOS java-jars build
  • ARROW-14377 - [Packaging][Python] Python 3.9 installation fails in macOS wheel build
  • ARROW-14381 - [CI][Python] Fix Spark integration failures
  • ARROW-14382 - [C++][Compute] Remove duplicated ThreadIndexer definition
  • ARROW-14392 - [C++] Bundled gRPC misses bundled Abseil include path
  • ARROW-14393 - [C++] GTest linking errors during the source release verification
  • ARROW-14397 - [C++] Fix valgrind error in test utility
  • ARROW-14406 - [CI] Skip failing test on dask-master nightly build
  • ARROW-14411 - [Release][Integration] Go integration tests fail for 6.0.0-RC1
  • ARROW-14417 - [R] Joins ignore projection on left dataset
  • ARROW-14423 - [Python] Fix version constraints in pyproject.toml
  • ARROW-14424 - [Packaging][Python] Disable windows wheel testing for python 3.6
  • ARROW-14434 - R crashes when making an empty selection for Datasets with DateTime
  • ARROW-14439 - [Python][C++] Segfault with read_json when a field is missing
  • PARQUET-2067 - [C++][Parquet] Fix Parquet null count stats for enclosing null lists
  • PARQUET-2089 - [C++] Align RowGroup file_offset with specification

New Features and Improvements

  • ARROW-1565 - [C++] Implement TopK/BottomK
  • ARROW-1568 - [C++] Implement Drop Null Kernel for Arrays
  • ARROW-4333 - [C++] Sketch out design for kernels and "query" execution in compute layer
  • ARROW-4700 - [C++] Added support for decimal128 and decimal256 json converted
  • ARROW-5002 - [C++] Implement Hash Aggregation query execution node
  • ARROW-5244 - [C++] Remove experimental marker from some APIs
  • ARROW-6072 - [C++] Implement casting List <-> LargeList
  • ARROW-6607 - [Python] Support for set/list columns when converting from Pandas
  • ARROW-6626 - [Python] Support converting nested sets when converting to arrow
  • ARROW-6870 - [C#] Add Support for Dictionary Arrays and Dictionary Encoding
  • ARROW-7102 - [Python] Make filesystems compatible with fsspec
  • ARROW-7179 - [C++][Python][R] Consolidate coalesce/fill_null
  • ARROW-7901 - [Go][Integration] enable integration tests for null case
  • ARROW-8022 - [C++] Add static and small vector implementations
  • ARROW-8147 - [C++] add GCS library to ThirdpartyToolchain
  • ARROW-8379 - [R] Investigate/fix thread safety issues (esp. Windows)
  • ARROW-8621 - [Release] Add post release step to add tags for Go versioning
  • ARROW-8780 - [Python][Doc] Document the fsspec wrapper for pyarrow.fs filesystems
  • ARROW-8928 - [C++] Add microbenchmarks to help measure ExecBatchIterator overhead
  • ARROW-9226 - [Python] Support core-site.xml default filesystem.
  • ARROW-9434 - [C++] Store type code in UnionScalar
  • ARROW-9719 - [Python] Improve HadoopFileSystem docstring
  • ARROW-10094 - [Python][Doc] Document missing pandas to arrow conversions
  • ARROW-10415 - [R] Support for dplyr::distinct()
  • ARROW-10898 - [C++] Improve table sort performance
  • ARROW-11238 - [Python] Make SubTreeFileSystem print method more informative
  • ARROW-11243 - [C++] Recognize time types in CSV files
  • ARROW-11460 - [R] Use system libraries if present on Linux
  • ARROW-11691 - [Developer][CI] Provide a consolidated .env file for benchmark-relevant environment variables
  • ARROW-11748 - [C++] Ensure Decimal fields are in native endian order
  • ARROW-11828 - [C++] Expose CSVWriter object in api
  • ARROW-11885 - [R] Turn off some capabilities when LIBARROW_MINIMAL=true
  • ARROW-11981 - [C++] Implement Union ExecNode
  • ARROW-12063 - [C++] Add null placement option to sort functions
  • ARROW-12181 - [C++][R] The "CSV dataset" in test-dataset.R is failing on RTools 3.5
  • ARROW-12216 - [R] Proactively disable multithreading on RTools3.5 (32bit?)
  • ARROW-12359 - [C++] Deprecate FileSystem::OpenAppendStream
  • ARROW-12388 - [C++][Gandiva] Implement cast numbers from varbinary functions in gandiva
  • ARROW-12410 - [C++][Gandiva] Implement regexp_replace function on Gandiva
  • ARROW-12479 - [C++][Gandiva] Implement castBigInt, castInt, castIntervalDay and castIntervalYear extra functions
  • ARROW-12563 - [C++][Gandiva] Add add_months and datediff functions for string
  • ARROW-12615 - [C++] Add options for handling NAs to stddev and variance
  • ARROW-12650 - [Doc][Python] Improve documentation regarding dealing with memory mapped files
  • ARROW-12657 - [C++] Adding String hex to numeric conversion
  • ARROW-12669 - [C++][Python] Implement a new scalar function: list_element
  • ARROW-12673 - [C++] Add callback to handle incorrect column counts
  • ARROW-12688 - [R] Use DuckDB to query an Arrow Dataset
  • ARROW-12714 - [C++] String title case kernel
  • ARROW-12725 - [C++][Compute] Column at a time hash and comparison in group by
  • ARROW-12728 - [C++] Implement count_distinct/distinct hash aggregate kernels
  • ARROW-12744 - [C++][Compute] Add rounding kernel
  • ARROW-12759 - [C++][Compute] Add ExecNode for group by
  • ARROW-12763 - [R] Optimize dplyr queries that use head/tail after arrange
  • ARROW-12846 - [Release] Reduce download/upload bandwidth for APT/Yum repositories
  • ARROW-12866 - [C++][Gandiva] Implement STRPOS function on Gandiva
  • ARROW-12871 - [R] upgrade to testthat 3e
  • ARROW-12876 - [R] Fix build flags on Raspberry Pi
  • ARROW-12944 - [C++] String capitalize kernel
  • ARROW-12946 - [C++] String swap case kernel
  • ARROW-12953 - [C++][Compute] Refactor CheckScalar* to take Datum arguments
  • ARROW-12959 - [C++][R] Option for is_null(NaN) to evaluate to true
  • ARROW-12965 - [Java] C Data Interface implementation
  • ARROW-12980 - [C++] Kernels to extract datetime components should be timezone aware
  • ARROW-12981 - [R] Install source package from CRAN alone
  • ARROW-13033 - [C++] Kernel to localize naive timestamps to a timezone (preserving clock-time)
  • ARROW-13056 - [MATLAB] Add a matlab label for dev Pull Requests
  • ARROW-13067 - [C++][Compute] Implement integer to decimal cast
  • ARROW-13089 - [Python] Allow creating RecordBatch from Python dict
  • ARROW-13112 - [R] altrep vectors for strings and other types
  • ARROW-13132 - [C++] Add Scalar validation
  • ARROW-13138 - [C++][R] Implement extract temporal components (year, month, day, etc) from date32/64 types
  • ARROW-13141 - [Python] Update HadoopFileSystem docs to clarify setting CLASSPATH env variable is required
  • ARROW-13163 - [C++][Gandiva] Implement REPEAT function on Gandiva
  • ARROW-13164 - [R] altrep vectors from Array with nulls
  • ARROW-13172 - [Java] Make TYPE_WIDTH publicly accessible
  • ARROW-13174 - [C++][Compute] Add strftime kernel
  • ARROW-13202 - [MATLAB] Enable GitHub Actions CI for MATLAB Interface on Linux
  • ARROW-13218 - [Format] Clarify interpretation of timestamp values
  • ARROW-13220 - [C++] Implement 'choose' function
  • ARROW-13222 - [C++] Improve type support for case_when
  • ARROW-13227 - [Documentation][Compute] Document ExecNode
  • ARROW-13257 - [Java][Dataset] Allow passing empty columns for projection
  • ARROW-13268 - [C++][Compute] Add ExecNode for semi and anti-semi join
  • ARROW-13279 - [R] Use C++ DayOfWeekOptions in wday implementation instead of manually calculating via Expression
  • ARROW-13287 - [C++] [Dataset] FileSystemDataset::Write should use an async scan
  • ARROW-13295 - [C++] add hash_mean, hash_variance, hash_stddev kernels
  • ARROW-13298 - [C++] Implement any/all hash aggregate kernels
  • ARROW-13307 - [C++] Remove reflection-based enums
  • ARROW-13311 - [C++][Documentation] Document hash aggregate kernels
  • ARROW-13317 - [Python] Improve documentation on what 'use_threads' does in 'read_feather'
  • ARROW-13326 - [R][Archery] Add linting to dev CI
  • ARROW-13327 - [C++][Python] Improve consistency of explicit C++ types in PyArrow files
  • ARROW-13330 - [Go][Parquet] Add the rest of the Encoding package
  • ARROW-13344 - [R] Initial bindings for ExecPlan/ExecNode
  • ARROW-13345 - [C++] Add basic implementation for log to base b
  • ARROW-13358 - [C++] Improve type support in if_else
  • ARROW-13379 - [Dev][Docs] Improvements to archery docs
  • ARROW-13390 - [C++] Implement coalesce for remaining types
  • ARROW-13397 - [R] Update arrow.Rmd vignette
  • ARROW-13399 - [R] Update dataset.Rmd vignette
  • ARROW-13402 - [R] Update flight.Rmd vignette
  • ARROW-13403 - [R] Update developing.Rmd vignette
  • ARROW-13404 - [Doc][Python] Improve PyArrow documentation for new users
  • ARROW-13405 - [Doc] Guide users to the documentation for their own platform
  • ARROW-13416 - [C++] Implement mod compute function
  • ARROW-13420 - [JS] Update dependencies
  • ARROW-13421 - [C++][Python] Add CSV convert option to change decimal point
  • ARROW-13433 - [R] Remove CLI hack from Valgrind test
  • ARROW-13434 - [R] group_by() with an unnammed expression
  • ARROW-13435 - [R] Add function arrow_table() as alias for Table$create()
  • ARROW-13444 - [C++] Remove usage of deprecated std::result_of
  • ARROW-13448 - [R] Bindings for strftime
  • ARROW-13453 - [R] DuckDB has not yet released 0.2.8
  • ARROW-13455 - [C++][Docs] Typo in RecordBatch::SetColumn
  • ARROW-13458 - [C++][Docs] Typo in RecordBatch::schema
  • ARROW-13459 - [C++][Docs] Missing param docs for RecordBatch::SetColumn
  • ARROW-13461 - [Python][Packaging] Build M1 wheels for python 3.8
  • ARROW-13463 - [Release][Python] Verify python 3.8 macOS arm64 wheel
  • ARROW-13465 - [R] to_arrow() from duckdb
  • ARROW-13466 - [R] make installation fail if Arrow C++ dependencies cannot be installed
  • ARROW-13468 - [Release] Fix binary download/upload failures
  • ARROW-13472 - [R] Remove .engine = "duckdb" argument
  • ARROW-13475 - [Release] Don't consider rust tarballs when cleaning up old releases
  • ARROW-13476 - [Doc][Python] Switch ipc/io doc to use context managers
  • ARROW-13478 - [Release] Unnecessary rc-number argument for the version bumping post-release script
  • ARROW-13480 - [C++] Fix possible deadlock when dataset produces an error
  • ARROW-13482 - [C++][Compute] Refactoring away from hard coded ExecNode factories to a registry
  • ARROW-13485 - [Release] Replace ${PREVIOUS_RELEASE}.9000 in r/NEWS.md by post-12-bump-versions.sh
  • ARROW-13488 - [Website] Update Linux packages install information for 5.0.0
  • ARROW-13489 - [R] Bump CI jobs after 5.0.0
  • ARROW-13501 - [R] Bindings for count aggregation
  • ARROW-13502 - [R] Bindings for min/max aggregation
  • ARROW-13503 - [GLib][Ruby][Flight] Add support for DoGet
  • ARROW-13506 - [C++][Java] Upgrade ORC to 1.6.9
  • ARROW-13508 - [C++] Support custom retry strategies in S3Options
  • ARROW-13510 - [CI][R][C++] Add -Wall to fedora-clang-devel as-cran checks
  • ARROW-13511 - [CI][R] Fail in the docker build step if R deps don't install
  • ARROW-13516 - [C++] Detect --version-script flag availability
  • ARROW-13519 - [R] Make doc examples less noisy
  • ARROW-13520 - [C++] Implement hash_aggregate tdigest kernel
  • ARROW-13521 - [C++][Docs] Add note about tdigest in compute functions docs
  • ARROW-13525 - [Python] Mention alternative deprecation message for ParquetDataset.partitions
  • ARROW-13528 - [R] Bindings for mean, var, sd aggregation
  • ARROW-13532 - [C++][Compute] - adding set membership type filtering to hash table interface
  • ARROW-13534 - [C++] Improve csv chunker
  • ARROW-13540 - [C++] Add order by sink node
  • ARROW-13541 - [C++][Python] Implement ExtensionScalar
  • ARROW-13542 - [C++][Compute][Dataset] Add dataset::WriteNode for writing rows from an ExecPlan to disk
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to ArrowBuf)
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to JDBC)
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to Vectors)
  • ARROW-13548 - [C++] Implement temporal difference kernels
  • ARROW-13549 - [C++] Add casts from timestamp to date/time
  • ARROW-13550 - [R] Support .groups argument to dplyr::summarize()
  • ARROW-13552 - [C++] Remove deprecated APIs
  • ARROW-13557 - [Packaging][Python] Skip test_cancellation test case on M1
  • ARROW-13561 - [C++] Implement week kernel that accepts WeekOptions
  • ARROW-13562 - [R] Styler followups
  • ARROW-13565 - [Packaging][Ubuntu] Drop support for 20.10
  • ARROW-13572 - [C++][Datasets] Add ORC support to Datasets API
  • ARROW-13573 - [C++] Support dictionaries natively in case_when
  • ARROW-13574 - [C++] Add 'count all' option to count kernels
  • ARROW-13575 - [C++] Add hash_product kernel
  • ARROW-13576 - [C++] Replace ExecNode::InputReceived with ::MakeTask
  • ARROW-13577 - [Python][FlightRPC] pyarrow client do_put close method after write_table did not throw flight error
  • ARROW-13585 - [GLib] Add support for C ABI interface
  • ARROW-13587 - [R] Handle --use-LTO override
  • ARROW-13595 - [C++] Add debug mode check for compute kernel output type
  • ARROW-13604 - [Java] : Remove deprecation annotations for APIs representing unsupported operations
  • ARROW-13606 - [R] Actually disable LTO
  • ARROW-13613 - [C++] Add decimal support to (hash) sum/mean/product
  • ARROW-13614 - [C++] Add decimal support to min_max/hash_min_max
  • ARROW-13618 - [R] Use Arrow engine for summarize() by default
  • ARROW-13620 - [R] Binding for n_distinct()
  • ARROW-13626 - [R] Bindings for log base b
  • ARROW-13627 - [C++] Fully support ScalarAggregateOptions in (hash) any/all/sum/product/mean
  • ARROW-13629 - [Ruby] Add support for building/converting map
  • ARROW-13633 - [Packaging][Debian] Add support for bookworm
  • ARROW-13634 - [R] Update distro() in nixlibs.R to map from "bookworm" to 12
  • ARROW-13635 - [Packaging][Python] Define --with-lg-page for jemalloc in the arm manylinux builds
  • ARROW-13637 - [Python] Fix docstrings
  • ARROW-13642 - [C++][Compute] Hash join node supporting all semi, anti, inner, outer join types
  • ARROW-13645 - [Java] : Allow NullVectors to have distinct field names
  • ARROW-13646 - [Go][Parquet] adding the parquet metadata package
  • ARROW-13648 - [Dev] Use #!/usr/bin/env instead of #!/bin where possible
  • ARROW-13650 - [C++] Create dataset writer to encapsulate dataset writer logic
  • ARROW-13651 - [Ruby][Symbol] to Arrow array
  • ARROW-13652 - [Python] Expose copy_files in pyarrow.fs
  • ARROW-13660 - [C++] Remove seq_num from ExecNode::InputReceived
  • ARROW-13670 - [C++] add virtual destructors
  • ARROW-13674 - [CI] PR checks should check for JIRA components
  • ARROW-13675 - [Doc][Python] Add a recipe on how to save partitioned datasets to the Cookbook
  • ARROW-13679 - [GLib][Ruby] Add support for group aggregation
  • ARROW-13680 - [C++] Create an asynchronous nursery to simplify capture logic
  • ARROW-13682 - [C++] Add TDigest API to merge one TDigest
  • ARROW-13684 - [C++][Compute] Strftime kernel follow-up
  • ARROW-13686 - [Python] Update deprecated pytest yield_fixture functions
  • ARROW-13687 - [Ruby] Add support for loading table by Arrow Dataset
  • ARROW-13691 - [C++] Support skip_nulls/min_count in VarianceOptions
  • ARROW-13693 - [Website] arrow-site should pin down a specific Ruby version and leverage toolings like rbenv
  • ARROW-13696 - [Python] Support for MapType with Fields
  • ARROW-13699 - [Python][Docs] Improve filesystem documentation
  • ARROW-13700 - [Docs][C++] Clarify DayOfWeekOptions args
  • ARROW-13702 - [Python] Add dataset mark to test_parquet_dataset_deprecated_properties
  • [ARROW-13704](https://

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Never, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants