-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial import #1
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This only adds yet a successful compilation for windows. Tests don't run. Author: Uwe L. Korn <[email protected]> Closes #213 from xhochy/ARROW-202 and squashes the following commits: d5088a6 [Uwe L. Korn] Correctly reference Kudu in LICENSE and NOTICE 72a583b [Uwe L. Korn] Differentiate Boost libraries based on build type 6c75699 [Uwe L. Korn] Add license header e33b08c [Uwe L. Korn] Pick up shared Boost libraries correctly 5da5f5d [Uwe L. Korn] ARROW-202: Integrate with appveyor ci for windows
… not implicitly skip I have ``` $ py.test pyarrow/tests/test_hdfs.py ================================== test session starts ================================== platform linux2 -- Python 2.7.11, pytest-2.9.0, py-1.4.31, pluggy-0.3.1 rootdir: /home/wesm/code/arrow/python, inifile: collected 15 items pyarrow/tests/test_hdfs.py sssssssssssssss ``` But ``` $ py.test pyarrow/tests/test_hdfs.py --hdfs -v ================================== test session starts ================================== platform linux2 -- Python 2.7.11, pytest-2.9.0, py-1.4.31, pluggy-0.3.1 -- /home/wesm/anaconda3/envs/py27/bin/python cachedir: .cache rootdir: /home/wesm/code/arrow/python, inifile: collected 15 items pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_close PASSED pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_download_upload PASSED pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_file_context_manager PASSED pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_ls PASSED pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_mkdir PASSED pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_orphaned_file PASSED pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_read_multiple_parquet_files SKIPPED pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_read_whole_file PASSED pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_close PASSED pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_download_upload PASSED pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_file_context_manager PASSED pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_ls PASSED pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_mkdir PASSED pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_read_multiple_parquet_files SKIPPED pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_read_whole_file PASSED ``` The `py.test pyarrow --only-hdfs` option will run only the HDFS tests. Author: Wes McKinney <[email protected]> Closes #353 from wesm/ARROW-557 and squashes the following commits: 52e03db [Wes McKinney] Add conftest.py file, hdfs group to opt in to HDFS tests with --hdfs
This supersedes apache/arrow#467 This is ready for review. Next steps are - Integration with the arrow CI - Write docs on how to use the object store There is one remaining compilation error (it doesn't find Python.h for one of the Travis configurations, if anybody has an idea on what is going on, let me know). Author: Philipp Moritz <[email protected]> Author: Robert Nishihara <[email protected]> Closes #742 from pcmoritz/plasma-store-2 and squashes the following commits: c100a453 [Philipp Moritz] fixes d67160c5 [Philipp Moritz] build dlmalloc with -O3 16d1f716 [Philipp Moritz] fix test hanging 0f321e16 [Philipp Moritz] try to fix tests 80f9df40 [Philipp Moritz] make format 4c474d71 [Philipp Moritz] run plasma_store from the right directory 85aa1710 [Philipp Moritz] fix mac tests 61d421b5 [Philipp Moritz] fix formatting 4497e337 [Philipp Moritz] fix tests 00f17f24 [Philipp Moritz] fix licenses 81437920 [Philipp Moritz] fix linting 5370ae06 [Philipp Moritz] fix plasma protocol a137e783 [Philipp Moritz] more fixes b36c6aaa [Philipp Moritz] fix fling.cc 214c426c [Philipp Moritz] fix eviction policy e7badc48 [Philipp Moritz] fix python extension 6432d3fa [Philipp Moritz] fix formatting b21f0814 [Philipp Moritz] fix remaining comments about client 27f9c9e8 [Philipp Moritz] fix formatting 7b08fd2a [Philipp Moritz] replace ObjectID pass by value with pass by const reference and fix const correctness ca80e9a6 [Philipp Moritz] remove plain pointer in plasma client, part II 627b7c75 [Philipp Moritz] fix python extension name 30bd68b7 [Philipp Moritz] remove plain pointer in plasma client, part I 77d98227 [Philipp Moritz] put all the object code into a common library 0fdd4cd5 [Philipp Moritz] link libarrow.a and remove hardcoded optimization flags 8daea699 [Philipp Moritz] fix includes according to google styleguide 65ac7433 [Philipp Moritz] remove offending c++ flag from c flags 7003a4a4 [Philipp Moritz] fix valgrind test by setting working directory 217ff3d8 [Philipp Moritz] add valgrind heuristic 9c703c20 [Philipp Moritz] integrate client tests 9e5ae0e1 [Philipp Moritz] port serialization tests to gtest 0b8593db [Robert Nishihara] Port change from Ray. Change listen backlog size from 5 to 128. b9a5a06e [Philipp Moritz] fix includes ed680f97 [Philipp Moritz] reformat the code f40f85bd [Philipp Moritz] add clang-format exceptions d6e60d26 [Philipp Moritz] do not compile plasma on windows f936adb7 [Philipp Moritz] build plasma python client only if python is available e11b0e86 [Philipp Moritz] fix pthread 74ecb199 [Philipp Moritz] don't link against Python libraries b1e0335a [Philipp Moritz] fix linting 7f7e7e78 [Philipp Moritz] more linting 79ea0ca7 [Philipp Moritz] fix clang-tidy 99420e8f [Philipp Moritz] add rat exceptions 6cee1e25 [Philipp Moritz] fix c93034fb [Philipp Moritz] add Apache 2.0 headers 63729130 [Philipp Moritz] fix malloc? 99537c94 [Philipp Moritz] fix compiler warnings cb3f3a38 [Philipp Moritz] compile C files with CMAKE_C_FLAGS e649c2af [Philipp Moritz] fix compilation 04c2edb3 [Philipp Moritz] add missing file 51ab9630 [Philipp Moritz] fix compiler warnings 9ef7f412 [Philipp Moritz] make the plasma store compile e9f9bb4a [Philipp Moritz] Initial commit of the plasma store. Contributors: Philipp Moritz, Robert Nishihara, Richard Shin, Stephanie Wang, Alexey Tumanov, Ion Stoica @ RISElab, UC Berkeley (2017) [from ray-project/ray@b94b4a3]
Also added some missing status checks to builder-benchmark Author: Wes McKinney <[email protected]> Closes #782 from wesm/ARROW-1151 and squashes the following commits: 9b488a0e [Wes McKinney] Try to fix snappy warning 06276119 [Wes McKinney] Restore check macros used in libplasma 83b3f36d [Wes McKinney] Add branch prediction to RETURN_NOT_OK
…m parquet-cpp I will make a corresponding PR to parquet-cpp to ensure that this code migration is complete enough. Author: Wes McKinney <[email protected]> Closes #785 from wesm/ARROW-1154 and squashes the following commits: 08b54c98 [Wes McKinney] Fix variety of compiler warnings ddc7354b [Wes McKinney] Fixes to get PARQUET-1045 working f5cd0259 [Wes McKinney] Import miscellaneous computational utility code from parquet-cpp
…and Clang warning fixes This was tedious, but overdue. The Status class in Arrow as originally imported from Apache Kudu, which had been modified from standard use in Google projects. I simplified the implementation to bring it more in line with the Status implementation used in TensorFlow. This also addresses ARROW-111 by providing an attribute to warn in Clang if a Status is ignored Author: Wes McKinney <[email protected]> Closes #814 from wesm/status-cleaning and squashes the following commits: 7b7e6517 [Wes McKinney] Bring Status implementation somewhat more in line with TensorFlow and other Google codebases, remove unused posix code. Add warn_unused_result attribute and fix clang warnings
An additional pair of eyes would be helpful, somewhat strangely the tests are passing for some datetime objects and not for others. Author: Philipp Moritz <[email protected]> Closes #1153 from pcmoritz/serialize-datetime and squashes the following commits: f3696ae4 [Philipp Moritz] add numpy to LICENSE.txt a94bca7d [Philipp Moritz] put PyDateTime_IMPORT higher up 0ae645e9 [Philipp Moritz] windows fixes cbd1b222 [Philipp Moritz] get rid of gmtime_r f3ea6699 [Philipp Moritz] use numpy datetime code to implement time conversions e644f4f5 [Philipp Moritz] linting f38cbd46 [Philipp Moritz] fixes 6e549c47 [Philipp Moritz] serialize datetime
… be a stateful kernel Only intended to implement selective categorical conversion in `to_pandas()` but it seems that there is a lot missing to do this in a clean fashion. Author: Wes McKinney <[email protected]> Closes #1266 from xhochy/ARROW-1559 and squashes the following commits: 50249652 [Wes McKinney] Fix MSVC linker issue b6cb1ece [Wes McKinney] Export CastOptions 4ea3ce61 [Wes McKinney] Return NONE Datum in else branch of functions 4f969c6b [Wes McKinney] Move deprecation suppression after flag munging 7f557cc0 [Wes McKinney] Code review comments, disable C4996 warning (equivalent to -Wno-deprecated) in MSVC builds 84717461 [Wes McKinney] Do not compute hash table threshold on each iteration ae8f2339 [Wes McKinney] Fix double to int64_t conversion warning c1444a26 [Wes McKinney] Fix doxygen warnings 2de85961 [Wes McKinney] Add test cases for unique, dictionary_encode 383b46fd [Wes McKinney] Add Array methods for Unique, DictionaryEncode 0962f06b [Wes McKinney] Add cast method for Column, chunked_array and column factory functions 62c3cefd [Wes McKinney] Datum stubs 27151c47 [Wes McKinney] Implement Cast for chunked arrays, fix kernel implementation. Change kernel API to write to a single Datum 1bf2e2f4 [Wes McKinney] Fix bug with column using wrong type eaadc3e5 [Wes McKinney] Use macros to reduce code duplication in DoubleTableSize 6b4f8f3c [Wes McKinney] Fix datetime64->date32 casting error raised by refactor 2c77a19e [Wes McKinney] Some Decimal->Decimal128 renaming. Add DecimalType base class c07f91b3 [Wes McKinney] ARROW-1559: Add unique kernel
…integration tests This PR adds a workaround for reading the metadata layout for C++ dictionary-encoded vectors. I added tests that validate against the C++/Java integration suite. In order to make the new tests pass, I had to update the generated flatbuffers format and add a few types the JS version didn't have yet (Bool, Date32, and Timestamp). It also uses the new `isDelta` flag on DictionaryBatches to determine whether the DictionaryBatch vector should replace or append to the existing dictionary. I also added a script for generating test arrow files from the C++ and Java implementations, so we don't break the tests updating the format in the future. I saved the generated Arrow files in with the tests because I didn't see a way to pipe the JSON test data through the C++/Java json-to-arrow commands without writing to a file. If I missed something and we can do it all in-memory, I'd be happy to make that change! This PR is marked WIP because I added an [integration test](apache/arrow@6e98874#diff-18c6be12406c482092d4b1f7bd70a8e1R22) that validates the JS reader reads C++ and Java files the same way, but unfortunately it doesn't. Debugging, I noticed a number of other differences between the buffer layout metadata between the C++ and Java versions. If we go ahead with @jacques-n [comment in ARROW-1693](https://issues.apache.org/jira/browse/ARROW-1693?focusedCommentId=16244812&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16244812) and remove/ignore the metadata, this test should pass too. cc @TheNeuralBit Author: Paul Taylor <[email protected]> Author: Wes McKinney <[email protected]> Closes #1294 from trxcllnt/generate-js-test-files and squashes the following commits: f907d5a7 [Paul Taylor] fix aggressive closure-compiler mangling in the ES5 UMD bundle 57c7df45 [Paul Taylor] remove arrow files from perf tests 5972349c [Paul Taylor] update performance tests to use generated test data 14be77f4 [Paul Taylor] fix Date64Vector TypedArray, enable datetime integration tests 5660eb34 [Wes McKinney] Use openjdk8 for integration tests, jdk7 for main Java CI job 019e8e24 [Paul Taylor] update closure compiler with full support for ESModules, and remove closure-compiler-scripts 48111290 [Paul Taylor] Add support for reading Arrow buffers < MetadataVersion 4 c72134a5 [Paul Taylor] compile JS source in integration tests c83a700d [Wes McKinney] Hack until ARROW-1837 resolved. Constrain unsigned integers max to signed max for bit width fd3ed475 [Wes McKinney] Uppercase hex values 224e041c [Wes McKinney] Remove hard-coded file name to prevent primitive JSON file from being clobbered 0882d8e9 [Paul Taylor] separate JS unit tests from integration tests in CI 1f6a81b4 [Paul Taylor] add missing mkdirp for test json data 19136fbf [Paul Taylor] remove test data files in favor of auto-generating them in CI 9f195682 [Paul Taylor] Generate test files when the test run if they don't exist 0cdb74e0 [Paul Taylor] Add a cli arg to integration_test.py generate test JSON files for JS cc744564 [Paul Taylor] resolve LICENSE.txt conflict 33916230 [Paul Taylor] move js license to top-level license.txt d0b61f49 [Paul Taylor] add validate package script back in, make npm-release.sh suitable for ASF release process 7e3be574 [Paul Taylor] Copy license.txt and notice.txt into target dirs from arrow root. c8125d2d [Paul Taylor] Update readme to reflect new Table.from signature 49ac3398 [Paul Taylor] allow unrecognized cli args in gulpfile 3c52587e [Paul Taylor] re-enable node_js job in travis cb142f11 [Paul Taylor] add npm release script, remove unused package scripts d51793dd [Paul Taylor] run tests on src folder for accurate jest coverage statistics c087f482 [Paul Taylor] generate test data in build scripts 1d814d00 [Paul Taylor] excise test data csvs 14d48964 [Paul Taylor] stringify Struct Array cells 1f004968 [Paul Taylor] rename FixedWidthListVector to FixedWidthNumericVector be73c918 [Paul Taylor] add BinaryVector, change ListVector to always return an Array 02fb3006 [Paul Taylor] compare iterator results in integration tests e67a66a1 [Paul Taylor] remove/ignore test snapshots (getting too big) de7d96a3 [Paul Taylor] regenerate test arrows from master a6d3c83e [Paul Taylor] enable integration tests 44889fbe [Paul Taylor] report errors generating test arrows fd68d510 [Paul Taylor] always increment validity buffer index while reading 562eba7d [Paul Taylor] update test snapshots d4399a8a [Paul Taylor] update integration tests, add custom jest vector matcher 8d44dcd7 [Paul Taylor] update tests 6d2c03d4 [Paul Taylor] clean arrows folders before regenerating test data 4166a9ff [Paul Taylor] hard-code reader to Arrow spec and ignore field layout metadata c60305d6 [Paul Taylor] refactor: flatten vector folder, add more types ba984c61 [Paul Taylor] update dependencies 5eee3eaa [Paul Taylor] add integration tests to compare how JS reads cpp vs. java arrows d4ff57aa [Paul Taylor] update test snapshots 407b9f5b [Paul Taylor] update reader/table tests for new generated arrows 85497069 [Paul Taylor] update cli args to execute partial test runs for debugging eefc256d [Paul Taylor] remove old test arrows, add new generated test arrows 0cd31ab9 [Paul Taylor] add generate-arrows script to tests 3ff71384 [Paul Taylor] Add bool, date, time, timestamp, and ARROW-1693 workaround in reader 4a34247c [Paul Taylor] export Row type 141194e7 [Paul Taylor] use fieldNode.length as vector length c45718e7 [Paul Taylor] support new DictionaryBatch isDelta flag 9d8fef97 [Paul Taylor] split DateVector into Date32 and Date64 types 8592ff3c [Paul Taylor] update generated format flatbuffers
Author: Uwe L. Korn <[email protected]> Closes #1334 from xhochy/ARROW-1703 and squashes the following commits: 7282583f [Uwe L. Korn] ARROW-1703: [C++] Vendor exact version of jemalloc we depend on
… UniqueID bytes Now, the hashing of UniqueID in plasma is too simple which has caused a problem. In some cases(for example, in github/ray, UniqueID is composed of a taskID and a index), the UniqueID may be like "ffffffffffffffffffff00", "ffffffffffffffffff01", "fffffffffffffffffff02" ... . The current hashing method is only to copy the first few bytes of a UniqueID and the result is that most of the hashed ids are same, so when the hashed ids put to plasma store, it will become very slow when searching(plasma store uses unordered_map to store the ids, and when the keys are same, it will become slow just like list). In fact, the same PR has been merged into ray, see ray-project/ray#2174. and I have tested the perf between the new hashing method and the original one with putting lots of objects continuously, it seems the new hashing method doesn't cost more time. Author: songqing <[email protected]> Closes #2220 from songqing/oid-hashing and squashes the following commits: 5c803aa0 <songqing> modify murmurhash LICENSE 8b8aa3e1 <songqing> add murmurhash LICENSE d8d5f93f <songqing> lint fix 426cd1e2 <songqing> lint fix 4767751d <songqing> Use hashing function that takes into account all UniqueID bytes
Author: Wes McKinney <[email protected]> Closes #2221 from wesm/ARROW-2634 and squashes the following commits: c65a8193 <Wes McKinney> Add Go license details to LICENSE.txt
cloudera/hs2client. Add Thrift to thirdparty toolchain This patch incorporates patches developed at cloudera/hs2client (Apache 2.0) by the following authors: * 12 Wes McKinney <[email protected]>, <[email protected]> * 2 Thomas Tauber-Marshall <[email protected]> * 2 陈晓发 <[email protected]> * 2 Matthew Jacobs <[email protected]>, <[email protected]> * 1 Miki Tebeka <[email protected]> * 1 Tim Armstrong <[email protected]> * 1 henryr <[email protected]> Closes #2444 Change-Id: I88aed528a9f4d2069a4908f6a09230ade2fbe50a
…ibrary This is very minimal in functionality, it just gives a simple R package that calls a function from the arrow C++ library. Author: Romain Francois <[email protected]> Author: Wes McKinney <[email protected]> Closes #2489 from romainfrancois/r-bootstrap and squashes the following commits: 89f14b4ba <Wes McKinney> Add license addendums 9e3ffb4d2 <Romain Francois> skip using rpath linker option 79c50011d <Romain Francois> follow up from @wesm comments on #2489 a1a5e7c33 <Romain Francois> + installation instructions fb412ca1d <Romain Francois> not checking for headers on these files 2848fd168 <Romain Francois> initial R 📦 with travis setup and testthat suite, that links to arrow c++ library and calls arrow::int32()
1. `glog` provides richer information. 2. `glog` can print good call stack while crashing, which is very helpful for debugging. 3. Make logging pluggable with `glog` or original log using a macro. Users can enable/disable `glog` using the cmake option `ARROW_USE_GLOG`. Author: Yuhong Guo <[email protected]> Author: Wes McKinney <[email protected]> Closes #2522 from guoyuhong/glog and squashes the following commits: b359640d4 <Yuhong Guo> Revert some useless changes. 38560c06e <Yuhong Guo> Change back the test code to fix logging-test e3203a598 <Wes McKinney> Some fixes, run logging-test 4a9d1728b <Wes McKinney> Fix Flatbuffers download url f36430836 <Yuhong Guo> Add test code to only include glog lib and init it without other use. c8269fd88 <Yuhong Guo> Change ARROW_JEMALLOC_LINK_LIBS setting to ARROW_LINK_LIBS 34e6841f8 <Yuhong Guo> Add pthread 48afa3484 <Yuhong Guo> Address comment 12f9ba7e9 <Yuhong Guo> Disable glog from ARROW_BUILD_TOOLCHAIN 62f20002d <Yuhong Guo> Add -pthread to glog 673dbebe5 <Yuhong Guo> Try to fix ci FAILURE 69c1e7979 <Yuhong Guo> Add pthread for glog fbe9cc932 <Yuhong Guo> Change Thirdpart to use EP_CXX_FLAGS 6f4d1b8fc <Yuhong Guo> Add lib64 to lib path suffix. 84532e338 <Yuhong Guo> Add glog to Dockerfile ccc03cb12 <Yuhong Guo> Fix a bug 7bacd53ef <Yuhong Guo> Add LICENSE information. 9a3834caa <Yuhong Guo> Enable glog and fix building error 2b1f7e00e <Yuhong Guo> Turn glog off. 7d92091a6 <Yuhong Guo> Hide glog symbols from libarrow.so a6ff67110 <Yuhong Guo> Support offline build of glog 14865ee93 <Yuhong Guo> Try to fix MSVC building failure 53cecebef <Yuhong Guo> Change log level to enum and refine code 09c6af7b9 <Yuhong Guo> Enable glog in plasma
…es to apache license. Fix clang-format, cpplint warnings, -Wconversion warnings and other warnings with -DBUILD_WARNING_LEVEL=CHECKIN. Fix some build toolchain issues, Arrow target dependencies. Remove some unused CMake code
The baseline UTF8 decoder is adapted from Bjoern Hoehrmann's DFA-based implementation. The common case of runs of ASCII chars benefit from a fast path handling 8 bytes at a time. Benchmark results (on a Ryzen 7 machine with gcc 7.3): ``` ----------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------- BM_ValidateTinyAscii/repeats:1 3 ns 3 ns 245245630 3.26202GB/s BM_ValidateTinyNonAscii/repeats:1 7 ns 7 ns 104679950 1.54295GB/s BM_ValidateSmallAscii/repeats:1 10 ns 10 ns 66365983 13.0928GB/s BM_ValidateSmallAlmostAscii/repeats:1 37 ns 37 ns 18755439 3.69415GB/s BM_ValidateSmallNonAscii/repeats:1 68 ns 68 ns 10267387 1.82934GB/s BM_ValidateLargeAscii/repeats:1 4140 ns 4140 ns 171331 22.5003GB/s BM_ValidateLargeAlmostAscii/repeats:1 24472 ns 24468 ns 28565 3.80816GB/s BM_ValidateLargeNonAscii/repeats:1 50420 ns 50411 ns 13830 1.84927GB/s ``` The case of tiny strings is probably the most important for the use case of CSV type inference. PS: benchmarks on the same machine with clang 6.0: ``` ----------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------- BM_ValidateTinyAscii/repeats:1 3 ns 3 ns 213945214 2.84658GB/s BM_ValidateTinyNonAscii/repeats:1 8 ns 8 ns 90916423 1.33072GB/s BM_ValidateSmallAscii/repeats:1 7 ns 7 ns 91498265 17.4425GB/s BM_ValidateSmallAlmostAscii/repeats:1 34 ns 34 ns 20750233 4.08138GB/s BM_ValidateSmallNonAscii/repeats:1 58 ns 58 ns 12063206 2.14002GB/s BM_ValidateLargeAscii/repeats:1 3999 ns 3999 ns 175099 23.2937GB/s BM_ValidateLargeAlmostAscii/repeats:1 21783 ns 21779 ns 31738 4.27822GB/s BM_ValidateLargeNonAscii/repeats:1 55162 ns 55153 ns 12526 1.69028GB/s ``` Author: Antoine Pitrou <[email protected]> Closes #2916 from pitrou/ARROW-3536-utf8-validation and squashes the following commits: 9c9713b78 <Antoine Pitrou> Improve benchmarks e6f23963a <Antoine Pitrou> Use a larger state table allowing for single lookups 29d6e347c <Antoine Pitrou> Help clang code gen e621b220f <Antoine Pitrou> Use memcpy for safe aligned reads, and improve speed of non-ASCII runs 89f6843d9 <Antoine Pitrou> ARROW-3536: Add UTF8 validation functions
Vendor the `std::string_view` backport from https://github.com/martinmoene/string-view-lite Author: Antoine Pitrou <[email protected]> Closes #2974 from pitrou/ARROW-3800-string-view-backport and squashes the following commits: 4353414b6 <Antoine Pitrou> ARROW-3800: Vendor a string_view backport
Second granularity is allowed (we might want to add support for fractions of seconds, e.g. in the "YYYY-MM-DD[T ]hh:mm:ss.ssssss" format). Timestamp conversion also participates in CSV type inference, since it's unlikely to produce false positives (e.g. a semantically "string" column that would be entirely made of valid timestamp strings). Author: Antoine Pitrou <[email protected]> Closes #2952 from pitrou/ARROW-3738-csv-timestamps and squashes the following commits: 005a6e3f7 <Antoine Pitrou> ARROW-3738: Parse ISO8601-like timestamps in CSV columns
1. Get rid of all macros and sprinkled out hash table handling code 2. Improve performance by more careful selection of hash functions (and better collision resolution strategy) Integer hashing benefits from a very fast specialization. Small string hashing benefits from a fast specialization with less branches and less computation. Generic string hashing falls back on hardware CRC32 or Murmur2-64, which has probably sufficient performance given the typical distribution of string key length. 3. Add some tests and benchmarks Author: Antoine Pitrou <[email protected]> Closes #3005 from pitrou/ARROW-2653 and squashes the following commits: 0c2dcc3de <Antoine Pitrou> ARROW-2653: Refactor hash table support
Also update mapbox::variant to v1.1.5 (I'm not sure which version was previously vendored). Author: Antoine Pitrou <[email protected]> Closes #3184 from pitrou/ARROW-4017-vendored-libraries and squashes the following commits: fe69566d7 <Antoine Pitrou> ARROW-4017: Move vendored libraries in dedicated directory
…edstock after compiler migration Crossbow builds: - [kszucs/crossbow/build-403](https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-403) - [kszucs/crossbow/build-404](https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-404) - [kszucs/crossbow/build-405](https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-405) - [kszucs/crossbow/build-406](https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-406) - [kszucs/crossbow/build-407](https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-407) Author: Krisztián Szűcs <[email protected]> Closes #3368 from kszucs/conda_forge_migration and squashes the following commits: e0a5a6422 <Krisztián Szűcs> use --croot 3749a2ff9 <Krisztián Szűcs> git on osx; set FEEDSTOSK_ROOT ca7217d7f <Krisztián Szűcs> support channel sources from variant files 33cba7118 <Krisztián Szűcs> fix conda path on linux 2505828b7 <Krisztián Szűcs> fix task names 0c4a10bc3 <Krisztián Szűcs> conda recipes for python 3.7; compiler migration
Added howard hinnant date project as a third party library. Used system timezone database for timezone information. Author: Antoine Pitrou <[email protected]> Author: shyam <[email protected]> Closes #3352 from shyambits2004/timestamp and squashes the following commits: 882a5cf6 <Antoine Pitrou> Tweak wording of vendored date library README 7f524805 <Antoine Pitrou> Small tweaks to license wording for the date library 9ee8eff4 <shyam> ARROW-4198 : Added support to cast timestamp
- Ported parquet-cpp external license references - Removed spurious duplicates (boost, mapbox) Author: François Saint-Jacques <[email protected]> Closes #3692 from fsaintjacques/ARROW-4546-parquet-license and squashes the following commits: a5aa81e48 <François Saint-Jacques> ARROW-4546: Update LICENSE with parquet-cpp licenses
This includes a Dockerfile that can be used to create wheels based on ubuntu 14.04 which are compatible with TensorFlow. TODO before this can be merged: - [x] write documentation how to build this - [x] do more testing Author: Philipp Moritz <[email protected]> Closes #3766 from pcmoritz/ubuntu-wheels and squashes the following commits: f708c29b <Philipp Moritz> remove tensorflow import check 599ce2e7 <Philipp Moritz> fix manylinux1 build instructions f1fbedf8 <Philipp Moritz> remove tensorflow hacks bf47f579 <Philipp Moritz> improve wording 4fb1d38b <Philipp Moritz> add documentation 078be98b <Philipp Moritz> add licenses 0ab0bccb <Philipp Moritz> cleanup c7ab1395 <Philipp Moritz> fix eae775d5 <Philipp Moritz> update 2820363e <Philipp Moritz> update ed683309 <Philipp Moritz> update e8c96ecf <Philipp Moritz> update 8a3b19e8 <Philipp Moritz> update 0fcc3730 <Philipp Moritz> update fd387797 <Philipp Moritz> update 78dcf42d <Philipp Moritz> update 7726bb6a <Philipp Moritz> update 82ae4828 <Philipp Moritz> update f44082ea <Philipp Moritz> update deb30bfd <Philipp Moritz> update 50e40320 <Philipp Moritz> update 58f6c121 <Philipp Moritz> update 5e8ca589 <Philipp Moritz> update 5fa73dd5 <Philipp Moritz> update 595d0fe1 <Philipp Moritz> update 79006722 <Philipp Moritz> add libffi-dev 9ff5236d <Philipp Moritz> update ca972ad0 <Philipp Moritz> update 60805e22 <Philipp Moritz> update 7a66ba35 <Philipp Moritz> update 1b56d1f1 <Philipp Moritz> zlib eedef794 <Philipp Moritz> update 3ae2b5ab <Philipp Moritz> update df297e1c <Philipp Moritz> add python build script 358e4f85 <Philipp Moritz> update 65afcebe <Philipp Moritz> update 11ccfc7e <Philipp Moritz> update f1784245 <Philipp Moritz> update b3039c8b <Philipp Moritz> update 9064c3ca <Philipp Moritz> update c39f92a9 <Philipp Moritz> install tensorflow ec4e2210 <Philipp Moritz> unicode 773ca2b6 <Philipp Moritz> link python b690d64a <Philipp Moritz> update 5ce7f0d6 <Philipp Moritz> update a9302fce <Philipp Moritz> install python-dev f12e0cfe <Philipp Moritz> multibuild python 2.7 9342006b <Philipp Moritz> add git ab2ef8e7 <Philipp Moritz> fix cmake install cef997b5 <Philipp Moritz> install cmake and ninja 5d560faf <Philipp Moritz> add build-essential adf2f705 <Philipp Moritz> add curl f8d66963 <Philipp Moritz> remove xz e439356e <Philipp Moritz> apt update 79fe557e <Philipp Moritz> add docker image for ubuntu wheel
This changes refactors much of our CMake logic to make use of built-in CMake paths and remove custom logic. It also switches to the use of more modern dependency management via CMake targets instead of plain text variables. This includes the following fixes: - Use CMake's standard find features, e.g. respecting the `*_ROOT` variables: https://issues.apache.org/jira/browse/ARROW-4383 - Add a Dockerfile for Fedora: https://issues.apache.org/jira/browse/ARROW-4730 - Add a Dockerfile for Ubuntu Xenial: https://issues.apache.org/jira/browse/ARROW-4731 - Add a Dockerfile for Ubuntu Bionic: https://issues.apache.org/jira/browse/ARROW-4849 - Add a Dockerfile for Debian Testing: https://issues.apache.org/jira/browse/ARROW-4732 - Change the clang-7 entry to use system packages without any dependency on conda(-forge): https://issues.apache.org/jira/browse/ARROW-4733 - Support `double-conversion<3.1`: https://issues.apache.org/jira/browse/ARROW-4617 - Use google benchmark from toolchain: https://issues.apache.org/jira/browse/ARROW-4609 - Use the `compilers` metapackage to install the correct binutils when using conda, otherwise system binutils to fix https://issues.apache.org/jira/browse/ARROW-4485 - RapidJSON throws compiler errors with GCC 8+ https://issues.apache.org/jira/browse/ARROW-4750 - Handle `EXPECT_OK` collision: https://issues.apache.org/jira/browse/ARROW-4760 - Activate flight build in ci/docker_build_cpp.sh: https://issues.apache.org/jira/browse/ARROW-4614 - Build Gandiva in the docker containers: https://issues.apache.org/jira/browse/ARROW-4644 Author: Uwe L. Korn <[email protected]> Closes #3688 from xhochy/build-on-fedora and squashes the following commits: 88e11fcfb <Uwe L. Korn> ARROW-4611: Rework CMake logic
Author: Jeroen Ooms <[email protected]> Closes #3923 from jeroen/cpuidex and squashes the following commits: 59429f02 <Jeroen Ooms> Mention mingw-w64 polyfill in LICENSE.txt 28619330 <Jeroen Ooms> run clang-format 9e780465 <Jeroen Ooms> polyfill for __cpuidex on mingw-w64
Replace mapbox::variant with Michael Park's variant implementation. Author: Antoine Pitrou <[email protected]> Closes #4259 from pitrou/ARROW-5252-variant-backport and squashes the following commits: 03dbc0e14 <Antoine Pitrou> ARROW-5252: Use standard-compliant std::variant backport
Some antiquated C++ build chains miss the standard <codecvt> header. Use a small vendored UTF8 implementation instead. Author: Antoine Pitrou <[email protected]> Closes #4616 from pitrou/ARROW-5648-simple-utf8 and squashes the following commits: 54b1b2f68 <Antoine Pitrou> ARROW-5648: Avoid using codecvt
lidavidm
pushed a commit
that referenced
this pull request
Oct 5, 2023
… correctly (#1168) Fixes #1100 Test before fix: ``` Expected equality of these values: AdbcGetObjectsDataGetTableByName(&mock_data, "mock_catalog", "mock_schema", "table_suffix") Which is: 0x16d014ee8 &mock_table_suffix Which is: 0x16d014ea8 arrow-adbc/c/driver/common/utils_test.cc:220: Failure Expected equality of these values: AdbcGetObjectsDataGetColumnByName(&mock_data, "mock_catalog", "mock_schema", "table", "column_suffix") Which is: 0x16d014df8 &mock_column_suffix Which is: 0x16d014d48 arrow-adbc/c/driver/common/utils_test.cc:224: Failure Expected equality of these values: AdbcGetObjectsDataGetConstraintByName(&mock_data, "mock_catalog", "mock_schema", "table", "constraint_suffix") Which is: 0x16d014d08 &mock_constraint_suffix Which is: 0x16d014cc8 [ FAILED ] AdbcGetObjectsData.GetObjectsByName (0 ms) ``` Test after fix: ``` $ ctest Test project arrow-adbc/build Start 1: adbc-driver-common-test 1/2 Test #1: adbc-driver-common-test .......... Passed 0.25 sec Start 2: adbc-driver-sqlite-test 2/2 Test #2: adbc-driver-sqlite-test .......... Passed 0.19 sec 100% tests passed, 0 tests failed out of 2 Label Time Summary: driver-common = 0.25 sec*proc (1 test) driver-sqlite = 0.19 sec*proc (1 test) unittest = 0.43 sec*proc (2 tests) Total Test time (real) = 0.44 sec ```
birschick-bq
referenced
this pull request
in birschick-bq/arrow-adbc
May 6, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.