-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Groupby pct_change #10444
Implement Groupby pct_change #10444
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10444 +/- ##
================================================
- Coverage 86.33% 86.30% -0.04%
================================================
Files 144 140 -4
Lines 22681 22309 -372
================================================
- Hits 19581 19253 -328
+ Misses 3100 3056 -44
Continue to review full report at Codecov.
|
…groupby.pct_change
…groupby.pct_change
This PR has been labeled |
…groupby.pct_change
…groupby.pct_change
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
…groupby.pct_change
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
[gpuCI] Forward-merge branch-22.06 to branch-22.08 [skip gpuci]
…apidsai#11108) String data that contained \r and \n characters would print badly mangled. Example column containing: ``` .,e,Infinity,+Infinity,-Infinity,+nAn,-naN,Nan,5f,1.2f,\riNf,NULL ``` Doing a `cudf::test::print()` on this would yield mostly garbage, but importantly, plausible-seeming but puzzling garbage: ``` iNf,NULLnity,+Infinity,-Infinity,+nAn,-naN,Nan,5f,1.2f, ``` The fix is to do a little postprocessing and replace the individual chars `'\r'` and `'\n'` with the escaped strings `"\r"` and `"\n"`. Note that this only applies to the output ultimately sent to the print - not to the raw data retrieved from the device itself using `::to_host` Authors: - https://github.com/nvdbaranec Approvers: - David Wendt (https://github.com/davidwendt) - Mike Wilson (https://github.com/hyperbolic2346) - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#11108
Spent some time investigating performance on column to row and row to column conversion code. Lots of wins here. Split PR up, so this is column to row work only. Row to column to follow. Also, planning to add tests and benchmarks into `spark-rapids-jni` once this is all updated. ``` Comparing 22.08/ROW_CONVERSION_BENCH to mwilson/column_to_row_optimization/ROW_CONVERSION_BENCH Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------------------------------------------------------- RowConversion/old_to_row_conversion/1048576/manual_time +0.0012 +0.0026 5 5 5 5 RowConversion/new_to_row_conversion/1048576/manual_time -0.5661 -0.5652 16 7 16 7 RowConversion/new_to_row_extended_conversion/1048576/manual_time -0.5680 -0.5655 16 7 16 7 RowConversion/string_to_row_extended_conversion/1048576/manual_time -0.5768 -0.5764 27 12 27 12 RowConversion/old_from_row_conversion/1048576/manual_time -0.0017 -0.0018 4 4 4 4 RowConversion/new_from_row_conversion/1048576/manual_time +1.1629 +1.1619 31 68 32 68 RowConversion/new_from_row_extended_conversion/1048576/manual_time +1.1540 +1.1549 31 68 32 68 RowConversion/string_from_row_extended_conversion/1048576/manual_time +0.7185 +0.6831 39 67 40 68 Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - Nghia Truong (https://github.com/ttnghia) - https://github.com/nvdbaranec - MithunR (https://github.com/mythrocks) URL: rapidsai#11063
This PR removes the outdated files `.dockerignore` and `MANIFEST.in`. The Dockerfile was removed in rapidsai#10069 so we don't need to keep the ignore file around anymore. The `MANIFEST.in` is very old (last updated in 2018, cad899b) and no longer correct for the current package structure. Finally, I moved `.clang-tidy` into the `cpp/` directory. @codereport said that would be fine to clean up from the root and have no negative effects since clang-tidy isn't being run yet. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Karthikeyan (https://github.com/karthikeyann) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#11124
…fault_stream_value (rapidsai#11082) Closes rapidsai#10864. Also closes rapidsai#9614. This PR is a follow-up to rapidsai#10877. It replaces all of the remaining instances of `rmm::cuda_stream_default` with `cudf::default_stream_value`. There are a lot of replacements and addition of includes, along with some reformatting due to clang-format, but like rapidsai#10877, there should be no noticeable functional change here. Authors: - Jim Brennan (https://github.com/jbrennan333) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - https://github.com/nvdbaranec - Nghia Truong (https://github.com/ttnghia) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#11082
…sai#11132) The developer guide references a `scalar_view` class that does not exist. This PR removes that reference. See rapidsai#6558 for the rationale of why no such class exists. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Nghia Truong (https://github.com/ttnghia) - https://github.com/nvdbaranec URL: rapidsai#11132
Needed for Sphinx 5 compatibility. Should fix the following warning occurring in doc builds. ``` Invalid configuration value found: 'language = None'. Update your configuration to a valid langauge code. Falling back to 'en' (English). ``` xref: sphinx-doc/sphinx#10481 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Bradley Dice (https://github.com/bdice) - https://github.com/jakirkham - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#11133
This adds `duplicate_keep_option` to `cudf::distinct`, allowing to specify a `keep` option for selecting which of the duplicate elements to keep. It paves the way for many drop duplicate applications to achieve `O(n)` performance. A `KEEP_ANY` option is also added to `duplicate_keep_option`, which was an attempt in rapidsai#9417 but didn't get in eventually. Partially addresses rapidsai#11050 and rapidsai#11053. ---- Main implementation: https://github.com/rapidsai/cudf/pull/11052/files#diff-4c2d4268b3c50000ae845ba15a890bb743709c30e5cab4847af7ad633c5a2823R47 Follow up work: * rapidsai#11053 * rapidsai#11089 * rapidsai#11092 Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Yunsong Wang (https://github.com/PointKernel) - Bradley Dice (https://github.com/bdice) URL: rapidsai#11052
This PR resolves failures related to `__cuda_array_interface__` which were not being run for a very long time. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: rapidsai#11113
This adds `nan_equality` parameter to `cudf::distinct`, allowing to specify the desired behavior when dealing with floating-point data: `NaN` should be compared equally to other `NaN` or not. Depends on rapidsai#11052 (built on top of it). Closes rapidsai#11092. This is a blocker for set-like operations (rapidsai#11043) and also the last blocker for rapidsai#11053. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: rapidsai#11118
This is just a not-very-simple refactor to `lists::contains`: * Remove some wrong lines in doxygen of `lists/contains.hpp`, and rewrite doxygen there a little bit. * Add more comments to the code. * Reduce the number of code paths of the template struct functors. * Rename some/many variables, and reorganize code to make it cleaner. No new feature is added in this PR, just modifying the existing functions and moving things around. This PR is extracted from the bigger PR for easier review. The original PR is rapidsai#10548 for supporting nested type in `lists::contains`. As such, this blocks it. Authors: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#11019
Closes rapidsai#10862. The `PER_THREAD_DEFAULT_STREAM` build option was deprecated in branch-22.06 via rapidsai#10877, and replaced with the new build option `CUDF_USE_PER_THREAD_DEFAULT_STREAM`. This PR removes `PER_THREAD_DEFAULT_STREAM`. I am putting this up as a draft because I am not certain if we want to do this in 22.08 or wait for a later release? Authors: - Jim Brennan (https://github.com/jbrennan333) Approvers: - Jason Lowe (https://github.com/jlowe) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#11134
This ensures that a consistent cuco version is used across all of RAPIDS. See rapidsai/rapids-cmake#201. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Robert Maynard (https://github.com/robertmaynard) URL: rapidsai#11139
…11138) We can use `rapids_cpm_libcudacxx` directly without this extra function. The only value it adds is setting the `LIBCUDACXX_INCLUDE_DIR` variable, which we don't use. See also rapidsai/cuspatial#538 (comment) Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Mark Harris (https://github.com/harrism) - Robert Maynard (https://github.com/robertmaynard) URL: rapidsai#11138
…rapidsai#11128) This PR adds a Java method to find out if a Map column contains a list of keys Authors: - Raza Jafri (https://github.com/razajafri) Approvers: - Nghia Truong (https://github.com/ttnghia) - Jason Lowe (https://github.com/jlowe) URL: rapidsai#11128
This PR introduces a small new utility function `_from_data_like_self` to provide an additional hook for `Frame` and `IndexedFrame` to customize `_from_data`. Specifically in instances where the outputs need to share the index of the input for `IndexedFrame` methods, this hook provides the necessary entrypoint. This method is analogous to the already existing `_from_columns_like_self` method. In the past I made efforts to consolidate our factory methods, and I'm not super happy with proliferating more of them again. However, I think in this case it is worth the tradeoff since this gets us very close to our goal of removing the `_index` attribute from the `Frame` class. Moreover, in the intermediate term I expect both this method and `_from_data` to be subsumed by `_from_columns` when we eventually transition all of our internals (especially the `ColumnAccessor`) to store a list rather than a dict of columns. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#11140
This PR is a follow-up to rapidsai#11073. Its sole purpose is to remove the deprecated public `replace` API for Index objects. A stripped down, minimal version of this logic has been inlined in `DataFrame.rename`, which is the only place that needs that functionality. The scope of this PR is intentionally limited so that further changes can happen in parallel with changes associated with the removal of `Frame._index`, which should no longer be necessary after this PR is merged (aside from its potential usage in internal Cython code that can be freely worked around). Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: rapidsai#11131
…groupby.pct_change
Closing PR due to a bad merge. Replaced by #11144 |
Subsequent to #9805, this PR adds support for Groupby.pct_change() Fixes #9606 Replaces #10444 Authors: - Sheilah Kirui (https://github.com/skirui-source) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: #11144
Subsequent to #9805, this PR adds support for
Groupby.pct_change()
Fixes #9606