You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.
We want to support strings (UTF-8 encoded) as fast as possible inside of pandas. Therefore we need to implement several things. This will be split into many issues and hard to track just with the issue search, so we will list them all here.
We try to add the functionality in three stages:
Implement the functionality using plain Python operations. This will be the same speed as with pandas.StringDtype but already provides the API to fletcher users. This will allow us to add faster implementations bit-by-bit while already providing a fully usable library.
a) Also ensure that we have benchmarks setup to compare the pandas/object implementation to ours.
Given the algorithm isn't too complicated, we try to make an efficient implementation with numba. This will allow us to provide a fast algorithm with less implementation overhead then adding it to Apache Arrow.
For all methods, add an efficient implementation to Apache Arrow if there is none yet.
This project has been archived as development has ceased around 2021.
With the support of Apache Arrow-backed extension arrays in pandas, the major goal of this project has been fulfilled.
We want to support strings (UTF-8 encoded) as fast as possible inside of
pandas
. Therefore we need to implement several things. This will be split into many issues and hard to track just with the issue search, so we will list them all here.We try to add the functionality in three stages:
pandas.StringDtype
but already provides the API tofletcher
users. This will allow us to add faster implementations bit-by-bit while already providing a fully usable library.a) Also ensure that we have benchmarks setup to compare the
pandas/object
implementation to ours.numba
. This will allow us to provide a fast algorithm with less implementation overhead then adding it to Apache Arrow.numba
implementationpyarrow
implementationcapitalize
casefold
cat
center
contains
(exact match) ✅contains
(other)count
decode
encode
endswith
extract
extractall
find
findall
get
index
join
len
ljust
lower
lstrip
match
normalize
pad
partition
repeat
replace
rfind
rindex
rjust
rpartition
rstrip
slice
slice_replace
split
rsplit
startswith
strip
swapcase
title
translate
upper
wrap
zfill
isalnum
✅isalpha
✅isdigit
✅isspace
✅islower
✅isupper
✅istitle
✅isnumeric
✅isdecimal
✅get_dummies
The text was updated successfully, but these errors were encountered: