-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Interpolate at new values #9340
Comments
See the (long) discussion at #4915 Basically, we wanted to keep the API of interpolate simple, but it's probably too clever since you have to be familiar with reindexing first. I'd actually favor changing the API of interploate to have a new parameter Also it could take some kind of parameter for whether to return just the interpolated values, or all the values. |
Probably I didn't address well. Actually, I see that to reindex is what I really meant, since I would say that Anyhow, to have a simple syntax (on |
Any news? @TomAugspurger |
@rubennj I agree, we should have some sort of interpolation method that works like It is a bit awkward from an internals perspective to put this on |
What about creating two interpolate methods: I worry that hiding this functionality in |
OK, I see that I think that one method, |
We could add a new optional parameter My hesitation with combining the functionality into one method is that there are at least two steps required to transform from one mode to the other, e.g., Another option would be move the |
I think a nice soln here is to create a cookbook recipe for this pattern and a link from the docs |
At least in my experience and anecdotally from my colleagues, the current API of interpolate (filling missing values) is unexpected and not what we were looking for. For example, it's pretty different from what scipy and numpy's interpolate functions do. So I think some sort of API change to make this more intuitive is warranted :). Interpolation at new values is also a very common pattern in my experience. The cookbook recipe will need to be even more complex than @rubennj's example if it is to handle propagating NA correctly, e.g., to ensure that the result of |
I also thought that I don't understand why overloading To transfer the current function to |
I'm in favor of adding a new method, since the behavior is different enough from the current df.interpolate_at(new_values, method='linear') @denfromufa are you interested in submitting a pull request for this? Otherwise I'll add it to my list. |
@TomAugspurger, sure I can submit, please confirm the logic:
Also should axis be an input or transposing is easy enough? |
My thoughts:
On Fri, Aug 14, 2015 at 7:17 PM, denfromufa [email protected]
|
copying the signature from interpolate() looks feasible. for lower level to avoid copies do you mean something like:
|
My thought was that it's probably worth looking at the implementation of df.interpolate (which calls scipy.interpolate) and using that logic at a lower level. On Sun, Aug 16, 2015 at 10:40 PM, denfromufa [email protected]
|
Keeping interpolate_at() at such low level (scipy) would force me to Please correct me if I do not understand something? The only reason why I would go that low level if we want to implement On Mon, Aug 17, 2015, 12:51 AM Stephan Hoyer [email protected]
|
OK, fair enough -- just thought it would be worth taking a look. I agree that we don't want to duplicate that logic, but it may be possible to refactor it pretty straightforwardly to make it work for both cases. |
two incompatible options: limit : int, default None. |
@denfromufa agreed, you can skip those. As far as overall API goes, I would still advocate for renaming the existing
|
for consistency let's call this |
IMO, interpolate_na and interpolate_at are better choices, like suggested On Tue, Aug 18, 2015, 4:38 PM Jeff Reback [email protected] wrote:
|
I don't like
@denfromufa one more thought on implementation. I'm pretty sure that def interpolate_na(series, inplace=False):
na_locs = series.isnull()
target = series.index[na_locs].values
new_values = series.values[~na_locs].interpolate_at(target)
if not inplace:
series = series.copy()
series.iloc[na_locs] = new_values
return series |
I'm looking at this code now to add interpolate_at() and have a hard time with this code:
can anyone explain it? |
@shoyer @TomAugspurger ok, firstIndex probably means first index in Series before reaching non-null value. All values below this (assuming Series is sorted) cannot be interpolated. Then why similarly lastIndex is not defined? |
@denfromufa It looks like the lack of this behavior at the end is a bug: #8000 |
ok, I started making changes in https://github.com/denfromufa/pandas, I have some TODO items in the code |
Today I found one corner-case, hopefully this can be fixed at lower-level.
If after
|
It is even deeper :( Duplicates need to be removed even before Hopefully others do not step on the same rake while I'm finishing my interpolation pull request.
|
latest version, previous one had bugs and sort() is deprecated:
Note that this accepts both Series and DataFrames. |
generally don't use inplace (it doesn't offer any benefit and makes code much harder to read) use Index operations rather than numpy functions |
Oh, I see! How is unique in pandas much faster than numpy?! |
Pandas uses a hash table, whereas numpy just sorts. On Mon, Nov 7, 2016 at 4:06 PM, denfromufa [email protected] wrote:
|
Here is cleaned up version using only pandas machinery, also fixed one more bug: def interpolate_at(df, new_idxs):
new_idxs = pd.Index(new_idxs)
df = df.drop_duplicates().dropna()
df = df.reindex(df.index.append(new_idxs).unique())
df = df.sort_index()
df = df.interpolate()
return df.ix[new_idxs] |
This is an interesting related question: |
To be clear, is the cookbook solution by denfromufa the current "best_for_many_cases" way to do this? |
@shoyer or @TomAugspurger - any more feedback for @denfromufa? |
For one thing, I believe But I also would be happy to see this feature in pandas. I thought it would fit well as an additonal method in |
|
First time I used the .interpolate() method I thought that it receives a new index and then interpolates on it, similar to scipy.interpolate.interp1d
From scipy web:
Later I saw the .reindex() method, so I understood that this role is done by .reindex(). However .reindex() is not really doing a powerful interpolation, just extending the current values using the method keyword.
The current way to achieve it (joining previous and new index and then using .reindex()), in version 0.15.0,
A simpler syntax could be accepting the methods from .interpolate() into the 'method' keyword of .reindex():
The text was updated successfully, but these errors were encountered: