-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue on page /modules/generated/chainladder.IncrementalAdditive.html #377
Comments
Thanks for the review! This is a lesser used development estimator, so thanks for finding the poor coverage. Some of the secondary diagnostic properties have not been passed through to A general ambition for the package is to be able to mix and match development estimators with tail estimators with IBNR methods. In order to do this, we standardize the import chainladder as cl
tri = cl.load_sample("ia_sample")
ia_dev = cl.IncrementalAdditive().fit(
X=tri['loss'],
sample_weight=tri['exposure'].latest_diagonal)
# These match
assert (
cl.Chainladder().fit(ia_dev.transform(tri['loss'])).full_triangle_.loc[..., :72].round(0) ==
ia_dev.incremental_.incr_to_cum().round(0)) Standardizing the import chainladder as cl
tri = cl.load_sample("ia_sample")
pipe = cl.Pipeline(steps=[
('dev', cl.IncrementalAdditive()),
('tail', cl.TailCurve()), # Tail support provided via pipeline
('model', cl.BornhuetterFerguson(apriori=.6))]
)
pipe.fit(X=tri['loss'], sample_weight=tri['exposure'].latest_diagonal) There are several "methods" in actuarial literature where development and tail extrapolation are rolled into one method. For these, we do not add tail options to the development estimator. Instead, we deliberately break them up to support compatibility with other user-selected options. The Clark growth curves is a good example of this. To replicate the paper, one would need to build a pipeline that includes both import chainladder as cl
genins = cl.load_sample('genins')
pipe = cl.Pipeline(steps=[
('dev', cl.ClarkLDF()),
('tail', cl.TailClark()),
('model', cl.Chainladder()),])
pipe.fit(genins) However, if a user so chooses, they can swap out |
thanks for the walkthrough! i think i understand a bit better. with that new understanding, ldf_ and cdf_ makes sense to me the way they are specified. I think what I was trying to ask for is a cumulative version of zeta_, since it looks like zeta_ think it's cumulative genins = cl.load_sample("genins")
genins_dev = cl.Development().fit_transform(genins)
genins_cl = cl.Chainladder().fit(genins_dev)
genins_incr = cl.IncrementalAdditive().fit(genins, sample_weight = genins_cl.ultimate_ * 5)
print(genins_incr.zeta_)
print(genins_incr.zeta_.incr_to_cum())
print(genins_incr.zeta_.cum_to_incr()) i wonder if i may be able to persuade you to take a more relaxed stance on the relationship between the Development class and the IBNR Methods class. Kinda like how incr_to_cum() currently uses is_pattern to generalize between addition for triangles and multiplication for ldfs. What if there are additive Development classes and multiplicative Development classes. The way I think about the Incremental method is that it's a special case of the BF method, where the IBNR% is specified directly, rather than through 1/(1 - IBNR%), aka development factors. If there was a boolean or subclasses built around additive/multiplicative, then the IBNR methods can be flexible in how it chooses to interface with the Development class. I'm thinking about a 2-phase refresh effort. Please lemme know if it makes sense. Phase 1 - QoL improvements
Phase 2 - Enhanced integration with rest of chainladder package
import chainladder as cl
tri = cl.load_sample("ia_sample")
pipe = cl.Pipeline(steps=[
('dev', cl.IncrementalAdditive()),
('model', cl.BornhuetterFerguson(apriori=.6))]
)
test = pipe.fit(X=tri['loss'], sample_weight=tri['exposure'].latest_diagonal)
print(test.named_steps.model.full_triangle_.cum_to_incr())
print(test.named_steps.dev.incremental_)
|
That makes sense. My design style has generally puts the burden of conformity on the estimator outputs (e.g. impute a multiplicative Your phase 1 suggestions are great. I fully support them. Your phase 2 example looks the same with the new I support the direction you want to go in phase 2 as well. Its compatible with the end-user API and would improve the user experience. My biggest fear is that the software design skills needed to pull it off are pretty high. The way these classes interact today is novice to intermediate skill. To do it properly, I think we need some advanced software engineering or at the very least careful planning and scoping of estimator compatibility. |
created #381 for phase 1 fixes. ideally would have liked to create cum_zeta_ and fit_zeta_ as properties. But I think that would only work if I added them directly to Triangle? |
in further testing, #181 is popping up again. raa = cl.load_sample('raa')
raa.values[0,0,0,5:] = 18608
ult = cl.Chainladder().fit(raa)
incr = cl.IncrementalAdditive().fit(raa,sample_weight=ult.ultimate_)
print(incr.tri_zeta)
print(incr.zeta_) Take the 108 column as example. the 0 incremental from AY 1981 should really have contributed towards a lower average, but it's being ignored as NaN. I would take the position that 0 incremental occurs sufficiently frequently in real data. Will think through a solution and propose here. |
def nan_to_zero(tri,years = None):
xp = tri.get_array_module()
tric = tri.copy()
if years is None:
tric.values = xp.nan_to_num(tric.values) * tric.nan_triangle
else:
years_list = years if isinstance(years,list) else [years]
years_np = np.array(years_list)
origin_ind = np.where(np.array([tri.origin.astype("string")]) == years_np.reshape(-1,1))[1]
helper = tric.nan_triangle * 0
helper[origin_ind,:] = helper[origin_ind,:] * 0 + 1
tric.values = np.where(np.logical_and(np.isnan(tric.values), helper[None,None] == 1), 0.0, tric.values)
return tric
raa = cl.load_sample('raa')
raa.values[:,:,0:4,5:] = 18608
print(nan_to_zero(raa.cum_to_incr()))
print(nan_to_zero(raa.cum_to_incr(),'1981'))
print(nan_to_zero(raa.cum_to_incr(),['1981','1983'])) here is the function. the intent is to put it under triangle as an internal function (tri turns into self). lemme know what you think. |
@jbogaardt thoughts? |
Sorry for late response. Didn't see the original proposal. This seems great. There are a lot of places in the package where 0s are coerced to nans, and we should be deliberate when using your function. As the package matures we do need to reimplement some backend functionality to consider that nan and 0 are not the same. Your function is a step in that direction. In my own usage of Triangles, I build claim level triangles and rely heavily on the sparse array (fill_value=nan) for high throughput of data. For my use case, I store my triangles in incremental format and need the zeros to be nan so I don't blow out my machine's RAM. That is to say, there are some cases when we might want zeros represented by nan and other cases (like this issue) where we do not. The package currently turns zeros into nans always and that's probably not the right answer. Similarly turning all Nans into zeros is also not effective - especially for large sparse triangles. All this to say that when coercing nans to zeros we should consider the consequences on very large triangles. |
lemme add a sparse incremental test and tinker around. what's a good sparse data I can use? |
The "prism" dataset is a large sparse incremental triangle. import chainladder as cl
prism = cl.load_sample('prism')['Paid']
model = cl.Chainladder().fit(prism.sum())
claim_level_ult = model.predict(prism).ultimate_
claim_level_ult.groupby('Limit').sum() |
based on my primitive understanding of sparse, would it be correct to conclude that these couple lines in the class are essentially turning off sparse support? I haven't made full sense of the benktander implementation yet. My understanding is that much of it is to enable sparse support. (probably also why apriori doesn't support vector?). Meaning, it might actually make sense to deprecate incremental_, and just focus on adding additive support in BF? |
Yes, sorta. Its not well documented, but I went down the path of having any "development patterns" be dense. I couldn't think of a reason why we would ever have sparse development patterns. But I could think of plenty of reasons why we would want to multiply those development patterns by a sparse Triangle of individual claims. As such, arithmetic between a sparse triangle and a dense triangle (such as multiplying latest diagonal by CDFs) is designed to yield a sparse result. Indeed much of the logic in methods/base.py and even the methods themselves have wonky looking arithmetic to pare extraneous entries down to nans so that the sparse computations are more efficient. So, it is perhaps okay to have the estimators themselves create "coefficients" that are stored as dense, but |
I do see how in most cases dense coefficients work just fine. I don't think dense coefficients currently work for incrementaladditive. The conversion from zeta_ to ldf_ is done densely and depends on the dense latest diagonal. This certainly makes the CL estimator give odd results. I suspect BF as well. Once BF can take dense zeta_ directly, we should be able to get rid of the incremental_ attribute (nothing against incremental_, just seems like extra work to enable sparse support) and rely solely on BF for sparse-supported estimation. |
I have been mucking around the Incremental Additive method for some payment pattern work. Found some bugs and noticed a couple potential QoL improvements.
bugs:
other enhancements:
The text was updated successfully, but these errors were encountered: