Issue on page /modules/generated/chainladder.IncrementalAdditive.html #377

henrydingliu · 2022-11-23T01:36:08Z

I have been mucking around the Incremental Additive method for some payment pattern work. Found some bugs and noticed a couple potential QoL improvements.

bugs:

transformer doesn't have incremental_
ldf_ doesn't give the actual incremental factors (not sure if intentional)

other enhancements:

add support for cdf_
store sample_weights for
add support for tail

jbogaardt · 2022-11-23T13:20:06Z

Thanks for the review! This is a lesser used development estimator, so thanks for finding the poor coverage. Some of the secondary diagnostic properties have not been passed through to tranform as consistently as they should. We should be passing incremental_ through.

A general ambition for the package is to be able to mix and match development estimators with tail estimators with IBNR methods. In order to do this, we standardize the ldf_ to a multiplicative basis. For IncrementalAdditive the multiplicative ldf_ should produce run-off consistent with the incremental_ property:

import chainladder as cl
tri = cl.load_sample("ia_sample")
ia_dev = cl.IncrementalAdditive().fit(
    X=tri['loss'], 
    sample_weight=tri['exposure'].latest_diagonal)
# These match
assert (
    cl.Chainladder().fit(ia_dev.transform(tri['loss'])).full_triangle_.loc[..., :72].round(0) == 
    ia_dev.incremental_.incr_to_cum().round(0))

Standardizing the ldf_ interface to always be multiplicative allows for user-specified pipelines that support tail estimation:

import chainladder as cl
tri = cl.load_sample("ia_sample")
pipe = cl.Pipeline(steps=[
    ('dev', cl.IncrementalAdditive()),
    ('tail', cl.TailCurve()), # Tail support provided via pipeline
    ('model', cl.BornhuetterFerguson(apriori=.6))]
)
pipe.fit(X=tri['loss'], sample_weight=tri['exposure'].latest_diagonal)

There are several "methods" in actuarial literature where development and tail extrapolation are rolled into one method. For these, we do not add tail options to the development estimator. Instead, we deliberately break them up to support compatibility with other user-selected options. The Clark growth curves is a good example of this. To replicate the paper, one would need to build a pipeline that includes both ClarkLDF and TailClark:

import chainladder as cl
genins = cl.load_sample('genins')
pipe = cl.Pipeline(steps=[
    ('dev', cl.ClarkLDF()),
    ('tail', cl.TailClark()),
    ('model', cl.Chainladder()),])
pipe.fit(genins)

However, if a user so chooses, they can swap out TailClark with any other tail if they tail to come up with their own approach. While doing this may invalidate some properties of stochastic approaches, we leave it up to the user to manage the appropriateness of there models.

henrydingliu · 2022-11-23T20:16:50Z

thanks for the walkthrough! i think i understand a bit better.

with that new understanding, ldf_ and cdf_ makes sense to me the way they are specified. I think what I was trying to ask for is a cumulative version of zeta_, since it looks like zeta_ think it's cumulative

genins = cl.load_sample("genins")
genins_dev = cl.Development().fit_transform(genins)
genins_cl = cl.Chainladder().fit(genins_dev)
genins_incr = cl.IncrementalAdditive().fit(genins, sample_weight = genins_cl.ultimate_ * 5)
print(genins_incr.zeta_)
print(genins_incr.zeta_.incr_to_cum())
print(genins_incr.zeta_.cum_to_incr())

i wonder if i may be able to persuade you to take a more relaxed stance on the relationship between the Development class and the IBNR Methods class. Kinda like how incr_to_cum() currently uses is_pattern to generalize between addition for triangles and multiplication for ldfs. What if there are additive Development classes and multiplicative Development classes. The way I think about the Incremental method is that it's a special case of the BF method, where the IBNR% is specified directly, rather than through 1/(1 - IBNR%), aka development factors. If there was a boolean or subclasses built around additive/multiplicative, then the IBNR methods can be flexible in how it chooses to interface with the Development class.

I'm thinking about a 2-phase refresh effort. Please lemme know if it makes sense.

Phase 1 - QoL improvements

zeta_ and cumulative zeta_
store sample_weights, and allow retrieval
store incremental % triangle, and allow retrieval

Phase 2 - Enhanced integration with rest of chainladder package

for IBNR methods where it makes sense, allow the IBNR method to directly take zeta_ as the key input, rather than ldf_. (I'm using the old 0.8.13 full_triangle_ here. Maybe the new implementation already fixes it.)

import chainladder as cl
tri = cl.load_sample("ia_sample")
pipe = cl.Pipeline(steps=[
    ('dev', cl.IncrementalAdditive()),
    ('model', cl.BornhuetterFerguson(apriori=.6))]
)
test = pipe.fit(X=tri['loss'], sample_weight=tri['exposure'].latest_diagonal)
print(test.named_steps.model.full_triangle_.cum_to_incr())
print(test.named_steps.dev.incremental_)

add additive tail transformers or additive logic into existing tail transformers (the smoothing applied to additive-turned-multiplicative ldf, which tend to 1, isn't quite the same as directly smoothing the additive factors, which tend to 0)

jbogaardt · 2022-11-23T21:56:31Z

i wonder if i may be able to persuade you to take a more relaxed stance on the relationship between the Development class and the IBNR Methods class. Kinda like how incr_to_cum() currently uses is_pattern

That makes sense. My design style has generally puts the burden of conformity on the estimator outputs (e.g. impute a multiplicative ldf_) rather than conforming and validating inputs (e.g. having BornheutterFerguson appropriately deal with additive approaches), mainly because it has been easier to manage OOP coupling that way. I recognize that there is a trade-off there that should be reevaluated.

Your phase 1 suggestions are great. I fully support them.

Your phase 2 example looks the same with the new full_triangle_ implementation. Something buggy is going on there. I'll create a new issue for this.

I support the direction you want to go in phase 2 as well. Its compatible with the end-user API and would improve the user experience. My biggest fear is that the software design skills needed to pull it off are pretty high. The way these classes interact today is novice to intermediate skill. To do it properly, I think we need some advanced software engineering or at the very least careful planning and scoping of estimator compatibility.

henrydingliu · 2022-11-24T00:05:08Z

created #381 for phase 1 fixes.

ideally would have liked to create cum_zeta_ and fit_zeta_ as properties. But I think that would only work if I added them directly to Triangle?

henrydingliu · 2022-12-02T01:29:19Z

in further testing, #181 is popping up again.

raa = cl.load_sample('raa')
raa.values[0,0,0,5:] = 18608
ult = cl.Chainladder().fit(raa)
incr = cl.IncrementalAdditive().fit(raa,sample_weight=ult.ultimate_)
print(incr.tri_zeta)
print(incr.zeta_)

Take the 108 column as example. the 0 incremental from AY 1981 should really have contributed towards a lower average, but it's being ignored as NaN. I would take the position that 0 incremental occurs sufficiently frequently in real data. Will think through a solution and propose here.

henrydingliu · 2022-12-02T03:39:13Z

def nan_to_zero(tri,years = None):
    xp = tri.get_array_module()
    tric = tri.copy()
    if years is None:
        tric.values = xp.nan_to_num(tric.values) * tric.nan_triangle
    else:
        years_list = years if isinstance(years,list) else [years]
        years_np = np.array(years_list)
        origin_ind = np.where(np.array([tri.origin.astype("string")]) == years_np.reshape(-1,1))[1]
        helper = tric.nan_triangle * 0
        helper[origin_ind,:] = helper[origin_ind,:] * 0 + 1
        tric.values = np.where(np.logical_and(np.isnan(tric.values), helper[None,None] == 1), 0.0, tric.values)
    return tric
raa = cl.load_sample('raa')
raa.values[:,:,0:4,5:] = 18608
print(nan_to_zero(raa.cum_to_incr()))
print(nan_to_zero(raa.cum_to_incr(),'1981'))
print(nan_to_zero(raa.cum_to_incr(),['1981','1983']))

here is the function. the intent is to put it under triangle as an internal function (tri turns into self). lemme know what you think.

henrydingliu · 2022-12-04T18:45:02Z

@jbogaardt thoughts?

jbogaardt · 2022-12-04T21:37:49Z

Sorry for late response. Didn't see the original proposal. This seems great. There are a lot of places in the package where 0s are coerced to nans, and we should be deliberate when using your function. As the package matures we do need to reimplement some backend functionality to consider that nan and 0 are not the same. Your function is a step in that direction.

In my own usage of Triangles, I build claim level triangles and rely heavily on the sparse array (fill_value=nan) for high throughput of data. For my use case, I store my triangles in incremental format and need the zeros to be nan so I don't blow out my machine's RAM. That is to say, there are some cases when we might want zeros represented by nan and other cases (like this issue) where we do not. The package currently turns zeros into nans always and that's probably not the right answer. Similarly turning all Nans into zeros is also not effective - especially for large sparse triangles. All this to say that when coercing nans to zeros we should consider the consequences on very large triangles.

henrydingliu · 2022-12-04T22:09:55Z

lemme add a sparse incremental test and tinker around. what's a good sparse data I can use?

jbogaardt · 2022-12-05T01:03:24Z

The "prism" dataset is a large sparse incremental triangle.

import chainladder as cl
prism = cl.load_sample('prism')['Paid']

model = cl.Chainladder().fit(prism.sum())
claim_level_ult = model.predict(prism).ultimate_
claim_level_ult.groupby('Limit').sum()

henrydingliu · 2022-12-05T02:38:36Z

based on my primitive understanding of sparse, would it be correct to conclude that these couple lines in the class are essentially turning off sparse support?

I haven't made full sense of the benktander implementation yet. My understanding is that much of it is to enable sparse support. (probably also why apriori doesn't support vector?). Meaning, it might actually make sense to deprecate incremental_, and just focus on adding additive support in BF?

jbogaardt · 2022-12-05T02:51:54Z

Yes, sorta. Its not well documented, but I went down the path of having any "development patterns" be dense. I couldn't think of a reason why we would ever have sparse development patterns. But I could think of plenty of reasons why we would want to multiply those development patterns by a sparse Triangle of individual claims.

As such, arithmetic between a sparse triangle and a dense triangle (such as multiplying latest diagonal by CDFs) is designed to yield a sparse result. Indeed much of the logic in methods/base.py and even the methods themselves have wonky looking arithmetic to pare extraneous entries down to nans so that the sparse computations are more efficient.

So, it is perhaps okay to have the estimators themselves create "coefficients" that are stored as dense, but X and sample_weight should honor both sparse and dense formats.

henrydingliu · 2022-12-05T03:43:32Z

I do see how in most cases dense coefficients work just fine. I don't think dense coefficients currently work for incrementaladditive. The conversion from zeta_ to ldf_ is done densely and depends on the dense latest diagonal. This certainly makes the CL estimator give odd results. I suspect BF as well. Once BF can take dense zeta_ directly, we should be able to get rid of the incremental_ attribute (nothing against incremental_, just seems like extra work to enable sparse support) and rely solely on BF for sparse-supported estimation.

henrydingliu mentioned this issue Nov 28, 2022

[BUG] Inconsistency in ultimates and full triangle #380

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue on page /modules/generated/chainladder.IncrementalAdditive.html #377

Issue on page /modules/generated/chainladder.IncrementalAdditive.html #377

henrydingliu commented Nov 23, 2022

jbogaardt commented Nov 23, 2022

henrydingliu commented Nov 23, 2022 •

edited

Loading

jbogaardt commented Nov 23, 2022

henrydingliu commented Nov 24, 2022

henrydingliu commented Dec 2, 2022

henrydingliu commented Dec 2, 2022

henrydingliu commented Dec 4, 2022

jbogaardt commented Dec 4, 2022

henrydingliu commented Dec 4, 2022

jbogaardt commented Dec 5, 2022

henrydingliu commented Dec 5, 2022

jbogaardt commented Dec 5, 2022 •

edited

Loading

henrydingliu commented Dec 5, 2022

Issue on page /modules/generated/chainladder.IncrementalAdditive.html #377

Issue on page /modules/generated/chainladder.IncrementalAdditive.html #377

Comments

henrydingliu commented Nov 23, 2022

jbogaardt commented Nov 23, 2022

henrydingliu commented Nov 23, 2022 • edited Loading

jbogaardt commented Nov 23, 2022

henrydingliu commented Nov 24, 2022

henrydingliu commented Dec 2, 2022

henrydingliu commented Dec 2, 2022

henrydingliu commented Dec 4, 2022

jbogaardt commented Dec 4, 2022

henrydingliu commented Dec 4, 2022

jbogaardt commented Dec 5, 2022

henrydingliu commented Dec 5, 2022

jbogaardt commented Dec 5, 2022 • edited Loading

henrydingliu commented Dec 5, 2022

henrydingliu commented Nov 23, 2022 •

edited

Loading

jbogaardt commented Dec 5, 2022 •

edited

Loading