Suggestion for Adjusting Difficulty Score to Use an Asymptote at 10 #697

richard-lacasse · 2024-10-16T16:18:04Z

I've noticed that with my cards, they very quickly reach the maximum Difficulty score of 10. This might be somewhat due to my specific parameters, but I suspect others might experience a similar pattern, perhaps to a lesser extent.

When I mark a card as Again or Hard, it increases the Difficulty score significantly. However, when I mark a card as Good, it barely changes. While this is likely intentional, after getting a card wrong about half a dozen times, its Difficulty score becomes stuck at 10. Even if I subsequently get the card right 100 times, getting it wrong just once pushes it back to the maximum Difficulty.

The issue with this is that sorting cards by Difficulty Ascending becomes less useful because many cards are tied at $D = 10$. As a result, I lose the granularity that this sorting method could provide. I think this could be addressed by using an asymptote instead of a hard cap at 10, while keeping the original equation for Difficulty ($D$) the same.

Currently, when $D$ exceeds 10, I believe the algorithm uses something like:

$$D\prime = min(D\prime, 10)$$

This means any information beyond 10 is lost each time it's capped. Instead, we could use an asymptote with a damping factor. For example:

1. Linear Dampening Factor

Let:

$$\Delta D = D\prime - D$$

Then update $D\prime$ using:

$$D\prime = D + \Delta D \cdot (1 - \frac{D}{10})$$

In this equation, as $D$ approaches 10, the factor $1 - \frac{D}{10}$ decreases, reducing the increment $\Delta D$ and causing $D\prime$ to approach 10 asymptotically.

2. Exponential Damping Factor

Alternatively, you could parameterize the damping factor similar to how it's done in the Stability equation:

$$D\prime = D + \Delta D \cdot e^{-w (10 - D)}$$

Here, $w$ would be a new weight and is a positive constant that controls the rate of damping. As $D$ approaches 10, the exponential term $e^{-w (10 - D)}$ decreases, causing $D\prime$ to approach 10 asymptotically.

Implementing an asymptote at 10 could preserve the usefulness of sorting by Difficulty and prevent the loss of information due to capping. It might be worth exploring this adjustment in the next version of FSRS to see if it improves the algorithm.

Expertium · 2024-10-16T16:23:32Z

I'll benchmark it

user1823 · 2024-10-17T05:35:16Z

This change will be useful for sorting.

However, I don't think that it would improve the algorithm because there won't be any significant change in the value of 11 - D.

I will expect a greater improvement if we simultaneously update the formula that uses D for calculating S. (But, I don't know what changes should be made in that formula.)

Expertium · 2024-10-17T10:50:59Z

I interpreted his suggestion as "Add this to the main formula".
And surprisingly, it (kind of) worked!

I am shocked. I have tried so many changes to the D formula, so the fact that this helps even by 1% is surprising.
@L-M-Sherlock we can improve RMSE by 1% (relatively) by doing this:

    def next_d(self, state: Tensor, rating: Tensor) -> Tensor:
        delta_d = - self.w[6] * (rating - 3)
        new_d = state[:, 1] + delta_d * (1 - (state[:, 1] / 10))  # linear damping
        new_d = self.mean_reversion(self.init_d(4), new_d)
        return new_d

It's not much, but it's "free" - no new parameters are needed

L-M-Sherlock · 2024-10-17T12:58:44Z

OK. I will benchmark it in 20k collections tomorrow. If the result is stable, I will add it to FSRS-5.

L-M-Sherlock · 2024-10-18T02:15:07Z

One problem: when D = 10, 1 - D / 10 = 0, rating Easy will not decrease D.

Expertium · 2024-10-18T02:20:22Z

Dang. I'll modify it and benchmark again

L-M-Sherlock · 2024-10-18T02:35:53Z

    def linear_damping_one_way(self, delta_d: Tensor, old_d: Tensor) -> Tensor:
        return torch.where(delta_d <= 0, delta_d, delta_d * (10 - old_d) / 9)

    def next_d(self, state: Tensor, rating: Tensor) -> Tensor:
        delta_d = -self.w[6] * (rating - 3)
        new_d = state[:, 1] + self.linear_damping_one_way(delta_d, state[:, 1])
        new_d = self.mean_reversion(self.init_d(4), new_d)
        return new_d

I will benchmark this one.

Expertium · 2024-10-18T02:36:49Z

Ok, looks good. Why are you dividing by 9, though?
Oh, ok, I get it. Because 10-D is never 10, it's at most 9.

L-M-Sherlock · 2024-10-18T03:05:38Z

Weird. The variant is worse than FSRS-5 in my initial test with top 352 collections:

Model: FSRS-5-dev
Total number of users: 352
Total number of reviews: 10744330
Weighted average by reviews:
FSRS-5-dev-1 LogLoss (mean±std): 0.2939±0.1591
FSRS-5-dev-1 RMSE(bins) (mean±std): 0.0495±0.0335
FSRS-5-dev-1 AUC (mean±std): 0.7001±0.0771

Weighted average by log(reviews):
FSRS-5-dev LogLoss (mean±std): 0.3348±0.1636
FSRS-5-dev RMSE(bins) (mean±std): 0.0681±0.0416
FSRS-5-dev AUC (mean±std): 0.6950±0.0892

Weighted average by users:
FSRS-5-dev LogLoss (mean±std): 0.3380±0.1646
FSRS-5-dev RMSE(bins) (mean±std): 0.0706±0.0426
FSRS-5-dev AUC (mean±std): 0.6942±0.0914

Model: FSRS-5
Total number of users: 352
Total number of reviews: 10744330
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.2938±0.1587
FSRS-5 RMSE(bins) (mean±std): 0.0493±0.0332
FSRS-5 AUC (mean±std): 0.6989±0.0813

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3344±0.1629
FSRS-5 RMSE(bins) (mean±std): 0.0677±0.0411
FSRS-5 AUC (mean±std): 0.6949±0.0905

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3375±0.1639
FSRS-5 RMSE(bins) (mean±std): 0.0701±0.0421
FSRS-5 AUC (mean±std): 0.6941±0.0927

L-M-Sherlock · 2024-10-18T03:08:46Z

I'm benchmarking the two-way method. And it performs better:

Model: FSRS-5-dev
Total number of users: 118
Total number of reviews: 3151255
Weighted average by reviews:
FSRS-5-dev LogLoss (mean±std): 0.3465±0.1428
FSRS-5-dev RMSE(bins) (mean±std): 0.0557±0.0322
FSRS-5-dev AUC (mean±std): 0.6978±0.0639

Weighted average by log(reviews):
FSRS-5-dev LogLoss (mean±std): 0.3624±0.1551
FSRS-5-dev RMSE(bins) (mean±std): 0.0726±0.0395
FSRS-5-dev AUC (mean±std): 0.6995±0.0924

Weighted average by users:
FSRS-5-dev LogLoss (mean±std): 0.3631±0.1570
FSRS-5-dev RMSE(bins) (mean±std): 0.0747±0.0398
FSRS-5-dev AUC (mean±std): 0.6987±0.0955

Model: FSRS-5
Total number of users: 118
Total number of reviews: 3151255
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3474±0.1433
FSRS-5 RMSE(bins) (mean±std): 0.0563±0.0322
FSRS-5 AUC (mean±std): 0.6952±0.0648

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3629±0.1555
FSRS-5 RMSE(bins) (mean±std): 0.0728±0.0394
FSRS-5 AUC (mean±std): 0.6977±0.0942

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3637±0.1573
FSRS-5 RMSE(bins) (mean±std): 0.0749±0.0397
FSRS-5 AUC (mean±std): 0.6969±0.0975

So... should we implement it?

Expertium · 2024-10-18T10:22:55Z

The two-way is the one where Easy (or any other grade) doesn't change difficulty if it's already at 10? No, let's not. Users will definitely complain.

L-M-Sherlock · 2024-10-18T10:37:46Z

Yeah. But it improves RMSE by 1% (relatively) among 8493 collections. It seems to say: ease hell is a myth, as similar to #695.

Expertium · 2024-10-18T10:39:10Z

I still think it would increase the number of complaints from users. @user1823 thoughts?

user1823 · 2024-10-18T10:51:10Z

Not decreasing the D when the user presses Easy seems to be wrong. But, the RMSE is decreasing. There can be two scenarios that explain this:

The better prediction of R at other reviews is offsetting the slight inaccuracy introduced by not decreasing the D.
The D shouldn't be decreased because the user is overestimating their learning.

In either case, I think that implementing this would be worth it.

However, it will be great if we can think of some modification that prevents this problem while still producing a similar decrease in RMSE.

Expertium · 2024-10-18T10:53:03Z

I'm afraid that "I pressed Easy and D is still at 100%" will outweigh any algorithmic benefits

user1823 · 2024-10-18T11:07:46Z

If @richard-lacasse has an idea to solve this problem, then it would be great.

Otherwise, I think that we should implement this for the reason I mentioned here.

I'm afraid that "I pressed Easy and D is still at 100%" will outweigh any algorithmic benefits

To those users, say "just trust the algorithm".

Moreover, this issue is really not that important because with the proposed change, having a card with D = 10 is difficult.

Even currently, I have only 171 cards out of 15000 cards that have D exactly equal to 10, even though 30% of my collection has D between 9 and 10.

richard-lacasse · 2024-10-18T12:12:05Z

It's an asymptote, the Difficulty should never reach 10 exactly, that's the point. You'd have to hit Again so many times that it get smaller than machine epsilon, but not sure that's gonna be possible.

Expertium · 2024-10-18T12:13:16Z

Still, suppose that D=9.9. In that case difficulty will barely change if the user presses Easy.

richard-lacasse · 2024-10-18T12:14:47Z

But it will change a lot relative to the other cards in that space. If you want the changes to be reflected so users can see it, make Difficulty displayed on a log scale or something. I already have 99% of my cards showing up in the 90%-100% range, so the graph is already pretty useless.

richard-lacasse · 2024-10-18T12:19:25Z

Moreover, this issue is really not that important because with the proposed change, having a card with D = 10 is difficult.

Even currently, I have only 171 cards out of 15000 cards that have D exactly equal to 10, even though 30% of my collection has D between 9 and 10.

Maybe I'm just weird and this isn't a problem for other people.

This is what my D distribution looks like. When I do a search, 2650/8888 cards fall into prop:d=1, which is maxed out.

richard-lacasse · 2024-10-18T12:23:38Z

Still, suppose that D=9.9. In that case difficulty will barely change if the user presses Easy.

One issue I can see is that even though it will be fine for sorting purposes, D is still affecting S, so stability won't get as affected by the Easy button once it get's close to 10. But if it's performing better in the simulations anyway...

But, it will take much longer to get near 9.9 because of the asymptote. Every change that's closer to 10 is smaller, so there will have to be a lot of Again button presses to even get near that.

Expertium · 2024-10-18T12:26:42Z

Alright, fine. @L-M-Sherlock it's up to you. Implement the two-way version if you want

richard-lacasse · 2024-10-18T12:35:45Z

Are people ever hitting Easy on high difficulty cards anyway? I know I never do. Those are the cards that have given me the most trouble.

user1823 · 2024-10-18T13:01:18Z

But, it will take much longer to get near 9.9 because of the asymptote. Every change that's closer to 10 is smaller, so there will have to be a lot of Again button presses to even get near that.

I did an experiment to confirm that.

With my parameters and rating_sequence 3, 1, 1, 1, 1, 1, 1, 1, I got the following successive values of D.

Current formula:
5.3624, 9.0620, 10, 10, 10, 10, 10, 10

Proposed formula:
5.3624, 7.0779, 8.1582, 8.8385, 9.2669, 9.5366, 9.7065, 9.8134

So, even though the current formula makes the D equal to 10 just after pressing Again twice, the proposed formula takes D to 9.8 even after 7 Again button presses.

L-M-Sherlock · 2024-10-19T09:41:50Z

Weighted by number of reviews

Model	Parameters	LogLoss	RMSE (bins)	AUC
FSRS-5-Linear Dampening	19	0.319±0.0053	0.049±0.0010	0.702±0.0028
FSRS-5	19	0.320±0.0052	0.050±0.0010	0.700±0.0028

The new median parameters: [0.40255, 1.18385, 3.173, 15.69105, 7.1949, 0.5345, 1.4604, 0.0046, 1.54575, 0.1192, 1.01925, 1.9395, 0.11, 0.29605, 2.2698, 0.2315, 2.9898, 0.51655, 0.6621]

The old median parameters: [0.4043, 1.18225, 3.13915, 15.6764, 7.2285, 0.4904, 1.0644, 0.0239, 1.6253, 0.1423, 1.0983, 1.95, 0.1014, 0.2998, 2.2834, 0.2407, 2.967, 0.5045, 0.66445]

brishtibheja · 2024-10-19T09:43:23Z

Maybe I'm just weird and this isn't a problem for other people.

@richard-lacasse I have known other users with this so it's not a problem specific to you. I like your idea, and don't think practically anyone would reach a high D value, then press Easy and fret about how DSR value changed.

L-M-Sherlock · 2024-10-19T09:51:38Z

It is worth noting that w[7] is reduced from 0.0239 to 0.0046. And w[9] is reduced from 0.1423 to 0.1192.

DerIshmaelite · 2024-10-19T10:20:44Z

Moreover, this issue is really not that important because with the proposed change, having a card with D = 10 is difficult.
Even currently, I have only 171 cards out of 15000 cards that have D exactly equal to 10, even though 30% of my collection has D between 9 and 10.

Maybe I'm just weird and this isn't a problem for other people.

This is what my D distribution looks like. When I do a search, 2650/8888 cards fall into prop:d=1, which is maxed out.

@richard-lacasse Nope you are pretty normal. In the current update, my new decks clock around pretty ungodly Difficulty values.

richard-lacasse · 2024-10-19T13:01:37Z

the proposed formula takes D to 9.8 even after 7 Again button presses.

Can you run it to see how long it takes to reach 10 exactly? Mathematically that's impossible, but it will reach machine epsilon at some point and probably round up to 10.

user1823 · 2024-10-19T13:05:01Z

Assuming that I wrote the code correctly, the D didn't reach 9.995 even after 2000 consecutive Again presses.

L-M-Sherlock · 2024-10-19T13:40:00Z

It is worth noting that w[7] is reduced from 0.0239 to 0.0046.

Will it cause more complaints?

user1823 · 2024-10-19T13:42:14Z

It is worth noting that w[7] is reduced from 0.0239 to 0.0046.

Will it cause more complaints?

I don't think so. If D increases slowly on pressing Again, then it should also decrease slowly on pressing Good.

L-M-Sherlock · 2024-10-19T14:37:20Z

OK, I will implement it in FSRS-rs and benchmark it.

Expertium · 2024-10-19T14:39:52Z

You need to implement it both in the Rust version and in the Python version, otherwise the comparison won't be fair.

L-M-Sherlock · 2024-10-19T14:44:02Z

You need to implement it both in the Rust version and in the Python version, otherwise the comparison won't be fair.

I just haven't pushed the updated python version to the repo. The benchmark result above is based on the Python version.

L-M-Sherlock · 2024-10-20T03:50:11Z

With the Linear Dampening, I have more cards whose difficulty is in 10th bin...

My previous parameters: [1.0962, 1.543, 7.8692, 12.0038, 8.1849, 0.5031, 0.6852, 0.001, 1.3281, 0.0796, 0.8666, 2.5417, 0.0128, 0.2952, 0.7547, 0.0001, 3.2912, 0.1097, 0.6747]

My current parameters: [1.114, 1.601, 8.0182, 12.2391, 8.0261, 0.5454, 3.0807, 0.0026, 1.255, 0.0208, 0.7996, 2.5398, 0.0122, 0.3219, 0.3941, 0.0001, 3.6492, 0.102, 0.6778]

The w[6] increases significantly from 0.68 to 3.08.

The difficulty increases quickly, too.

However, it improves the metric:

By the way, the model version will be updated to FSRS-5.5 if we introduce this change because I have released FSRS-5 in some packages.

L-M-Sherlock · 2024-10-20T06:46:20Z

Bad news: the change makes FSRS-rs worse. I don't know why. I need to check the consistency between the Python version and the Rust version again.

Update: OK, I find out a serious problem and fix it:

Revert "remove pow polyfill" fsrs-rs#240

Expertium · 2024-10-20T11:58:14Z

By the way, the model version will be updated to FSRS-5.5 if we introduce this change because I have released FSRS-5 in some packages.

Does this mean that the benchmark will have FSRS-5 and FSRS-5.5?
Also, maybe call it FSRS-5.1 and not FSRS-5.5? It's a pretty small change, after all.

L-M-Sherlock · 2024-10-20T12:08:47Z

Also, maybe call it FSRS-5.1 and not FSRS-5.5? It's a pretty small change, after all.

It's not a small change because some parameters have been changed significantly. Actually, I guess we need to do something like:

Properly convert FSRS 4.5 parameters into FSRS 5 parameters fsrs-rs#226

user1823 · 2024-10-20T14:55:45Z

some parameters have been changed significantly.

I agree. For example,

My parameters with current main:
1.0588, 4.7011, 31.5471, 79.2369, 7.561, 0.9454, 3.0717, 0.0, 1.8531, 0.3046, 1.3861, 1.6149, 0.0357, 0.4327, 1.8443, 0.0003, 6.0, 0.9915, 0.6734

My parameters with linear damping:
1.0652, 4.7012, 30.4195, 79.3164, 7.5425, 1.0393, 3.8364, 0.0, 1.8143, 0.292, 1.3124, 1.5111, 0.0276, 0.4369, 1.7987, 0.0003, 6.0, 1.013, 0.6778

So, there is a 25% increase in w6 and a 23% decrease in w12. The other parameters have changed slightly.

RMSE with old parameters evaluated using old optimizer: 0.0140
RMSE with new parameters evaluated using new optimizer: 0.0138
RMSE with old parameters evaluated using new optimizer: 0.0156

This means that if the user doesn't optimize their parameters after updating, their RMSE will increase. (This is true for almost any change to the algorithm, though.)

Actually, I guess we need to do something like:

Properly convert FSRS 4.5 parameters into FSRS 5 parameters fsrs-rs#226

Unfortunately, finding a mathematical relation between the old and new parameters will be very difficult in this case.

Expertium · 2024-10-20T15:04:14Z

It's not a small change because some parameters have been changed significantly.

But the impact on metrics is small, around 1%. I think FSRS-5.1 makes more sense than FSRS-5.5 in this case

L-M-Sherlock · 2024-10-21T08:42:49Z

Another problem: what if the initial difficulty is 10? In this case, the difficulty is immutable (if w[7] is 0).

Fine. Now we have the reason to set a non-zero lower limit for w[7].

Gilfaro · 2024-10-21T12:31:39Z

Just as discussed in the other topic about w7 #695 (comment) this change has shifted most optimal value of w7 more into negative, so this means optimizer wants to shit all cards into D=10 and clipper being mode value means that for most of the people all cards will want to be at max D all of the time.

Expertium · 2024-10-21T12:32:44Z

Then I guess we shouldn't implement damping after all.

Gilfaro · 2024-10-21T12:39:25Z

The problem is not with the damping itself, but optimizer prefers D=10 with the current setup, so damping D will mean that optimizer accommodates by decreasing w7 even further till it clips to whatever value you set.
The best solution would be to investigate why and change the formulas so that it doesn't go into that optimization path.

Expertium · 2024-10-21T12:47:06Z

I've experimented with dozens of changes to the D formula and nothing worked. At this point I'm just waiting for someone smarter than me to completely re-define D.

L-M-Sherlock · 2024-10-22T09:20:28Z

A conservative converting is new w[6] = old w[6] + 0.5.

richard-lacasse · 2024-10-26T19:40:01Z

I've experimented with dozens of changes to the D formula and nothing worked. At this point I'm just waiting for someone smarter than me to completely re-define D.

@Expertium I know you've tried R as a parameter in the Difficulty equation, but did you also take R out of the parameter list in Stability? D is a parameter in Stability, so if R is a parameter in both, that might explain why adding R to Difficulty didn't help anything.

Expertium · 2024-10-26T19:46:19Z

R affecting the increase in stability is one of the most important principles of the DSR model. Without it, we can't model the spacing effect since R depends on interval length, so removing it from the formula of S would mean that S doesn't depend on the interval length.

richard-lacasse · 2024-10-26T21:05:31Z

S' depends on S, which has the interval length information, and if R is in the formula for Difficulty, and D is in the formula for Stability, then R is indirectly affecting the increase in Stability still.

I'm spitballing here a little bit, but this is the first thing I would toy with.

richard-lacasse added the enhancement New feature or request label Oct 16, 2024

This was referenced Oct 20, 2024

Feat/linear damping && w[7] >= 0.001 open-spaced-repetition/fsrs-optimizer#143

Merged

Feat/linear damping open-spaced-repetition/fsrs-rs#239

Merged

L-M-Sherlock closed this as completed in open-spaced-repetition/fsrs-rs#239 Oct 22, 2024

Suggestion for Adjusting Difficulty Score to Use an Asymptote at 10 #697

Suggestion for Adjusting Difficulty Score to Use an Asymptote at 10 #697

Comments

richard-lacasse commented Oct 16, 2024

1. Linear Dampening Factor

2. Exponential Damping Factor

Expertium commented Oct 16, 2024 • edited Loading

user1823 commented Oct 17, 2024

Expertium commented Oct 17, 2024 • edited Loading

L-M-Sherlock commented Oct 17, 2024

L-M-Sherlock commented Oct 18, 2024

Expertium commented Oct 18, 2024

L-M-Sherlock commented Oct 18, 2024

Expertium commented Oct 18, 2024 • edited Loading

L-M-Sherlock commented Oct 18, 2024

L-M-Sherlock commented Oct 18, 2024

Expertium commented Oct 18, 2024 • edited Loading

L-M-Sherlock commented Oct 18, 2024

Expertium commented Oct 18, 2024

user1823 commented Oct 18, 2024

Expertium commented Oct 18, 2024

user1823 commented Oct 18, 2024 • edited Loading

richard-lacasse commented Oct 18, 2024

Expertium commented Oct 18, 2024

richard-lacasse commented Oct 18, 2024

richard-lacasse commented Oct 18, 2024

richard-lacasse commented Oct 18, 2024

Expertium commented Oct 18, 2024

richard-lacasse commented Oct 18, 2024

user1823 commented Oct 18, 2024

L-M-Sherlock commented Oct 19, 2024

Weighted by number of reviews

brishtibheja commented Oct 19, 2024

L-M-Sherlock commented Oct 19, 2024

DerIshmaelite commented Oct 19, 2024 • edited Loading

richard-lacasse commented Oct 19, 2024

user1823 commented Oct 19, 2024

L-M-Sherlock commented Oct 19, 2024

user1823 commented Oct 19, 2024

L-M-Sherlock commented Oct 19, 2024

Expertium commented Oct 19, 2024

L-M-Sherlock commented Oct 19, 2024

L-M-Sherlock commented Oct 20, 2024 • edited Loading

L-M-Sherlock commented Oct 20, 2024 • edited Loading

Expertium commented Oct 20, 2024 • edited Loading

L-M-Sherlock commented Oct 20, 2024

user1823 commented Oct 20, 2024

Expertium commented Oct 20, 2024

L-M-Sherlock commented Oct 21, 2024 • edited Loading

Gilfaro commented Oct 21, 2024

Expertium commented Oct 21, 2024

Gilfaro commented Oct 21, 2024 • edited Loading

Expertium commented Oct 21, 2024

L-M-Sherlock commented Oct 22, 2024

richard-lacasse commented Oct 26, 2024

Expertium commented Oct 26, 2024

richard-lacasse commented Oct 26, 2024

Expertium commented Oct 16, 2024 •

edited

Loading

Expertium commented Oct 17, 2024 •

edited

Loading

Expertium commented Oct 18, 2024 •

edited

Loading

Expertium commented Oct 18, 2024 •

edited

Loading

user1823 commented Oct 18, 2024 •

edited

Loading

DerIshmaelite commented Oct 19, 2024 •

edited

Loading

L-M-Sherlock commented Oct 20, 2024 •

edited

Loading

L-M-Sherlock commented Oct 20, 2024 •

edited

Loading

Expertium commented Oct 20, 2024 •

edited

Loading

L-M-Sherlock commented Oct 21, 2024 •

edited

Loading

Gilfaro commented Oct 21, 2024 •

edited

Loading