-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise explanation of simulation duration #145
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except for one slightly confusing point about the SSA data which I noted above
@@ -356,6 +363,7 @@ contractors or self-employed individuals, while a W-2 form is used for employees | |||
pseudopeople can generate a simulated version of the data collected by W-2 and 1099 forms. | |||
This is a yearly dataset, where the user-specified year is the **tax year** of the data. | |||
That is, the data for 2022 will be the result of tax forms filed in early 2023. | |||
Tax data can be generated for tax years 2019 through 2040 (inclusive). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely going to be confused about tax year vs filing year. But I have confirmed that psp.generate_taxes_w2_and_1099(year=2019)
returns some simulated data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all the datasets (except SSA) start with 2019. But (unlike some other datasets) there is no tax data returned by psp.generate_taxes_w2_and_1099(year=2041)
because that would have to be filed in 2042, which we didn't simulate.
Revise explanation of simulation duration
Description
@NathanielBlairStahn pointed out that this was incorrect the way I had tried to oversimplify it before. The timespans available in different datasets are not exactly the same and need to be noted as such in more places.
Also, thanks to @rmudambi's comment on Slack, I have added some explanation here of the data generated in the final, partial year of the simulation.