Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about within transformation in section 4.3 #1

Open
fengxiaoruo opened this issue Jun 15, 2023 · 0 comments
Open

Question about within transformation in section 4.3 #1

fengxiaoruo opened this issue Jun 15, 2023 · 0 comments

Comments

@fengxiaoruo
Copy link

Hi, I am Sharooo, a master student major in Economics.

The research I am doing now requires econometric analysis remotely on the Daas platform via pyspark. So, your work really helps to me. I have some questions about the code for section 4.3.

In the notebook for codes in section 4.3, the calculation for within-group data transformation is different from that shown in the paper. It seems that The mean within each individual is subtracted from the original data, and the mean within the sample is subtracted. Why do I need to subtract the mean in sample again(As shown in the code below)?

df_train_within = spark.sql("""SELECT id, time, target + (select avg(target) from df) as target, x1 + (select avg(x1) from df) as x1, x2 + (select avg(x2) from df) as x2, x3 + (select avg(x3) from df) as x3, x4 + (select avg(x4) from df) as x4, x5 + (select avg(x5) from df) as x5, x6 + (select avg(x6) from df) as x6, x7 + (select avg(x7) from df) as x7 FROM df""") df_train_within.createOrReplaceTempView("df_train_within")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant