Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there problem in my implementation #9

Open
traderforce opened this issue Apr 12, 2020 · 3 comments
Open

Is there problem in my implementation #9

traderforce opened this issue Apr 12, 2020 · 3 comments

Comments

@traderforce
Copy link

I want to detect interaction and I use real dataset in paper: Detecting Statistical Interactions with Additive Groves of Trees, but cannot get the same result.

image

I use the real dataset Kinematics, which use parameters like above.
I split the data, four part is train, the remain one is validation file. The attribute file is like this:

image

then is the train file:

image

As illiustrated on the website material, I first run ag_train with parameters like below.
image

log.txt shows like this:
image

image

the parameters are different from illustrated on paper. Then I do the feature selection with the parameters trained model give, it tells only four features are used , so I wonder what problems are in my step.
image

Thank you.

@dariasor
Copy link
Owner

Kinematics has multiple data sets, did you use specifically kin8nm ?
The package and the algorithms have evolved since the paper was published, but it is strange that a much smaller model is chosen and no expansion by bagging is suggested. Can you send me the output file called performance.txt as well as your split of the data set? I'll take a look. e-mail is [email protected]

@traderforce
Copy link
Author

Much thanks for your help, I have sent the files to your gmail and my email account is [email protected]. Details are written in the email.

@dariasor
Copy link
Owner

Thanks! It seems that with this train/validation data split we don't get a good large model needed for fune-tuned interaction detection. These public repositories are not huge, and it is quite possible that there is a lot of variance between different splits. Did you randomize the data before splitting it? I definitely remember that we did for the paper. If the order of the data points is non-random, you end up with added difference between train and validation data sets, and smaller model as a result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants