-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested cross-validation #163
Comments
To implement CV in matlab what you need to do are
- randomly permute data by randperm()
- use a for loop to get each validation fold
num_per_fold = ceil(num_data/num_fold);
for i = 1 : num_fold
range = (i-1)*num_per_fold + 1 : min(num_data, i*num_per_fold);
- then use this "range" to extract the validation fold. The training
fold can be get by a similar way
- then do training/prediction, and aggregate results to get CV acuracy
- for nested CV I think you mean 2-level CV. You can use a 2-level for
loop on that
…On 2020-02-20 23:56, ziqianwang9 wrote:
Dear Lin,
thanks for providing this useful toolbox. I'm trying to use it to
publish a paper, here I met some problem from the reviewer. He
suggested me to use the nested cross-validation.
Here I list the script I used for my study:
clear all;
load median20190923.mat
%leave-one-out cross-validation
w = zeros(size(data_all));% weight
h = waitbar(0,'please wait..');
for i = 1:size(data_all,1)
waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))])
new_DATA = data_all;
new_label = label;
test_data = data_all(i,:); new_DATA(i,:) = []; train_data =
new_DATA;
test_label = label(i,:);new_label(i,:) = [];train_label =
new_label;
% Data Normalization
[train_data,PS] = mapminmax(train_data',0,1);
test_data = mapminmax('apply',test_data',PS);
train_data = train_data';
test_data = test_data';
% RFE feature selectioin
step = 1;
ftRank = SVMRFE(train_label,train_data, step,'-t 0');
IX = ftRank(1:ceil(length(ftRank)*0.4));
[bestacc,bestc] =
SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1);
cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1'];
model = svmtrain(train_label,train_data(:,IX),cmd);
w(i,IX) = model.SVs'*model.sv_coef;
[predicted_label, accuracy, deci] =
svmpredict(test_label,test_data(:,IX),model);
acc(i,1) = accuracy(1);
deci_value(i,1) = deci;
% clear test_data train_data test_label train_label model IX k
end
w_msk = double(sum(w~=0,1)==size(w,1));
w = mean(w,1).*w_msk;
acc_final = mean(acc);
disp(['accuracy - ',num2str(acc_final)]);
% ROC
[X,Y,T,AUC] = perfcurve(label,deci_value,1);
figure;plot(X,Y);hold on;plot(X,X,'-');
xlabel('False positive rate'); ylabel('True positive rate');
for i=1:length(X)
Cut_off(i,1) = (1-X(i))*Y(i);
end
[~,maxind] = max(Cut_off);
Specificity = 1-X(maxind);
Sensitivty = Y(maxind);
disp(['Specificity= ', num2str(Specificity)]);
disp(['Sensitivty= ', num2str(Sensitivty)]);
fprintf('Permutation test ......\n');
Nsloop = 5000;
auc_rand = zeros(Nsloop,1);
for i=1:Nsloop
label_rand = randperm(length(label));
deci_value_rand = deci_value(label_rand);
[~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1);
clear label_rand
end
p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1);
disp(['Pvalue= ', num2str(p_auc)]);
Here, what I used is leave-one-out cross-valitaion. But the reviewer
suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al.,
Neuroimage, 2017) and K-fold.
Since I am not familiar with nested cross-validation. Is it any
possible we perform it based on your libsvm? If it is, could you
please give me some clue how to achieve this?
Best,
Ziqian
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub [1], or unsubscribe
[2]. [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage",
"potentialAction": { ***@***.***": "ViewAction", "target":
"#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A",
"url":
"#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A",
"name": "View Issue" }, "description": "View this Issue on GitHub",
"publisher": { ***@***.***": "Organization", "name": "GitHub", "url":
"https://github.com" } } ]
Links:
------
[1]
#163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A
[2]
https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ
|
Thank your for your reply.
As the knowledge I have, the nested is not 2-level CV. This figure could illustrate what is nested CV:
The nested CV has an inner loop CV nested in an outer CV. The inner loop is responsible for model selection/hyperparameter tuning (similar to validation set), while the outer loop is for error estimation (test set).
My question is how is our '[bestacc,bestc] =SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1)’ working on hyperparameter tuning? Do we use the similar method?
If not, can we combine it with SVMcgForClass_NoDisplay_linear?
Any response will be helpful.
Best,
Ziqian
… 在 2020年2月20日,下午10:30,Chih-Jen Lin ***@***.***> 写道:
To implement CV in matlab what you need to do are
- randomly permute data by randperm()
- use a for loop to get each validation fold
num_per_fold = ceil(num_data/num_fold);
for i = 1 : num_fold
range = (i-1)*num_per_fold + 1 : min(num_data, i*num_per_fold);
- then use this "range" to extract the validation fold. The training
fold can be get by a similar way
- then do training/prediction, and aggregate results to get CV acuracy
- for nested CV I think you mean 2-level CV. You can use a 2-level for
loop on that
On 2020-02-20 23:56, ziqianwang9 wrote:
> Dear Lin,
> thanks for providing this useful toolbox. I'm trying to use it to
> publish a paper, here I met some problem from the reviewer. He
> suggested me to use the nested cross-validation.
> Here I list the script I used for my study:
>
> clear all;
> load median20190923.mat
>
> %leave-one-out cross-validation
> w = zeros(size(data_all));% weight
> h = waitbar(0,'please wait..');
>
> for i = 1:size(data_all,1)
>
> waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))])
> new_DATA = data_all;
> new_label = label;
> test_data = data_all(i,:); new_DATA(i,:) = []; train_data =
> new_DATA;
> test_label = label(i,:);new_label(i,:) = [];train_label =
> new_label;
>
> % Data Normalization
> [train_data,PS] = mapminmax(train_data',0,1);
> test_data = mapminmax('apply',test_data',PS);
> train_data = train_data';
> test_data = test_data';
>
> % RFE feature selectioin
> step = 1;
> ftRank = SVMRFE(train_label,train_data, step,'-t 0');
> IX = ftRank(1:ceil(length(ftRank)*0.4));
>
> [bestacc,bestc] =
> SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1);
> cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1'];
>
> model = svmtrain(train_label,train_data(:,IX),cmd);
> w(i,IX) = model.SVs'*model.sv_coef;
> [predicted_label, accuracy, deci] =
> svmpredict(test_label,test_data(:,IX),model);
> acc(i,1) = accuracy(1);
> deci_value(i,1) = deci;
> % clear test_data train_data test_label train_label model IX k
> end
> w_msk = double(sum(w~=0,1)==size(w,1));
> w = mean(w,1).*w_msk;
> acc_final = mean(acc);
> disp(['accuracy - ',num2str(acc_final)]);
>
> % ROC
> [X,Y,T,AUC] = perfcurve(label,deci_value,1);
> figure;plot(X,Y);hold on;plot(X,X,'-');
> xlabel('False positive rate'); ylabel('True positive rate');
>
> for i=1:length(X)
> Cut_off(i,1) = (1-X(i))*Y(i);
> end
> [~,maxind] = max(Cut_off);
> Specificity = 1-X(maxind);
> Sensitivty = Y(maxind);
> disp(['Specificity= ', num2str(Specificity)]);
> disp(['Sensitivty= ', num2str(Sensitivty)]);
>
> fprintf('Permutation test ......\n');
> Nsloop = 5000;
> auc_rand = zeros(Nsloop,1);
> for i=1:Nsloop
> label_rand = randperm(length(label));
> deci_value_rand = deci_value(label_rand);
> [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1);
> clear label_rand
> end
> p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1);
> disp(['Pvalue= ', num2str(p_auc)]);
>
> Here, what I used is leave-one-out cross-valitaion. But the reviewer
> suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al.,
> Neuroimage, 2017) and K-fold.
> Since I am not familiar with nested cross-validation. Is it any
> possible we perform it based on your libsvm? If it is, could you
> please give me some clue how to achieve this?
>
> Best,
> Ziqian
>
> --
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub [1], or unsubscribe
> [2]. [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage",
> "potentialAction": { ***@***.***": "ViewAction", "target":
> "#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A",
> "url":
> "#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A",
> "name": "View Issue" }, "description": "View this Issue on GitHub",
> "publisher": { ***@***.***": "Organization", "name": "GitHub", "url":
> "https://github.com" } } ]
>
> Links:
> ------
> [1]
> #163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A
> [2]
> https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#163?email_source=notifications&email_token=AH4SOUKYJP2I4QT47KCVPEDRD3ZAZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMQG5YI#issuecomment-589328097>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH4SOUIRJU2YNGYBVC5OCHTRD3ZAZANCNFSM4KYRVGXQ>.
|
Dear Lin,
I found that this nested VC add grid search in every loop of inner loop. If it’s 5-fold, it calculate 5 best-c, then calculate arithmetic mean/geometric mean or power mean.
Here is also a description in Chinese:
这个思想有两个循环(loop):(1)外循环就是普通的cross validation (2)内循环相当于是一个子优化问题,通过grid search寻常当前子问题中模型对应的最优参数。grid search就相当于是遍历有限的空间点(每一个点对应于一组参数),每一组参数对应一个模型的performance,然后选取performance最好的模型。
cross validation用了几个fold最后就有几组模型参数,如果你的模型是stable的,那么这几组参数应该类似。
I don’t know if this is the state of art. But it should be a good way to solve the problem of information ‘leaking’.
Could we manage to implement to your wonderful libsvm toolbox?
Best,
Ziqian
… 在 2020年2月28日,上午11:19,王子谦 ***@***.***> 写道:
Thank your for your reply.
As the knowledge I have, the nested is not 2-level CV. This figure could illustrate what is nested CV:
<F1QgU.png>
The nested CV has an inner loop CV nested in an outer CV. The inner loop is responsible for model selection/hyperparameter tuning (similar to validation set), while the outer loop is for error estimation (test set).
My question is how is our '[bestacc,bestc] =SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1)’ working on hyperparameter tuning? Do we use the similar method?
If not, can we combine it with SVMcgForClass_NoDisplay_linear?
Any response will be helpful.
Best,
Ziqian
> 在 2020年2月20日,下午10:30,Chih-Jen Lin ***@***.*** ***@***.***>> 写道:
>
> To implement CV in matlab what you need to do are
>
> - randomly permute data by randperm()
>
> - use a for loop to get each validation fold
>
> num_per_fold = ceil(num_data/num_fold);
> for i = 1 : num_fold
> range = (i-1)*num_per_fold + 1 : min(num_data, i*num_per_fold);
>
> - then use this "range" to extract the validation fold. The training
> fold can be get by a similar way
>
> - then do training/prediction, and aggregate results to get CV acuracy
>
> - for nested CV I think you mean 2-level CV. You can use a 2-level for
> loop on that
>
>
> On 2020-02-20 23:56, ziqianwang9 wrote:
> > Dear Lin,
> > thanks for providing this useful toolbox. I'm trying to use it to
> > publish a paper, here I met some problem from the reviewer. He
> > suggested me to use the nested cross-validation.
> > Here I list the script I used for my study:
> >
> > clear all;
> > load median20190923.mat
> >
> > %leave-one-out cross-validation
> > w = zeros(size(data_all));% weight
> > h = waitbar(0,'please wait..');
> >
> > for i = 1:size(data_all,1)
> >
> > waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))])
> > new_DATA = data_all;
> > new_label = label;
> > test_data = data_all(i,:); new_DATA(i,:) = []; train_data =
> > new_DATA;
> > test_label = label(i,:);new_label(i,:) = [];train_label =
> > new_label;
> >
> > % Data Normalization
> > [train_data,PS] = mapminmax(train_data',0,1);
> > test_data = mapminmax('apply',test_data',PS);
> > train_data = train_data';
> > test_data = test_data';
> >
> > % RFE feature selectioin
> > step = 1;
> > ftRank = SVMRFE(train_label,train_data, step,'-t 0');
> > IX = ftRank(1:ceil(length(ftRank)*0.4));
> >
> > [bestacc,bestc] =
> > SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1);
> > cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1'];
> >
> > model = svmtrain(train_label,train_data(:,IX),cmd);
> > w(i,IX) = model.SVs'*model.sv_coef;
> > [predicted_label, accuracy, deci] =
> > svmpredict(test_label,test_data(:,IX),model);
> > acc(i,1) = accuracy(1);
> > deci_value(i,1) = deci;
> > % clear test_data train_data test_label train_label model IX k
> > end
> > w_msk = double(sum(w~=0,1)==size(w,1));
> > w = mean(w,1).*w_msk;
> > acc_final = mean(acc);
> > disp(['accuracy - ',num2str(acc_final)]);
> >
> > % ROC
> > [X,Y,T,AUC] = perfcurve(label,deci_value,1);
> > figure;plot(X,Y);hold on;plot(X,X,'-');
> > xlabel('False positive rate'); ylabel('True positive rate');
> >
> > for i=1:length(X)
> > Cut_off(i,1) = (1-X(i))*Y(i);
> > end
> > [~,maxind] = max(Cut_off);
> > Specificity = 1-X(maxind);
> > Sensitivty = Y(maxind);
> > disp(['Specificity= ', num2str(Specificity)]);
> > disp(['Sensitivty= ', num2str(Sensitivty)]);
> >
> > fprintf('Permutation test ......\n');
> > Nsloop = 5000;
> > auc_rand = zeros(Nsloop,1);
> > for i=1:Nsloop
> > label_rand = randperm(length(label));
> > deci_value_rand = deci_value(label_rand);
> > [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1);
> > clear label_rand
> > end
> > p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1);
> > disp(['Pvalue= ', num2str(p_auc)]);
> >
> > Here, what I used is leave-one-out cross-valitaion. But the reviewer
> > suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al.,
> > Neuroimage, 2017) and K-fold.
> > Since I am not familiar with nested cross-validation. Is it any
> > possible we perform it based on your libsvm? If it is, could you
> > please give me some clue how to achieve this?
> >
> > Best,
> > Ziqian
> >
> > --
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub [1], or unsubscribe
> > [2]. [ { ***@***.***": "http://schema.org <http://schema.org/>", ***@***.***": "EmailMessage",
> > "potentialAction": { ***@***.***": "ViewAction", "target":
> > "#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A <#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A>",
> > "url":
> > "#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A <#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A>",
> > "name": "View Issue" }, "description": "View this Issue on GitHub",
> > "publisher": { ***@***.***": "Organization", "name": "GitHub", "url":
> > "https://github.com <https://github.com/>" } } ]
> >
> > Links:
> > ------
> > [1]
> > #163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A <#163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A>
> > [2]
> > https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ <https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub <#163?email_source=notifications&email_token=AH4SOUKYJP2I4QT47KCVPEDRD3ZAZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMQG5YI#issuecomment-589328097>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH4SOUIRJU2YNGYBVC5OCHTRD3ZAZANCNFSM4KYRVGXQ>.
>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Dear Lin,
thanks for providing this useful toolbox. I'm trying to use it to publish a paper, here I met some problem from the reviewer. He suggested me to use the nested cross-validation.
Here I list the script I used for my study:
Here, what I used is leave-one-out cross-valitaion. But the reviewer suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al., Neuroimage, 2017) and K-fold.
Since I am not familiar with nested cross-validation. Is it any possible we perform it based on your libsvm? If it is, could you please give me some clue how to achieve this?
Best,
Ziqian
The text was updated successfully, but these errors were encountered: