You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
The current code compares data frames by trying all permutations of the columns, without using the information about the column names. I guess this is OK for very small number of columns, but it doesn't scale to some other datasets - e.g., in the KaggleDBQA dataset, some queries return 26 columns. 26! = 403,291,461,126,605,635,584,000,000 ~= pretty much infinity :) The evaluation just hangs.
For my local experiments w/ KaggleDBQA, I changed the matching code to build Pandas DataFrames with named columns, then sort the order of the columns by their names before comparing the DataFrame values. The column names don't have to match exactly between ground-truth and prediction, but their order needs to match.
I wanted to ask here if you'd be willing to accept such change in your repository. If yes, I can clean up the code and publish a PR.
If you're worried that it may change the results on the existing datasets, perhaps we could still do the column permutation thing IF number of column is low (maybe up to 5 or so? or whatever is max number of returned columns in the current datasets you officially support?). What do you think?
The text was updated successfully, but these errors were encountered:
Hi,
The current code compares data frames by trying all permutations of the columns, without using the information about the column names. I guess this is OK for very small number of columns, but it doesn't scale to some other datasets - e.g., in the KaggleDBQA dataset, some queries return 26 columns. 26! = 403,291,461,126,605,635,584,000,000 ~= pretty much infinity :) The evaluation just hangs.
For my local experiments w/ KaggleDBQA, I changed the matching code to build Pandas DataFrames with named columns, then sort the order of the columns by their names before comparing the DataFrame values. The column names don't have to match exactly between ground-truth and prediction, but their order needs to match.
I wanted to ask here if you'd be willing to accept such change in your repository. If yes, I can clean up the code and publish a PR.
If you're worried that it may change the results on the existing datasets, perhaps we could still do the column permutation thing IF number of column is low (maybe up to 5 or so? or whatever is max number of returned columns in the current datasets you officially support?). What do you think?
The text was updated successfully, but these errors were encountered: