Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Priority substrings #7

Open
asayers opened this issue Aug 31, 2015 · 3 comments
Open

Priority substrings #7

asayers opened this issue Aug 31, 2015 · 3 comments

Comments

@asayers
Copy link

asayers commented Aug 31, 2015

cpsm does a good job of guessing the right file based on very little information, but the user probably has some additional information which could help it along; for instance:

  • I'm more likely to want files with a ".c" extension than ".pcap"
  • I'm more likely to want files in "src/" than in "test-data/"
  • etc.

I'm not exactly sure what the best way of exploiting this information is. One way would be to read a variable containing a list of exact substrings which, if found, would increase the score of a match. Thoughts?

@nixprime
Copy link
Owner

nixprime commented Sep 2, 2015

UI is the trickiest part of a feature like this. The problem is basically that it's not obvious how strongly a priority substring match should be weighted; in fact, I suspect the answer depends on the user and substring. Nor do I want to expose details about cpsm's scoring algorithm in the UI, since they're subject to change.

@asayers
Copy link
Author

asayers commented Sep 2, 2015

You're right - it's tricky. One relatively simple approach would be to create a partition: all prioritised entries score higher than any non-prioritised entry. My problem, as the examples above may have suggested, is that I sometimes blindly hit enter and get a screen full of binary test data. In my case it would certainly make sense to return non-source files only when all source files have been ruled out.

(I realise I could simply exclude those files from the file list - in my case it would be reasonable - but in general it seems like a fairly drastic option.)

@LemonBoy
Copy link
Contributor

It probably makes sense to take into account during the scoring process the suffixes option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants