Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dashless input #3

Open
audreyt opened this issue Jun 18, 2013 · 2 comments
Open

Allow dashless input #3

audreyt opened this issue Jun 18, 2013 · 2 comments

Comments

@audreyt
Copy link

audreyt commented Jun 18, 2013

$ curl http://su-lip.magistry.fr/_ws_/ -d '{"query":"bokbing"}'
{"fuzzy":[]}

The system should have sufficient information to tokenize the incoming dashless strings into possible segmentations.

For bokbing there are two possible segmentations: bok-bi-ng 莫美秧 (which should match nothing), and bok-bing which matches the usual 莫名*.

@a-tsioh
Copy link
Owner

a-tsioh commented Jun 18, 2013

Sufficient information but for now, I'm using a PEG parser for the TRS, it can't deal cleanly with ambiguity.
It may be able to deal with simple dashless cases (finding bok-bing but not bok-bi-ng).

Another solution could be to replace the PEG with something like CKY but this will need more coding

@audreyt
Copy link
Author

audreyt commented Jun 18, 2013

I see, it's because that the PEG matcher can't be coaxed into giving ambiguous parses?

I think bokbing => bok-bing is good for now, certainly better than nothing at all. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants