Default fuzzy searching speed #848

jojojames · 2017-01-07T22:08:15Z

The default fuzzy speed seems to be a little slow when I compare it with evaling this snippet from

(with-eval-after-load 'ivy
  (require 'flx)
  (defvar my/ivy-cache (flx-make-string-cache))
  (defvar my/ivy-flx-limit 500)

  (defun nadvice/ivy--filter (name candidates)
    (if (or (= (length name) 0)
            (string= name "^"))
        candidates
      (let* (;; an optimized regex for fuzzy matching
             ;; "abc" → "\\`[^a]*a[^b]*b[^c]*c"
             (fuzzy-regex (if (= (elt name 0) ?^)
                              (concat "^"
                                      (regexp-quote (substring name 1 2))
                                      (mapconcat
                                       (lambda (x)
                                         (setq x (string x))
                                         (concat "[^" x "]*" (regexp-quote x)))
                                       (substring name 2)
                                       ""))
                            (concat "^"
                                    (mapconcat
                                     (lambda (x)
                                       (setq x (string x))
                                       (concat "[^" x "]*" (regexp-quote x)))
                                     name
                                     ""))))

             (flx-name (if (= (elt name 0) ?^)
                           (substring name 1)
                         name))

             ;; disable side-effects of string-match
             (inhibit-changing-match-data t)
             (cands-left)
             (cands-to-sort))

        ;; filter out non-matching candidates
        (dolist (cand candidates)
          (when (string-match fuzzy-regex cand)
            (push cand cands-left)))

        ;; pre-sort the candidates by length before partitioning
        (setq cands-left (sort cands-left
                               (lambda (c1 c2)
                                 (< (length c1)
                                    (length c2)))))

        ;; partition the candidates into sorted and unsorted groups
        (dotimes (_n (min (length cands-left) my/ivy-flx-limit))
          (push (pop cands-left) cands-to-sort))

        (append
         ;; compute all of the flx scores in one pass and sort
         (mapcar #'car
                 (sort (mapcar
                        (lambda (cand)
                          (cons cand
                                (car (flx-score cand
                                                flx-name
                                                my/ivy-cache))))
                        cands-to-sort)
                       (lambda (c1 c2)
                         (> (cdr c1)
                            (cdr c2)))))

         ;; add the unsorted candidates
         cands-left))))

  (advice-add 'ivy--filter :override #'nadvice/ivy--filter))

I included a video here.

https://dl.dropboxusercontent.com/u/11400324/Untitled.m4v

When I type 'adengine' with the default ivy fuzzy, it has a noticeable 1-2 second pause before going through with the search. With the eval'd code, I can get through the entire adengine text with only minor pauses.

Using ivy-20170105.711

My own ivy config

(use-package ivy
  :ensure t)

(use-package counsel
  :ensure t
  :config
  ;; Disable for now while trying grizzl.
  (global-set-key (kbd "M-x") 'counsel-M-x)
  (global-set-key (kbd "s-x") 'counsel-M-x))

(use-package swiper
  :ensure t
  :diminish ivy-mode
  :config
  (if (jojo/imac-p)
      (setq ivy-flx-limit 100)
    (setq ivy-flx-limit 500))
  ;; (ivy-mode)
  (setq ivy-re-builders-alist
        '((t . ivy--regex-fuzzy)))

  (setq ivy-initial-inputs-alist nil)

  ;; swapping behavior
  (define-key ivy-minibuffer-map (kbd "RET") 'ivy-alt-done)
  (define-key ivy-minibuffer-map (kbd "C-j") 'ivy-done)

  ;; default: "ag --nocolor --nogroup %s -- ."
  (setq counsel-ag-base-command "ag -U --nocolor --nogroup %s -- .")
  (setq ivy-count-format "")
  (setq ivy-height 15))

abo-abo · 2017-01-09T14:23:20Z

Can you provide an example collection that you're using ivy's completion on. Example: a big file and a list of terms to search for. I can't really reproduce your Projectile project.

jojojames · 2017-01-10T04:47:28Z

Will do, as time allows.

jojojames · 2017-01-14T04:57:48Z

This will work.

Searching for 'adengine' will reproduce the behavior.

https://github.com/jojojames/ivy-fuzzy-sample/tree/master/MyProfile

abo-abo · 2017-01-18T14:53:00Z

It seems that your project is stressing the limits of the regex engine. Entering adengine into projectile-find-file on your project results in the following data being fed into ivy--re-filter:

re
;; => "\\(a\\).*?\\(d\\).*?\\(e\\).*?\\(n\\).*?\\(g\\).*?\\(i\\).*?\\(n\\).*?\\(e\\)"

(length candidates)
;; => 8519

Running the function results in the following time - 0.4 seconds:

(benchmark-run 1
  (ivy--re-filter re candidates))
;; =>
;; (0.370236854 0 0.0)

So all we do here is test the above regex on 8519 strings and it takes that much time. Even simplifying the regex to (setq re "a.*d.*e.*n.*g.*i.*n.*e") does not improve the computation time.

The only thing that I can think of that can improve the computation time is performing the regex matching incrementally, i.e. assume all matches for adengine input must be a subset of all matches for adengin input. But that approach is quite flimsy with respect to complex regex, I'd rather avoid it.

@PythonNut, thoughts on this?

PythonNut · 2017-01-19T01:47:19Z

@abo-abo I'm not sure yet. I have several ideas, but I might not have time to check them for a while.

It might be faster to manually loop instead of using cl-remove-if(-not)?.
You might be better off with the regex of the form a[^d]*d instead of a.*d as this forces non-greedy matching.

Also, @jojojames the code in the advice I wrote is essentially the same as the code that's currently deployed by ivy, as of #843. If you're seeing large speedups, I can only guess that either something very strange is going on, I messed up some performance critical detail, or the code isn't being run at all. Can you confirm that the order of the candidates in ivy is the same for both versions in all circumstances?

jojojames · 2017-01-19T02:16:45Z

@PythonNut The results are different (at least the first few) for keyword 'adengine'

PythonNut · 2017-01-24T06:38:40Z

@jojojames that seems to imply to me that the fuzzy sorting is being bypassed for some reason, which is why it's so fast.

jojojames · 2017-01-25T19:18:24Z

That's strange. I wonder what kind of matching I'm getting then.

In another project,

Typing 'abc',

I'll get results like this,

src/Something/[A]pp.ZZZ/[B]aseActivity.[c]s
src/Something/[A]pp.ZZZ/[B]aseFragment.[c]s
M[ab]rZZZZ[C]Rules.ruleset
[A]rgument/T[b]App/libs/[c]ommon.jar

and so on.

PythonNut · 2017-02-03T02:57:24Z

@jojojames I can't really tell, but that doesn't look right. M[ab]rZZZZ[C]Rules.ruleset (score 44) should be below [A]rgument/T[b]App/libs/[c]ommon.jar (score 139) normally, I think.

jojojames · 2017-02-09T22:09:59Z

@PythonNut I can try and provide more concrete results given a specific search to narrow down the problem.

As an aside, it'd be great if we could somehow integrate (either through Emacs25 modules or through a background process) something like this junegunn/fzf#764.

Vim has always had really fast fuzzy finders.

andyleejordan · 2018-10-10T06:09:10Z

I was running into this too, and the idea that popped into my head was (for files anyway): apply the flx regex to each path portion, instead of across the whole path. So, break alpha/beta/gamma into three sub-patterns: alpha, beta, and gamma, apply flx to each of these individually, then sum the scores together for the score of the whole path.

Maybe someone could help me prototype this?

With https://github.com/Microsoft/openenclave, M-x counsel-git forreleasing is matching first the path:

3rdparty/libcxx/libcxx/test/libcxx/debug/containers/db_string.pass.cpp
and then a bunch of others like this, and finally at the way bottom of the list (the one I want):
docs/Releasing.md

(Note that docsreleasing doesn't improve the results.)

I figure 99% of what someone wants when using flx on a file path is to match against the file and directory name components.

Even better would be if docs releasing applied docs and releasing as two flx patterns to the components (this makes more sense for docs govmod to open docs/GovernanceModel.md). I know, I know, at this point it's getting kind of similar to ivy--regex-plus but I really like and am addicted to flx matching.

dsedivec · 2018-11-26T00:57:47Z

@abo-abo I'm not sure yet. I have several ideas, but I might not have time to check them for a while.

It might be faster to manually loop instead of using cl-remove-if(-not)?.

You might be better off with the regex of the form a[^d]*d instead of a.*d as this forces non-greedy matching.

In my profiling, most of the time from jojojames's example is spent in string-match-p, so I'm guessing the cl-remove-if is not the problem, at least in this specific case. I decided to give the second idea a spin:

(defun my:ivy--regex-fuzzy (str)
  "Build a regex sequence from STR.
Insert the moral equivalent of .* between each char."
  (if (string-match "\\`\\(\\^?\\)\\(.*?\\)\\(\\$?\\)\\'" str)
      (prog1
          (concat (match-string 1 str)
                  (cl-loop
                     for ch across (match-string 2 str)
                     for sep = "" then (format "[^%c]*" ch)
                     concat sep
                     concat (format "\\(%s\\)"
                                    (regexp-quote (char-to-string ch))))
                  (match-string 3 str))
        (setq ivy--subexps (length (match-string 2 str))))
    str))

;; Build candidates as with `projectile-find-file' in the ivy-fuzzy-sample repo
;; (current buffer is a dired on the root of that repo)
ELISP> (current-buffer)
#<buffer ivy-fuzzy-sample>
ELISP> (and (setq candidates
		  (funcall (let* ((pr (project-current t))
				  (dirs (project-roots pr)))
			     (project-file-completion-table pr dirs)) "" nil t))
	    (length candidates))
7947 (#o17413, #x1f0b)

;; Current ivy--regex-fuzzy
ELISP> (ivy--regex-fuzzy "adengine")
"\\(a\\).*?\\(d\\).*?\\(e\\).*?\\(n\\).*?\\(g\\).*?\\(i\\).*?\\(n\\).*?\\(e\\)"
ELISP> (benchmark-run 1 (ivy--re-filter (ivy--regex-fuzzy "adengine") candidates))
(0.482889 0 0.0)

;; New my:ivy--regex-fuzzy
ELISP> (my:ivy--regex-fuzzy "adengine")
"\\(a\\)[^d]*\\(d\\)[^e]*\\(e\\)[^n]*\\(n\\)[^g]*\\(g\\)[^i]*\\(i\\)[^n]*\\(n\\)[^e]*\\(e\\)"
ELISP> (benchmark-run 1 (ivy--re-filter (my:ivy--regex-fuzzy "adengine") candidates))
(0.047701 0 0.0)

;; They return the same results
ELISP> (equal (ivy--re-filter (ivy--regex-fuzzy "adengine") candidates)
	      (ivy--re-filter (my:ivy--regex-fuzzy "adengine") candidates))
t

I ran this several times and those times are fairly stable.

(Next paragraph edited from original post.)

One thing I have noticed is that I think the matches I get are sorted differently, and I don't know why. The set of candidates ends up being the same in this one case, but I am concerned that the using [^x]* instead of .*? may not always yield the same results. However, I am currently unable to think of a single example in which they would differ. The match-data for every candidate in this test set matches exactly between the new and old regexps.

The only other idea I had for speeding up the existing fuzzy matching is to notice when I type faster than the candidates can be filtered: Right now I can type "adengine" and Ivy will search for "a" then "ad" then "ade", and so on. However, by the time it's searching "aden", for example, I believe I have already completed typing my whole search string, "adengine". Could Ivy try to consume more than just the single next character of input when updating the candidates? In other words, if filtering on "aden" takes 400 ms, and by the time that search is complete I've added the remaining "gine" (of "adengine") to the input buffer, could Ivy skip searching "adeng", "adengi", "adengin" and just skip straight to searching "adengine"?

Still seems unfortunate that there may be that long pause between "aden" and "adengine" but it's better than multiple long pauses, I think.

Alexander-Shukaev · 2018-11-26T01:18:28Z

There is some relation with #1812, where cl-remove-if-not is the killer.

dsedivec · 2018-11-26T04:32:18Z

Answering my own question:

One thing I have noticed is that I think the matches I get are sorted differently, and I don't know why.

Mystery solved, I think: ivy--sort gives flx sorting only when ivy--regex-fuzzy is in use. With a little advice to fool ivy--sort by setting ivy--regex-function, it looks like I get the same sorting from projectile-find-file whether I use ivy--regex-fuzzy or my:ivy--regex-fuzzy, which is good.

The advice:

;; Must get `my:ivy--regex-fuzzy' to sort as with `ivy--regex-fuzzy'.
(define-advice ivy--sort
    (:around (orig-fun &rest args) my:sort-my-fuzzy-with-flx)
  (let ((ivy--regex-function (if (eq ivy--regex-function
                                     'my:ivy--regex-fuzzy)
                                 'ivy--regex-fuzzy
                               ivy--regex-function)))
    (apply orig-fun args)))

abo-abo · 2018-11-26T15:43:25Z

@dsedivec Thanks for the detailed benchmark. I like your idea of switching from a non-greedy regex to a greedy one. That should speed up things.

@PythonNut

* ivy-test.el (ivy--regex-fuzzy): Update tests. Thanks to @PythonNut and @dsedivec for the idea. Re #848

abo-abo · 2018-11-26T16:47:04Z

I've prototyped a version of counsel-describe-function that uses fzf. It's a little ugly. The ugliness of Elisp can be fixed with a couple functions. But since fzf offers no deamon option, it will re-read the saved candidates file on each key press. This will remain ugly, but is likely still performant enough. Have a look:

(defun counsel-describe-function-fzf-function (str)
  (setq ivy--old-re (ivy--regex-fuzzy str))
  (or (ivy-more-chars)
      (progn
        (counsel--async-command
         (format "cat /tmp/counsel-describe-function-fzf | fzf -f '%s'" str))
        nil)))

(defun counsel-describe-function-fzf ()
  "Forward to `describe-function'.

Interactive functions (i.e., commands) are highlighted according
to `ivy-highlight-face'."
  (interactive)
  (with-temp-buffer
    (insert (mapconcat #'identity
                       (all-completions "" obarray
                                        (lambda (sym)
                                          (or (fboundp sym)
                                              (get sym 'function-documentation))))
                       "\n"))
    (write-file "/tmp/counsel-describe-function-fzf"))
  (let ((enable-recursive-minibuffers t))
    (ivy-read "Describe function: " #'counsel-describe-function-fzf-function
              :require-match t
              :history 'counsel-describe-symbol-history
              :keymap counsel-describe-map
              :preselect (funcall counsel-describe-function-preselect)
              :action (lambda (x)
                        (funcall counsel-describe-function-function (intern x)))
              :dynamic-collection t
              :caller 'counsel-describe-function-fzf)))

A second improvement would be to use call-process-region in place of a temporary file, but counsel--async-command doesn't yet have an API for that.

basil-conto · 2018-11-26T17:51:36Z

Here's an alternative way of generating the temporary file which should also enable recursive completion sessions:

diff --git a/counsel.el b/counsel.el
index f6ff507..2b87e31 100644
--- a/counsel.el
+++ b/counsel.el
@@ -1,10 +1,12 @@
+(defvar counsel--describe-function-fzf-file null-device
+  "Input file for `counsel-describe-function-fzf-function'.")
+
 (defun counsel-describe-function-fzf-function (str)
   (setq ivy--old-re (ivy--regex-fuzzy str))
   (or (ivy-more-chars)
-      (progn
-        (counsel--async-command
-         (format "cat /tmp/counsel-describe-function-fzf | fzf -f '%s'" str))
-        nil)))
+      (ignore (counsel--async-command
+               (format "fzf -f '%s' < '%s'"
+                       str counsel--describe-function-fzf-file)))))
 
 (defun counsel-describe-function-fzf ()
   "Forward to `describe-function'.
@@ -12,15 +14,13 @@ counsel-describe-function-fzf
 Interactive functions (i.e., commands) are highlighted according
 to `ivy-highlight-face'."
   (interactive)
-  (with-temp-buffer
-    (insert (mapconcat #'identity
-                       (all-completions "" obarray
-                                        (lambda (sym)
-                                          (or (fboundp sym)
-                                              (get sym 'function-documentation))))
-                       "\n"))
-    (write-file "/tmp/counsel-describe-function-fzf"))
-  (let ((enable-recursive-minibuffers t))
+  (let* ((enable-recursive-minibuffers t)
+         (fns (all-completions "" obarray
+                               (lambda (sym)
+                                 (or (fboundp sym)
+                                     (get sym 'function-documentation)))))
+         (counsel--describe-function-fzf-file
+          (make-temp-file "counsel-" nil nil (mapconcat #'identity fns "\n"))))
     (ivy-read "Describe function: " #'counsel-describe-function-fzf-function
               :require-match t
               :history 'counsel-describe-symbol-history

Only problem is, the temporary file should be deleted when it's no longer needed.

(Note that with-temp-buffer+write-file can be written using with-temp-file instead.)

jojojames · 2022-05-31T18:34:00Z

I implemented a completion-style supporting various fuzzy matching backends. It includes support for the rust based/native module/c backends that (from quick benchmarking) are 10x faster than the elisp flx option.

https://github.com/jojojames/fussy

For ivy to be able to leverage this though, looks like we need to implement completion-styles support (#2321 here?).

jojojames · 2022-06-01T05:22:16Z

Took a quick look and found I can just stub in the flx-score call here with fussy's call.

(defun ivy--flx-sort (name cands)
  "Sort according to closeness to string NAME the string list CANDS."
  (condition-case nil
      (let* ((bolp (= (string-to-char name) ?^))
             ;; An optimized regex for fuzzy matching
             ;; "abc" → "^[^a]*a[^b]*b[^c]*c"
             (fuzzy-regex (concat "\\`"
                                  (and bolp (regexp-quote (substring name 1 2)))
                                  (mapconcat
                                   (lambda (x)
                                     (setq x (char-to-string x))
                                     (concat "[^" x "]*" (regexp-quote x)))
                                   (if bolp (substring name 2) name)
                                   "")))
             ;; Strip off the leading "^" for flx matching
             (flx-name (if bolp (substring name 1) name))
             cands-left
             cands-to-sort)

        ;; Filter out non-matching candidates
        (dolist (cand cands)
          (when (string-match-p fuzzy-regex cand)
            (push cand cands-left)))

        ;; pre-sort the candidates by length before partitioning
        (setq cands-left (cl-sort cands-left #'< :key #'length))

        ;; partition the candidates into sorted and unsorted groups
        (dotimes (_ (min (length cands-left) ivy-flx-limit))
          (push (pop cands-left) cands-to-sort))

        (nconc
         ;; Compute all of the flx scores in one pass and sort
         (mapcar #'car
                 (sort (mapcar
                        (lambda (cand)
                          (cons cand
                                (car
                                 ;; (flx-score cand flx-name
                                 ;;            ivy--flx-cache
                                 ;;            )


                                 ;; SWAP THIS IN!
                                 (funcall
                                  fussy-score-fn
                                  cand flx-name
                                  ivy--flx-cache
                                  ))))
                        cands-to-sort)
                       (lambda (c1 c2)
                         ;; Break ties by length
                         (if (/= (cdr c1) (cdr c2))
                             (> (cdr c1)
                                (cdr c2))
                           (< (length (car c1))
                              (length (car c2)))))))
         ;; Add the unsorted candidates
         cands-left))
    (error cands)))

The wins don't seem that great in terms of performance (slight win?) but at least you'd be able to select the scoring backend to use instead of just flx. Not sure, I didn't benchmark perfectly. I think this area could use some improvement as it seems quite slow compared to the selectrum/vertico/icomplete etc.

On the other hand, the equivalent call (M-x describe-symbol) with 30,000 candidates can be filtered in about 30-40 ms with vertico+fussy. (See https://github.com/jojojames/fussy#fast for a handwavy benchmark) whereas the calls with flx+ido using this ivy-sort here goes past 100ms. (Not sure if if it was much more than that but definitely slower than vertico+fussy)

Think it may be because we're doing 2-3 phases of filtering (very slow) before we finally get to scoring. e.g. I'm guessing we filter initially for the collection, then here, we're filtering again! using the fuzzy regex and then finally we're scoring.

You can see in the implementation here (for completion-styles support), we just filter once (through the quick C all-completions passing in the flex regex to it, and then scoring over the filtered candidates.

abo-abo added a commit that referenced this issue Nov 26, 2018

ivy.el (ivy--regex-fuzzy): Optimize regex "(a).*?(b)" -> "(a)[^b]*(b)"

7b6765b

* ivy-test.el (ivy--regex-fuzzy): Update tests. Thanks to @PythonNut and @dsedivec for the idea. Re #848

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default fuzzy searching speed #848

Default fuzzy searching speed #848

jojojames commented Jan 7, 2017

abo-abo commented Jan 9, 2017

jojojames commented Jan 10, 2017

jojojames commented Jan 14, 2017

abo-abo commented Jan 18, 2017

PythonNut commented Jan 19, 2017

jojojames commented Jan 19, 2017

PythonNut commented Jan 24, 2017

jojojames commented Jan 25, 2017

PythonNut commented Feb 3, 2017

jojojames commented Feb 9, 2017

andyleejordan commented Oct 10, 2018

dsedivec commented Nov 26, 2018 •

edited

Loading

Alexander-Shukaev commented Nov 26, 2018

dsedivec commented Nov 26, 2018

abo-abo commented Nov 26, 2018

abo-abo commented Nov 26, 2018

basil-conto commented Nov 26, 2018 •

edited

Loading

jojojames commented May 31, 2022

jojojames commented Jun 1, 2022

Default fuzzy searching speed #848

Default fuzzy searching speed #848

Comments

jojojames commented Jan 7, 2017

abo-abo commented Jan 9, 2017

jojojames commented Jan 10, 2017

jojojames commented Jan 14, 2017

abo-abo commented Jan 18, 2017

PythonNut commented Jan 19, 2017

jojojames commented Jan 19, 2017

PythonNut commented Jan 24, 2017

jojojames commented Jan 25, 2017

PythonNut commented Feb 3, 2017

jojojames commented Feb 9, 2017

andyleejordan commented Oct 10, 2018

dsedivec commented Nov 26, 2018 • edited Loading

Alexander-Shukaev commented Nov 26, 2018

dsedivec commented Nov 26, 2018

abo-abo commented Nov 26, 2018

abo-abo commented Nov 26, 2018

basil-conto commented Nov 26, 2018 • edited Loading

jojojames commented May 31, 2022

jojojames commented Jun 1, 2022

dsedivec commented Nov 26, 2018 •

edited

Loading

basil-conto commented Nov 26, 2018 •

edited

Loading