Micro-optimize list index range checks #9784

rhettinger · 2018-10-10T03:59:42Z

Old code

_list_item:
    testq   %rsi, %rsi 
    js  L282
    cmpq    %rsi, 16(%rdi)
    jg  L283
    ...
L283:
    movq    24(%rdi), %rax
    movq    (%rax,%rsi,8), %rax
    addq    $1, (%rax)
    ret

New code

_list_item:
    cmpq    16(%rdi), %rsi
    jb  L282
    ...
L282:
    movq    24(%rdi), %rax
    movq    (%rax,%rsi,8), %rax
    addq    $1, (%rax)
    ret

serhiy-storchaka · 2018-10-10T04:31:38Z

Is not this like bpo-28397?

pablogsal · 2018-10-10T10:51:18Z

It would be interesting to run Linux perf on some representative examples to understand how the function call is affecting cache misses, references and branch predictions. (See #6493 as an example).

vstinner · 2018-10-10T10:57:57Z

Objects/listobject.c

+       optimization manual found at:
+       https://www.agner.org/optimize/optimizing_cpp.pdf
+    */
+    return (size_t) i < (size_t) limit;


I'm not sure that the behaviour is well defined in C. I fear that it's Undefined Behaviour. @benjaminp @gpshead: What do you think ?

If it's well defined, why should we hack such micro optimization? Why compilers would not implement the optimization themself?

I think because they don't know that Py_SIZE(op) is non-negative.

It is well defined. It is used for example in the STL implementations.

But there was not found any difference in microbenchmark results on 64-bit platforms in previous discussion in bpo-28397.

gpshead

Regardless of if this change is measurable, i like the way the code looks afterwards, getting rid of the repeated verbose i < 0 || i >= Py_SIZE(spam) everywhere. so +1 from me.

gpshead · 2018-10-10T18:44:06Z

Objects/listobject.c

@@ -208,6 +208,19 @@ PyList_Size(PyObject *op)
        return Py_SIZE(op);
 }

+static inline int
+valid_index(Py_ssize_t i, Py_ssize_t limit)


Why not just define this as taking two size_t parameters instead of doing the casting below. The casts then happen implicitly at all call sites.

rhettinger added 2 commits October 9, 2018 20:31

Optimize list_item

d152678

Convert the other range checks as well

65b589b

rhettinger added performance Performance or resource usage skip issue skip news labels Oct 10, 2018

rhettinger assigned serhiy-storchaka Oct 10, 2018

the-knights-who-say-ni added the CLA signed label Oct 10, 2018

bedevere-bot added the awaiting merge label Oct 10, 2018

serhiy-storchaka requested review from vstinner, serhiy-storchaka and skrah October 10, 2018 04:41

serhiy-storchaka removed their assignment Oct 10, 2018

vstinner reviewed Oct 10, 2018

View reviewed changes

gpshead approved these changes Oct 10, 2018

View reviewed changes

rhettinger merged commit f1aa8ae into python:master Oct 11, 2018

bedevere-bot removed the awaiting merge label Oct 11, 2018

lgeiger mentioned this pull request Sep 23, 2020

Add optimised 'Indirect BGEMM' binary convolution kernels. larq/compute-engine#516

Merged

This was referenced Jun 6, 2024

Use shared helper function for faster index range checks #120176

Closed

gh-120176: Reduce number of compares in index range checks #120181

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Micro-optimize list index range checks #9784

Micro-optimize list index range checks #9784

rhettinger commented Oct 10, 2018

serhiy-storchaka commented Oct 10, 2018 •

edited by bedevere-bot

Loading

pablogsal commented Oct 10, 2018

vstinner Oct 10, 2018

vstinner Oct 10, 2018

sir-sigurd Oct 10, 2018 •

edited

Loading

serhiy-storchaka Oct 10, 2018

gpshead left a comment

gpshead Oct 10, 2018

Micro-optimize list index range checks #9784

Micro-optimize list index range checks #9784

Conversation

rhettinger commented Oct 10, 2018

Old code

New code

serhiy-storchaka commented Oct 10, 2018 • edited by bedevere-bot Loading

pablogsal commented Oct 10, 2018

vstinner Oct 10, 2018

Choose a reason for hiding this comment

vstinner Oct 10, 2018

Choose a reason for hiding this comment

sir-sigurd Oct 10, 2018 • edited Loading

Choose a reason for hiding this comment

serhiy-storchaka Oct 10, 2018

Choose a reason for hiding this comment

gpshead left a comment

Choose a reason for hiding this comment

gpshead Oct 10, 2018

Choose a reason for hiding this comment

serhiy-storchaka commented Oct 10, 2018 •

edited by bedevere-bot

Loading

sir-sigurd Oct 10, 2018 •

edited

Loading