Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change subtraction overflow condition in sub_with_carry() #189

Merged
merged 1 commit into from
Feb 18, 2021

Conversation

chfast
Copy link
Owner

@chfast chfast commented Feb 18, 2021

The Clang compiler loves this pattern and will generate optimal subtraction procedure. And the code is also simpler although not symmetric to addition any more.

binop<uint256, uint256, sub>                                        -0.4896         -0.4896             3             2             3             2
binop<uint256, uint256, inline_sub>                                 -0.3636         -0.3635             2             1             2             1
binop<uint512, uint512, sub>                                        -0.4397         -0.4397             7             4             7             4
binop<uint512, uint512, inline_sub>                                 -0.5787         -0.5787             6             3             6             3

Before:

0000000000412390 <sub(intx::uint<256u> const&, intx::uint<256u> const&)>:                                                                                                                                                         
  412390:       53                      push   %rbx                                                                                                                                                                               
  412391:       4c 8b 06                mov    (%rsi),%r8                                                                                                                                                                         
  412394:       48 8b 4e 08             mov    0x8(%rsi),%rcx                                                                                                                                                                     
  412398:       48 2b 4a 08             sub    0x8(%rdx),%rcx                                                                                                                                                                     
  41239c:       41 0f 92 c1             setb   %r9b                                                                                                                                                                               
  4123a0:       4c 2b 02                sub    (%rdx),%r8                                                                                                                                                                         
  4123a3:       49 89 ca                mov    %rcx,%r10                                                                                                                                                                          
  4123a6:       49 83 da 00             sbb    $0x0,%r10                                                                                                                                                                          
  4123aa:       48 89 f8                mov    %rdi,%rax                                                                                                                                                                          
  4123ad:       49 39 ca                cmp    %rcx,%r10                                                                                                                                                                          
  4123b0:       0f 97 c1                seta   %cl                                                                                                                                                                                
  4123b3:       44 08 c9                or     %r9b,%cl                                                                                                                                                                           
  4123b6:       48 8b 7e 10             mov    0x10(%rsi),%rdi                                                                                                                                                                    
  4123ba:       48 2b 7a 10             sub    0x10(%rdx),%rdi                                                                                                                                                                    
  4123be:       41 0f 92 c1             setb   %r9b                                                                                                                                                                               
  4123c2:       44 0f b6 d9             movzbl %cl,%r11d                                                                                                                                                                          
  4123c6:       48 89 fb                mov    %rdi,%rbx                                                                                                                                                                          
  4123c9:       4c 29 db                sub    %r11,%rbx                                                                                                                                                                          
  4123cc:       48 39 fb                cmp    %rdi,%rbx                                                                                                                                                                          
  4123cf:       0f 97 c1                seta   %cl                                                                                                                                                                                
  4123d2:       44 08 c9                or     %r9b,%cl                                                                                                                                                                           
  4123d5:       48 8b 76 18             mov    0x18(%rsi),%rsi                                                                                                                                                                    
  4123d9:       48 2b 72 18             sub    0x18(%rdx),%rsi                                                                                                                                                                    
  4123dd:       0f b6 c9                movzbl %cl,%ecx                                                                                                                                                                           
  4123e0:       48 29 ce                sub    %rcx,%rsi                                                                                                                                                                          
  4123e3:       4c 89 00                mov    %r8,(%rax)
  4123e6:       4c 89 50 08             mov    %r10,0x8(%rax)
  4123ea:       48 89 58 10             mov    %rbx,0x10(%rax)
  4123ee:       48 89 70 18             mov    %rsi,0x18(%rax)
  4123f2:       5b                      pop    %rbx
  4123f3:       c3                      retq

After:

00000000004122d0 <sub(intx::uint<256u> const&, intx::uint<256u> const&)>:
  4122d0:       48 89 f8                mov    %rdi,%rax
  4122d3:       4c 8b 06                mov    (%rsi),%r8
  4122d6:       4c 2b 02                sub    (%rdx),%r8
  4122d9:       48 8b 7e 08             mov    0x8(%rsi),%rdi
  4122dd:       48 1b 7a 08             sbb    0x8(%rdx),%rdi
  4122e1:       48 8b 4e 10             mov    0x10(%rsi),%rcx
  4122e5:       48 1b 4a 10             sbb    0x10(%rdx),%rcx
  4122e9:       48 8b 76 18             mov    0x18(%rsi),%rsi
  4122ed:       48 1b 72 18             sbb    0x18(%rdx),%rsi
  4122f1:       4c 89 00                mov    %r8,(%rax)
  4122f4:       48 89 78 08             mov    %rdi,0x8(%rax)
  4122f8:       48 89 48 10             mov    %rcx,0x10(%rax)
  4122fc:       48 89 70 18             mov    %rsi,0x18(%rax)
  412300:       c3                      retq

@chfast chfast requested review from axic and gumb0 February 18, 2021 18:48
Copy link
Collaborator

@axic axic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

const auto e = d - carry;
const auto carry2 = e > d;
const auto carry2 = d < uint64_t{carry};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: There's such an implicit conversion for carry in intx.hpp:sub_with_carry too. Clion complained about it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where.

Copy link
Collaborator

@axic axic Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chfast chfast merged commit af587c5 into master Feb 18, 2021
@chfast chfast deleted the optimize_sub branch February 18, 2021 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants