Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add algorithm.merge #16182

Merged
merged 18 commits into from
Mar 23, 2021
Merged

add algorithm.merge #16182

merged 18 commits into from
Mar 23, 2021

Conversation

ringabout
Copy link
Member

@ringabout ringabout commented Nov 29, 2020

Merge two sorted array:

C++ (merge algorithm):
http://www.cplusplus.com/reference/algorithm/merge

D:
https://dlang.org/library/std/algorithm/sorting/merge.html

Python(based on heap)
https://docs.python.org/3/library/heapq.html#heapq.merge

followup:

see #17472

@ringabout ringabout changed the title Algorithm merge add algorithm.merge Nov 29, 2020
Copy link
Contributor

@ee7 ee7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an opinion on whether this proc is worth adding, but I'd suggest that the implementation should avoid newSeqOfCap. I made a quick code suggestion that uses newSeq instead.

That gives a speedup of roughly 5x (with -d:danger --passC:-flto) when merging two sorted arrays each containing 1000 random integers in 0..10_000:

Implementation Elapsed time
Using newSeqOfCap 11.59 ± 0.37 μs
Using newSeq 2.45 ± 0.21 μs

(And roughly 3x for two sorted arrays each containing 100 random integers in 0..1000). See this gist.

And can we avoid snake_case for identifiers? Currently this PR has:

size_x
size_y
index_x
index_y
item_x
item_y

lib/pure/algorithm.nim Outdated Show resolved Hide resolved
lib/pure/algorithm.nim Show resolved Hide resolved
lib/pure/algorithm.nim Outdated Show resolved Hide resolved
lib/pure/algorithm.nim Outdated Show resolved Hide resolved
lib/pure/algorithm.nim Outdated Show resolved Hide resolved
lib/pure/algorithm.nim Outdated Show resolved Hide resolved
@ringabout
Copy link
Member Author

ringabout commented Nov 30, 2020

@ee7
Thanks for your help!

lib/pure/algorithm.nim Outdated Show resolved Hide resolved
lib/pure/algorithm.nim Outdated Show resolved Hide resolved
lib/pure/algorithm.nim Outdated Show resolved Hide resolved
Copy link
Contributor

@ee7 ee7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issorted -> isSorted

tests/stdlib/talgorithm.nim Outdated Show resolved Hide resolved
tests/stdlib/talgorithm.nim Outdated Show resolved Hide resolved
tests/stdlib/talgorithm.nim Outdated Show resolved Hide resolved
tests/stdlib/talgorithm.nim Outdated Show resolved Hide resolved
tests/stdlib/talgorithm.nim Outdated Show resolved Hide resolved
tests/stdlib/talgorithm.nim Outdated Show resolved Hide resolved
tests/stdlib/talgorithm.nim Outdated Show resolved Hide resolved
tests/stdlib/talgorithm.nim Outdated Show resolved Hide resolved
@ringabout
Copy link
Member Author

@ee7
Thanks! (I have disabled nimsuggest because of CPU consumption :)

@ee7
Copy link
Contributor

ee7 commented Nov 30, 2020

Given that this proc returns a new seq, is merged a better name?

The style guide wording suggests that we should call it merged if merge is already "available":

When both in-place mutation and 'returns transformed copy' are available the latter is a past participle of the former:

  • reverse and reversed in algorithm
  • sort and sorted
  • rotate and rotated

algorithm.nim already has a merge proc for two var openArray, but it isn't exported, so I don't know if it's considered "available":

Nim/lib/pure/algorithm.nim

Lines 334 to 339 in 3589631

proc merge[T](a, b: var openArray[T], lo, m, hi: int,
cmp: proc (x, y: T): int {.closure.}, order: SortOrder) =
# optimization: If max(left) <= min(right) there is nothing to do!
# 1 2 3 4 ## 5 6 7 8
# -> O(n) for sorted arrays.
# On random data this safes up to 40% of merge calls

Also, the Nim 1.2.0 release blog post contained:

dup is a macro which turns an in-place function into one that returns a result without modifying its input. The idea is that future versions of Nim will offer only in-place versions of the functions (i.e. no more sort and sorted, rotateLeft and rotatedLeft, and so on), which could be transformed to functions returning a copy via dup.

Is that relevant here?

@ringabout
Copy link
Member Author

ringabout commented Nov 30, 2020

It seems that inplace version is much slower(N * M) than the version with extra space(N + M) or complex(best performance needs quick sort, (N+M) * log(N+M)).
https://morioh.com/p/51eb8dbf58aa

@timotheecour
Copy link
Member

the implementation should avoid newSeqOfCap

indeed. also noticed in #13448. It's so bad that newSeqOfCap should come with a performance warning in its doc comment explaining the newSeqOfCap(n) + seq.add(elem) anti-pattern. This isn't specific to nim.

@ringabout
Copy link
Member Author

Should I implement inplace version(though it looks ugly to me) instead of outplace version?

@ringabout ringabout closed this Nov 30, 2020
@ringabout ringabout reopened this Nov 30, 2020
@timotheecour
Copy link
Member

Should I implement inplace version(though it looks ugly to me) instead of outplace version?

sugar.dup (or if I revive it, #13309 which has better syntax) obsoletes the need for duplicate pairs of inplace+outplace versions. This gives us N API's instead of 2N APIs.

@ringabout
Copy link
Member Author

ringabout commented Nov 30, 2020

All right, I will implement the inplace version soon.

@timotheecour timotheecour mentioned this pull request Nov 30, 2020
1 task
@ringabout ringabout added the TODO: followup needed remove tag once fixed or tracked elsewhere label Feb 26, 2021
lib/pure/algorithm.nim Show resolved Hide resolved
lib/pure/algorithm.nim Outdated Show resolved Hide resolved
lib/pure/algorithm.nim Outdated Show resolved Hide resolved
lib/pure/algorithm.nim Outdated Show resolved Hide resolved
@timotheecour
Copy link
Member

we need to decide on #16182 (comment) before merging

Copy link
Member

@timotheecour timotheecour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@timotheecour
Copy link
Member

assert i == 0 or cmp(a, result[i-1]) <= 0 # or similar
result[i] = a

(can use a template since used in a few locations)

@ringabout ringabout requested review from Araq and Clyybber March 12, 2021 14:23
changelog.md Outdated Show resolved Hide resolved
@timotheecour
Copy link
Member

@Araq friendly ping on this

@Araq Araq merged commit c719d79 into nim-lang:devel Mar 23, 2021
@ringabout ringabout removed the TODO: followup needed remove tag once fixed or tracked elsewhere label Mar 23, 2021
ringabout added a commit to ringabout/Nim that referenced this pull request Mar 23, 2021
Araq pushed a commit that referenced this pull request Mar 23, 2021
* fix nim js cmp fails at CT

* followup #16182
ardek66 pushed a commit to ardek66/Nim that referenced this pull request Mar 26, 2021
* add merge to algorithm
* Apply suggestions from code review
* Update lib/pure/algorithm.nim
* Apply suggestions from code review
* Update changelog.md

Co-authored-by: ee7 <[email protected]>
ardek66 pushed a commit to ardek66/Nim that referenced this pull request Mar 26, 2021
* fix nim js cmp fails at CT

* followup nim-lang#16182
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants