-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Add support for concurrent specs #8944
base: master
Are you sure you want to change the base?
Conversation
@block : (->) | Nil) | ||
initialize_tags(tags) | ||
end | ||
|
||
# :nodoc: | ||
def run | ||
Spec.root_context.check_nesting_spec(file, line) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine to remove this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s actually required. It checks to see that no two it
blocks are running at the same time, but that’s literally the point of this PR. 🙂 The check to see whether someone defined an it
block inside another it
block was changed in this PR to using a running
state and raising if you try to add another example while in that state.
Co-Authored-By: Sijawusz Pur Rahnama <[email protected]>
I find the approach a little weird. The I'd instead spawn a configurable number of concurrent runners (parallel with MT enabled) that would take specs to run from a global pool, instead of spawning as many fibers as there are |
The same way that If you have a better name in mind than
Do you have an end goal in mind that this supports? |
I've renamed |
@ysbaddaden I think your suggestion for a static level of concurrency has merit here. I've been running this against apps using Neo4j as their DB where connections are pretty lightweight, so I didn't see any problems with an unbounded connection count, but I was just putting together an example that uses Postgres that didn't go very well. Postgres hard-limits you on the number of open connections to 100 in its default configuration because each new connection On that note, does anyone have any suggestions on how to do that? I was thinking a
I still think that all examples will need to be direct children of the channel = Channel(Example).new(capacity: examples.size)
done_channel = Channel(Nil).new
# Spread the work across `job_count` fibers
job_count.times { spawn consume(channel, done_channel) }
# Put the work into the queue
examples.each { |example| channel.send example }
# Wait for each one to finish
examples.each { done_channel.receive } |
Still needs cleanup and the specs are currently broken, but I was able to get the Given the following spec, which spans inserting a trivial amount of data into the DB to inserting a lot of data into the DB just to give it something to wait on: require "./spec_helper"
require "db"
require "pg"
db = DB.open("postgres:///")
db.exec %{CREATE EXTENSION IF NOT EXISTS "uuid-ossp"}
db.exec "DROP TABLE IF EXISTS example_table"
db.exec <<-SQL
CREATE TABLE example_table (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
name TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
)
SQL
describe "somethign we can do concurrently" do
(1..1000).to_a.shuffle.each do |i|
it "does a thing that takes a long time(#{i})" do
i.times do |i|
db.exec "INSERT INTO example_table (name) VALUES ('Item ' || $1)", i
end
end
end
end TL;DRI was able to drop the runtime from 2:32 to 0:27 (~1/5 the runtime). No concurrency (no
|
It looks like when I started this, |
This is awesome! The term |
Agreed, there are quite a few ways this could go and I honestly don't know which is preferable. Regarding tests sharing common state, do you mean something like:
… where both A and B operate on the same record in the DB and are expected to run in order? |
It's been a year, but having just saw this:
I was hoping this could be dusted off and looked at again. 😅 What are your thoughts on this @beta-ziliani @straight-shoota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good 👍
It needs an update, though. Please merge master.
def run | ||
run | ||
yield | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure anymore. I wrote this over a year ago. 🙂
I remember I needed something to send a notification of completion to the main fiber. I don't remember why I implemented it this way specifically, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually used? I don't see where.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had to update this PR several times since it's been opened due to implementation changes in Spec
, so I'm not 100% sure which changes were made when and why, but I definitely did use it at one point.
Looks like I used it in the beginning when there was a distinction between sync vs async specs (before I added the -j
option).
@@ -14,8 +14,8 @@ module Spec::Methods | |||
# ``` | |||
# | |||
# If `focus` is `true`, only this `describe`, and others marked with `focus: true`, will run. | |||
def describe(description, file = __FILE__, line = __LINE__, end_line = __END_LINE__, focus : Bool = false, tags : String | Enumerable(String) | Nil = nil, &block) | |||
Spec.root_context.describe(description.to_s, file, line, end_line, focus, tags, &block) | |||
def describe(description, file = __FILE__, line = __LINE__, end_line = __END_LINE__, focus : Bool = false, concurrent : Bool = false, tags : String | Enumerable(String) | Nil = nil, &block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new parameter must not be added before any existing positional parameter for backwards compatibility. Please move concurrent
to the end. And I would prefer to make it a named parameter. There's not much sense in writing method calls with seven positional arguments.
(I also suggest to make all but description
named parameters, but that's a story for another time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I would prefer to make it a named parameter.
What do you mean? I thought the caller determined whether something was a named parameter.
To be clear, I agree on backwards compatibility (especially since 1.0 is out), but also I wouldn't expect anyone to be writing these positionally and I certainly wouldn't want to work on code that used them that way. 😂
Regardless, IIRC with the -j
option the concurrent
parameter is no longer be necessary anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? I thought the caller determined whether something was a named parameter.
You can make it so an argument is required to be provided as a named argument.
When a splat parameter has no name, it means no more positional arguments can be passed, and any following parameters must be passed as named arguments. For example:
# Only one positional argument allowed, y must be passed as a named argument
def foo(x, *, y)
end
foo 1 # Error, missing argument: y
foo 1, 2 # Error: wrong number of arguments (given 2, expected 1)
foo 1, y: 10 # OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, TIL. I'd always wondered why the nameless splat existed in some stdlib methods.
Before we get too deep in the weeds here, there are still open questions about what it means to run concurrent specs. There are multiple implementations in this PR — Juan made some observations almost a year ago and I followed up a few hours later trying to get some more insight. Maybe we could start there. Specifically, what is the granularity we're looking at for concurrent specs? IIRC, the current implementation does it at the Other questions that could help clarify the larger vision of concurrent specs in Crystal:
|
Many spec suites probably contain specs that can't run concurrently because they access shared resources. That could be a database, environment variable, a specific file path, non-concurrent API etc. For a concrete example, all stdlib specs using the I assume that such tests would typically be collected in a single example group (if there are other specs in the group, the non-concurrent specs could be filtered to a sub-group). In that scenario, it would be sufficient to configure the example group as running its examples in sequence to avoid concurrency issues. However, I believe there could be a relatively trivial solution if we operate on very small sections. Files execute sequentially. Every example group can be considered a concurrency context, and defines whether its children execute sequentially or concurrently. This is closely related to the concept of structured concurrency in #6468. Let's take this example: # bar_spec.cr
describe "bar" do
it "A" {}
it "B" {}
describe "C" do
it "1" {}
it "2" {}
end
end
# foo_spec.cr
describe "foo" do
it "A" {}
end
describe "baz" do
it "A" {}
end The execution order would be:
For simplicity, I just assumed that all example groups would be concurrent by default. That does not need to be the case. Either way, it is possible to configure every example group with Considering that a group usually contains more examples than the concurrency limit and that these examples roughly have a similar execution length, this is probably a reasonable approach. |
The move to allow
focus: true
andbefore_each
in specs opened the door to asynchronous specs sinceit
no longer runs the spec eagerly. This is something I've been wanting for a while because the majority of your time spent in running the test suite for a typical web app is spent in I/O with the DB rather than in CPU time.If all of your specs are asynchronous, your suite is only as slow as your slowest spec:
With this PR, this is the result:
A few things I had to do to make this happen:
Example
s are now direct children of theRootContext
if async
checks could handle that.ExampleGroup#run
here unnecessaryExampleGroup
makes all its descendants asyncExampleGroup
runs all its descendants even if they're not focused, letting those descendants "inherit" the focus statusImportant notes
A spec suite that contains
async
specs cannot depend on any global mutable state. For example, you may not be able to runUser.count.should eq 1
unless all of your specs run within their own isolated transaction which is rolled back at the end of each example. And even then! :-)