-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Docker Swarm Autoscale support #1497
Conversation
@nathanielc getting error
I see in the code that it's default taking tls enabled docker cluster. Is it possible to get client for a docker cluster without tls? |
@adityacs Its complaining that it can't find the cluster with the ID of the empty string "". Either set the TLS is controlled by the protocol specified in the list of servers whether its http or https. |
@nathanielc getting below error
It's failing to read docker response and it keeps on retrying to apply the replica value. However, scaling is working in Docker cluster. If I scale down manually on docker cluster, Kapacitor again retries immediately to scale. I believe it should not try to apply the scale event if it has already scaled within last 5mins or so. |
@adityacs What version of docker are you running? It seems to be returning a different http response code than expected. The code is written to consume the v1.30 API version.
This is possible via the |
@nathanielc I am using 17.06 version. The API version is also v1.30. Since, the response is not correct, irrespective of .increaseCooldown() and .decreaseCooldown() settings it is still trying to apply the event repeatedly for every |
@adityacs I have pushed up an update that should fix the error. |
@nathanielc The error is fixed now. However, I am not seeing expected behavior for scaling.
Kapacitor should try to scale the containers right? What is the expected behavior here without increase/decrease cooldown? |
@adityacs Hmm, what is the reason for manually scaling the service? For k8s Kapacitor assumes that is has full control over the scaling factor and so it caches it instead of making an API call to k8s for every evaluation of the node. As a result if the autoscale node doesn't make any changes to the scale factor then no new API calls will be made to the system to set a new scaling value. Docker swarm has inherited that assumption. Also in a live system the autoscaling is probably set up as a feedback loop. If you manually change the scale factor that will likely cause the feedback loop to adjust accordingly. We could add some kind of timeout for the cached values and periodically check, but that seems like it would create a confusing user experience. Thoughts? |
@nathanielc Makes sense. It's good we give full control to Kapacitor. However, if we include cache timeout, we can make it as node property where user can set |
@nathanielc It's working as expected. I tried even with Thanks :) |
Sounds reasonable to me. Let's do it in a separate PR. Do you want to take a crack at it? I'll clean up this PR for the rebase etc and get it merged. |
Need some time. Will work on this. Thanks for this PR. |
@nathanielc One more thing. Is there any plan for supporting ec2 instances scaling or gce instances scaling with auto scaling? or do you think it's beyond the scope of Kapacitor? |
@adityacs There are no current plans but if someone wanted to add them its much easier now as the autoscale node is generalized. |
@desa Could you give this a quick review? |
@nathanielc Cool. Would like to work on this. Will let know my progress. |
7edc0a3
to
40170d7
Compare
@nathanielc I'm wondering if there should just be a generic This would be something like this
or
The way that it is currently implemented is a bit different than what I have seen else where in the Kapacitor code base (this could just be still not being 100% up to speed with how all of the nodes are implemented). It'd also open up the ability to add more autoscaling "handlers" in a pattern similar to the alert handlers. I think it would also be possible to make this change while still keeping backwards compatibility with the Thoughts? |
@desa I dislike the way the alert handlers work, because everything in TICKscript is at most 2 deep, node -> property, except for the alert handlers which are 3 deep node-> handler -> property. I dislike that handlers are different from everything else and don't want to add more like them Also the difference here is that it makes sense to have multiple handlers for a single alert, and it doesn't make as much sense to have multiple autoscale handlers since its likely you are only using one. So, in short I want it to remain the way it is. |
Makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple questions.
New int | ||
} | ||
|
||
type AutoscaleNode struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why ResourceID
, Autoscaler
, AutoscaleNode
are exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, other than that's how other nodes work.
autoscale.go
Outdated
currentField string | ||
} | ||
|
||
// Create a new AutoscaleNode which can trigger autoscale event for a Kubernetes cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment needs updating not just Kubernetes any more :)
ID string `toml:"id" override:"id"` | ||
Servers []string `toml:"servers" override:"servers"` | ||
// Path to CA file | ||
SSLCA string `toml:"ssl-ca" override:"ssl-ca"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of these okay to expose via the API? I see that the other services do this as well, but it feels like there might be some security issue here? Am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are just paths to files on disk. That should not be sensitive information.
services/swarm/cluster.go
Outdated
} | ||
|
||
func (s *Cluster) Client() (client.Client, error) { | ||
config := s.configValue.Load().(Config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
I like the pattern of having a method that does the casting for you e.g.
func (s *Service) config() Config {
return s.configValue.Load().(Config)
}
services/swarm/cluster.go
Outdated
func NewCluster(c Config, l *log.Logger) (*Cluster, error) { | ||
clientConfig, err := c.ClientConfig() | ||
if err != nil { | ||
return nil, errors.Wrap(err, "failed to create k8s client config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be "failed to create swarm client config"
?
services/swarm/cluster.go
Outdated
} | ||
cli, err := client.New(clientConfig) | ||
if err != nil { | ||
return nil, errors.Wrap(err, "failed to create k8s client") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be "failed to create swarm client config"
?
40170d7
to
163c3d4
Compare
163c3d4
to
5dd2183
Compare
Add Docker Swarm Autoscale support
i use version kapacitor 1.51, and swarmAutoscale() not working, either enable or disable swarm service in configuration file, i got always message unknown swarm cluster "" my configuration file: [[swarm]] |
This PR builds off #1292 and #1425. Since significant change have been made to the way nodes are written the cahnges meritted a new PR.
@adityacs Can you take a look at the PR and possibly test it out in your environment? The PR is basically the same as what you already had but updated to reflect some major changes being made in Kapacitor.
I have tested this with my local Docker swarm cluster.