Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --no-listen-gossip flag to prevent gossip layer from starting up #6176

Closed
wants to merge 6 commits into from

Conversation

mpeck
Copy link
Contributor

@mpeck mpeck commented Feb 19, 2019

Signed-off-by: Matthew Peck [email protected]

@thesentinels
Copy link
Contributor

Thanks for the pull request! Here is what will happen next:

  1. Your PR will be reviewed by the maintainers
  2. If everything looks good, one of them will approve it, and your PR will be merged.

Thank you for contributing!

@mpeck mpeck changed the title Add --no-gossip-listen flag to prevent gossip layer from starting up Add --no-listen-gossip flag to prevent gossip layer from starting up Feb 19, 2019
@mpeck mpeck force-pushed the peck/no-listen-gossip branch from 97f88fb to 4d8dc55 Compare February 20, 2019 14:35
Copy link
Contributor

@baumanj baumanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not all the way through this yet, but I thought it would be useful to give some initial feedback.

components/sup/src/cli.rs Outdated Show resolved Hide resolved
components/hab/src/cli.rs Show resolved Hide resolved
components/sup/src/cli.rs Show resolved Hide resolved
components/hab/src/cli.rs Outdated Show resolved Hide resolved
components/sup/src/main.rs Outdated Show resolved Hide resolved
@@ -262,7 +262,7 @@ pub struct GatewayState {

pub struct Manager {
pub state: Arc<ManagerState>,
butterfly: butterfly::Server,
butterfly: Option<butterfly::Server>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of an optional server here, which requires inserting conditionals everywhere it's referenced, did you consider making this type a trait object instead? That way the manager logic stays the same apart from initializing butterfly to either a regular butterfly::Server or an implementation that has no-ops for everything.

I'm not sure whether this would be more confusing or not in the end, but it's worth considering since option types tend to infect the logic of otherwise linear code with an abundance of conditionals.

None,
Some(&fs_cfg.data_path),
Box::new(SuitabilityLookup(services.clone())),
)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this isn't strictly required as part of your change, I think any time you're modifying code in a particular area, it's a good opportunity to clean up stuff that could use it. In this case, it's the fact that butterfly::Server::new is very awkward, and now that the logic around calling it is becoming more complicated by virtue of the if let we need to put it inside, I think this change will benefit from the cleanup.

The problem I see with butterfly::Server::new is twofold:

  1. It takes separate args for swim_addr and gossip_addr, but in practice, callers always provide the same values; we shouldn't use separate args until we need the flexibility.

  2. In general, new should return Self. While we agreed that Result<Self> sometimes an acceptable variant, in this case the only reason new could fail is because of a failure to resolve the address provided to swim_addr/gossip_addr, and it makes the code really messy:

        match (maybe_swim_socket_addr, maybe_gossip_socket_addr) {
            (Ok(Some(swim_socket_addr)), Ok(Some(gossip_socket_addr))) => {}
            (Err(e), _) | (_, Err(e)) => Err(Error::CannotBind(e)),
            (Ok(None), _) | (_, Ok(None)) => Err(Error::CannotBind(io::Error::new(
                io::ErrorKind::AddrNotAvailable,
                "No address discovered.",
            ))),
        }

    That code is kind of mind-bending and especially unnecessary given that the two addr args are always the same.

I'd recommend reducing butterfly::Server::new to accepting a single address arg, and changing the type from the generic T: ToSocketAddrs to the concrete SocketAddr. Add a separate function to do the address resolution, which can fail and then butterfly::Server::new can be infallible and return Self.

This may allow the code here, which is structured like:

        let butterfly = if let Some(gossip_listen) = sys.gossip_listen() {Some()
        } else {
            None
        };

to be changed to the more functional approach:

        let butterfly = sys.gossip_listen().map(|gossip_listen| {};

But that may also be more trouble than it's worth because of the ownership changes that come along with using the closure map requires; use your judgement about whether this is worth it.

if service.topology == Topology::Leader {
self.butterfly.start_election(&service.service_group, 0);
if let Some(ref butterfly) = self.butterfly {
self.gossip_latest_service_rumor(&service);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an if let Some(…) = self.butterfly conditional both here and inside gossip_latest_service_rumor. For simplicity and testability, I'm a fan of having code be either:

  1. Logic with loops and branches that encode behavior
  2. Lists of mostly unconditional commands that plumb together different chunks of (1)

Try to minimize mixing (1) and (2). The (1)-type code is where the testing should be focused, so making that code as purely functional as possible, with a minimum of side effects or superfluous state carried around will make the testing easiest. The (2)-type code won't really have any behavior to test and with a language like rust, errors in plumbing will nearly all be caught by the compiler.

Despite it having a conditional in the form of if let, I would call this more plumbing (2) than behavioral logic (1), so keeping it at the higher level and making gossip_latest_service_rumor more pure by passing it the specific data it needs to operate on rather than all of self should eliminate the redundancy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the general pattern I think we should try to avoid:

        if let Some(ref butterfly) = self.butterfly {
            self.some_function_that_needs_butterfly();

This is how we end up exploding the number of redundant if let conditionals with self.butterfly. Inside any of those conditionals, if we're calling functions that need butterfly, we should provide it as a parameter. This gives is a purer, more testable function and avoids the need for redundant checks.

create_butterfly_client(mgr)?
.send_service_file(service_group, filename, version, &content, is_encrypted)
.map(|_| req.reply_complete(net::ok()))
.map_err(|e| net::err(ErrCode::Internal, e.to_string()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way you refactored this to reduce the code duplication! I think you could go even further, since 3 of these 4 lines are repeated 3 times. In order to put the emphasis on what's different, you could further factor out the rest of the error handling. I poked at it a bit and was able to get this to compile:

    with_butterfly_client(mgr, req, |mut client| {
        client.send_service_file(
            service_group.clone(),
            filename.clone(),
            version,
            &content,
            is_encrypted,
        )
    })

where with_butterfly_client is like:

fn with_butterfly_client(
    mgr: &ManagerState,
    req: &mut CtlRequest,
    f: impl Fn(butterfly::client::Client) -> butterfly::error::Result<()>,
) -> NetResult<()> {// basically what you had already inside

With a bit more work, I imagine you could get rid of the clones as well, though for such small pieces of data attached to user-initiated commands, it may not be worthwhile.

However, there's an alternative to consider here: is this the right level to put the conditional handling of mgr.cfg.gossip_listen? Conceptually, a single check for whether we have a gossip server running at the point commands are dispatched seems to reduce the complexity, though if you looked at https://github.com/habitat-sh/habitat/blob/master/components/sup/src/ctl_gateway/server.rs#L303, I can certainly understand why you didn't go that route: it's far from obvious to see how to get at ManagerState at that level. Given the state of that code (which @christophermaier is in the midst of refactoring), I think your current approach is pretty reasonable, but I'd like to see an issue filed to revisit this once the control gateway code has been simplified because I think that's the more appropriate level to handle this.

Copy link
Contributor

@baumanj baumanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're going to explore a different approach here, but since I already made these comments I'll share them for general edification and if we return to this approach in the future.

@@ -707,7 +707,7 @@ pub fn sup_commands() -> App<'static, 'static> {
// set custom usage string, otherwise the binary
// is displayed as the clap_app name, which may or may not be different.
// see: https://github.com/kbknapp/clap-rs/blob/2724ec5399c500b12a1a24d356f4090f4816f5e2/src/app/mod.rs#L373-L394
(usage: "hab sup <SUBCOMMAND>")
(bin_name: "hab sup")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good in terms of the help output, but I think the comment needs to be updated. Also, why not fix up the other instances of this same issue while we're at it?

@@ -868,8 +883,10 @@ impl Manager {
fn persist_state(&self) {
debug!("Updating census state");
self.persist_census_state();
debug!("Updating butterfly state");
self.persist_butterfly_state();
if let Some(ref butterfly) = self.butterfly {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're not doing gossip, is there any census state to persist? We should look into skipping census operations such as persist_census_state above as well.

@@ -884,8 +901,8 @@ impl Manager {
.census_data = json;
}

fn persist_butterfly_state(&self) {
let bs = ServerProxy::new(&self.butterfly);
fn persist_butterfly_state(&self, butterfly: &butterfly::Server) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A method which takes self as well as a parameter that's contained within self seems like an anti-pattern to me. I understand the reasoning here: to avoid doing the if let that the caller just did. I think that's good, but take it one step further and change this from a method that carries around unnecessary state to a function that receives only the arguments it specifically needs.

We could do something like this:

    fn persist_butterfly_state(gateway_state: &mut GatewayState, butterfly: &butterfly::Server) {
        let bs = ServerProxy::new(butterfly);
        gateway_state.butterfly_data = serde_json::to_string(&bs).unwrap();
    }

or even better:

    fn persist_butterfly_state(butterfly_data: &mut String, butterfly: &butterfly::Server) {
        let bs = ServerProxy::new(butterfly);
        *butterfly_data = serde_json::to_string(&bs).unwrap();
    }

But I think the best option to to make it a pure function: only the required arguments and no side effects:

    fn persist_butterfly_state(butterfly: &butterfly::Server) -> String {
        let bs = ServerProxy::new(butterfly);
        serde_json::to_string(&bs).unwrap()
    }

Though the name should probably change since this isn't really persisting so much as serializing. This provides a nice, simple test surface, though.

And while we're at it, we should probably change persist_census_state similarly. This has the benefit of increased testability, but it also eliminates the need for both persist_census_state and persist_butterfly_state to take the same write lock on gateway_state right after one another. It makes more sense for the caller (persist_state) to take that lock once.

fn persist_butterfly_state(&self) {
let bs = ServerProxy::new(&self.butterfly);
fn persist_butterfly_state(&self, butterfly: &butterfly::Server) {
let bs = ServerProxy::new(&butterfly);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let bs = ServerProxy::new(&butterfly);
let bs = ServerProxy::new(butterfly);

butterfly is already a reference, so the & is unnecessary.

gossip_ip: Cow::Owned(IpAddr::V4(Ipv4Addr::new(0, 0, 0, 0))),
gossip_port: Cow::Owned(1234),
gossip_ip: None,
gossip_port: None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't seem like they should be none based on the comment; I'd keep the old values (inside Somes, that is)

.gossip_ip
.map(|ip| ip.to_string())
.unwrap_or(String::new());
sys_info.gossip_port = u32::from(self.gossip_port.unwrap_or(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like we're taking a situation where we have potentially no gossip address (because we're not running a gossip server) and we're making up invalid values to pass to SysInfo. This sounds like a subtle bug waiting to happen.

Insofar as Sys may be used for local consumption, we need to introduce Options to account for the fact that we may not have a gossip ip/port. For things like SysInfo, which appear to only be used in the context of a rumor (I haven't thoroughly read the code, so I could be mistaken) and so should never encounter the situations where we have no gossip ip/port. If we find ourselves building bogus values like this, we're handling things at the wrong level. If we're running without gossip, we should never be creating rumors at all, right?

@mpeck
Copy link
Contributor Author

mpeck commented Feb 25, 2019

Closing for now in favor of #6209

@mpeck mpeck closed this Feb 25, 2019
@mpeck mpeck deleted the peck/no-listen-gossip branch March 7, 2019 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants