Notification Backbone Architecture Design #1386

eeyun · 2020-04-10T15:29:52Z

Builder Notifications

An often and long-requested feature for builder users is a meaningful notifications system. The request for this feature is one of the earliest requests from our internal customers. In our investigation into the requirements of such a system by reading the many feature requests related to a notification system we've concluded there are 2 specific user personas (or at least 2 distinct use cases). The first, I'll call the Audit persona. The second, I'll call the Remediate persona.

Audit

The audit use case is pretty straightforward. This user type has a need for a simple (if not configurable) mechanism for auditing the actions taken in an origin. Whether those are completed builds, promoted packages, new users, makes no difference. The desire is quite simply to have the data available in a consumable way. This persona doesn't really have an intention to perform any automated remediation as much as a desire to potentially perform some human process based on what the auditable information shows. A solid example for this persona might be an operator that's been on vacation for a week. They return to work and would like to have an idea of what changes have been made to their packages and environments.

Remediate

The remediate persona has overlapping but slightly different requirements for the notification system. Their focus is much more tuned towards automation. Ideally, these users would like to be able to subscribe to various bits of information and have that information fired off to an endpoint that can consume that data as actionable. It should be clear the pieces of information desired by both personas is pretty well equal, the difference is largely in what the remediate persona intends to do with the data. A solid example for this persona might be a user that works on a release engineering or SRE team. On promotion of a build group to stable the user might like their CI system to fire off a functional test pipeline of their dependent packages. Or perhaps they'd like to make that information actionable to a third party by sending them an email or slack notification.

Motivation

As a user of both SaaS and On-prem builder I want the ability to subscribe to actionable data related to origin and package level operations in the builder cluster, whether for auditability or. remediation. I need the data in an easily consumed or configurable format.

Specification:

For our first vertical slice we're going to focus (again) on the needs of the internal customer who has a pressing and immediate requirement. As can be expected any architectural decision outlined here can be altered, this is simply an example of the way we're envisioning the work will go. For our purposes, most of the data we need is already generated by the system. It lives in our collection of audit tables audit_jobs, audit_origin, and audit_package. This data is persisted for all time, currently. What we don't have in our database is any primitives around users subscribing to this data nor any mechanism for exposing it. As such we feel fairly confident that a new table will need to be created to hold user notification subscription data. The shape of that is not yet decided but it's probably safe to assume that data can be subscribed both at the origin and package level and thus we might have a table with multiple entries for a single user much in the same way we handle origin_member information.

In our discussion we've largely come to the conclusion that a webhook based consumption model will provide the most flexibility to solve user requirements. That does not preclude us from adding other endpoints in the future that might just be a dump of all auditable data for an origin or a package, however for the sake of the first slice we believe those endpoints to be of significantly lower value. Another discussion point in the architecture has been around whether the new notification backbone should exist as a piece of the API. There are benefits to designing the system this way but ultimately we've all come to the conclusion that doing so would be overloading the purpose of the API as such we imagine the new notifier to be a separate service or in the most minimal, a sidecar to the API.

Some expected or potential requirements will be - queueing and serial/stream processing of outbound notifications, tracking receipt of a notification, the ability to re-trigger recent notifications manually. Automated retry of failed webhook delivery. Subscription to desired notification information (though this might not exist in the initial slice of work). Notification (perhaps in the UI) of failed notification firing. Configurability of webhook destination location.

Ultimately we want to have all operations in an origin as options for subscription but our initial slice will likely focus on core-plans promotion to stable at the origin level. We believe GitHub's webhook system to be most closely aligned with what we ultimately would like to create and thus will be basing many of our implementation assumptions on their architecture.

The consumption model of the user seems like it may have a material effect on the architecture of the work. For our initial target of "package (or group, or universe) promotion" we can likely assume the user's immediate remediation activity will be a package sync from SaaS Builder to an On-Prem depot.

As we are imagining the future of the notification system, it seems plausible that build workers themselves could subscribe to notifications for the sake of triggering their own build events. While this isn't an immediate target, it is in line with a potentially desirable architectural future state.

Either way, we imagine whatever service gets created as output of this effort will be something that we run both on SaaS builder as well as on-prem. Doing so will likely give our users more flexibility and control over the manner in which they configure their hab based workflows.

Related Issues

All related issues have been collated into the following "epic". We aren't treating this epic as a true epic as much as a place to collect the multitude of notification related issues that have been submitted over the years.

#839

The text was updated successfully, but these errors were encountered:

stale · 2022-08-12T01:32:13Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We value your input and contribution. Please leave a comment if this issue still affects you.

stale · 2023-09-17T00:38:04Z

This issue has been automatically closed after being stale for 400 days. We still value your input and contribution. Please re-open the issue if desired and leave a comment with details.

eeyun added the Type:Bug label Apr 10, 2020

jeremymv2 added Type:Feature E-less-easy and removed Type:Bug labels Dec 1, 2020

rahulgoel1 removed the E-less-easy label Jul 23, 2021

stale bot added the Stale label Aug 12, 2022

stale bot closed this as completed Sep 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notification Backbone Architecture Design #1386

Notification Backbone Architecture Design #1386

eeyun commented Apr 10, 2020 •

edited

Loading

stale bot commented Aug 12, 2022

stale bot commented Sep 17, 2023

Notification Backbone Architecture Design #1386

Notification Backbone Architecture Design #1386

Comments

eeyun commented Apr 10, 2020 • edited Loading

Builder Notifications

Audit

Remediate

Motivation

Specification:

Related Issues

stale bot commented Aug 12, 2022

stale bot commented Sep 17, 2023

eeyun commented Apr 10, 2020 •

edited

Loading