Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excluding Events in Funnels #5074

Closed
neilkakkar opened this issue Jul 12, 2021 · 11 comments · Fixed by #5150
Closed

Excluding Events in Funnels #5074

neilkakkar opened this issue Jul 12, 2021 · 11 comments · Fixed by #5150
Assignees
Labels
enhancement New feature or request

Comments

@neilkakkar
Copy link
Collaborator

Is your feature request related to a problem?

Say you have a funnel: Sign Up -> Discover Learning. And you're interested in a specific kind of user: one that, say, didn't invite teammates to the project.

It's hard to represent this using segmentation / breakdowns, since the users you're interested in are defined by what events they didn't trigger.

So, we'd like to support excluding events (and in turn, disqualifying users) who did a certain event.

More example use cases:

  1. (1) Search -> (2) Watch Movie. And in this case, I want to exclude extra search events happening inside the funnel. (So, focusing on users who watched a movie after just one search, instead of multiple searches).
  2. Looking at behaviour funnel of users who did NOT click the call to action at any point.
  3. ???

Describe the solution you'd like

The first and third examples are qualitatively different from the second example (search->movie).

Let's tackle the former first: In this case, given a funnel date range, I want to exclude all users who did the event within the date range. This implies an exclusion filter over the date range, i.e. exclude all people who did this specific event between date_from and date_to.

This doesn't work with the latter case, because the funnel step, and the exclusion event are the same: "search". For this case, we introduce the concept of "exclusion within conversion time".

What is Conversion Window?

Given a funnel A->B, and a date range from, say, -7d to now, the conversion window is the maximum allowable time within which a user should do events A and B. (our code controls this parameter using the funnel_window_days parameter).


Exclusion within conversion time means that if the user does events A->A->B within the conversion window, this user is disqualified from the funnel.

However, if this user did A->B->A within 20 minutes (which is within the funnel date range, and within conversion window) - this user is not disqualified, because they completed the funnel before the second A event.

This also implies that when dealing with exclusion within conversion time with duplicates, we choose the first event that we see as the start of the conversion window.

Do we count a user where we see two As, but no B? If the second A happened before the conversion window ends, then they're disqualified. Otherwise, no.

Additional context

What we're deciding not to do(?)

An alternative way to deal with (search->movie, excluding multiple search) case is to allow exclusion between steps. Such that, say you have a funnel, X->A->B, you can have exclusion events within every two steps of the funnel (X->A, A->B).

This introduces complications, both in the UI, and what the exclusions mean. For example: consider X->A-B and exclusion of A between A and B: Is the 'first' A the A corresponding to the first X, or the first A corresponding to all X->A half-funnels?

Is there a way to work around this using what we have?

Yes! Given an exclusion funnel result, we could create a static cohort out of all the people entering the funnel, and now you can create all sorts of funnels out of this cohort.

Alternative solutions

It's worth thinking about other ways of solving this, specially the "excluding event which is also a part of the funnel step" case. I haven't done this yet, opting to preserve the context so far for now.

Thank you for your feature request – we love each and every one!

@macobo
Copy link
Contributor

macobo commented Jul 12, 2021

Note that this is not a funnel-specific abstraction/search thing but rather a general analytics filtering capability, often associated with a time window.

He@p has this functionality:

image

Related issue: #2594

@neilkakkar
Copy link
Collaborator Author

neilkakkar commented Jul 12, 2021

Hmm, now getting into problem solving mode, and it sounds like maybe separating out the two use cases into two different abstractions is better: make no-duplicates a new type of ordering, such that event exclusion definition becomes cleaner: just means excluding that event from the funnel range.

Thinking from a users point of view, this really helps resolve confusion between: Select "search" while excluding "search" (huh?) vs "select search, disqualifying duplicates" (aha!) ?

@neilkakkar
Copy link
Collaborator Author

Also, another challenge to the problem: When would you want to exclude events & disqualify persons in the funnel date range, but not globally?

Or more specifically, when would you want to disqualify persons outside the funnel conversion time, but inside the funnel date range?

That is, do we need the date range exclusion at all, given we have a "within conversion time" exclusion, and a global exclusion?

@kpthatsme
Copy link
Contributor

Hey @neilkakkar great discussion this AM – summarizing a few points here:

  • No separate concept of "exclusions within the conversion window". Setting a conversion window is a top level requirements to create a funnel. Every funnel must have a conversion window set, and event exclusions have no impact on the conversion window.

  • Event ordering semantics (i.e. specifying a strict funnel) are a separate concept from exclusions and should not impact the options we provide in terms of exclusions.

  • We want support for two kinds of exclusions:

    • Across all steps
    • Between steps (the user must specify the steps the event could not occur between)
    • General way this should work on a high level:
      • Make a funnel: define a chart date range, the funnel events, and a conversion window
      • For each day, (x-axis val on the chart), pull users that perform the first event of the funnel
        • For each user, look conversion window ahead from funnel entry time to see if they performed the necessary events needed to count as a conversion.
          • If all steps were not complete or exclusion cases are detected, mark as drop-off.

Example

Let's say we have an example funnel:

  1. View ad
  2. Play song
  3. Download song
  4. Purchase tickets

If we excluded the event "View email" (not present in the funnel at all):

  • Across all steps: For someone to count as a conversion, they cannot perform "View email" between the time they perform the first event and before they perform the 4th event.
  • Between steps: Let's say we chose steps 2 and 3. For a conversion to count, this person could not have performed a "View email" event between those steps. Performing the event before or after has no bearing on the result.

If we excluded the "View ad" event:

  • Across all steps: For someone to count as a conversion, they cannot perform a second "View ad" event anytime between the sequence of above 4 events. We do not exclude this event completely because this funnel includes it.
  • Between steps: Let's say we choose to exclude between Steps 3 and 4. Conversions here could perform "View ad" as many times as they want, as long as it hasn't occurred between steps 3 and 4.

@neilkakkar
Copy link
Collaborator Author

Cheers, thanks! It came out during the call that I don't really have a good reason for exclusion between specific steps, except that I can't imagine how it would be useful. (That's a failure of imagination, not of utility).

I'm still not 100% convinced it will be useful, but happy to defer to people who're actually going to use this (Kunal)+ user studies done by competitors. (We had a look at how competitors indeed support this).

So, I'm moving forward with implementing this.


Further, the date range exclusion doesn't make sense with this, so we're discarding that instead. And global exclusions happen using cohorts (users who never did event X), so we're good there as well.

Thanks!

@neilkakkar neilkakkar self-assigned this Jul 13, 2021
@neilkakkar
Copy link
Collaborator Author

Further, to refine exclusions over orderings: this doesn't make sense for strict funnels, and between specific steps doesn't make sense for unordered funnels.

Between any 2 steps makes sense for ordered funnels.

@neilkakkar
Copy link
Collaborator Author

Some great feedback from @macobo : Looks like I got too hung up on the specifics / edge cases - I jumped too quickly, without thinking from first principles.

Concretely, I started with thinking about why users would / wouldn't want to exclude events in a part of the funnel, without asking the deeper question: What question are users trying to answer when they exclude events in a part of the funnel?

And then, I could follow from there to whether funnel exclusion steps is the way to go, or a more general approach, like the one he@p does, as mentioned by Karl above^.

This is probably also why I wrote: I'm still not 100% convinced it will be useful, but happy to defer to people who're actually going to use this.


While the final result may not change, it's important to me that I take the right path to the decision.

Challenging the decision

#1: Comparing different flows, one where users didn't do X in their journey, vs where users did X in their journey, is more general that just funnels. (this is what exclusions help us do)

I agree with this. However our focus right now is funnels, so I'd restrict scope to funnels. Plus, trends/retention/whatever would have to implement their own logic for filtering like so, since the queries are different. Perhaps it's worth abstracting common bits out into ClickhouseEventQuery, but that only happens after we've seen the pattern.

#2 So, what question exactly are users answering with excluding events in a part of the funnel? More importantly, is that the right question for users to be asking, and the right way for us to be presenting that information?

The first one is above: Comparing across flows - Users who do an event X in between their conversion, vs, all users who convert. You might have a multivariate A/B test, where you want more granular level analysis on the A side: Users on a certain flow may or may not trigger event X, and you want to see their conversion to determine whether the A side is good or not.

Err, but, can't you do this using two funnels as well: event 1 -> event 2 -> X -> event 3 and event 1 -> event 2 -> event 3. ? This gives you the same answer for funnels I think, where instead of excluding those users who did X, you compare across two funnels. This is essentially the reverse comparison. However, visualising this is harder, since you need two funnels for the same information you would've got with exclusions, which is a good enough reason to implement exclusions, imo.

I think this use case is enough to warrant having exclusions, but for the sake of completeness, what other questions are users trying to answer? I don't know. I need your help, @kpthatsme , @paolodamico to help answer this! :)

Also, welcoming other challenges. This was a bit messy, since it's happening after I've even implemented exclusions, sorry about all this questioning!

@paolodamico
Copy link
Contributor

Hey @neilkakkar, let me give this some further thought and get back to you! I'm sure there's other questions we're missing. Also tagging @marcushyett-ph here for input who probably has some useful insights too.

@marcushyett-ph
Copy link
Contributor

@neilkakkar thanks for this great discussion - it's a little complex and hard to follow this entire thread so please correct me if I've misunderstood anything crucial.

I agree with @macobo that this is probably a standard filtering capability we'll want across the board at some point, but I'll focus my feedback specifically on the funnels use case.

I want to probe at whether there is a higher level problem people are trying to solve and whether or not we can solve that problem directly rather than something lower level.

I'll take this example funnel:

  1. View ad
  2. Play song
  3. Download song
  4. Purchase tickets

Lets say there are a 100+ other events a user can trigger in our product, we're likely to have 3 jobs we want to do using this information (linked to exclusions):

  1. Understand which other events might have "caused" a user to be successful or unsuccessful (since there's 100+ I could do this by creating 100 funnels - but if we could automate this - that'd be ideal): https://github.com/PostHog/product-internal/issues/92# <--- This issue talks about this use case in more detail
  2. I already have a "hunch" that a specific event might be "causing" a user to be unsuccessful, so I want to backtest this hypothesis by excluding it from the funnel and seeing if conversion is improved
  3. Understand short-cuts in the funnel (and potentially how we can encourage them), e.g. understand who managed to go straight from view ad to purchase tickets (and what other events they did along the way)

I think it would be great to see if we have any user feedback around this, I've heard some around 1 and 2, but I don't think I've heard a customers talk much about 3 so far? @paolodamico

@macobo
Copy link
Contributor

macobo commented Jul 15, 2021

Thanks @marcushyett-ph

For usecases 1-2, the natural way for me would be think about it is breakdowns. Being able to break down (or filter) by a virtual property like "User has done $action in past X days" or "User has done X at any time" would help answer similar questions I've had in the past, minus the automation part.

@paolodamico
Copy link
Contributor

First of all thanks @neilkakkar & @macobo for challenging this and making us think from first principles.

  • Unfortunately, checked my notes and also can't recall any further questions that users want to answer that would be particularly relevant in this context. I can bring it up in upcoming calls.
  • Re @macobo, not sure a breakdown is the right way to solve @marcushyett-ph's 1-2. If my hypothesis is for example: users who create a playlist are less likely to purchase tickets, a breakdown on the suggested virtual property would basically show me the binary outcome of either doing the event or not, which seems more confusing than treating it like a filter. This is for instance the same behavior we have for cohorts breakdown which we know has been a source of great confusion.
  • Another use case that comes to mind is wanting to exclude users for business/product reasons. For instance, I may want to exclude users who during the process of purchasing a ticket, upgraded their account. Upgraded accounts may have different paths or upselling tactics that make sense to analyze separately. A user property might not do the trick because I may particularly care if they upgraded during the process.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants