Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address ambiguity for priority/score field names and additional priority score features #27

Closed
lucasoares opened this issue Jun 27, 2023 · 9 comments · Fixed by #29
Closed
Labels
enhancement New feature or request

Comments

@lucasoares
Copy link
Member

lucasoares commented Jun 27, 2023

Deckard currently assigns the default priority to the current timestamp when the message is added to the queue. While this works for most cases, some users may prefer to use their own priority algorithm.

To avoid breaking changes with protocol buffer default value, we should not allow users to set a priority = 0 and should keep the current timestamp implementation for this case.

Users will be able to set a negative priority if they want to.

Additionally, I will create new fields called priority_increase and priority_decrease for ACK/NACK requests, as the current score_subtract field is ambiguous. Increasing the priority means lowering its score (that's why score_subtract have this name).

UPDATE: these fields will not be created (#27 (comment))

We will also have two different fields for priority filtering on PULL requests. Currently, we have a priority_filter field and it is also ambiguous (currently filter messages to have a max score of now-score_subtract) I suggest the creation of two new fields: max_score and min_score:

The max_score and min_score fields will be introduced as part of the priority filtering mechanism in the system. Here's an explanation of what they will mean:

  • max_score: sets the upper threshold for the priority score of a message to be returned in the pull request. Only messages with a priority score equal to or lower than the max_score value will be returned.

  • min_score: sets the lower threshold for the priority score required for a message to be returned. Only messages with a priority score equal to or higher than the min_score value will be returned.

I'm not entirely sure if these are the best names for the fields because a lower priority score actually indicates a higher priority. Suggestions?

I'm keeping both changes in the same issue as they will change the exact same code, will be easier to implement it together

@lucasoares lucasoares added the enhancement New feature or request label Jun 27, 2023
@lucasoares lucasoares changed the title Allow set priority on message insertion Priority changes to allow more customizations Jun 27, 2023
@lucasoares lucasoares changed the title Priority changes to allow more customizations Adddress ambiguity for priority/score field names and allow assign the priority while adding a message Jun 27, 2023
@andreminelli
Copy link

Well, if I have understood right, score and priority both refer to same concept in Deckard, so I suggest to name it uniquely throughout the codebase, to avoid confusion...

And sadly both score and priority have the same interpretation problem: it is not clear if lower values should be processed first or last. IMO the word which best express the whole concept would be rank - lower values always come first on any ranking classification (at least I never saw anything different of that😅)

@lucasoares
Copy link
Member Author

lucasoares commented Jun 28, 2023

Not exactly... A score is the numerical value that represents the priority of a message. In our case, Deckard is a priority queue where each element has its own score, indicating its priority.

However, rank would refer to the position of each message in the priority list, following a sequential order. Although I like the term "rank," I believe it may lead to confusion, so changing it might not significantly improve clarity.

But you're correct, I think keeping only one in our API interface is better. I think everywhere we used score so far. Maybe I will change to with score_increase and score_decrease instead, where score_increase will add a value to the current score and then lower the priority of the message.

If we revisit the concept, every priority queue has its own score (sometimes referred to as a priority value). I haven't searched much now to get the reference we used, but I remember seeing the description used in the book The Art of Multiprocessor Programming, Revised Reprint:

A priority queue is a multiset of items, where each item has an associated priority, a score that indicates its importance (by convention, smaller scores are more important, indicating a higher priority). A priority queue typically provides an add() method to add an item to the set, and a removeMin() method to remove and return the item of minimal score (highest priority). Priority queues appear everywhere from high-level applications to low-level operating system kernels.

We initially built this project using Redis as the foundation, and their sorted sets always order elements based on the lowest value. This approach is common in other projects I've researched as well at the time. For instance, Netflix's launched Timestone after some months we created Deckard and they also assigns the highest priority to the element with the lowest score.

What do you think?

@cezar-tech
Copy link
Contributor

cezar-tech commented Jun 28, 2023

The idea of setting a priority during a ack/nack request seems fine to me.

Regarding the inverse rationality between score and priority, this can be understandable, I think we should focus more on how we handle the score's boundaries, since this is what actually results in behavioral changes in the app. It can be done in 2 ways the way I see it and each one has its own caveat:

Using 0 as a boundary:

  • the outcome of using a directly proportional (score=priority) interpretation => for every item you put in the queue, there can always be another one to top it off in prioirity up to "LONG_MAX" sort of threshold based on whatever number registers Redis has.
  • The inverse is achievable by using the reversely proportional logic, you can "infinetly" sort lower priority items, but the top (0) will have them all on a draw since that would be the max value

Unbounded Score for positive and negative directions:

I would not advise going into negative numbers, since it can lead to people mathematically manipulating them with subtractions in a wrong way, like adding a neg number resulting in a lower one. Which can lead to hard debugging or unexpected behaviors for the a given deckard use case but totally correct in mathematic terms.

Even then, it could be done with all cares taken and it would allow one to virtually handle priorities draws infinitely on both directions, but I find it hard to be a necessary use case, so I would't back this up.

@cezar-tech
Copy link
Contributor

Adressing the specific issue of the inverse logic between score X filter, I have always used the inverse interpretation, seems to be more academically accurate based on cientific definitions of a priority queue, as quoted by @lucasoares .

@cezar-tech
Copy link
Contributor

Regrading the window filter request, seems ok and tottaly fine to me.

@lucasoares
Copy link
Member Author

lucasoares commented Jun 28, 2023

Agreed. I think it's best to set the highest priority score as 0 and the lowest as MAX_NUMBER, which we can define ourselves or use Redis' definition of the maximum allowed score for a sorted set.

As stated in the Redis documentation, a sorted set score is:

The score values should be the string representation of a double precision floating point number. +inf and -inf values are valid values as well.

And in more details:

Redis sorted sets use a double 64-bit floating point number to represent the score. In all the architectures we support, this is represented as an IEEE 754 floating point number, that is able to represent precisely integer numbers between -(2^53) and +(2^53) included. In more practical terms, all the integers between -9007199254740992 and 9007199254740992 are perfectly representable. Larger integers, or fractions, are internally represented in exponential form, so it is possible that you get only an approximation of the decimal number, or of the very big integer, that you set as score.

We currently have the scores represented in Deckard as float64 but we only used integer values so far, we never used decimals for any score.

What do you think?

Using 9007199254740992 as our max score wil make sure Deckard keeps working for more 285,899.5994 years in our default score algorithm, that's nice :D

@lucasoares lucasoares changed the title Adddress ambiguity for priority/score field names and allow assign the priority while adding a message Address ambiguity for priority/score field names and allow assign the priority while adding a message Jun 28, 2023
@andreminelli
Copy link

andreminelli commented Jun 29, 2023

Not exactly... A score is the numerical value that represents the priority of a message. In our case, Deckard is a priority queue where each element has its own score, indicating its priority.

Got it.

However, rank would refer to the position of each message in the priority list, following a sequential order. Although I like the term "rank," I believe it may lead to confusion, so changing it might not significantly improve clarity.

Well remembered: the strict sequential ordering in a rank, without 'holes' between each element, would 'break expectations' in a large number of scenarios, indeed.

@cezar-tech , I agree that most of the time lower number means higher priorities, but it's not universal (at least Scala and C++ use the higher-ordering - I have found this by looking at this article).
And I also agree that the current definition is understandable - you just have to look into documentation. My tentative here was to find a name able to state clearly the 'priority direction' without the need to check docs - which I have failed completely 😜

But you're correct, I think keeping only one in our API interface is better. I think everywhere we used score so far. Maybe I will change to with score_increase and score_decrease instead, where score_increase will add a value to the current score and then lower the priority of the message.

Any solution with consistent naming and behavior would do fine in my opinion: either using priority (so parameters should be named priority_increase, max_priority, etc) and then priority_increase would change the score accordingly (decreasing the score value in this case); or using 'score' (so parameters should be named score_increase, max_score, etc) and score_increase could just make a sum.

If we revisit the concept, every priority queue has its own score (sometimes referred to as a priority value). I haven't searched much now to get the reference we used, but I remember seeing the description used in the book The Art of Multiprocessor Programming, Revised Reprint:

A priority queue is a multiset of items, where each item has an associated priority, a score that indicates its importance (by convention, smaller scores are more important, indicating a higher priority). A priority queue typically provides an add() method to add an item to the set, and a removeMin() method to remove and return the item of minimal score (highest priority). Priority queues appear everywhere from high-level applications to low-level operating system kernels.

We initially built this project using Redis as the foundation, and their sorted sets always order elements based on the lowest value. This approach is common in other projects I've researched as well at the time. For instance, Netflix's launched Timestone after some months we created Deckard and they also assigns the highest priority to the element with the lowest score.

I think the 'priority ordering' should be clearly stated in documentation (I didn't make a 'deep dive' in docs but I haven't seeing anything about this on wiki pages yet), to avoid any doubt for both users and contributors (for instance any replacement for Redis - as discussion #26 has started - should keep the same ordering convention).

@lucasoares
Copy link
Member Author

Fine by me then! Let's stick with score. We will only use the term 'priority' in our documentation to indicate 'higher priority' or 'lower priority,' but always with the clear understanding that a lower score means a higher priority.

Another question... If we let users set the score when doing an ACK, are fields score_increase and score_decrease (and old score_subtract really necessary? With this approach, the application can set the score by itself. Applications can use a custom algorithm or the default (timestamp) and then increase/decrease their scores on the application side.

And we would only implement:

  • max_score and min_score filtering for Pull
  • score for AddMessage
  • score for Ack

@cezar-tech
Copy link
Contributor

cezar-tech commented Jun 29, 2023

@lucasoares maybe just use one field for setting the absolute score value, no delta-like operations of add and subtracting, just let them set the score they want the message do have may yield more certain, less error-prone behavior and usability.

I mean: just use a score field for the ack operation and let that be the new score. No adding or Subtracting

@lucasoares lucasoares changed the title Address ambiguity for priority/score field names and allow assign the priority while adding a message Address ambiguity for priority/score field names and allow assign the priority while adding/acking a message Jun 29, 2023
@lucasoares lucasoares changed the title Address ambiguity for priority/score field names and allow assign the priority while adding/acking a message Address ambiguity for priority/score field names and additional priority score features Jun 29, 2023
lucasoares added a commit that referenced this issue Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants