Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the criteria for embedded content vs. links #36

Closed
petersilva opened this issue May 24, 2021 · 10 comments
Closed

What are the criteria for embedded content vs. links #36

petersilva opened this issue May 24, 2021 · 10 comments

Comments

@petersilva
Copy link
Contributor

petersilva commented May 24, 2021

The original encodings which motivated this work always used links, and never embedded content in the messages, as that has been used for many years in Canada. This was considered an option for future implementation, but had so far been deemed unnecessary. Discussion in the ET-CTS in 2019 led to the "Content" field being added to the message format to include the actual message data in the MQP message literally. This was originally intended to help with long latency, low bandwidth satellite links, a fairly special case where it does make alot of sense.

Some feel that embedding the content within the message is a universal accellerator. Generally speaking, we strive to have transfer protocols that encourage as much parallelization as possible as that maximizes the opportunities for speedups. It needs to be appreciated that a queue is a serialization of a kind, and that it slows down posting of files, because each one needs to be read from disk to be included in the message, and the content will then be in the message flow. With pub/sub, a subscriber performs client-side filtering to exclude messages they are not interested in. The larger messages will make the receipt of each message slower, and increase queueing, when compared to a message stream with no embedding. It is also well understood that:

  • MQP brokers are optimized to transfer large numbers of small messages, They don't achieve great overall transfer rates, in terms of a data transfer protocol, they are not a good choice for large volumes.
  • large messages cause memory management issues and can slow down brokers as a whole, and are not recommended for any MQP.
  • if large data is to be embedded, we will likely need to implement message segmentation method, which have so far avoided. Past WMO segmentation was complicated, and so there is little appetite for it.

If we want to avoid segmentation, then we need to establish a maximum message size for data to be embedded. That size has to be supported by protocols efficiently.

So we have one case: If every message in an publication stream is of interest to the subscriber, and the messages are all "small" then a completely embedded stream will perform better than third party downloads, assuming the entire stream bandwidth can be supported. Another option to reduce client-side filtering is to have separate channels for warnings. These are guaranteed not to be busy, and we could simply elect to embed everything sent on such channels.

On the other hand, it is well understood if an advertised stream includes numerical weather prediction outputs, satellite and RADAR imagery, as well as observations, that pushing large data through MQP channels will be awful. Such channels will universally experience queueing, and the warnings may be stuck behind them.

Using standard protocols such as https and sftp allows us to achieve much higher total transfer bandwidth rates, without taxing the MQP brokers doing something they are not intended for (large scale data transfers.)

Another benefit of out of band, non-mqp transfers is that they use protocols which are in very wide use with many opportunities for use of Content distribution networks, web accelleration appliances, etc... for which the analogue would be to have third parties implementing brokers, a far more cumbersome prospect.

summary. Embedding:

  • increases the size of messages. MQP works best on small messages.
  • slows down the posting of messages as the content needs to be serialized into each one.
  • slows down receipt of messages, delaying start of content transfers... reducing parallelism.
  • slows down receipt of messages much more when there is a lot of client side filtering.
  • makes throughput of brokers much more important.
  • is more in line with GTS and Aviation messaging traditions.
  • reduces ability to leverage more standard transfer protocol accellerations particularly available for HTTP.
  • may force us to consider segmentation.
@petersilva
Copy link
Contributor Author

petersilva commented May 24, 2021

thought experiment picking ugly numbers for illustrative purposes:

If we have a maximum message size, then experience with WMO indicates the average message size will converge to about half of the maximum... (when it was 14K, we observed a message size of 7K on our links... but there was segmentation in the story... who knows how it will change in the new methods.) If we pick 8M as a new max (8MB is the max I am hearing in Aviation circles), then imagine 4M becomes the average. A message without payload is about If we are transferring 100 messages per second. without payload, assume a message is 512 bytes. so at any given message transfer rate, the size difference is 8190:1 ... so if we can support 100 messages/second coming in over the message protocol, then we should get 0.02 messages/second with embedded messages. Or you need the message protocol to run 8000 times faster, or some combination.

It becomes a question of how good are brokers at doing actual data transfer (as opposed to switching.)
The traditional answer is: not very good. But I fully get that people intuitively think it should be faster, and in many cases it should be, but in a lot of other common cases, I expect it to turn out to be counter-productive.

@golfvert
Copy link

With my understanding, brokers are intended to work best with "small" messages. Everyone will agree that a 1GB file is not small. On the other end of the spectrum, 1kB is small. So, a range of 1 to 1000000 between the two. For brokers with the anticipated workload, can we consider 100kB, 1MB being "small"? I agree we need to run some experiment.

@petersilva
Copy link
Contributor Author

So far, in the committee, we had proposed 4KB as a definition of "small enough to be embedded" and the strategy was just to embed everything smaller than that. I wanted 512 bytes... but in the interest of consensus, we have been using 4KB.

I expect that CAP messages are typically in the 100KB to 500KB range, so about 50x-100x larger... I think the content agnostic approach (embedding all messages smaller than an embedding threshold) is no longer reasonable at that size, as it will slow things down unacceptably for a large number of cases.

We could get more sophisticated, and embed bigger messages as long as they are sufficiently rare and important, but that means we need to understand what we are sending, as opposed to being content agnostic as we have so far succeeded in being.

@eliot-christian
Copy link

In my experience, CAP messages are typically quite small.. My guess is that they average about 1,000 bytes (1 K).

To my mind, sudden onset emergencies provide the primary Use Case for embedding CAP alerts in a message queue. In this case, seconds can be the difference between a life-saving alert and an alert that arrives too late. Examples of sudden-onset emergencies include earthquake-early-warning (alerts trying to outrace the earthquake wave propagation) and its analogues in tsunami, flash floods, landslides, volcanic eruption, space weather, et al. Also qualifying as sudden-onset emergencies are public safety matters such as 'active shooter' situations.

We should bear in mind that CAP alerts are often handled without human mediation, as in the triggering of sirens, traffic signals, bridge and tunnel gates, etc,

@petersilva
Copy link
Contributor Author

petersilva commented May 27, 2021

You can see Canadian CAP here for the last few weeks: https://dd.weather.gc.ca/alerts/cap/

Looking at today, for one of our seven storm prediction centres, we have issued about a dozen warnings, most of them are in the 200KB to 400KB range. about 1/3 are in the 18KB range. None are smaller than that. Those are from ECCC (Environment and Climate Change Canada ... current name for the met service's parent organization.) Having a look at all Canadian ones, I had a look here:

https://alertsarchive.pelmorex.com/en.php

ECCC ones are the vast majority of CAP produced, but I found some others, and none of the alerts from other organizations were smaller than 5KB.

I just looked a bit more at a CAP, and from ECCC, the digital signature alone is 3KB.

@petersilva
Copy link
Contributor Author

On the other hand, if people are ok with messages < 4KB being embedded, that is the committee working hypothesis anyways, and if folks agree that that convention includes CAP (ie. we do not expect CAP message bigger than 4KB to be embedded.) then there is already complete agreement.

@eliot-christian
Copy link

eliot-christian commented May 27, 2021 via email

@petersilva
Copy link
Contributor Author

further discussion here: https://github.com/wmo-im/wis2-notification-message

@amilan17
Copy link
Member

decision in: wmo-im/wis2-notification-message#6

@eliot-christian
Copy link

eliot-christian commented Sep 12, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants