What are the criteria for embedded content vs. links #36

petersilva · 2021-05-24T00:00:52Z

The original encodings which motivated this work always used links, and never embedded content in the messages, as that has been used for many years in Canada. This was considered an option for future implementation, but had so far been deemed unnecessary. Discussion in the ET-CTS in 2019 led to the "Content" field being added to the message format to include the actual message data in the MQP message literally. This was originally intended to help with long latency, low bandwidth satellite links, a fairly special case where it does make alot of sense.

Some feel that embedding the content within the message is a universal accellerator. Generally speaking, we strive to have transfer protocols that encourage as much parallelization as possible as that maximizes the opportunities for speedups. It needs to be appreciated that a queue is a serialization of a kind, and that it slows down posting of files, because each one needs to be read from disk to be included in the message, and the content will then be in the message flow. With pub/sub, a subscriber performs client-side filtering to exclude messages they are not interested in. The larger messages will make the receipt of each message slower, and increase queueing, when compared to a message stream with no embedding. It is also well understood that:

MQP brokers are optimized to transfer large numbers of small messages, They don't achieve great overall transfer rates, in terms of a data transfer protocol, they are not a good choice for large volumes.
large messages cause memory management issues and can slow down brokers as a whole, and are not recommended for any MQP.
if large data is to be embedded, we will likely need to implement message segmentation method, which have so far avoided. Past WMO segmentation was complicated, and so there is little appetite for it.

If we want to avoid segmentation, then we need to establish a maximum message size for data to be embedded. That size has to be supported by protocols efficiently.

So we have one case: If every message in an publication stream is of interest to the subscriber, and the messages are all "small" then a completely embedded stream will perform better than third party downloads, assuming the entire stream bandwidth can be supported. Another option to reduce client-side filtering is to have separate channels for warnings. These are guaranteed not to be busy, and we could simply elect to embed everything sent on such channels.

On the other hand, it is well understood if an advertised stream includes numerical weather prediction outputs, satellite and RADAR imagery, as well as observations, that pushing large data through MQP channels will be awful. Such channels will universally experience queueing, and the warnings may be stuck behind them.

Using standard protocols such as https and sftp allows us to achieve much higher total transfer bandwidth rates, without taxing the MQP brokers doing something they are not intended for (large scale data transfers.)

Another benefit of out of band, non-mqp transfers is that they use protocols which are in very wide use with many opportunities for use of Content distribution networks, web accelleration appliances, etc... for which the analogue would be to have third parties implementing brokers, a far more cumbersome prospect.

summary. Embedding:

increases the size of messages. MQP works best on small messages.
slows down the posting of messages as the content needs to be serialized into each one.
slows down receipt of messages, delaying start of content transfers... reducing parallelism.
slows down receipt of messages much more when there is a lot of client side filtering.
makes throughput of brokers much more important.
is more in line with GTS and Aviation messaging traditions.
reduces ability to leverage more standard transfer protocol accellerations particularly available for HTTP.
may force us to consider segmentation.

petersilva · 2021-05-24T14:31:01Z

thought experiment picking ugly numbers for illustrative purposes:

If we have a maximum message size, then experience with WMO indicates the average message size will converge to about half of the maximum... (when it was 14K, we observed a message size of 7K on our links... but there was segmentation in the story... who knows how it will change in the new methods.) If we pick 8M as a new max (8MB is the max I am hearing in Aviation circles), then imagine 4M becomes the average. A message without payload is about If we are transferring 100 messages per second. without payload, assume a message is 512 bytes. so at any given message transfer rate, the size difference is 8190:1 ... so if we can support 100 messages/second coming in over the message protocol, then we should get 0.02 messages/second with embedded messages. Or you need the message protocol to run 8000 times faster, or some combination.

It becomes a question of how good are brokers at doing actual data transfer (as opposed to switching.)
The traditional answer is: not very good. But I fully get that people intuitively think it should be faster, and in many cases it should be, but in a lot of other common cases, I expect it to turn out to be counter-productive.

golfvert · 2021-05-25T13:07:33Z

With my understanding, brokers are intended to work best with "small" messages. Everyone will agree that a 1GB file is not small. On the other end of the spectrum, 1kB is small. So, a range of 1 to 1000000 between the two. For brokers with the anticipated workload, can we consider 100kB, 1MB being "small"? I agree we need to run some experiment.

petersilva · 2021-05-25T14:14:10Z

So far, in the committee, we had proposed 4KB as a definition of "small enough to be embedded" and the strategy was just to embed everything smaller than that. I wanted 512 bytes... but in the interest of consensus, we have been using 4KB.

I expect that CAP messages are typically in the 100KB to 500KB range, so about 50x-100x larger... I think the content agnostic approach (embedding all messages smaller than an embedding threshold) is no longer reasonable at that size, as it will slow things down unacceptably for a large number of cases.

We could get more sophisticated, and embed bigger messages as long as they are sufficiently rare and important, but that means we need to understand what we are sending, as opposed to being content agnostic as we have so far succeeded in being.

eliot-christian · 2021-05-27T13:50:33Z

In my experience, CAP messages are typically quite small.. My guess is that they average about 1,000 bytes (1 K).

To my mind, sudden onset emergencies provide the primary Use Case for embedding CAP alerts in a message queue. In this case, seconds can be the difference between a life-saving alert and an alert that arrives too late. Examples of sudden-onset emergencies include earthquake-early-warning (alerts trying to outrace the earthquake wave propagation) and its analogues in tsunami, flash floods, landslides, volcanic eruption, space weather, et al. Also qualifying as sudden-onset emergencies are public safety matters such as 'active shooter' situations.

We should bear in mind that CAP alerts are often handled without human mediation, as in the triggering of sirens, traffic signals, bridge and tunnel gates, etc,

petersilva · 2021-05-27T17:45:18Z

You can see Canadian CAP here for the last few weeks: https://dd.weather.gc.ca/alerts/cap/

Looking at today, for one of our seven storm prediction centres, we have issued about a dozen warnings, most of them are in the 200KB to 400KB range. about 1/3 are in the 18KB range. None are smaller than that. Those are from ECCC (Environment and Climate Change Canada ... current name for the met service's parent organization.) Having a look at all Canadian ones, I had a look here:

https://alertsarchive.pelmorex.com/en.php

ECCC ones are the vast majority of CAP produced, but I found some others, and none of the alerts from other organizations were smaller than 5KB.

I just looked a bit more at a CAP, and from ECCC, the digital signature alone is 3KB.

petersilva · 2021-05-27T18:18:23Z

On the other hand, if people are ok with messages < 4KB being embedded, that is the committee working hypothesis anyways, and if folks agree that that convention includes CAP (ie. we do not expect CAP message bigger than 4KB to be embedded.) then there is already complete agreement.

eliot-christian · 2021-05-27T20:30:06Z

You are correct that these are huge CAP alerts.

…

On Thu, May 27, 2021 at 1:45 PM Peter Silva ***@***.***> wrote: You can see Canadian CAP here for the last few weeks: https://dd.weather.gc.ca/alerts/cap/ Looking at today, for one of our seven storm prediction centres, we have issued about a dozen warnings, most of them are in the 200KB to 400KB range. about 1/3 are in the 18KB range. None are smaller than that. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#36 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFSZR3GPPFGSIX4W2LXV3LDTP2ALZANCNFSM45MGY74Q> .

petersilva · 2022-07-12T13:14:12Z

further discussion here: https://github.com/wmo-im/wis2-notification-message

amilan17 · 2023-09-12T07:48:50Z

decision in: wmo-im/wis2-notification-message#6

eliot-christian · 2023-09-12T10:11:45Z

I agree with those on this thread who assert that the link is a MUST in all cases and inline content is a MAY. From a law and policy perspective, the CAP alert as originally published is the legal, public document that must be maintained as the permanent record of what the alerting authority sent to the public for the emergency. Therefore, the link to that CAP alert is of the essence and not merely a convenience. I draw attention in this regard to the new CEN Workshop Agreement: "Requirements and recommendations for social media early warning messages in crisis and disaster management". It includes this requirement: "To ensure consistency, clarity and completeness of the messages disseminated simultaneously across different channels, the social media early warning messages and related notifications shall refer to the persistent unique URL of the CAP message." Eliot Christian

…

On Tue, Sep 12, 2023 at 3:49 AM Anna Milan ***@***.***> wrote: decision in: wmo-im/wis2-notification-message#6 <wmo-im/wis2-notification-message#6> — Reply to this email directly, view it on GitHub <#36 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFSZR3ESFJLYUHKD2JWPTU3X2AHWZANCNFSM45MGY74Q> . You are receiving this because you commented.Message ID: ***@***.***>

petersilva mentioned this issue Jun 9, 2021

Maximum Message Size #44

Closed

petersilva closed this as completed Jul 12, 2022

petersilva mentioned this issue Jul 12, 2022

consider importing old issues from GTStoWIS2 for information purposes wmo-im/wis2-notification-message#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are the criteria for embedded content vs. links #36

What are the criteria for embedded content vs. links #36

petersilva commented May 24, 2021 •

edited

Loading

petersilva commented May 24, 2021 •

edited

Loading

golfvert commented May 25, 2021

petersilva commented May 25, 2021

eliot-christian commented May 27, 2021

petersilva commented May 27, 2021 •

edited

Loading

petersilva commented May 27, 2021

eliot-christian commented May 27, 2021 via email

petersilva commented Jul 12, 2022

amilan17 commented Sep 12, 2023

eliot-christian commented Sep 12, 2023 via email

What are the criteria for embedded content vs. links #36

What are the criteria for embedded content vs. links #36

Comments

petersilva commented May 24, 2021 • edited Loading

petersilva commented May 24, 2021 • edited Loading

golfvert commented May 25, 2021

petersilva commented May 25, 2021

eliot-christian commented May 27, 2021

petersilva commented May 27, 2021 • edited Loading

petersilva commented May 27, 2021

eliot-christian commented May 27, 2021 via email

petersilva commented Jul 12, 2022

amilan17 commented Sep 12, 2023

eliot-christian commented Sep 12, 2023 via email

petersilva commented May 24, 2021 •

edited

Loading

petersilva commented May 24, 2021 •

edited

Loading

petersilva commented May 27, 2021 •

edited

Loading