Fatal error in Airnode feed #207

metobom · 2024-01-31T09:16:03Z

Nodary Airnode feed process failed with this error and because we read config from raw Github URL and change the deployment file's location (from candidate-deployments to the active-deployments) after the deployment, CF tried to redeploy the app but wget kept throwing 404 error.

CF tried to redeploy the app but wget kept throwing 404 error.

As a solution to this, I will update CF EntryPoint in a way to try candidate-deployments path first if fails try active-deployments.

@Siegrift for visibility.

The text was updated successfully, but these errors were encountered:

Siegrift · 2024-01-31T09:55:32Z

The error message doesn't tell much :/

I couldn't find related issues for Check failed: is_clonable_js_type || is_clonable_wasm_type.. Do we have access to the Nodary Airnode feed to check some metrics? Maybe it was some CPU/memory spike or memory leak.

In regards to Github URLs we need to make sure that the production ones are immutable.

(I've assigned both of us on this issue for now)

metobom · 2024-01-31T11:32:50Z

Do we have access to the Nodary Airnode feed to check some metrics?

Unfortunately, we don't have any metrics.

bdrhn9 · 2024-01-31T12:43:46Z

Do we have access to the Nodary Airnode feed to check some metrics?

For debugging purpose, I activated ECS CloudWatch Container Insights to collect metrics from the containers. If we experience the issue again, it will be helpful.

It's charged extra so it wouldn't be enabled by default. But it's easy to enable it in CF template:

"AppCluster": {
      "Type": "AWS::ECS::Cluster",
      "Properties": {
        "ClusterName": "AirnodeFeedCluster-<SOME_ID>",
+       "ClusterSettings": [
+         {
+           "Name": "containerInsights",
+           "Value": "enabled"
+         }
+       ]
      }
    }

Siegrift · 2024-01-31T12:48:11Z

We should be able to see at least CPU and memory usage, but when I was stress testing the container it was able to handle the load so I'd be surprised if it was caused by this. Hopefully, we will be able to reproduce it again with more insights.

metobom · 2024-02-01T22:51:57Z

The same error occurred in TwelveData's deployment too.

Siegrift · 2024-02-02T11:02:25Z

Thanks, for reference this is the error in Grafana.

The service seems operational again after AWS restart, so I suspect there is some memory leak. I will try to reproduce it and fix it.

Siegrift · 2024-02-02T11:11:26Z

Btw. it seems that the message in Grafana is trimmed out. E.g. the error message pasted in this issue contains more information and suggests race condition inside Node.js.

bdrhn9 · 2024-02-02T11:59:28Z

I was mistaken, the metrics are still in place. Here is a snapshot of them. I expected to see a gradual increase in memory usage, but it seems that's not the case.

Siegrift · 2024-02-03T11:34:13Z

Happened again with Finage.

Siegrift · 2024-02-03T12:23:42Z

I created an issue on Node.js repo nodejs/node#51652 and hope someone responds.

An idea would be to try migrating to a different Node.js image (or version). Especially, there are some mentions to use the Slim package instead of Alpine.

metobom · 2024-02-04T06:33:57Z

It happend to coinpaprika too. One possibly useful information is, it happens with the Airnode feeds that include more data feeds.

bbenligiray · 2024-02-05T15:40:05Z

I wonder if this will happen with a configuration that excludes the Grafana log shipping stuff

aquarat · 2024-02-05T16:35:31Z

An idea would be to try migrating to a different Node.js image (or version). Especially, there are some mentions to use the Slim package instead of Alpine.

A good idea. AFAIK they use different C libraries 👌 This definitely looks like a runtime issue.

aquarat · 2024-02-06T09:47:31Z

I'm currently trying to recreate this issue by simulating lots of feeds.

vponline · 2024-02-06T10:32:07Z

It was already mentioned but doesn't seem to be related to memory. I managed to limit RAM for a local airnode-feed and make it crash due to out of memory and the error looks different:

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

You can reproduce this by using this script in package.json:

"dev": "NODE_OPTIONS=--max-old-space-size=100 nodemon --ext ts,js,json,env  --exec \"pnpm ts-node src/index.ts\"",

aquarat · 2024-02-06T14:13:18Z

I've been running it locally (docker images built from main on amd64, both intervals set to 0) with 15 000 feeds and so far it's been fine. I had to increase the memory (tentatively gave it 2048m). I'll let it run for a few hours and see what happens.

With so many feeds I've noticed there's a bottleneck when running post-processing, so it could be useful to maybe put that logic in a worker thread in future.

aquarat · 2024-02-06T17:52:44Z

So it's been running locally now with 15 000 feeds for about 4 hours and it hasn't died - so it may be something specific to AWS or the RAM allocation (which affects processor resources). I'll try with reduced RAM.

aquarat · 2024-02-07T07:59:33Z

It ran overnight with 15000 feeds and a reduced-speed CPU to try and simulate resource constraints. It still didn't crash, so I'm thinking even more that this may be specific to AWS.

I'm now running it with fuzzed responses from the data-provider API: randomly every 3rd API response is corrupted and every 2nd response is delayed by 0 to 3000 ms. I'll let it run like this for a few hours.

aquarat · 2024-02-09T09:45:28Z

It's been running for three days, 15k feeds, some fuzzing and it's still running, no crashes, so... this is a hard bug to trace 😆

metobom · 2024-02-10T09:00:45Z

It happened again in TwelveData's Airnode feed.

aquarat · 2024-02-12T10:32:40Z

My local instance eventually crashed because it ran out of log space (400 GBs) - so I haven't been able to recreate this locally. Upgrading to Node 20 may help.

Siegrift · 2024-03-07T08:53:27Z

Let's close this one, otherwise it's going to remain on the board forever. After crashing, the service restarts so we are not affected much by this as of now.

We've tried using a different Node image (didn't help) and upgraded Node version (not confirmed whether it helps).

aquarat · 2024-03-15T09:57:23Z

Has this happened again with the updated Node version? Just curious.

bbenligiray · 2024-03-20T14:19:03Z

I think it did

metobom · 2024-03-21T09:51:06Z

API providers' current deployments are 0.5.1 and Node 20 is used in 0.6.0.

Siegrift assigned Siegrift and metobom Jan 31, 2024

Siegrift added the bug Something isn't working label Feb 2, 2024

Siegrift added the on hold We do not plan to address this at the moment label Feb 3, 2024

Siegrift unassigned metobom Feb 7, 2024

This was referenced Feb 8, 2024

Try using node-slim Docker image instead of alpine #223

Merged

Upgrade to Node@20 #224

Closed

Siegrift closed this as completed Mar 7, 2024

Siegrift added wontfix This will not be worked on and removed on hold We do not plan to address this at the moment labels Mar 7, 2024

bbenligiray mentioned this issue Mar 9, 2024

Support TWAP at the Airnode feed side #48

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fatal error in Airnode feed #207

Fatal error in Airnode feed #207

metobom commented Jan 31, 2024

Siegrift commented Jan 31, 2024 •

edited

Loading

metobom commented Jan 31, 2024

bdrhn9 commented Jan 31, 2024 •

edited

Loading

Siegrift commented Jan 31, 2024

metobom commented Feb 1, 2024

Siegrift commented Feb 2, 2024

Siegrift commented Feb 2, 2024

bdrhn9 commented Feb 2, 2024

Siegrift commented Feb 3, 2024 •

edited

Loading

Siegrift commented Feb 3, 2024 •

edited

Loading

metobom commented Feb 4, 2024

bbenligiray commented Feb 5, 2024

aquarat commented Feb 5, 2024

aquarat commented Feb 6, 2024

vponline commented Feb 6, 2024 •

edited

Loading

aquarat commented Feb 6, 2024 •

edited

Loading

aquarat commented Feb 6, 2024

aquarat commented Feb 7, 2024 •

edited

Loading

aquarat commented Feb 9, 2024

metobom commented Feb 10, 2024

aquarat commented Feb 12, 2024

Siegrift commented Mar 7, 2024

aquarat commented Mar 15, 2024 •

edited

Loading

bbenligiray commented Mar 20, 2024

metobom commented Mar 21, 2024

Fatal error in Airnode feed #207

Fatal error in Airnode feed #207

Comments

metobom commented Jan 31, 2024

Siegrift commented Jan 31, 2024 • edited Loading

metobom commented Jan 31, 2024

bdrhn9 commented Jan 31, 2024 • edited Loading

Siegrift commented Jan 31, 2024

metobom commented Feb 1, 2024

Siegrift commented Feb 2, 2024

Siegrift commented Feb 2, 2024

bdrhn9 commented Feb 2, 2024

Siegrift commented Feb 3, 2024 • edited Loading

Siegrift commented Feb 3, 2024 • edited Loading

metobom commented Feb 4, 2024

bbenligiray commented Feb 5, 2024

aquarat commented Feb 5, 2024

aquarat commented Feb 6, 2024

vponline commented Feb 6, 2024 • edited Loading

aquarat commented Feb 6, 2024 • edited Loading

aquarat commented Feb 6, 2024

aquarat commented Feb 7, 2024 • edited Loading

aquarat commented Feb 9, 2024

metobom commented Feb 10, 2024

aquarat commented Feb 12, 2024

Siegrift commented Mar 7, 2024

aquarat commented Mar 15, 2024 • edited Loading

bbenligiray commented Mar 20, 2024

metobom commented Mar 21, 2024

Siegrift commented Jan 31, 2024 •

edited

Loading

bdrhn9 commented Jan 31, 2024 •

edited

Loading

Siegrift commented Feb 3, 2024 •

edited

Loading

Siegrift commented Feb 3, 2024 •

edited

Loading

vponline commented Feb 6, 2024 •

edited

Loading

aquarat commented Feb 6, 2024 •

edited

Loading

aquarat commented Feb 7, 2024 •

edited

Loading

aquarat commented Mar 15, 2024 •

edited

Loading