-
-
Notifications
You must be signed in to change notification settings - Fork 1
Analysis-ready Parquet download? #2
Comments
Hi Mark, thank you making this issue,
While I am in principle not opposed to having other formats of the data,
Before considering something like this, I need the files to have their
‘gaps’ accounted for.
As you know, when readsb restarts for any reason (configuration change
being the most common) one readsb (let’s say -0) will go down while the
other will keep running.
Then once 0 is back 1 will go down and restart.
This will result in a few minutes of unique data for each file, which is
why they are both there.
So basically, I need to solve this problem first with the globe_history
format before moving forward.
Make sense?
…On Tue, 5 Mar 2024 at 09:47, Mark Litwintschik < ***@***.***> wrote:
I built an ETL script that turns the current download into a parquet file.
It has names for every field, is columnar-formatted so it is much quicker
to query and it is compressed with ZStandard so a d
*DuckDuckGo* removed one tracker. More
<https://duckduckgo.com/-uEgSxrlRvtp75UHOCEW7NotPVd3Be6g9xTrg0qR4_f0vQMLIEUum_juOmorLouK0Ii9oFFM9H44HytK0pZNQjIvI0221nv6wv0HbzC9G1c82W69vqPlG1i-eihth2nI0BBTbJQAbr2fxypS99oYYfA6duer_sJgmIwlnrrTq8oirkYzXX3xjDQ91LI3ee4NEP2l-h5jyMqsfXYvE36befDfOaJx18XlcIlUo-IzxS7VDpm1qOpZUVX0s6XCbwUncq217gcYezMtq8gp1owHU1XhK1e-3ACKoR9ZXRFVtojzLgK__9oBsKus2vp_GeaTiAJD3nvAvff5D--W7H60szcfLZY332MgEHS92GGKGf8QxK8DsMol2NvjXS3aKILvvKoLiM07SHOgRz9UpP5CRezWlQG2RZ_md0xsHLEj0h4mZ1f4NSPR7mciiwVL03K9VpscgtPTlTI1MiUceW98D07-vH6w2XddjKPe5jRmrNL6Xn8j75kDJZ2h8>
Report Spam
<https://duckduckgo.com/-uEgSxrlRvtp75UHOCEW7NotPVd3Be6g9xTrg0qR4_f0vQMLIEUum_juOmorLouK0Ii9oFFM9H44HytK0pZNQjIvI0221nv6wv0HbzC9G1c82W69vqPlG1i-eihth2nI0BBTbJQAbr2fxypS99oYYfA6duer_sJgmIwlnrrTq8oirkYzXX3xjDQ91LI3ee4NEP2l-h5jyMqsfXYvE36befDfOaJx18XlcIlUo-IzxS7VDpm1qOpZUVX0s6XCbwUncq217gcYezMtq8gp1owHU1XhK1e-3ACKoR9ZXRFVtojzLgK__9oBsKus2vp_GeaTiAJD3nvAvff5D--W7H60szcfLZY332MgEHS92GGKGf8QxK8DsMol2NvjXS3aKILvvKoLiM07SHOgRz9UpP5CRezWlQG2RZ_md0xsHLEj0h4mZ1f4NSPR7mciiwVL03K9VpscgtPTlTI1MiUceW98D07-vH6w2XddjKPe5jRmrNL6Xn8j75kDJZ2h8>
I built an ETL script that turns the current download into a parquet file.
It has names for every field, is columnar-formatted so it is much quicker
to query and it is compressed with ZStandard so a day's worth of data is
still around 1.2 GB.
https://tech.marksblogg.com/global-flight-tracking-adsb.html
Is there any chance the above ETL script could work its way into your
infrastructure and produce a daily Parquet file in addition to the current
daily download tar file?
—
Reply to this email directly, view it on GitHub
<#2>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAM553LXP35IMWZ74DYTJETYWWBCNAVCNFSM6AAAAABEGWDOTSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE3DQNRTG4YTGOA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hey nice blog post! :) If you're gonna make such a nice new format you should include info if the airplane is on the ground. 'altitude':
trace[3]
if str(trace[3]).strip().lower() != 'ground'
else None, I didn't see that saved anywhere. You probably already referenced it while using the data, but here is some explanation on the format: https://github.com/wiedehopf/readsb/blob/dev/README-json.md#trace-jsons Also sorry for the format, it's a bit of a mess. |
@marklit of course nothing is preventing you from tackling this project yourself and making the parquet-ready data available similar to this repo. :) |
@marklit, I've created a ClickHouse database with the data and also added ADSB-E: https://github.com/ClickHouse/adsb.exposed/ |
I built an ETL script that turns the current download into a parquet file. It has names for every field, is columnar-formatted so it is much quicker to query and it is compressed with ZStandard so a day's worth of data is still around 1.2 GB. There is also H3 indices which help filter specific geographies quickly.
https://tech.marksblogg.com/global-flight-tracking-adsb.html
Is there any chance the above ETL script could work its way into your infrastructure and produce a daily Parquet file in addition to the current daily download tar file?
The text was updated successfully, but these errors were encountered: