EDDB will soon cease operations #110

bgol · 2023-04-02T07:48:10Z

In case you didn't notice:
https://forums.frontier.co.uk/threads/eddb-a-site-about-systems-stations-commodities-and-trade-routes-in-elite-dangerous.97059/page-37#post-10114765

eyeonus · 2023-04-02T09:02:48Z

Well that's not helpful.

Meowcat285 · 2023-04-11T19:03:40Z

EDDB has now shut down, is there any plans to update TD to use something else, like inara for example?

Edit: Looks like inara doesn't seem to have a API for exporting data

eyeonus · 2023-04-11T19:08:31Z

Working on it.

Tromador · 2023-04-15T20:58:25Z

Temporarily TD is working, but uses stations and systems from the day EDDB died. That said, the first phase for server work for this change is now functionally complete. We are producing our own listings.csv and continue to produce listings-live.csv as normally. Next thing is dealing with new systems and stations as those need entirely new code to handle so they are imported correctly.

aadler · 2023-05-21T17:27:21Z

Late to the party, here, but would it be possible to pull from EDSM, Inara, or even Spansh? The first two rely on EDDN, as did EDDB. Perhaps there is a way to hook into their feed.

eyeonus · 2023-05-21T23:25:51Z

I don't think any of them have a means of obtaining bulk data. I know I looked at this back when the end of EDDB was first announced, and things didn't pan out. IIRC, one of the places I looked at, I think it was Inara, did have an API, but it was for single queries only, as in "give me the data for this station", so that wouldn't work.

I would love to be wrong about this, because figuring out how to do it ourselves sucks.

aadler · 2023-05-22T03:18:35Z

I'm not an expert in the slightest, so I don't know if this is even feasible, but can @Tromador read and aggregate EDDNs commodities feed for pricing purposes? Start with what we have now and update hourly/daily from an EDDN feed. I'm pretty sure Inara does this. Perhaps EDSM can be approached for system information. Or we can ask @spansh (I believe that's Gareth) if we can download his Data dumps for systems.

What is the major problem, not having an authoritative source for ships, modules, components?

eyeonus · 2023-05-22T03:50:54Z

I'm not an expert in the slightest, so I don't know if this is even feasible, but can @Tromador read and aggregate EDDNs commodities feed for pricing purposes?...

This is what we have now. Tromador's server runs a python script that does exactly that, that's how listings-live.csv was generated before the end of EDDB, and after the "first phase" as Tromador put it, that is also how the listings.csv file is generated as well.

For details: https://github.com/eyeonus/EDDBlink-listener

What is the major problem, not having an authoritative source for ships, modules, components?

Yes. As far as market data is concerned, we've got that covered. However, we have no means of updating TD with new anything; new commodities, new stations, any of it. (Actually I think I did make it so if new commodities show up they do get added to the DB, but I'm not certain, and I'm too lazy to look right now.)

For some things, like rare items, that's not a big problem because it isn't very often a new ship module gets added to the game, so adding them manually isn't a huge deal, even if it would be nice to have it all done automagically.

Basically, right now we can get all the information contained in an EDDN Commodities message, and process it for inclusion in the DB, but some information TD needs isn't contained in that EDDN message, and so we need to start also processing other EDDN messages in the script so that we can process that stuff too.

For example, if we want to know what star system a station is in, we need to process the docking event of a Journal message.

eyeonus · 2023-05-22T03:54:58Z

I'm not going to lie, my life is in a bit of an upheaval right now, so I haven't had time to work on this very much.

If anyone who reads this wants to take a crack at it please feel free.

aadler · 2023-05-22T04:59:23Z

Completely understood; real life comes first, second, and third. We're extremely grateful for the work you (and @Tromador and @bgol and @kfsone ) have done to make our lives both easier and more fun.

EyeMWing · 2023-12-28T04:57:15Z

Is there any interest in bringing this back? @eyeonus @Tromador in particular.

I've got the most egregious problems with eddblink_listener hammered out and running on my machine, and I'm working on a mechanism to replay the archived eddn streams to load data from between when EDDB went out and now. That should get TD up and going with old (EDDB era) star systems.

After that, I don't think it would be a very big lift to get a star system feed out of the EDDN journals. I haven't actually looked at the guts of TD to see what else it might need.

Tromador · 2023-12-28T06:18:55Z

Hi, The last position we had (bearing in mind my memory is not what it once was) was there were some issues causing threads to hang up. The database was still having issues in that it liked to grow larger, though eyeonus had done a lot of work on that area and it was miles better than it once was. There was some testing to be done to try and pin down the cause of some problems, I can't even remember now what, but my health took a downturn, my cat became diabetic, eyeonus was injured in a road accident and as nobody was asking for TD to be mended it really seemed like it was very low on the list of priorities, if indeed it was on the list at all. Assuming good data goes into the database, I think basically TD should work, I mean why not? It's problem may be it was written at a time when we were still in Beta with a lot less stations and perhaps it hasn't scaled well. It doesn't seem to matter how much in the way of memory or cpu it gets, or the speed of the storage, some queries are disgustingly slow. That said, I still maintain my belief that TD's userside query syntax is by far and away streaks ahead in the questions it can answer. I am more than willing to host a working version of the software with associated datasets for download. What I may not have the energy for is a lot of convoluted testing if wierd and wonderful bugs crop up. Cheers Trom

…

On Thu, 28 Dec 2023, 04:57 EyeMWing, ***@***.***> wrote: Is there any interest in bringing this back? @eyeonus <https://github.com/eyeonus> @Tromador <https://github.com/Tromador> in particular. I've got the most egregious problems with eddblink_listener hammered out and running on my machine, and I'm working on a mechanism to replay the archived eddn streams to load data from between when EDDB went out and now. That should get TD up and going with old (EDDB era) star systems. After that, I don't think it would be a very big lift to get a star system feed out of the EDDB journals. I haven't actually looked at the guts of TD to see what else it might need. — Reply to this email directly, view it on GitHub <#110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJJGYLEV3SIMMRST3OTTVDLYLT33PAVCNFSM6AAAAAAWQEPTGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZQHAZDEMZWGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

eyeonus · 2023-12-28T07:22:50Z

You should absolutely feel encouraged to submit a PR to either/both repositories, I would love to have some help with this stuff. That said, to the best of my knowledge Tromador's server is still running, and I long ago patched the listener to work without EDDB, so assuming that's true, TD is still up to date, at least regarding market data for the systems that existed when EDDB went down. I look forward to seeing your fixes.

…

On Wed, Dec 27, 2023, 21:57 EyeMWing ***@***.***> wrote: Is there any interest in bringing this back? @eyeonus <https://github.com/eyeonus> @Tromador <https://github.com/Tromador> in particular. I've got the most egregious problems with eddblink_listener hammered out and running on my machine, and I'm working on a mechanism to replay the archived eddn streams to load data from between when EDDB went out and now. That should get TD up and going with old (EDDB era) star systems. After that, I don't think it would be a very big lift to get a star system feed out of the EDDB journals. I haven't actually looked at the guts of TD to see what else it might need. — Reply to this email directly, view it on GitHub <#110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANHHYCHRTNCO724NU477A3YLT33PAVCNFSM6AAAAAAWQEPTGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZQHAZDEMZWGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

aadler · 2024-01-25T02:40:01Z

I note that @spansh (https://github.com/spansh), of neutron plotter fame, now has market data. Here is an example. He also has system data dumps. Should we reach out to him to see if there is anything TradeDangerous can leverage?

eyeonus · 2024-01-25T04:52:56Z

We could potentially use the dumps https://spansh.co.uk/dumps
I haven't looked at them yet, but nightly dumps is what we used from EDDB, so....

Also what ever happened to @EyeMWing? I expected to see a PR at some point from that one.

rmb4253 · 2024-01-30T14:51:40Z

I don't have the knowhow to help in this in any way but I'm so pleased that TD has not been completely forgotten. I could probably help with testing as a user though.

Clivuus · 2024-01-30T17:18:11Z

I have been trying to update TDHelper every time I play Elite Dangerous Odyssey, but it seems to have stopped updating about 7 months ago. Hopefully something will soon happen. I would also be happy to help with testing of a new and improved version.

eyeonus · 2024-01-30T21:24:38Z

TDHelper is run by somebody else, it's not something I have anything to do with.

spansh · 2024-01-31T02:26:56Z

I'm more than happy to help populate data. We have the new system dumps at https://spansh.co.uk/dumps which are purely system data (no bodies). However if you also want station data you can grab the full galaxy file though that's probably a little large for players to download. If you only care about station data you can get galaxy_stations.json.gz. That contains all known data about the every system which contains a station including player fleet carriers, I'm. parsing that for my new trade router and it takes 3-5 minutes to parse using RapidJSON. I'm not as familiar with python but there are fast JSON parsers available and if you're worried about memory usage and don't have access to a SAX/Streaming parser I've made some concessions to make it relatively easy to create a streaming parser for those files manually. If you'd like more help with this you can catch me on the EDCD Discord.

…

On Mon, 22 May 2023 at 04:18, Avraham Adler ***@***.***> wrote: I'm not an expert in the slightest, so I don't know if this is even feasible, but can @Tromador <https://github.com/Tromador> read and aggregate EDDNs commodities feed for pricing purposes? Start with what we have now and update hourly/daily from an EDDN feed. I'm pretty sure Inara does this. Perhaps EDSM can be approached for system information. Or we can ask @spansh <https://github.com/spansh> (I believe that's Gareth) if we can download his Data dumps for systems. What is the major problem, not having an authoritative source for ships, modules, components? — Reply to this email directly, view it on GitHub <#110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAZGKCN72PJYY25TEOVAH3XHLLJPANCNFSM6AAAAAAWQEPTGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Tromador · 2024-01-31T11:02:25Z

I'm more than happy to help populate data. We have the new system dumps at https://spansh.co.uk/dumps which
are purely system data (no bodies). However if you also want station data you can grab the full galaxy file though
that's probably a little large for players to download.

Thanks for the offer of support. Big files don't really scare me. Potentially we can have the server grab it and output something smaller for clients. I always used to have the server configured to grab and hold more data than the average user would download, at least by default (they could still grab it via options if they really wanted it).

I too was hoping for this PR from @EyeMWing. That said, with @spansh willing to help with a reliable data source, I am willing to run up the ED server on current code, on the assumption we can start looking again at some of the long standing issues - I mean it does work, but there were some niggles.

Assuming we do that, I would ask for patience (especially from @eyeonus 🙂) it's been a very long time since I looked at this and the brain fog from my illness and associated meds will likely have me going over old ground previously discussed as though it never happened. I know this can be a little frustrating at times, it certainly annoys me when I know my cranium isn't firing on all cylinders.

EyeMWing · 2024-01-31T17:48:33Z

I’m still here, just got pulled away from ED for a little bit by some priority work. I was actually right in the middle of trying to solution new star systems - looks like we’ve got solution for that now. I’ve got some time this evening, will pull down the dump and see about getting it parsed. Shouldn’t be too bad.Sent from my iPhoneOn Jan 31, 2024, at 6:02 AM, Stefan Morrell ***@***.***> wrote: I'm more than happy to help populate data. We have the new system dumps at https://spansh.co.uk/dumps which are purely system data (no bodies). However if you also want station data you can grab the full galaxy file though that's probably a little large for players to download. Thanks for the offer of support. Big files don't really scare me. Potentially we can have the server grab it and output something smaller for clients. I always used to have the server configured to grab and hold more data than the average user would download, at least by default (they could still grab it via options if they really wanted it). I too was hoping for this PR from @EyeMWing. That said, with @spansh willing to help with a reliable data source, I am willing to run up the ED server on current code, on the assumption we can start looking again at some of the long standing issues - I mean it does work, but there were some niggles. Assuming we do that, I would ask for patience (especially from @eyeonus 🙂) it's been a very long time since I looked at this and the brain fog from my illness and associated meds will likely have me going over old ground previously discussed as though it never happened. I know this can be a little frustrating at times, it certainly annoys me when I know my cranium isn't firing on all cylinders. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

Tromador · 2024-03-10T03:01:34Z

@EyeMWing You posted over a month ago that you had some time "this evening". Please can you have a think and honestly decide if you have the time and inclination to do this work. If you don't, that's fine, everything here is voluntary. We'll decide if/how we want to proceed without you and that's ok.
Conversely if still intend to put in your promised PR, please can you do so? It's not really fair to tell us you have solutions for these problems, you said in December that you have code running and working on your system, but never sent the PR. Perhaps if you've lost interest, you might simply send what you have so we can look it over and use it?

lanzz · 2024-03-14T18:33:53Z

I had a bit of free time today, so I put together a quick parser for @spansh's dump files. I did a (very cursory) research into fast JSON parsers and settled on csymdjson. It can ingest the 8.8GB (uncompressed size) galaxy_stations.json in about 23 seconds on my M1 Pro Macbook (without doing anything with the data, that's just load time). It does process the input line by line to avoid needing insane amounts of memory, which means it makes some assumptions about the format of the galaxy dumps, namely that each system is on a single line, and the first and last lines of the JSON are the opening and closing square brackets.

Here it is as a proof of concept:

import cysimdjson
from collections import namedtuple

DEFAULT_INPUT = 'galaxy_stations.json'

Commodity = namedtuple('Commodity', 'name,sell,buy,demand,supply,ts')

def ingest(filename):
    parser = cysimdjson.JSONParser()
    with open(filename, 'r') as f:
        f.readline()    # skip over inital open bracket
        for line in f:
            line = line.rstrip().rstrip(',')
            if line == ']':
                # end of dump
                break
            system_data = parser.loads(line)
            yield from _ingest_system_data(system_data)

def _ingest_system_data(system_data):
    for station_name, update_time, commodities in _find_markets_in_system(system_data):
        yield f'{system_data["name"]}/{station_name}', _ingest_commodities(commodities, update_time)

def _ingest_commodities(commodities, update_time):
    for category, category_commodities in commodities.items():
        yield category, _ingest_category_commodities(category_commodities, update_time)

def _ingest_category_commodities(commodities, update_time):
    for commodity, market_data in commodities.items():
        yield Commodity(
            name=commodity,
            sell=market_data["sellPrice"],
            buy=market_data["buyPrice"],
            demand=market_data["demand"],
            supply=market_data["supply"],
            ts=update_time,
        )

def _find_markets_in_system(system_data):
    for station in system_data['stations']:
        if 'Market' not in station.get('services', []):
            continue
        if not station.get('market', {}).get('commodities', []):
            continue
        yield (
            station['name'],
            station['market'].get('updateTime', None),
            _categorize_commodities(station['market']['commodities'], ),
        )

def _categorize_commodities(commodities):
    commodities_by_category = {}
    for commodity in commodities:
        commodities_by_category.setdefault(commodity['category'], {})[commodity['name']] = commodity
    return commodities_by_category

if __name__ == '__main__':
    print('#     {name:35s}  {sell:>7s}  {buy:>7s}  {demand:>10s}  {supply:>10s}  {ts}'.format(
        name='Item Name',
        sell='SellCr',
        buy='BuyCr',
        demand='Demand',
        supply='Supply',
        ts='Timestamp',
    ))
    print()
    for station_name, market in ingest(DEFAULT_INPUT):
        print(f'@ {station_name}')
        for category, commodities in market:
            print(f'   + {category}')
            for commodity in commodities:
                print('      {name:35s}  {sell:7d}  {buy:7d}  {demand:10d}  {supply:10d}  {ts}'.format(
                    name=commodity.name,
                    sell=commodity.sell,
                    buy=commodity.buy,
                    demand=commodity.demand,
                    supply=commodity.supply,
                    ts=commodity.ts,
                ))
        print()

That POC prints out the result in Trade Dangerou's prices format, but it is intended to be used to provide the data in a programmatically convenient way, so it doesn't necessarily need to pass through a conversion step, Trade Dangerous could potentially just load the prices directly from the galaxy dumps.

spansh · 2024-03-14T19:11:12Z

You can trust that assumption about the file format, I specifically formatted it that way so that people without streaming JSON parsers can roll their own easily. One option for parsing would be pysimdjson which is a hook into purportedly the fastest JSON parser there is currently.

…

On Thu, 14 Mar 2024 at 18:34, Mihail Milushev ***@***.***> wrote: I had a bit of free time today, so I put together a quick parser for @spansh <https://github.com/spansh>'s dump files. I did a (very cursory) research into fast JSON parsers and settled on csymdjson <https://pypi.org/project/cysimdjson/>. It can ingest the 8.8GB galaxy_stations.json.gz in about 23 seconds on my M1 Pro Macbook (without doing anything with the data, that's just load time). It does process the input line by line to avoid needing insane amounts of memory, which means it makes some assumptions about the format of the galaxy dumps, namely that each system is on a single line, and the first and last lines of the JSON are the opening and closing square brackets. Here it is as a proof of concept: import cysimdjsonfrom collections import namedtuple DEFAULT_INPUT = 'galaxy_stations.json' Commodity = namedtuple('Commodity', 'name,sell,buy,demand,supply,ts') def ingest(filename): parser = cysimdjson.JSONParser() with open(filename, 'r') as f: f.readline() # skip over inital open bracket for line in f: line = line.rstrip().rstrip(',') if line == ']': # end of dump break system_data = parser.loads(line) yield from _ingest_system_data(system_data) def _ingest_system_data(system_data): for station_name, update_time, commodities in _find_markets_in_system(system_data): yield f'{system_data["name"]}/{station_name}', _ingest_commodities(commodities, update_time) def _ingest_commodities(commodities, update_time): for category, category_commodities in commodities.items(): yield category, _ingest_category_commodities(category_commodities, update_time) def _ingest_category_commodities(commodities, update_time): for commodity, market_data in commodities.items(): yield Commodity( name=commodity, sell=market_data["sellPrice"], buy=market_data["buyPrice"], demand=market_data["demand"], supply=market_data["supply"], ts=update_time, ) def _find_markets_in_system(system_data): for station in system_data['stations']: if 'Market' not in station.get('services', []): continue if not station.get('market', {}).get('commodities', []): continue yield ( station['name'], station['market'].get('updateTime', None), _categorize_commodities(station['market']['commodities'], ), ) def _categorize_commodities(commodities): commodities_by_category = {} for commodity in commodities: commodities_by_category.setdefault(commodity['category'], {})[commodity['name']] = commodity return commodities_by_category if __name__ == '__main__': print('# {name:35s} {sell:>7s} {buy:>7s} {demand:>10s} {supply:>10s} {ts}'.format( name='Item Name', sell='SellCr', buy='BuyCr', demand='Demand', supply='Supply', ts='Timestamp', )) print() for station_name, market in ingest(DEFAULT_INPUT): print(f'@ {station_name}') for category, commodities in market: print(f' + {category}') for commodity in commodities: print(' {name:35s} {sell:7d} {buy:7d} {demand:10d} {supply:10d} {ts}'.format( name=commodity.name, sell=commodity.sell, buy=commodity.buy, demand=commodity.demand, supply=commodity.supply, ts=commodity.ts, )) print() That POC prints out the result in Trade Dangerou's prices format, but it is intended to be used to provide the data in a programmatically convenient way, so it doesn't necessarily need to pass through a conversion step, Trade Dangerous could potentially just load the prices directly from the galaxy dumps. — Reply to this email directly, view it on GitHub <#110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAZGKCQGHCS7BNECGAEHP3YYHUSNAVCNFSM6AAAAAAWQEPTGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYGA4DIOBSGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

lanzz · 2024-03-15T01:33:04Z

cysimdjson that I went with is also supposed to be wrapping the same underlying JSON implementation (simdjson), but I'll benchmark pysimdjson tomorrow.

lanzz · 2024-03-15T11:12:02Z

I've fixed a bug, it wasn't picking up surface stations, so now ingestion times have jumped to the 50-70 second range.
Here's the latest iteration, supporting both cysimdjson and pysimdjson:

import cysimdjson
import simdjson
import time
from collections import namedtuple

DEFAULT_INPUT = 'galaxy_stations.json'
DEFAULT_PARSER = cysimdjson.JSONParser().loads
ALT_PARSER = lambda line: simdjson.Parser().parse(line)

Commodity = namedtuple('Commodity', 'name,sell,buy,demand,supply,ts')

def ingest(filename, parser):
    """Ingest a spansh-style galaxy dump and emits a generator cascade yielding the market data."""
    with open(filename, 'r') as f:
        f.readline()    # skip over inital open bracket
        for line in f:
            line = line.rstrip().rstrip(',')
            if line == ']':
                # end of dump
                break
            system_data = parser(line)
            yield from _ingest_system_data(system_data)

def _ingest_system_data(system_data):
    for station_name, update_time, commodities in _find_markets_in_system(system_data):
        yield f'{system_data["name"].upper()}/{station_name}', _ingest_commodities(commodities, update_time)

def _ingest_commodities(commodities, update_time):
    for category, category_commodities in commodities.items():
        yield category, _ingest_category_commodities(category_commodities, update_time)

def _ingest_category_commodities(commodities, update_time):
    for commodity, market_data in commodities.items():
        yield Commodity(
            name=commodity,
            sell=market_data["sellPrice"],
            buy=market_data["buyPrice"],
            demand=market_data["demand"],
            supply=market_data["supply"],
            ts=update_time,
        )

def _find_markets_in_system(system_data):
    # look for stations in the system and on all bodies
    targets = [system_data, *system_data.get('bodies', [])]
    for target in targets:
        for station in target['stations']:
            if 'Market' not in station.get('services', []):
                continue
            if not station.get('market', {}).get('commodities', []):
                continue
            yield (
                station['name'],
                station['market'].get('updateTime', None),
                _categorize_commodities(station['market']['commodities'], ),
            )


def _categorize_commodities(commodities):
    commodities_by_category = {}
    for commodity in commodities:
        commodities_by_category.setdefault(commodity['category'], {})[commodity['name']] = commodity
    return commodities_by_category

def benchmark(filename, parser, parser_name=None, iterations=5):
    """Benchmark a JSON parser.

    Prints timing for consuming the entire stream, without doing anything with the data.
    """
    times = []
    for _ in range(iterations):
        start_ts = time.perf_counter()
        stream = ingest(filename, parser)
        for _, market in stream:
            for _, commodities in market:
                for _ in commodities:
                    pass
        end_ts = time.perf_counter()
        elapsed = end_ts - start_ts
        times.append(elapsed)
    min_time = min(times)
    avg_time = sum(times) / len(times)
    max_time = max(times)
    if parser_name is None:
        parser_name = repr(parser)
    print(f'{min_time:6.2f} {avg_time:6.2f} {max_time:6.2f}  {parser_name}')

def benchmark_parsers(filename=DEFAULT_INPUT, **parsers):
    """Benchmark all parsers passed in as keyword arguments."""
    for name, parser in parsers.items():
        benchmark(filename, parser, parser_name=name)

def convert(filename, parser=DEFAULT_PARSER):
    """Converts spansh-style galaxy dump into TradeDangerous-style prices."""
    print('#     {name:35s}  {sell:>7s}  {buy:>7s}  {demand:>10s}  {supply:>10s}  {ts}'.format(
        name='Item Name',
        sell='SellCr',
        buy='BuyCr',
        demand='Demand',
        supply='Supply',
        ts='Timestamp',
    ))
    print()
    for station_name, market in ingest(DEFAULT_INPUT, parser=parser):
        print(f'@ {station_name}')
        for category, commodities in market:
            print(f'   + {category}')
            for commodity in commodities:
                pass
                print('      {name:35s}  {sell:7d}  {buy:7d}  {demand:10d}  {supply:10d}  {ts}'.format(
                    name=commodity.name,
                    sell=commodity.sell,
                    buy=commodity.buy,
                    demand=commodity.demand,
                    supply=commodity.supply,
                    ts=commodity.ts,
                ))
        print()

if __name__ == '__main__':
    benchmark_parsers(
        cysimdjson=DEFAULT_PARSER,
        pysimdjson=ALT_PARSER,
    )

I've benchmarked them and pysimdjson seems to be noticeably faster:

# min / avg / max time
 67.54  67.71  67.81  cysimdjson
 49.94  50.86  51.97  pysimdjson

eyeonus · 2024-03-15T11:23:13Z

Very nice. Do me a flavour and submit a PR for this, formatted as an import plugin.

lanzz · 2024-03-15T11:24:43Z

Yeah, that's WIP, I was just focusing on getting the parsing logic right first 👍

bgol · 2024-03-15T13:17:38Z

You don't need the category in the price file (saves some bytes), see:
https://github.com/eyeonus/Trade-Dangerous/blob/master/tradedangerous/cache.py#L583-L586

Tromador · 2024-03-16T16:24:42Z

@lanzz Probably a stupid question but I rather ask and not need to: I presume that carriers count as "stations in the system"?

eyeonus · 2024-04-22T06:36:55Z

I did not even see that. Thanks!

kfsone · 2024-04-22T06:56:11Z

It's like the four doctors in here :) 👋 @bgol :)

bgol · 2024-04-22T07:09:46Z

Yeah, hi Oliver, nice to hear from you. Now, where is madavo? :)
(Didn't expect this issue to become on old guys chat ;)

lanzz · 2024-04-22T07:14:13Z

Sorry I haven't been following up the developments in this thread.

Yes, the reason I implemented it to generate .prices file was because that seemed like what the plugin system itself was expecting. import_cmd.run() allows the plugin to either pass control back by returning True, or to cancel the default implementation by returning anything else, so it seemed like the plugin was expected to provide some loading up to a point and then leave the actual import to the default implementation, so that's what I went with. I also did not want to reproduce lots of complexity in updating the database directly, as I couldn't immediately find any nicely reusable way to do that in the existing logic (I might not have looked too hard).

aadler · 2024-04-22T07:16:16Z

Yeah, hi Oliver, nice to hear from you. Now, where is madavo? :) (Didn't expect this issue to become on old guys chat ;)

Chiming in for the 50+ crew 👴

eyeonus · 2024-04-22T07:26:13Z

It's like the four doctors in here :) 👋 @bgol :)

Lol, only if I'm Tennant. Also bowties are not cool.

Sorry I haven't been following up the developments in this thread.

Yes, the reason I implemented it to generate .prices file was because that seemed like what the plugin system itself was expecting.

No worries. I've changed it to go directly into the database myself, we all kind of assumed that's the reason why you did that. The RAM usage was just too much for the server doing it the way you had it.

Returning False means that the plugin handled everything itself. There's no expectation either way, it's just to give the plugin author the ability to go either way.

Your work is sincerely appreciated, after all, you did the hard part, all I did was some refactoring to make it less RAM intensive.

kfsone · 2024-04-22T07:36:14Z

@lanzz the simdjson usage seemed to run into a problem I had at Super Evil recently with our python-based asset pipeline and recent optimizations in CPython so that it doesn't garbage collect so often. As you'd spotted, you had to allocate a new simdjson parser each loop or else it complained you had references. Also, the td code is full of make-a-string-with-joins because that was the optimal way to join two short strings back in those versions of python. Now it seems to be bad because with the aforementioned garbage-reduction, the likelihood python will actually have to allocate an object for the string is extra high. le sigh.

That got me to looking at ijson, and I have a local version using it that is clearly slower than simdjson for small loops, but starts catching up by the page (4kb).

(please note: I live in a near perpetual state of 'wtf python, really, why?' these days -- if any of that seems to be pointed at anything but python and/or myself, I derped at typing)

spansh · 2024-04-22T10:07:15Z

We still need to figure out how to populate the other tables. At this point, we have Added, Category, and RareItem templated, and we have Item, Station, StationItem, and System built using the spansh data. The rest are empty until some means of generating them using the galaxy_stations.json dumps via the spansh plugin and/or the EDDN messages via the listener, preferably both.

If you let me know what the missing/extra data is I can point you to the fields if they're available in the dump and/or where I normally source that data from when/if I put them into my search index if they're not.

eyeonus · 2024-04-22T16:05:01Z

@spansh
All the tables not currently being populated:

TABLE Ship:
    ship_id  = fdev_id, 
    name, 
    cost
TABLE ShipVendor: 
    ship_id = fdev_id ( ref: SHIP.ship_id ), 
    station_id = fdev_id ( ref: STATION.station_id ),
    modified = format( "YYYY-MM-DD HH:MM:SS" )
TABLE Upgrade: 
    upgrade_id = fdev_id, 
    name, 
    weight, 
    cost
TABLE UpgradeVendor: 
    upgrade_id = fdev_id ( ref: UPGRADE.upgrade_id ), 
    station_id = fdev_id ( ref: STATION.station_id ), 
    cost, 
    modified = format( "YYYY-MM-DD HH:MM:SS" )

It'd also be nice to have a way to automatically add new rares to the RareItem table:

TABLE RareItem: 
    rare_id = fdev_id, 
    station_id = fdev_id ( ref: STATION.station_id ), 
    category_id = ( ref: CATEGORY.id ), 
    name, 
    cost, 
    max_allocation, 
    illegal = ( '?', 'Y', 'N' ), 
    suppressed = ( '?', 'Y', 'N' )

Tromador · 2024-04-22T16:11:05Z

Does that make me the Tardis, as I'm hosting?

…

On Mon, 22 Apr 2024 at 07:56, Oliver Smith ***@***.***> wrote: It's like the four doctors in here :) 👋 @bgol <https://github.com/bgol> :) — Reply to this email directly, view it on GitHub <#110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJJGYLGWTO4Z33X35ZCYGYLY6SYCDAVCNFSM6AAAAAAWQEPTGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRYGYZDINRRHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Omnia dicta fortiora si dicta Latina!

kfsone · 2024-04-22T17:35:18Z

Is there a reason to prefer the human-readable/strptime datetimes? Having them as utc timestamps either int or float would make parsing and import much, much faster.

eyeonus · 2024-04-22T17:39:04Z

Historical inertia (that's how it was setup when I took over TD maintenance), and potentially backwards compatibility.

Regarding the former, I've no problems with changing it.

Regarding the latter, as long as it doesn't break anything, I've no problems changing it.

kfsone · 2024-04-22T18:57:02Z

"Your fault, you daft old fart" is perfectly fine by me :)
^- the you being me, not you, if you see what I mean, oh this is making it worse isn't it? 🤯

kfsone · 2024-04-22T19:39:29Z

@eyeonus seeing this at the moment with latest, this my fault?

Win|PS> ./trade.py import -P eddblink -O clean,solo
...
NOTE: Rebuilding cache file: this may take a few moments.
NOTE: Missing "C:\Users\oliver.smith\source\github.com\kfsone\Trade-Dangerous\tradedangerous\data\TradeDangerous.prices" file - no price data.
NOTE: Import completed.

that seems like something that shouldn't be missing at the end of an import?

kfsone · 2024-04-22T19:41:49Z

Oh, that's not the same as regenerating prices file. Duh. I used to figure that eventually the cost of generating the .prices file and stability of TD would mean we didn't have to keep generating the thing, is that all this is?

eyeonus · 2024-04-22T19:43:12Z

Nope, it's fine. It's a warning by tdb.reloadCache(), and is expected since it's a clean run.

eyeonus · 2024-04-22T19:44:50Z

Also since you did solo, it didn't download or import the listings, there's nothing to export to a prices file, and so you won't have a prices file after the command finishes, either.

eyeonus · 2024-04-22T19:46:46Z

Oh, that's not the same as regenerating prices file. Duh. I used to figure that eventually the cost of generating the .prices file and stability of TD would mean we didn't have to keep generating the thing, is that all this is?

No, it will regenerate prices IFF listings are imported, but not otherwise.

kfsone · 2024-04-23T00:14:09Z

While I'm refinding my feet, I've made a number of qol changes - at least, if you're using an IDE like PyCharm/VSCode.

I tend to configure tox so that my IDEs pick up the settings from it and I get in-ide guidance and refactoring options.

I've also introduced a little bit of flair to make watching it do its import thing a little less tedious, but I'm trying to stagger how I do it so that there's always an easy way to dump the new presentation. This is what happens when I've been watching Sebastian Lague videos lately https://www.youtube.com/watch?v=SO83KQuuZvg ... but it's probably also going to be nice for end-users too.

Recording.2024-04-22.171344.mp4

These are currently in my kfsone/cleanup-pass branch.

kfsone · 2024-04-23T04:19:25Z

prices.py -> dumpPrices: "oliver waz 'ere" write large... took me a while to realize that somehow if you don't capture them, the cursor has the rows ready for you to iterate on. nasty. BAD SMITH.

kfsone · 2024-04-23T05:19:10Z

I'm doing some tuning of the tox config, it doesn't seem like we were actually running a "just flake8" run, or we weren't really using it? I got it enabled in my test branch. It should be fast (demo from pycharm)

Recording.2024-04-22.221802.mp4

eyeonus · 2024-04-23T19:56:54Z

Oh, that's not the same as regenerating prices file. Duh. I used to figure that eventually the cost of generating the .prices file and stability of TD would mean we didn't have to keep generating the thing, is that all this is?

No, it will regenerate prices IFF listings are imported, but not otherwise.

Also, both the eddblink and spansh plugins use the existence of that file to determine if the database needs to be built.
(i.e., if it doesn't exist, assume starting from scratch)

kfsone · 2024-04-23T20:49:08Z

I was checking a few of the 1MR posts/videos about how they tackle it in Python. We don't have 1 billion but it's not that dissimilar to what we do. Discovering that just using "rb" and doing byte-related operations was a bit of a stunner, but it's annoying trying to switch large tracts of code from utf8-to-bytes. However, it can provide a 4-8x speed up.

For instance, we count the number of lines in some of our files so we can do a progress bar, right? If the file is 86mb that takes ~250ms.

Just using "rb" gets that down to 50ms and a little use of fixed-size buffering gets it down to 41ms.

https://gist.github.com/kfsone/dcb0d7811570e40e73136a14c23bf128

eyeonus · 2024-04-24T00:37:14Z

Faster is good. I like faster.

aadler · 2024-04-24T17:30:17Z

I was checking a few of the 1MR posts/videos about how they tackle it in Python. We don't have 1 billion but it's not that dissimilar to what we do. Discovering that just using "rb" and doing byte-related operations was a bit of a stunner, but it's annoying trying to switch large tracts of code from utf8-to-bytes. However, it can provide a 4-8x speed up.

For instance, we count the number of lines in some of our files so we can do a progress bar, right? If the file is 86mb that takes ~250ms.

Just using "rb" gets that down to 50ms and a little use of fixed-size buffering gets it down to 41ms.

https://gist.github.com/kfsone/dcb0d7811570e40e73136a14c23bf128

See https://stackoverflow.com/a/27518377/2726543

kfsone · 2024-04-25T03:45:18Z

I was checking a few of the 1MR posts/videos about how they tackle it in Python. We don't have 1 billion but it's not that dissimilar to what we do. Discovering that just using "rb" and doing byte-related operations was a bit of a stunner, but it's annoying trying to switch large tracts of code from utf8-to-bytes. However, it can provide a 4-8x speed up.
For instance, we count the number of lines in some of our files so we can do a progress bar, right? If the file is 86mb that takes ~250ms.
Just using "rb" gets that down to 50ms and a little use of fixed-size buffering gets it down to 41ms.
https://gist.github.com/kfsone/dcb0d7811570e40e73136a14c23bf128

See https://stackoverflow.com/a/27518377/2726543

ooorrrrr.... https://github.com/KingFisherSoftware/traderusty/ :)

I'm thinking I should have called it "tradedangersy" since rusty projects like to end with "rs" and python with "y" :)

kfsone · 2024-04-25T03:46:40Z

Don't read too much into that - an excuse to try a rust-python extension in anger (see https://github.com/kfsone/rumao3) and see how much pain setting up pypi and everything was (it wasn't). And I'm not sure eyeonus is like to want a second language adding to the problem :)

kfsone · 2024-04-26T00:57:49Z

@Tromador is listings.csv guaranteed to be in station,item order? I think I can optimize by doing a lock-step walk thru the database and listings (you create two generators, one with database entries the other with listing entries, and you keep advancing the one that is "behind"; if the listings one runs out, you stop; if the database one runs out you just don't need to compare)

eyeonus · 2024-04-26T03:00:07Z

@Tromador is listings.csv guaranteed to be in station,item order?

Yes, both the listings.csv and listings-livs.csv are guaranteed to be sorted by station_id, item_id

eyeonus · 2024-04-26T03:01:29Z

https://github.com/eyeonus/TradeDangerous-listener/blob/e5899ccae44f832aec088e4c113e82b95da29d3b/tradedangerous_listener.py#L934

https://github.com/eyeonus/TradeDangerous-listener/blob/e5899ccae44f832aec088e4c113e82b95da29d3b/tradedangerous_listener.py#L1024

EDDB will soon cease operations #110

EDDB will soon cease operations #110

Comments

bgol commented Apr 2, 2023

eyeonus commented Apr 2, 2023

Meowcat285 commented Apr 11, 2023 • edited Loading

eyeonus commented Apr 11, 2023

Tromador commented Apr 15, 2023

aadler commented May 21, 2023

eyeonus commented May 21, 2023

aadler commented May 22, 2023

eyeonus commented May 22, 2023 • edited Loading

eyeonus commented May 22, 2023

aadler commented May 22, 2023

EyeMWing commented Dec 28, 2023 • edited Loading

Tromador commented Dec 28, 2023 via email

eyeonus commented Dec 28, 2023 via email • edited Loading

aadler commented Jan 25, 2024

eyeonus commented Jan 25, 2024

rmb4253 commented Jan 30, 2024

Clivuus commented Jan 30, 2024

eyeonus commented Jan 30, 2024

spansh commented Jan 31, 2024 via email

Tromador commented Jan 31, 2024

EyeMWing commented Jan 31, 2024 via email

Tromador commented Mar 10, 2024

lanzz commented Mar 14, 2024 • edited Loading

spansh commented Mar 14, 2024 via email

lanzz commented Mar 15, 2024

lanzz commented Mar 15, 2024

eyeonus commented Mar 15, 2024

lanzz commented Mar 15, 2024

bgol commented Mar 15, 2024 • edited Loading

Tromador commented Mar 16, 2024

eyeonus commented Apr 22, 2024

kfsone commented Apr 22, 2024

bgol commented Apr 22, 2024

lanzz commented Apr 22, 2024

aadler commented Apr 22, 2024

eyeonus commented Apr 22, 2024

kfsone commented Apr 22, 2024 • edited Loading

spansh commented Apr 22, 2024

eyeonus commented Apr 22, 2024 • edited Loading

Tromador commented Apr 22, 2024 via email

kfsone commented Apr 22, 2024

eyeonus commented Apr 22, 2024 • edited Loading

kfsone commented Apr 22, 2024

kfsone commented Apr 22, 2024

kfsone commented Apr 22, 2024

eyeonus commented Apr 22, 2024

eyeonus commented Apr 22, 2024

eyeonus commented Apr 22, 2024

kfsone commented Apr 23, 2024 • edited Loading

kfsone commented Apr 23, 2024

kfsone commented Apr 23, 2024

eyeonus commented Apr 23, 2024

kfsone commented Apr 23, 2024

eyeonus commented Apr 24, 2024

aadler commented Apr 24, 2024

kfsone commented Apr 25, 2024

kfsone commented Apr 25, 2024

kfsone commented Apr 26, 2024

eyeonus commented Apr 26, 2024

eyeonus commented Apr 26, 2024

Meowcat285 commented Apr 11, 2023 •

edited

Loading

eyeonus commented May 22, 2023 •

edited

Loading

EyeMWing commented Dec 28, 2023 •

edited

Loading

eyeonus commented Dec 28, 2023 via email •

edited

Loading

lanzz commented Mar 14, 2024 •

edited

Loading

bgol commented Mar 15, 2024 •

edited

Loading

kfsone commented Apr 22, 2024 •

edited

Loading

eyeonus commented Apr 22, 2024 •

edited

Loading

eyeonus commented Apr 22, 2024 •

edited

Loading

kfsone commented Apr 23, 2024 •

edited

Loading