Integrate Lemmy Explorer for Community Search #508

Fmstrat · 2023-07-31T10:49:37Z

Is your feature request related to a problem? Please describe.
Related: tgxn/lemmy-explorer#137

LE is a web crawler for communities that works very well. Hopefully the above is accepted as a path forward and we can directly leverage the great work they are doing via an API. If not, and if there is interest here, I can set up a containerized API service that pulls the Redis data dump nightly to provide access to the app for Thunder users.

Describe the solution you'd like
A search that finds communities no matter where they are, directly built into Thunder.

Describe alternatives you've considered
Using LE outside of Thunder.

Fmstrat · 2023-07-31T11:45:58Z

There is also the Data library file: https://data.lemmyverse.net/data/community.full.json

Which contains ~18M of data on all communities. Two options present themselves outside of direct integration:

Nightly pulls of the JSON, with an API that searches
If a user chooses "search all" in communities, it displays a modal letting them know it's pre-downloading community data for a faster experience, and just download the JSON locally for searching.

hjiangsu · 2023-08-01T20:34:34Z

This is a duplicate of #14 - I'll close this to keep things clean but feel free to let me know if you think they are separate issues @Fmstrat!

tgxn · 2023-08-07T09:08:34Z

There is also the Data library file: https://data.lemmyverse.net/data/community.full.json

Which contains ~18M of data on all communities. Two options present themselves outside of direct integration:
1. Nightly pulls of the JSON, with an API that searches

2. If a user chooses "search all" in communities, it displays a modal letting them know it's pre-downloading community data for a faster experience, and just download the JSON locally for searching.

I would love for you to use the data dumps, I wanted to pre-load the processing for my frontend, so I wouldn't need to have any backend server infrastructure (other than the the crawler). That's why they are so big.
I don't really plan on hosting a dedicated search api, but I could probably whip something simple up that people could host in docker or something...

For my frontend, I split the data into chunks of ~150 items, and then load them all in parallel, with TanStack.

Undocumented, but I have the chunked data on the data page too:
https://data.lemmyverse.net/data/community.json will give you the count of chunks, and then the chunks are https://data.lemmyverse.net/data/community/0.json.

Fmstrat · 2023-08-07T11:20:27Z

@hjiangsu Since we have the Lemmyverse Dev participating in this issue (in response to our discussion), should we reopen this and close #14 instead?

@tgxn First off, thanks for joining the conversation! My one concern with the loaded data dump is size over time (especially in Flutter). I'm not sure how the static file will hold up over years of new instances and communities. While it may be fine, have you done any load testing in JS with hundreds of thousands of data points yet? (If not we could generate and see).

hjiangsu · 2023-08-07T19:16:13Z

Opened - feel free to continue discussion here!

tgxn · 2023-10-07T17:25:43Z

@tgxn First off, thanks for joining the conversation! My one concern with the loaded data dump is size over time (especially in Flutter). I'm not sure how the static file will hold up over years of new instances and communities. While it may be fine, have you done any load testing in JS with hundreds of thousands of data points yet? (If not we could generate and see).

hey sorry I am not the best at replying or following thing up in reasonable amount of time 😂

i reckon it could be an issue for communities at the least. i found it's a looot of data, especially if you want to let people search by stuff in the descriptions. I have to split it apart and do as much processing on the crawler as I can, and even then, it's ~14MB or something.

I did a few load tests with different ways of splitting or compressing it, but i've still got work to do on that.

as for suggestions, I would only bundle a minified bundle of community names and instances (and think about if you reallly need this), and maybe offer a way for users to "fetch new data" - which could download the chunked version of the data from lemmyverse.

Fmstrat added the enhancement New feature or request label Jul 31, 2023

hjiangsu closed this as completed Aug 1, 2023

tgxn mentioned this issue Aug 7, 2023

Feature: API layer for direct App integration tgxn/lemmy-explorer#137

Closed

hjiangsu reopened this Aug 7, 2023

hjiangsu mentioned this issue Aug 7, 2023

Integration with Lemmyverse #14

Closed

micahmo mentioned this issue Oct 4, 2023

Add instance pipeline #799

Merged

3 tasks

This was referenced Oct 8, 2023

[Feature]: integrate with browse.feddit.de for community subscriber counts Memmy-App/memmy#972

Open

Sort by Published tgxn/lemmy-explorer#146

Merged

hjiangsu mentioned this issue Feb 27, 2024

Add instance explorer #1133

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Lemmy Explorer for Community Search #508

Integrate Lemmy Explorer for Community Search #508

Fmstrat commented Jul 31, 2023

Fmstrat commented Jul 31, 2023

hjiangsu commented Aug 1, 2023

tgxn commented Aug 7, 2023

Fmstrat commented Aug 7, 2023

hjiangsu commented Aug 7, 2023

tgxn commented Oct 7, 2023 •

edited

Loading

Integrate Lemmy Explorer for Community Search #508

Integrate Lemmy Explorer for Community Search #508

Comments

Fmstrat commented Jul 31, 2023

Fmstrat commented Jul 31, 2023

hjiangsu commented Aug 1, 2023

tgxn commented Aug 7, 2023

Fmstrat commented Aug 7, 2023

hjiangsu commented Aug 7, 2023

tgxn commented Oct 7, 2023 • edited Loading

tgxn commented Oct 7, 2023 •

edited

Loading