Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Lemmy Explorer for Community Search #508

Open
Fmstrat opened this issue Jul 31, 2023 · 6 comments
Open

Integrate Lemmy Explorer for Community Search #508

Fmstrat opened this issue Jul 31, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@Fmstrat
Copy link
Contributor

Fmstrat commented Jul 31, 2023

Is your feature request related to a problem? Please describe.
Related: tgxn/lemmy-explorer#137

LE is a web crawler for communities that works very well. Hopefully the above is accepted as a path forward and we can directly leverage the great work they are doing via an API. If not, and if there is interest here, I can set up a containerized API service that pulls the Redis data dump nightly to provide access to the app for Thunder users.

Describe the solution you'd like
A search that finds communities no matter where they are, directly built into Thunder.

Describe alternatives you've considered
Using LE outside of Thunder.

@Fmstrat Fmstrat added the enhancement New feature or request label Jul 31, 2023
@Fmstrat
Copy link
Contributor Author

Fmstrat commented Jul 31, 2023

There is also the Data library file: https://data.lemmyverse.net/data/community.full.json

Which contains ~18M of data on all communities. Two options present themselves outside of direct integration:

  1. Nightly pulls of the JSON, with an API that searches
  2. If a user chooses "search all" in communities, it displays a modal letting them know it's pre-downloading community data for a faster experience, and just download the JSON locally for searching.

@hjiangsu
Copy link
Member

hjiangsu commented Aug 1, 2023

This is a duplicate of #14 - I'll close this to keep things clean but feel free to let me know if you think they are separate issues @Fmstrat!

@hjiangsu hjiangsu closed this as completed Aug 1, 2023
@tgxn
Copy link

tgxn commented Aug 7, 2023

There is also the Data library file: https://data.lemmyverse.net/data/community.full.json

Which contains ~18M of data on all communities. Two options present themselves outside of direct integration:

1. Nightly pulls of the JSON, with an API that searches

2. If a user chooses "search all" in communities, it displays a modal letting them know it's pre-downloading community data for a faster experience, and just download the JSON locally for searching.

I would love for you to use the data dumps, I wanted to pre-load the processing for my frontend, so I wouldn't need to have any backend server infrastructure (other than the the crawler). That's why they are so big.
I don't really plan on hosting a dedicated search api, but I could probably whip something simple up that people could host in docker or something...

For my frontend, I split the data into chunks of ~150 items, and then load them all in parallel, with TanStack.

Undocumented, but I have the chunked data on the data page too:
https://data.lemmyverse.net/data/community.json will give you the count of chunks, and then the chunks are https://data.lemmyverse.net/data/community/0.json.

@Fmstrat
Copy link
Contributor Author

Fmstrat commented Aug 7, 2023

@hjiangsu Since we have the Lemmyverse Dev participating in this issue (in response to our discussion), should we reopen this and close #14 instead?

@tgxn First off, thanks for joining the conversation! My one concern with the loaded data dump is size over time (especially in Flutter). I'm not sure how the static file will hold up over years of new instances and communities. While it may be fine, have you done any load testing in JS with hundreds of thousands of data points yet? (If not we could generate and see).

@hjiangsu hjiangsu reopened this Aug 7, 2023
@hjiangsu
Copy link
Member

hjiangsu commented Aug 7, 2023

Opened - feel free to continue discussion here!

@tgxn
Copy link

tgxn commented Oct 7, 2023

@tgxn First off, thanks for joining the conversation! My one concern with the loaded data dump is size over time (especially in Flutter). I'm not sure how the static file will hold up over years of new instances and communities. While it may be fine, have you done any load testing in JS with hundreds of thousands of data points yet? (If not we could generate and see).

hey sorry I am not the best at replying or following thing up in reasonable amount of time 😂

i reckon it could be an issue for communities at the least. i found it's a looot of data, especially if you want to let people search by stuff in the descriptions. I have to split it apart and do as much processing on the crawler as I can, and even then, it's ~14MB or something.

I did a few load tests with different ways of splitting or compressing it, but i've still got work to do on that.

as for suggestions, I would only bundle a minified bundle of community names and instances (and think about if you reallly need this), and maybe offer a way for users to "fetch new data" - which could download the chunked version of the data from lemmyverse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants