Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Startup During Content Sync with Starlight 0.31 #13050

Open
1 task
justin5267 opened this issue Jan 21, 2025 · 6 comments
Open
1 task

Slow Startup During Content Sync with Starlight 0.31 #13050

justin5267 opened this issue Jan 21, 2025 · 6 comments
Labels
- P3: minor bug An edge case that only affects very specific usage (priority) feat: content layer Related to the Content Layer feature (scope) feat: markdown Related to Markdown (scope)

Comments

@justin5267
Copy link

justin5267 commented Jan 21, 2025

What version of starlight are you using?

0.31.1

What version of astro are you using?

5.1.7

What package manager are you using?

npm

What operating system are you using?

windows

What browser are you using?

edge

Describe the Bug

Summary

After upgrading to Astro 5.1.7 and Starlight 0.31, loading content takes significantly longer compared to previous versions, and large file sizes or a high number of files can result in memory errors.

Details

Setup:

  • 800 Markdown (.md) files organized in a deeply nested folder structure:
    src/content/docs/
    ├── 1.文件夹/
    │ ├── A.文件夹/
    │ │ ├── A1.文件夹/
    │ │ │ ├── file1.md
    │ │ │ ├── file2.md
    │ │ │ └── ...
    │ │ └── fileX.md
    │ └── B.文件夹/
    │ │ ├── B1.文件夹/
    │ │ │ ├── file1.md

  • Each Markdown (.md) file has an average size of 400KB.

  • Files include Chinese filenames and use slug in their frontmatter for URL generation.

  • Memory allocation set to 16GB for Node.js.

Observed Behavior:

  • During the "content sync" phase, Node.js memory usage gradually increases until it exceeds 16GB, at which point the process crashes with a memory error.

  • Reducing the number of Markdown files to around 200 allows the development server to start, but it takes 161258ms.

Expected Behavior:

Under Astro 4.16.7 and Starlight 0.29.3, the same setup (800 Markdown files) allows the development server to start in 619ms without memory issues.

Link to Minimal Reproducible Example

n/a

Participation

  • I am willing to submit a pull request for this issue.
@delucis
Copy link
Member

delucis commented Jan 21, 2025

Thank you for the issue @justin5267!

Would you be able to share a reproduction at can look at? Although your description is quite detailed (thank you), it would take quite a bit of time for us to try to build a new project matching that description and there's no guarantee it would be exactly the same. As an example, Astro's Docs has thousands of content files and does not have the same issue, so I guess it's not just volume of content in this case

If you could share a reproduction, then we can take a look.

@justin5267
Copy link
Author

Thank you for your response!

To reproduce the issue, please follow these steps:

git clone https://github.com/justin5267/test.git
cd ./test
npm install
npm run dev

This test project was created using npx create astro, with the original project’s markdown files—sanitized for anonymity—added under src/content/docs/. The directory and heading structure are identical to the original project. After running npm run dev, the project gets stuck at "Syncing content" and fails to start properly.

@HiDeoo
Copy link
Member

HiDeoo commented Jan 23, 2025

Sharing my first observations after quickly playing with the repro:

  • This can repro with just this file from the repro
  • This can repro with the file above at the root and removing the folder structure
  • This can repro by removing non-ASCII characters from the file content
  • This can repro with the Starlight integration entirely commented out

This seems to be related to the file sizes, e.g. with the file linked above which is 3.24 MB. By progressively removing content from the file, I was able to get it to pass the sync step.

@justin5267
Copy link
Author

Sharing my first observations after quickly playing with the repro:

  • This can repro with just this file from the repro
  • This can repro with the file above at the root and removing the folder structure
  • This can repro by removing non-ASCII characters from the file content
  • This can repro with the Starlight integration entirely commented out

This seems to be related to the file sizes, e.g. with the file linked above which is 3.24 MB. By progressively removing content from the file, I was able to get it to pass the sync step.

Thank you for your thorough testing! I conducted similar tests and arrived at the same conclusions: by adjusting the number of "just for test" filler segments in the content paragraphs to reduce the overall size of the project files, the test project was indeed able to start successfully. However, this does not address the issue I initially reported regarding the excessive loading time for content in version 0.31.

In fact, under my testing conditions (with the size of the a1-1h.md file reduced to 107 KB and other files reduced to 20–50 KB, which are more typical sizes), Starlight version 0.29.3 (Astro 4.16.7) takes only 562 ms to start, while Starlight version 0.31 (Astro 5.1.8) requires 21,430 ms to start.

@justin5267 justin5267 changed the title High Memory Usage During Content Sync with Astro 5.1.7 and Starlight 0.31 Slow Startup During Content Sync with Starlight 0.31 Jan 23, 2025
@delucis
Copy link
Member

delucis commented Jan 23, 2025

Thank you for the the repro @justin5267 and for doing that initial digging @HiDeoo! I’ll move this to the main Astro repository given changes to Astro itself would presumably be required to improve this.

@delucis delucis transferred this issue from withastro/starlight Jan 23, 2025
@github-actions github-actions bot added the needs triage Issue needs to be triaged label Jan 23, 2025
@ascorbic
Copy link
Contributor

This seems to be all markdown rendering time. The reason dev is slower is because since Astro 5, markdown is parsed during content loading, rather during server rendering, so it's parsing them all. In most cases this works out a lot faster (and in builds always will be), but for these really big sites it does seem to be causing problems. I don't know if there is an actual problem with the markdown parser, because the time for those individual large files are very large. For context, we do benchmark with 10000 markdown files and it's very quick, but they are small files.

On the plus side: once it has run once, the data will be cached and subsequent starts will be quick, though it's clearly not much help if it's not able to start at all

@ascorbic ascorbic added feat: markdown Related to Markdown (scope) - P3: minor bug An edge case that only affects very specific usage (priority) feat: content layer Related to the Content Layer feature (scope) and removed needs triage Issue needs to be triaged labels Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
- P3: minor bug An edge case that only affects very specific usage (priority) feat: content layer Related to the Content Layer feature (scope) feat: markdown Related to Markdown (scope)
Projects
None yet
Development

No branches or pull requests

4 participants