Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Registry UI #450

Closed
ghost opened this issue May 6, 2024 · 15 comments
Closed

RFC: Registry UI #450

ghost opened this issue May 6, 2024 · 15 comments
Labels
enhancement New feature or request

Comments

@ghost
Copy link

ghost commented May 6, 2024

Abstract

The registry needs a user interface to display documentation to users and make it searchable. This RFC will detail the technical implementation of the registry UI for the purposes of making a decision.

Design considerations

During the preliminary discussions, we have outlined the following criteria:

  1. The registry must only expose properly licensed files. Providers that do not exhibit an OSI-approved license should not have their documentation exposed.
  2. The UI should be updated alongside the registry itself. If a new version of a provider or module is added to the registry, the documentation should be updated too.
  3. The UI should have an almost instantaneous feel to it, users should prefer to use it for documentation.
  4. The UI should be indexed by search engines.
  5. The UI should require very little maintenance.
  6. The UI must make sure that the markdown parser does not allow in HTML tags that could potentially be dangerous (e.g. script tags). It may be preferable to not allow HTML tags at all.

Background research

To evaluate the size of the registry, I have downloaded about half of all providers present in the OpenTofu registry. The docs folders amounted to roughly 6 GB of data and 850k files. By extrapolation, this would mean that the registry UI would be roughly 10-15 GB in size and contain 1.5-2M files if hosted statically, assuming one markdown file corresponds to one HTML file. Packing all files related to a provider/version is not feasible because some providers (AWS, Azure, etc) contain hundreds of files and would negatively impact loading times.

Front-end technology

When it comes to front-end technology, we have three options:

  1. Static files with minimal JS for search functionality. This is the simplest solution and the least likely to break. It also allows for heavy optimization for loading times.
  2. Server-side rendering using Cloudflare workers and caching. This is more complex, hard to test, but saves us from having to pre-generate HTML files and potentially reupload the entire registry UI if we change the layout.
  3. SPA loading from GitHub. The raw.githubusercontent.com domain has CORS headers set, so this would be feasible. We would only need to index the repositories.
  4. SPA that loads markdown files from an R2 bucket. This would require us to upload the markdown files and keep them in sync.
  5. XML data files with XSLT style sheets. It's a remnant from a bygone era, but browsers support it and Google indexes it. This would allow us to change the layout as we please without having to recreate all HTML files but still enjoy the benefits of static files. It would cause one additional request to load the XSLT file.

Given the performance and search indexing requirements above, solution 1 seems to be the most appropriate, although solution 3 is definitely intriguing because it would require very little maintenance.

Front-end optimization

HTTP/2 promised that it would make front-end loading times faster with Server Push, but it has several problems and hasn't gained wide adoption. It has since been removed from browsers. The alternatives to server push require extra round trips.

It is worth noting that using external sources, such as linked CSS files may cause caching issues across such a vast number of HTML files and therefore, CSS files should either be inlined or strictly URL-versioned.

One of the alternatives that we should make use of is inlining critical resources, such as CSS files. Since the registry UI can be fairly minimalistic in its appearance, inlining resources is feasible. Incidentally, this technique also bypasses caching issues with linked resources.

Build process

In order to build the HTML files, we should create a Go library that does the following:

  1. Perform a sparse clone of the given registry. It should also provide the option to use an already existing clone for bulk updates.
  2. Sparse checkout the LICENSE/LICENSE.txt/LICENSE.md files, as well as the docs/ and/or website/docs folders.
  3. Perform a license detection. I used go-license-detector in the past and it seems satisfactory.
  4. Render the markdown. Hugo uses Goldmark because it's extensible in how it parses and renders HTML. Hugo also uses Chroma for pre-generated syntax highlighting, which is beneficial for performance. For safety, we may want to re-parse the HTML and sanitize for disallowed tags and attributes. We should also make sure we add CSP meta tags to prevent some classes of attacks.
  5. Embed the existing HTML into our template.
  6. Upload the resulting HTML files to an R2 bucket, cleaning out any files that should no longer be there. We can do this externally, or build it into the application.

For development purposes, we may want to provide a separate binary that lets provider authors start this in their local development version.

Costs

Based on the R2 pricing, the initial upload and any full refresh will cost us 5-10 USD and the storage will cost us $0.015 per month.

@ghost ghost added the enhancement New feature or request label May 6, 2024
@cube2222
Copy link
Contributor

cube2222 commented May 6, 2024

I'm personally not a huge fan of pre-generating all pages, as then each change we do to the frontend will:

  1. Take a while to deploy, as it has to download 10 GB worth of markdown files, regenerated them, and then upload them.
  2. The above will cost us 10 USD for all the operations involved, every time we make a change.

I would instead heavily suggest going with either

  • javascript frontend rendering based on markdown files fetched from our API
  • rendering a fully-static page dynamically via cloudflare workers, then caching it heavily

Slightly relevant blog post: https://blog.cloudflare.com/serverless-rendering-with-cloudflare-workers/

Though I personally believe a simple javascript frontend that fetches the markdown files and renders them is the most boring solution.

Static generation also heavily limits our flexibility for the future, like wanting to add new dynamically fetched components, which I strongly dislike.

@DicsyDel
Copy link

DicsyDel commented May 6, 2024

To consider: Documentation is not 1 version per module/provider, it needs to be available for each module / provider version. So, it's much, much more than 10GB.

@ghost
Copy link
Author

ghost commented May 6, 2024

@DicsyDel I downloaded all versions for the experiment.

image

@matteoredaelli
Copy link

Ok for a web UI but it would be also useful a command line option for searching providers like

tofu providers search postgres

Matteo

@ghost
Copy link
Author

ghost commented Jun 25, 2024

@matteoredaelli thank you for your input. Since the CLI search is a separate effort, please open a separate issue for it.

@flickerfly
Copy link

Is it intended that the UI is an app that uses an API that the registry has or built into the registry and can not be separated from it?

@ghost
Copy link
Author

ghost commented Jul 11, 2024

@flickerfly the current UI in development indexes the registry data in the registry repo and uploads it to a separate R2 bucket, which is consumed by a React frontend. Why do you ask?

@flickerfly
Copy link

flickerfly commented Jul 11, 2024

I'm interested in the registry as a tool to enable air-gapped and scripted installs/updates of infrastructure. In that case, I wouldn't need and would consider the UI to introduce potential additional vulnerabilities I'd like to not track. Keeping them as isolated capabilities would enhance my registry experience in these disconnected spaces.

@ghost
Copy link
Author

ghost commented Jul 11, 2024

As indicated in the other thread, the OpenTofu registry is not suitable for that. Please look for one of the many self-hosted Terraform/OpenTofu registry implementations.

@NoelJacob
Copy link

UI code?

@ghost
Copy link
Author

ghost commented Jul 16, 2024

@NoelJacob I'm sorry, I don't understand your question.

@gedw99
Copy link

gedw99 commented Aug 7, 2024

I dont know what @NoelJacob thumb means exactly either, but I would vote for a golang with htmx style GUI system rather than react. a ton of golang devs are using it to escape the Web GUI has gotten way too complex problem, and we want control back.

https://htmx.org/examples/ is your curious.
A single javscripts include is all thats needed.

Here is a golang todo that uses htmx:

Here is a golang todo using a htmx also:

Things that I like:

  • no compilation needed. Its just ESM module include and everything runs.

  • golang devs can work on backend and frontend. Same set of skills for everything.

  • htmx GUI examples are everywhere and dont depend on ANY GUI framework. https://shoelace.style is widely used for example.

  • htmx is a nice mix of dynamic and static. You can do both with it. Its well battle proven. V2 is simple to use. Progressive enhancement of the GUI is all 100% driven from the server. No split brain stuff like with React - All state is on the server and no where else.

No need for S3, etc you can run the Backend and Frontend in this repo. Your keeping state in 1 place. On the server. You can cache using Caddy. Basically it's very standard golang stack.

You can sue bleve or zinc for search and indexing if you want.

@ghost
Copy link
Author

ghost commented Aug 7, 2024

Hey @gedw99 thank you for your input. The registry UI is almost complete, we plan to release it in the next few weeks. Given that we have a generous sponsorship from Cloudflare, we are using the R2 buckets to host everything and GitHub Actions to generate static files.

@gedw99
Copy link

gedw99 commented Aug 7, 2024

Ah makes sense then .

no worries . S3 and statics is a decent start.

I run golang as WASM on Cloudflare that does htmx too . It works well , using https://github.com/syumai/workers

@abstractionfactory
Copy link
Contributor

Closing this as https://search.opentofu.org is now available in beta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants