"Clean" URLs without ending slash cause redirects to "Cleaner" URLs with ending slash #81

forbytten · 2025-03-09T11:01:31Z

Hi,

Soupault generates clean URLs without a trailing slash, resulting in a redirect for each requested URL from, for example, "/blog" to "/blog/". Both https://soupault.app and https://baturin.org contain URLs of the former type, as does the blueprints blog.

Impact:

Potential performance impact from unnecessary requests.
Potential SEO penalty.

Details

The reference manual (https://soupault.app/reference-manual/#clean-urls) states:

Soupault uses clean URLs by default. If you add a page to site/, for example, site/about.html, it will turn into build/about/index.html so that it can be accessed as https://mysite.example.com/about.

That statement seems to imply that a request for htts://mysite.example.com/about should resolve to https://mysite.example.com/about/index.html and both https://soupault.app and the blueprints blog contain links without a trailing slash, like https://soupault.app/blog in the header.

However, those URLs cause a redirect in a number of web servers, including those used by:

https://soupault.app
https://baturin.org
Gitlab Pages sites , which is the provider I am using.

I have attached screenshots from different tools showing how https://soupault.app/blog is being redirected to https://soupault.app/blog/. I have used multiple tools because it was confusing at first, as the response body of the 301 Moved Permanently is actually the same response body as the subsequent 200 OK response.

Burp Suite Community Edition: the redirect response is 10247 bytes, seemingly containing a gzipped response that Burp doesn't bother unzipping since it was a 301 but the subsequent 200 OK response is 44185 bytes and unzipped by Burp.
Chromium seems to throw away the 301 redirect body so the 301 response is only 226 B.
Firefox seems to download both responses so they are both about about 10 kB.

The redirect can also be seen with curl:

curl with explicit --no-location option so it does not follow the redirect:

curl -i --no-location --no-progress-meter https://soupault.app/blog |head -20
HTTP/2 301
accept-ranges: bytes
age: 1004
cache-control: public,max-age=0,must-revalidate
cache-status: "Netlify Edge"; hit
content-type: text/html; charset=UTF-8
date: Sun, 09 Mar 2025 09:53:19 GMT
etag: "122509aae14b19098ca372918952081b-ssl"
location: /blog/
server: Netlify
strict-transport-security: max-age=31536000
x-nf-request-id: 01JNX55GXJ0T9XZ9PREDS2HFQT
content-length: 43771

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta name="generator" content="soupault 4.11.0">
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">

curl with --location option so it does follow the redirect. The -i option makes it
show the response headers for both the original request and the second request:

curl -i --location --no-progress-meter https://soupault.app/blog |head -30
 HTTP/2 301
 accept-ranges: bytes
 age: 1067
 cache-control: public,max-age=0,must-revalidate
 cache-status: "Netlify Edge"; hit
 content-type: text/html; charset=UTF-8
 date: Sun, 09 Mar 2025 09:54:22 GMT
 etag: "122509aae14b19098ca372918952081b-ssl"
 location: /blog/
 server: Netlify
 strict-transport-security: max-age=31536000
 x-nf-request-id: 01JNX57E809J8AEDZV6TD5M1E0
 content-length: 43771

 HTTP/2 200
 accept-ranges: bytes
 age: 0
 cache-control: public,max-age=0,must-revalidate
 cache-status: "Netlify Edge"; fwd=miss
 content-type: text/html; charset=UTF-8
 date: Sun, 09 Mar 2025 09:54:22 GMT
 etag: "122509aae14b19098ca372918952081b-ssl"
 server: Netlify
 strict-transport-security: max-age=31536000
 x-nf-request-id: 01JNX57E9K6KJ8XJ535SV68ZPV
 content-length: 43771

 <!DOCTYPE html>
 <html lang="en">
  <head>

Impact

Potential performance impact from unnecessary requests, the degree of impact depending on the end user's environment, the weight of the requested page and the behavior and configuration of the site's webserver.
Potential SEO penalty. I'm unsure of this because SEO is confusing but:
1. Google's Search Console explicitly complained about the redirects on two of my URLs. I don't know why it didn't complain about the others but I have generally had trouble with Google not indexing a bunch of pages. Notably, Gitlab pages, which I use, does a 302 Found redirect instead of a 301 Moved Permanently and Google describes the former as providing a weak signal that the redirect target should be canonical: https://developers.google.com/search/docs/crawling-indexing/301-redirects
2. Some other random SEO info on the internet seems to also suggest that 301 Found should only be used for temporary redirects as it will not pass SEO value to the target URL.

Potential Remediation

I was able to fix the issue in my blog without needing any changes from Soupault itself. I have attached the equivalent diff if applied to the blueprint blog, generated by git diff > clean-url-issue/soupault-blueprints-blog.patch and updated to use the latest Soupault 4.11.0. However, fixes are potentially better made to Soupault itself but I am unsure. Some potential schools of thought:

Some URLs are generated by Soupault's indexing, and used in templates such as "{{e.url}} and {{t.url}} so it seems reasonable to expect the URL to already have a trailing slash?
Changing Soupault's generated URLs may break existing blogs, resulting in two slashes being added.
If Soupault's generated URLs were changed to end in a slash, then the code in the blueprint blog would need a bit more care to edit with two conventions to be followed:
1. If using a URL generated by Soupault, don't add a trailing slash in the HTML.
2. Otherwise, add a trailing slash in the HTML.
  In contrast, my attached patch has a simpler rule: every HTML href should end in an explicit slash.

The issue can also sometimes be addressed by webserver configuration but not all providers provide such access, such as Gitlab Pages.

If you think changing the blueprint blog is the way to go, I would be happy to raise a pull request with my changes.

Attachments

Burp:

Chromium:

Firefox:

Blueprint blog patch:

diff --git a/helpers/blog-index.lua b/helpers/blog-index.lua
index 0f4f20b..7a73458 100644
--- a/helpers/blog-index.lua
+++ b/helpers/blog-index.lua
@@ -112,7 +112,7 @@ local template = [[
 <h1>Posts by tag</h1>
 <ul>
 {% for t in tag_links %}
-  <li> <a href="{{t.url}}">{{t.title}}</a> </li>
+  <li> <a href="{{t.url}}/">{{t.title}}</a> </li>
 {% endfor %}
 </ul>
 ]]
diff --git a/netlify.sh b/netlify.sh
index 2d3d9f6..50b4d5d 100755
--- a/netlify.sh
+++ b/netlify.sh
@@ -1,6 +1,6 @@
 #!/bin/sh
 
-SOUPAULT_VERSION="4.0.0-beta1"
+SOUPAULT_VERSION="4.11.0"
 
 if [ -z "${SOUPAULT_VERSION}" ]; then
     echo "Error: soupault version is undefined, cannot decide what to download"
diff --git a/soupault.toml b/soupault.toml
index c5f62d6..1df50d7 100644
--- a/soupault.toml
+++ b/soupault.toml
@@ -99,19 +99,19 @@
   # Jingoo template for rendering extracted metadata
   index_template = """
     {% for e in entries %}
-    <h2><a href="{{e.url}}">{{e.title}}</a></h2>
+    <h2><a href="{{e.url}}/">{{e.title}}</a></h2>
     <div><strong>Last update:</strong> {{e.date}}</div>
     {% if e.tags %}
     <div class="post-tags">
        <strong>Tags: </strong>
        {%- for t in e.tags -%}
-         <a href="/blog/tag/{{t}}"><span class="post-tag">{{t}}</span></a>{% if not loop.last %}, {% endif %}
+         <a href="/blog/tag/{{t}}/"><span class="post-tag">{{t}}</span></a>{% if not loop.last %}, {% endif %}
        {%- endfor -%}
     </div>
     {% endif %}
     <div><strong>Reading time:</strong> {{e.reading_time}}</div>
     <p>{{e.excerpt}}</p>
-    <a href="{{e.url}}">Read more</a>
+    <a href="{{e.url}}/">Read more</a>
     {% endfor %}
   """
 
@@ -132,7 +132,7 @@
     <dl>
       {% for e in sublist(0, limit, entries) %}
       <dt>{{e.date}}</dt>
-      <dd> <a href="{{e.url}}">{{e.title}}</a> </dd>
+      <dd> <a href="{{e.url}}/">{{e.title}}</a> </dd>
       {% endfor %}
       </ul>
     </dl>
@@ -195,7 +195,7 @@
         <div class="post-tags">
          <span><strong>Tags:</strong> </span>
          {%- for t in tags -%}
-           <a href="/blog/tag/{{t}}"><span class="post-tag">{{t}}</span></a>{% if not loop.last %}, {% endif %}
+           <a href="/blog/tag/{{t}}/"><span class="post-tag">{{t}}</span></a>{% if not loop.last %}, {% endif %}
          {%- endfor -%}
          </div>
         {% endif %}
diff --git a/templates/header.html b/templates/header.html
index 89af48c..963f822 100644
--- a/templates/header.html
+++ b/templates/header.html
@@ -3,6 +3,6 @@
 </header>
 <nav>
   <a href="/">Home</a> |
-  <a href="/blog">Blog</a> |
-  <a href="/about">About</a>
+  <a href="/blog/">Blog</a> |
+  <a href="/about/">About</a>
 </nav>

The text was updated successfully, but these errors were encountered:

but give the user an option to disable that

dmbaturin · 2025-03-09T14:24:23Z

I never paid attention to that, but you are right. In fact, even SimpleHTTP of python -m http.server behaves exactly that way and responds to everything like /about with a 301 redirect to /about/.

I think trailing slashes should be the default for URLs produced by site indexing, since it's the de facto canonical URL in most web servers. 6232d56 makes it so, but also adds an option to revert to the old behavior with settings.clean_url_trailing_slash = true option, if anyone wants it.

forbytten · 2025-03-09T15:01:44Z

Sounds good! Thanks for the very speedy response!

dmbaturin added a commit that referenced this issue Mar 9, 2025

Add trailing slashes to clean URLs by default (#81)

6232d56

but give the user an option to disable that

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Clean" URLs without ending slash cause redirects to "Cleaner" URLs with ending slash #81

"Clean" URLs without ending slash cause redirects to "Cleaner" URLs with ending slash #81

forbytten commented Mar 9, 2025

dmbaturin commented Mar 9, 2025

forbytten commented Mar 9, 2025

"Clean" URLs without ending slash cause redirects to "Cleaner" URLs with ending slash #81

"Clean" URLs without ending slash cause redirects to "Cleaner" URLs with ending slash #81

Comments

forbytten commented Mar 9, 2025

Details

Impact

Potential Remediation

Attachments

dmbaturin commented Mar 9, 2025

forbytten commented Mar 9, 2025