This issue by Parand Darugar pointed out that Datasette doesn't currently run on Linux systems with FIPS enabled, due to the way it uses MD5 hashes.
I hadn't heard of FIPS before. It stands for the Federal Information Processing Standard, and it's a set of US government standards around how cryptographic algorithms should be used.
Certain operating systems can run in a FIPS mode, and running them like that disables certain cryptographic algorithms that are considered to be too weak.
Unsurprisingly, MD5 is one of the algorithms that is disabled in FIPS mode.
The problem here is that Datasette uses MD5 for non-cryptographic purposes. It uses it in two places.
@property
def color(self):
if self.hash:
return self.hash[:6]
return hashlib.md5(self.name.encode("utf8")).hexdigest()[:6]
This method generates a color associated with a database, using the first 6 characters of the MD5 hash of the database name. I picked this trick up years ago from Dopplr.
The second place is here:
def to_css_class(s):
"""
Given a string (e.g. a table name) returns a valid unique CSS class.
For simple cases, just returns the string again. If the string is not a
valid CSS class (we disallow - and _ prefixes even though they are valid
as they may be confused with browser prefixes) we strip invalid characters
and add a 6 char md5 sum suffix, to make sure two tables with identical
names after stripping characters don't end up with the same CSS class.
"""
if css_class_re.match(s):
return s
md5_suffix = hashlib.md5(s.encode("utf8")).hexdigest()[:6]
# Strip leading _, -
s = s.lstrip("_").lstrip("-")
# Replace any whitespace with hyphens
s = "-".join(s.split())
# Remove any remaining invalid characters
s = css_invalid_chars_re.sub("", s)
# Attach the md5 suffix
bits = [b for b in (s, md5_suffix) if b]
return "-".join(bits)
The code comment explains what's happening here - this is a way to turn any string into a valid CSS class name. It's actually used for custom template names as well.
Both of these uses of MD5 are legitimate. But... attempting to run Datasette on a FIPS system produces the following runtime error:
else hashlib.md5(name.encode("utf8")).hexdigest()[:6],
ValueError: [digital envelope routines] unsupported
Python 3.9, released in 2020, introduced a fix for this issue: the usedforsecurity=False
parameter, documented here. This had been under discussion since 2010!
If you run MD5 like this, FIPS systems won't complain:
value = b"These are some bytes"
digest = hashlib.md5(value, usedforsecurity=False).hexdigest()
The default value there is =True
, so you have to explicitly pass =False
to pass the FIPS check.
This is a great solution... if you are running Python 3.9 or later.
Datasette supports all versions of Python that are not yet EOL for support. This means Python 3.8 is supported until at least October 2024.
Since this is a runtime error, I figured out a way to fix this with a utility function:
def md5_not_usedforsecurity(s):
try:
return hashlib.md5(s.encode("utf8"), usedforsecurity=False).hexdigest()
except TypeError:
# For Python 3.8 which does not support usedforsecurity=False
return hashlib.md5(s.encode("utf8")).hexdigest()
This catches the TypeError
thrown by hashlib.md5
when usedforsecurity=False
is not supported, and falls back to running the function without that parameter.
This means that Datasette can now run on FIPS-enabled systems provided they are running Python 3.9 or higher, and can continue to work on Python 3.8 systems that have not enabled FIPS mode.
The hardest part of fixing this was figuring out how to test it.
Eventually I found a recent Docker image with FIPS enabled, by searching Docker Hub for "fips" and sorting by recently updated:
https://hub.docker.com/search?q=fips&sort=updated_at&order=desc
I found ubuntu-ruby-fips which includes a recent Ubuntu with FIPS enabled.
I started a container like this:
docker run -it --rm cyberark/ubuntu-ruby-fips /bin/bash
Then installed Python 3 and Git in that container like this:
apt-get update && apt-get install -y python3 git python3.10-venv
Then I could run the Datasette test suite like so:
cd /tmp
git clone https://github.com/simonw/datasette
cd datasette
python3 -m venv venv
source venv/bin/activate
pip install -e '.[test]'
pytest -n auto
This failed the first time with the expected ValueError: [digital envelope routines] unsupported
errors - but after I applied this change the tests passed, showing my fix worked.