-
Notifications
You must be signed in to change notification settings - Fork 87
Fix disk space "leak" in extract-and-vector, limit disk usage on all services #809
Comments
Dunno, as I don't really know what does it exactly limit :) Is it a cap on container image's own files? Or the files that the container creates while running? Could you test it out for me? Also, what happens if the container exceeds that limit? Does it get killed, or it just can't write stuff anymore? Generally containers aren't supposed to do much writing to their own root partitions (only to the volumes) while running, and our containers don't write much anywhere. Some exceptions:
If, say, If you can, find out what does
All apps can decide to write things, so we'd be looking into adding a storage caps on all apps I'd think.
|
Looks like this is for setting the container's rootfs size at creation time: https://docs.docker.com/engine/reference/commandline/run/#set-storage-driver-options-per-container From the docs:
The problem is that it only works for overlay over xfs, and in our case we use ext4, so this isn't a compatible option in our case. Per our discussion earlier, I'm just gonna go ahead and fix the jieba cache issue and call it a day. |
extract-and-vector
workers tend to fill up/var/tmp
with gigabytes of pretty much identical files which are of the size of either 0 or 3332489:It took me a while to notice that a temporary file with a random name and a temporary file with a not-so-random name have identical file sizes:
Jieba is a Python library which does Chinese language tokenization for us. Given that it uses a dictionary to do that, it has to pre-load some stuff:
backend/apps/common/Dockerfile
Lines 139 to 144 in 04bc9c6
but it seems that the resulting
/var/tmp/jieba.cache
does not become accessible by the users as that file gets created withroot:root
owner and600
permissions while its users run asmediacloud:mediacloud
, so Jieba resorts to rebuilding that cache file on every call.@jtotoole, could you:
jieba.cache
's file permissions at build time so that Jieba library could access it; probably you just need to run that cache creation script with a different user inDockerfile
docker-compose.yml
where appropriate - you'll probably needstorage_opt
for thatThe text was updated successfully, but these errors were encountered: