Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to install index for silk.Request model: (1071, 'Specified key was too long; max key length is 767 bytes') #38

Closed
GitFree opened this issue Sep 15, 2014 · 30 comments
Assignees
Labels

Comments

@GitFree
Copy link

GitFree commented Sep 15, 2014

I get this warning when run "python manage.py syncdb".

@mtford90
Copy link
Collaborator

Interesting, what database are you using?

@GitFree
Copy link
Author

GitFree commented Sep 19, 2014

mysql5.6
2014年9月19日 上午4:02于 "Michael Ford" [email protected]写道:

Interesting, what database are you using?


Reply to this email directly or view it on GitHub
#38 (comment).

joaofrancese added a commit to joaofrancese/silk that referenced this issue Sep 23, 2014
… 767 bytes' when creating the Request table in MySQL
joaofrancese added a commit to joaofrancese/silk that referenced this issue Sep 23, 2014
… 767 bytes' when creating the Request table in MySQL
@utkuyaman
Copy link

http://dev.mysql.com/doc/refman/5.6/en/create-index.html says:

Prefix support and lengths of prefixes (where supported) are storage engine dependent. For example, a prefix can be up to 767 bytes long for InnoDB tables or 3072 bytes if the innodb_large_prefix option is enabled. For MyISAM tables, the prefix limit is 1000 bytes.

this means the length of the varchar column is too long to apply index on so we must decrease the size of the column or size of the index to 255.
but as far as i know indexing the first xxx characters of a column is not supported in django.

I tried changing the table definition to:

class Request(models.Model):
    path = CharField(max_length=255, db_index=True)
    ...
    view_name = CharField(max_length=255, db_index=True, blank=True, default='')

and it worked.

but i don't know if changing the length of these 2 fields is good or bad for the system to work.
I need a comment for this from a commiter.

@mtford90
Copy link
Collaborator

mtford90 commented Nov 4, 2014

@utkuyaman, the only issue I see with that is if a path is longer than 255 characters it's going to break when we try to save down to the db. Not sure how rare that would be.

I totally support making this change though, but I think if you do make a pull request for this you should add code that truncates any paths that are >255

@prismic
Copy link

prismic commented May 28, 2015

Is there any update on this? I can take a stab at making a pull request to truncate the path if that's the way to go.

@utkuyaman
Copy link

I changed the table definition as i described in my reply. and it worked for me.
Unfortunately I couldn't find time to test it completely and commit the change.
You can try to change the codes and reply here if it works, then i can commit a patch.

@prismic
Copy link

prismic commented May 29, 2015

I just tried changing the length of the two fields and it did indeed work for me.

@mtford90
Copy link
Collaborator

@prismic @utkuyaman I def think truncating is the way to go. It would be nice if there was a way to make the max_length configurable however - if the database supports >255 e.g. postgres, it does seem a shame to not support that. I suspect it would be a bad idea linking the settings with model fields though...

@prismic
Copy link

prismic commented Jun 8, 2015

If changing maxlength is not desirable, what about the ability to turn off indexing on those fields via settings? I imagine running on a dev box indexing would not be a huge performance issue, although in production it would be. Would a warning message in the docs about turning off indexing suffice?

@avelis
Copy link
Collaborator

avelis commented Oct 19, 2015

@mtford90 To create a pull request that simply truncates a Request.path that is larger than 255 characters can be done in one of two ways:

  1. Override the path model fields setter using a properties technique.
  2. Augment the save method to truncate before persistence to the DB.

Both have pro's and cons. Whichever you lean on I can probably develop, write a test and make a pull request. Both solutions put the development at the model level. This simplifies the responsibility to one location that is very visible to discover and understand.

Thoughts?

@smaccona
Copy link
Collaborator

@utkuyaman @prismic @mtford90 hard-coding the max_length to 255 characters, even if you also safely truncate it to 255 characters via a model property, a pre-save signal or a similar technique won't cover all eventualities, because the character encoding of the underlying table/column is also a factor.

By default, MySQL uses a character encoding it calls utf8, however it's not actually utf8 at all but a non-standard 3-byte encoding. To really support UTF8 in MySQL, you need a 4-byte encoding, and the one MySQL provides is called utf8mb4. This is what you need to do if you want to fully support internationalization, so many people will be using utf8mb4.

The upshot of this is that the maximum index length is actually 191 characters (767/4) instead of 255 characters (767/3). For more details on how the Django team already ran into this issue, see https://code.djangoproject.com/ticket/18392 (in particular the comment at https://code.djangoproject.com/ticket/18392#comment:16 is enlightening). They haven't fully resolved this yet.

Also, to be clear: it only makes sense to both change max_length and implement truncate-before-save, right? One allows you to create the index, the second prevents an error on attempted save of paths that are too long.

What was the rationale for the original max_length=300? I don't believe Django itself places a limit on Request.path length - it's just a string.

@avelis
Copy link
Collaborator

avelis commented Nov 17, 2015

@smaccona I believe the attempt was to ensure the ability to capture long URL's that django-silk could be stored. Further up in the thread you will read that indexing a varchar column longer than 255 is not possible in MySQL but it is PostgreSQL. That difference is at the heart of this issue.

@smaccona
Copy link
Collaborator

@avelis what I meant was that the current Request model's name and view fields had their max_length originally set to 300, regardless of database (before this indexing issue was raised). I was just wondering why the current value of 300 was picked as the max_length in the first place.

Also, indexing a varchar column longer than 255 characters is possible in MySQL by setting the server option innodb_large_prefix to be true, setting innodb_file_format to be "Barracuda" and using a row format of ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED. You can do this without restarting MySQL by using set global commands from the MySQL client, and also place the relevant lines in your equivalent of the my.cnf file so they survive a server restart. See https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_large_prefix for details and http://mechanics.flite.com/blog/2014/07/29/using-innodb-large-prefix-to-avoid-error-1071/ for a walkthrough.

The point of my post above was that even reducing the max_length to 255 for those two fields won't help when your MySQL setup is using a Unicode character set such as utf8mb4 (that's the only one that works for real internationalization support). If you're using utf8mb4, then a varchar(255) column takes up 255 * (4 bytes per character) = 1020 bytes, which is still greater than the default 767-byte index maximum index length in MySQL, so the error will still occur when python manage.py syncdb tries to create the index.

@mtford90
Copy link
Collaborator

@avelis sorry for not getting back to you sooner! Have been rather overworked recently ;) @smaccona there was no rationale behind 300 as far as I remember... It was probably arbitrary ¯_(ツ)_/¯.

Interesting analysis though... I've never really used MySQL before but seems to have some pretty baffling design decisions.

I really doubt I'll have time to dig into this myself (and would be a poor use of my time I think with lack of mysql experience). In fact I'm looking for contributors to silk anyway having just moved it into it's own org - @avelis & @smaccona if you two are able to think of a solution (or at least a mitigation given that the django team haven't solved this either) would be happy to add you both as contributors so that my shitty response time doesn't hold back a fix any longer.

@smaccona
Copy link
Collaborator

@mtford90 agreed on the baffling MySQL design decisions ... I don't think the Django team are going to have an easy fix for this given it's a limitation in one of their dependencies and not really something they control. I have no doubt it's stuff like this that makes them highly recommend PostgreSQL over MySQL.

I propose that we simply reduce max_length for both of these fields to 190 and have a secondary strategy to handle cases where the path ends up being longer than this. This will cover the worst case scenario I can think of:

  • User is using MySQL
  • MySQL is installed with the default InnoDB setup (i.e user hasn't enabled innodb_large_prefix)
  • User has enabled utf8mb4 tables so they can support internationalization

In that case our maximum length for an indexable varchar column is 191 characters.

For the secondary strategy, here are my thoughts and proposal:

  • Simple truncation will lose the last part of the path and the view_name, which may contain information that is important to the user (e.g. the end of the path is arguably more important than all the directories in the hierarchy above it).
  • Instead of simple truncation, I think replacing enough characters from the middle of the path and view_name with the characters "..." to make them fit makes most sense, so we would get something like this:
    • /blog/category/start/of/long/path...end/of/long/path/by_id/123

The Request model already has an overridden save() method, so I would just put the "middle-removal" logic in there.

If this is acceptable to everyone, I'll make a PR for it.

@avelis
Copy link
Collaborator

avelis commented Nov 17, 2015

@smaccona Your precision strategy and solution works for me. I was previously not aware of those settings for MySQL so I do appreciate you enlightening me with that information.

@mtford90 I am open to contribution access. Would enjoy seeing this project gain more traction among the open source community.

@mtford90
Copy link
Collaborator

@smaccona Reckon it's possible to attempt to write the view name, detect the error & fall back to 190 chars? Not sure how viable or accurate detection of this specific error could be though...

@avelis @smaccona, gave you both access to the repo, thanks again for the help & let me know if either of you have any questions on the source code etc. Look forward to the PR ;)

@smaccona
Copy link
Collaborator

@mtford90 we could certainly trap an error on save (and I believe MySQL has a specific error for a column's value being too long) but it won't help because we won't get as far as model.save() before running into issues. Here are the two possible scenarios:

  • max_length in the Request model for the path and view_name fields is set to some value <= 191 characters. Then all databases will limit those columns to 190 (or whatever) characters, and Django will not allow saves of model instances where those fields exceed that length.
  • max_length in the Request model for the path and view_name fields is set to some value > 191 characters (like the current 300 or the originally proposed 255 above). In this case, MySQL won't be able to handle python manage.py syncdb because it will barf on the index creation, so everything will work fine for PostgreSQL but MySQL users won't even be able to get off the ground.

Django doesn't support different field lengths for different database engines, and even if it did, I think that's unneeded complexity. Nor does it support indexing a field when using one database engine but not another.

I think the bottom line is that to support MySQL and keep the index, we have to limit these two fields to 190 characters. To be honest, I don't think 190 characters is that bad a limitation - with 16-character directory, file and view names you would have to have a hierarchy 11-deep to run into a situation where we truncate (querystrings etc. don't count). Here's what one of those paths would look like:

  • 1234567890ABCDEF/1234567890ABCDEF/1234567890ABCDEF/1234567890ABCDEF/1234567890ABCDEF/1234567890ABCDEF/1234567890ABCDEF/1234567890ABCDEF/1234567890ABCDEF/1234567890ABCDEF/1234567890ABCDEF

I don't know about you guys but I have nothing that long in any of my apps!

@mtford90
Copy link
Collaborator

@smaccona haha very good point... Ok I would say go for the 190 char limit!

@smaccona smaccona added the bug label Nov 18, 2015
@smaccona smaccona self-assigned this Nov 18, 2015
@erback
Copy link

erback commented Mar 7, 2016

what is the status on this issue?

@smaccona
Copy link
Collaborator

smaccona commented Mar 7, 2016

@erback this is not as straightforward to fix as I originally thought because migrations for both Django and South will need to be supported for existing installations.

For now, I suggest you go with the steps outlined in the second paragraph of my comment above at #38 (comment) to enable longer indexes in MySQL. This can be done without downtime.

@erback
Copy link

erback commented Mar 7, 2016

@smaccona thanks for the swift response. Somehow I got it to work anyway by just running syncdb again after it failed. Is there something that I'm missing, because everythin seems to be working fine now?

@smaccona
Copy link
Collaborator

smaccona commented Mar 7, 2016

@erback I'm pretty sure your models have been created but the two indexes affected by this bug are missing. This will mean that Silk will still work, but will perform very slowly once you have more than a small amount of data.

You can confirm this by examining the silk_request table in your database and checking whether or not the path and view_name columns have indexes. If not, then queries against that table may perform slowly.

@erback
Copy link

erback commented Mar 7, 2016

@smaccona aha now I get it hadn't read through the issue well enough. Thanks for the input!

@avelis
Copy link
Collaborator

avelis commented Mar 17, 2016

@smaccona One path forward I see is doing a backwards incompatible change to just reset the initial migration file to the desired new max length. I realize this sort of cleaves current users from the project but could have the desired effect of laying this issue to rest.

@smaccona
Copy link
Collaborator

This has been hanging around for a long time now and I haven't had time to come up with a good fix which takes migrations into account properly (there's no strictly non-destructive way to truncate the data). We could do as @avelis suggested and just do a backwards incompatible change and reset the length. Another option is to just more fully document in the installation instructions the procedure for making MySQL support longer indexes - how does everyone feel about that?

@avelis
Copy link
Collaborator

avelis commented Nov 14, 2016

@smaccona For developer ease of use, I don't mind going forward with a fully backwards incompatible change. While making MySQL support longer indexes is possible that might not work for everyone as easily.

Not sure what is the best transition strategy for the backwards incompatible change. Open to ideas on that effort.

@teeberg
Copy link

teeberg commented Mar 9, 2017

I think it's fairly customary to communicate backwards-incompatible changes by bumping the major version number

@avelis
Copy link
Collaborator

avelis commented Mar 9, 2017

@teeberg I agree and given my recent experience no one seems to surprised when that happens. I can look into changing this. I am also open to a PR for this I can cut a release with a major version bump.

@auvipy
Copy link
Contributor

auvipy commented Mar 12, 2017

@avelis which fields need adjustments? I will be happy to send the suggested changes

avelis pushed a commit that referenced this issue Mar 14, 2017
Adds support for `utf8` and `utf8mb4` character sets in MySQL's default
configuration.

Closes #38.
smaccona added a commit to smaccona/django-silk that referenced this issue Oct 26, 2018
MySQL index restrictions prevent us from having path or view_name be longer than 190 characters (see jazzband#38), but there is no logic in the Request object to enforce this limit when saving, so long URLs cause a 500 error.

Closes jazzband#179.
avelis pushed a commit that referenced this issue Oct 28, 2018
MySQL index restrictions prevent us from having path or view_name be longer than 190 characters (see #38), but there is no logic in the Request object to enforce this limit when saving, so long URLs cause a 500 error.

Closes #179.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants