-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Certain unexpected characters in an entry title can cause a stack trace #1535
Comments
You know, I may have slightly mis-reported this. The actual content in the original database is hex C2A0, which is however the Unicode representation of the same reported, A0 -- Latin-1 nonbreaking space. I'm including here a text file of what is actually in the external database being read from (look between BS and EN), and it should have propagated through as the full C2A0 anyway by the time Craft saw it, but not sure about that, or which/both are causing your stripHTML preg_remove to blow up. It's a tedious thing to set up the test case actually in situ, if aided well by PhpStorm's breakpoint from conditioned breakpoint nice abilities, so I'll try to close this loop later, possibly also by a separated example. As you'll know, discerning what is actually present in the world of various viewers and editors and database interfaces vs. Unicode can be pretty tricky. In any case, that preg ought to be made safe, or safetied, imagine you'd agree. |
Ok, I've pretty well verified it is indeed a single 0xA0 that exists in the text sent to the stripHtml regex during slugbuilding, which causes it to fail. Apparently the MsSql database driver just changes original Unicode into equivalent Latin-1, when it can - in this case U+C2A0 to 0xA0. 'Nice' -- not exactly, but sort of their style. Hence a lot of fun matching out bad characters. I'm still going to use my preg_replace matcher on this single \hA0 char to replace it with a space at least for the present,, as the HtmlPurify in StringHelper::convertToUtf8() while safe entirely takes this Latin-1 nbsp out. Thanks for the extra notes offline, Brad -- much appreciated and leads to some still-advancing thinking here. |
I can see the offending whitespace in the sample text file you provided, but C3 beta 8 handles it gracefully for me either via an auto-generated slug or if I manually insert the whitespace into the slug. Maybe something to do with the version of PCRE installed? Running |
Hmm - could well be. Handy beta 8 phpinfo says I'm running PCRE 8.39 2016-06-14 on this fragrant , flagrantly up-to-date-by-Otwell-standards Vagrant. Without looking too deeply into it, I did find some murky water having to do with decisions on how to handle various POSIX (?) classes and/or a bug in the jit, which is turned on in this php load. Two links -- first has a plod through for alternative issues, second lists what they fixed on 8.40. It sounds a bit of a mess, actually. https://bugzilla.redhat.com/show_bug.cgi?id=1400267 I guess you could do as I did, and just convert that particular character to safety Craft titles -- as people could paste that in as they did in this client database. Or if Macish make it in some case. Otherwise, I liked the idea of tuning up use of HtmlPurifier some year when there might be time to sensibly do that -- and if it didn't reinvent similar problems...what is it using if not REs. I want a Yogi Berra-ism 'judgement is hard when you need to decide', or something, for a case like this - think you have had a few of them...and I really like the way you guys choose, quite dependably... |
@narration-sd Can you confirm if replacing |
Sigh. Hope you didn't get emailed with my first results, which were incorrect since I didn't read your note carefully at this hour. Answer, unfortunately what you propose doesn't work. StringHelper::stripHtml() fails out with a TypeError because it is trying to return null. Null in turn probably means the same (PCRE?) fault as drives preg_replace() to fail is occurring somewhere underneath. Strip_tags() didn't get the chance to be involved. I will keep my special two-step breakpoint around, if disabled. And now that I have more of the plugin up, while still having the dirty database, it's not so much trouble to run a test. |
So you're saying:
returns null with your null space sample input? |
No, php raised an exception, the TypeError on StringHelper::stripHtml() -- because of its attempt to return null. I'm not sure I like the type handling system zapping an old use of null for strings, but there it is. And again, the null return has the scent of the preg_replace thing we are trying to work around. |
You know, I[m still not reading you straight. Slowing down and doing it right. (also still-db-related vision issues about to be well treated, but still my fault) -- though the reference to replacing StringHelper::stripHtml() also threw me, as I'd been talking about StringHelper::convertToUtf8(). Am setting up the test again to do it right. What i just said about stripHtml() is true I believe, though - will verify that as well, maybe trace deeper.... |
Ok, now some more detailed information.
I don't know if it's something we might be missing about tuning PCRE 8.40, or rather the rude basket of known faults with my version, reported by that team links above -- they ve pretty evidently abandoned even trying with it, in favor of producing PCRE 10.x, not in public. Brad, I know it's messy, I don't know if the simple /\xa0/ - replace w/ordinarly-space remover I put in can be a general answer for titles inside of Craft. If we were sure PCRE wouldn't get it mixed up as part of a longer utf8 entity, this could be ok. Really, it shouldn't, and my Chinese string thrown at it gave a first approximation, but that wasn't a real test. I guess I should find some Chinese characters with A0s in them...can't now for this afternoon. |
Yeah... seems like when you add UTF-8 into the mix, there's just not an easy answer. https://stackoverflow.com/questions/1279774/to-strip-whitespaces-inside-a-variable-in-php http://stackoverflow.com/a/1176923/684 Given that it seems to be edge case (tried on multiple computers and couldn't reproduce), no one else has reported a similar issue and all of the solutions I've found have their own issues and potential side-effects, going to go ahead and close this. Can always re-open if necessary. |
Well, pretty much agree that's sound judgement, Brad. I do have one more thing on a note to try here, just to be sure I actually covered all the bases. News at some 11 if I learn anything :) And thanks for all the consideration... |
Hi! I have same error but with russian chars in slug field. It works fine on local dev machine but fail on hosting: Return value of craft\helpers\StringHelper::stripHtml() must be of the type string, null returned
|
If remove /u flag then regex return "Сертификация" -> "-е---------к------" |
For my this fixed by adding: Also there is error in createSlug but it was fixed in latest commit |
@sokolovstas I just tried that slug locally and got a “Malformed UTF-8 characters, possibly incorrectly encoded” error, even with cms/src/helpers/ElementHelper.php Line 69 in 02739d2
I’ve updated that line to be unicode-aware for the next release, which fixes this. To get the fix early, change your "require": {
"craftcms/cms": "dev-develop#ab3095b6447dadcf17ce7a3197975b31efe609b1 as 3.2.6",
"...": "..."
} Then run |
Description
I'm going to write this up here to save @takobell from having to parse a detailed late-night note. It's not likely to show up unless you are doing as follows (common enough possible occurances, though):
What will happen if in particular your title text includes a Latin-1 non-breaking space (hex a0), is that a php preg_replace will mysteriously error when creating the slug, returng a null which will give the stack trace exception. Other odd characters will likely do the same thing, so very possibly this Entity range ought to be filtered out -- which preg_replace actually will handle. Apparently it's the html tag-removing matcher that blows up.
Steps to reproduce
Additional info
PHP version 7.1.2-4+deb.sury.org
xenial+1xenial+1Database driver & version MySQL 5.7.17-0ubuntu0.16.04.1
Image driver & version GD 7.1.2-4+deb.sury.org
Craft edition & version Craft Personal 3.0.0-beta.6
Yii version 2.0.11.2
Twig version 2.1.0
Guzzle version 6.2.1
Imagine version 0.7-dev
Plugins: yes, proprietary StandardsAgent, a big one near full conversion, and this string with the divot is coming from an external database it watches for reliable and timely creation, update, delete, etc. tracking.
The text was updated successfully, but these errors were encountered: