Fix downcasing of unicode tag names #492

leo-souza · 2014-03-15T14:16:02Z

I ran into some issues when a tag name contains upcased unicode chars.
It happens in both creation and retrieving:

It doesn't find the previous tag when a second tag is being created and duplicates it.
When tagged_with is called, it returns an empty array, even when calling with the exact tag name.
Examples:

Post.create(title: 'First', tag_list: 'Ruby')
Post.create(title: 'Second', tag_list: 'ruby')
Post.create(title: 'Third', tag_list: 'Ábaco')
Post.create(title: 'Fourth', tag_list: 'ábaco')
ActsAsTaggableOn::Tag.all
#<ActiveRecord::Relation [
  #<ActsAsTaggableOn::Tag id: 1, name: "Ruby">, 
  #<ActsAsTaggableOn::Tag id: 2, name: "Ábaco">, 
  #<ActsAsTaggableOn::Tag id: 3, name: "ábaco">]>

Post.tagged_with('ruby')
#<ActiveRecord::Relation [
  #<Post id: 1, title: "First", body: nil>,
  #<Post id: 2, title: "Second", body: nil>]>
Post.tagged_with('Ruby')
 #<ActiveRecord::Relation [
  #<Post id: 1, title: "First", body: nil>,
  #<Post id: 2, title: "Second", body: nil>]>
Post.tagged_with('ábaco')
#<ActiveRecord::Relation []>
Post.tagged_with('Ábaco')
#<ActiveRecord::Relation []>

This behaviour was introduced when .force_encoding('BINARY') was added in Tag class, which was being called before downcasing the string

When sqlite is being used, the only way around this is loading the ICU extension.

PR #472 only fixes the tagged_with part of this issue.

seuros · 2014-03-15T14:25:14Z

Tests are failing.

leo-souza · 2014-03-15T14:30:27Z

My mistake. do I need to repush this commits as one?

seuros · 2014-03-15T14:33:27Z

Since you will need to push the changelog , rebase and push as one.

leo-souza · 2014-03-15T15:25:30Z

This is related to #464 as well

seuros · 2014-03-15T15:31:24Z

Would you mind modifying the change-log as well ?
And add the note about ICU extension in the readme.

Fix downcasing of unicode tag names

seuros · 2014-03-15T15:44:47Z

Thank you

seuros · 2014-03-15T16:10:55Z

Sqlite3 tests are failing. :/

leo-souza · 2014-03-15T17:30:12Z

I'll check this

leo-souza · 2014-03-15T19:18:52Z

How can I reproduce those testing errors in my local code?

seuros · 2014-03-15T19:23:33Z

I could not. All tests passed on my machine.
You need use ruby 1.9.3.
cc @bf4 @mbleigh

leo-souza · 2014-03-15T20:41:31Z

Maybe rolling back #as_8bit_ascii to what it was and relying on database's lower function for queries is another valid approach.
Like this a75a061

seuros · 2014-03-15T22:48:58Z

Do you want to send a PR ?

nicolaslazartekaqui · 2014-03-17T12:43:37Z

lib/acts_as_taggable_on/tag.rb

        else
-          string.to_s.mb_chars
+          string.to_s


@leo-souza is returning a string, should not be returning a ActiveSupport::Multibyte::Chars?
Maybe this broke the build.

Suggestion:

def as_8bit_ascii(string, downcase=false) string = string.to_s string.downcase! if downcase if defined?(Encoding) string.dup.force_encoding('BINARY') else string.mb_chars end end

#mb_chars have to be called before #downcase or else Á would not become á. But either way, I don't think the else block is being called at all. But this is a good suggestion, maybe just removing #to_s is a good try. If only the tests fail locally

The mb_chars code was legacy 1.8 that I left in there. That's why it's in the elsif Encoding is not defined.

Ruby downcase method doesn't apply to multibyte chars unless you call mb_chars before downcase, even for 1.9+

leo-souza · 2014-03-19T15:01:26Z

Yes, I do.
This build https://travis-ci.org/mbleigh/acts-as-taggable-on/builds/21049248 throw me off of understanding what's going on, a test on mysql broke, and in this other build https://travis-ci.org/mbleigh/acts-as-taggable-on/builds/20835215 a postgres test broke, so not only the sqlite ones are failing.
Should i modify changelog again?

seuros · 2014-03-19T15:10:14Z

I noticed that. If you restart the tests , they often pass.

bf4 · 2014-03-20T02:34:25Z

lib/acts_as_taggable_on/tag.rb

@@ -111,11 +111,13 @@ def binary
        /mysql/ === ActiveRecord::Base.connection_config[:adapter] ? "BINARY " : nil
      end

-      def as_8bit_ascii(string)
+      def as_8bit_ascii(string, downcase=false)
+        string = string.to_s.dup.mb_chars


Why use mb_chars here? Unnecessary in Ruby 1.9

Fix downcasing of unicode tag names

Fix downcasing of unicode tag names

ad2bf97

seuros added a commit that referenced this pull request Mar 15, 2014

Merge pull request #492 from leo-souza/master

95424df

Fix downcasing of unicode tag names

seuros merged commit 95424df into mbleigh:master Mar 15, 2014

nicolaslazartekaqui reviewed Mar 17, 2014
View reviewed changes

leo-souza mentioned this pull request Mar 19, 2014

Use database's lower function for case-insensitive match #498

Merged

bf4 reviewed Mar 20, 2014
View reviewed changes

tekniklr pushed a commit to tekniklr/acts-as-taggable-on that referenced this pull request Mar 19, 2021

Merge pull request mbleigh#492 from leo-souza/master

95bf577

Fix downcasing of unicode tag names

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix downcasing of unicode tag names #492

Fix downcasing of unicode tag names #492

leo-souza commented Mar 15, 2014

seuros commented Mar 15, 2014

leo-souza commented Mar 15, 2014

seuros commented Mar 15, 2014

leo-souza commented Mar 15, 2014

seuros commented Mar 15, 2014

seuros commented Mar 15, 2014

seuros commented Mar 15, 2014

leo-souza commented Mar 15, 2014

leo-souza commented Mar 15, 2014

seuros commented Mar 15, 2014

leo-souza commented Mar 15, 2014

seuros commented Mar 15, 2014

nicolaslazartekaqui Mar 17, 2014

leo-souza Mar 17, 2014

bf4 Mar 20, 2014

leo-souza Mar 20, 2014

leo-souza commented Mar 19, 2014

seuros commented Mar 19, 2014

bf4 Mar 20, 2014

Fix downcasing of unicode tag names #492

Fix downcasing of unicode tag names #492

Conversation

leo-souza commented Mar 15, 2014

seuros commented Mar 15, 2014

leo-souza commented Mar 15, 2014

seuros commented Mar 15, 2014

leo-souza commented Mar 15, 2014

seuros commented Mar 15, 2014

seuros commented Mar 15, 2014

seuros commented Mar 15, 2014

leo-souza commented Mar 15, 2014

leo-souza commented Mar 15, 2014

seuros commented Mar 15, 2014

leo-souza commented Mar 15, 2014

seuros commented Mar 15, 2014

nicolaslazartekaqui Mar 17, 2014

Choose a reason for hiding this comment

leo-souza Mar 17, 2014

Choose a reason for hiding this comment

bf4 Mar 20, 2014

Choose a reason for hiding this comment

leo-souza Mar 20, 2014

Choose a reason for hiding this comment

leo-souza commented Mar 19, 2014

seuros commented Mar 19, 2014

bf4 Mar 20, 2014

Choose a reason for hiding this comment