Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix downcasing of unicode tag names #492

Merged
merged 1 commit into from
Mar 15, 2014
Merged

Conversation

leo-souza
Copy link
Contributor

I ran into some issues when a tag name contains upcased unicode chars.
It happens in both creation and retrieving:

  • It doesn't find the previous tag when a second tag is being created and duplicates it.
  • When tagged_with is called, it returns an empty array, even when calling with the exact tag name.
    Examples:
Post.create(title: 'First', tag_list: 'Ruby')
Post.create(title: 'Second', tag_list: 'ruby')
Post.create(title: 'Third', tag_list: 'Ábaco')
Post.create(title: 'Fourth', tag_list: 'ábaco')
ActsAsTaggableOn::Tag.all
#<ActiveRecord::Relation [
  #<ActsAsTaggableOn::Tag id: 1, name: "Ruby">, 
  #<ActsAsTaggableOn::Tag id: 2, name: "Ábaco">, 
  #<ActsAsTaggableOn::Tag id: 3, name: "ábaco">]>

Post.tagged_with('ruby')
#<ActiveRecord::Relation [
  #<Post id: 1, title: "First", body: nil>,
  #<Post id: 2, title: "Second", body: nil>]>
Post.tagged_with('Ruby')
 #<ActiveRecord::Relation [
  #<Post id: 1, title: "First", body: nil>,
  #<Post id: 2, title: "Second", body: nil>]>
Post.tagged_with('ábaco')
#<ActiveRecord::Relation []>
Post.tagged_with('Ábaco')
#<ActiveRecord::Relation []>

This behaviour was introduced when .force_encoding('BINARY') was added in Tag class, which was being called before downcasing the string

When sqlite is being used, the only way around this is loading the ICU extension.

PR #472 only fixes the tagged_with part of this issue.

@seuros
Copy link
Collaborator

seuros commented Mar 15, 2014

Tests are failing.

@leo-souza
Copy link
Contributor Author

My mistake. do I need to repush this commits as one?

@seuros
Copy link
Collaborator

seuros commented Mar 15, 2014

Since you will need to push the changelog , rebase and push as one.

@leo-souza
Copy link
Contributor Author

This is related to #464 as well

@seuros
Copy link
Collaborator

seuros commented Mar 15, 2014

Would you mind modifying the change-log as well ?
And add the note about ICU extension in the readme.

seuros added a commit that referenced this pull request Mar 15, 2014
Fix downcasing of unicode tag names
@seuros seuros merged commit 95424df into mbleigh:master Mar 15, 2014
@seuros
Copy link
Collaborator

seuros commented Mar 15, 2014

Thank you

@seuros
Copy link
Collaborator

seuros commented Mar 15, 2014

Sqlite3 tests are failing. :/

@leo-souza
Copy link
Contributor Author

I'll check this

@leo-souza
Copy link
Contributor Author

How can I reproduce those testing errors in my local code?

@seuros
Copy link
Collaborator

seuros commented Mar 15, 2014

I could not. All tests passed on my machine.
You need use ruby 1.9.3.
cc @bf4 @mbleigh

@leo-souza
Copy link
Contributor Author

Maybe rolling back #as_8bit_ascii to what it was and relying on database's lower function for queries is another valid approach.
Like this a75a061

@seuros
Copy link
Collaborator

seuros commented Mar 15, 2014

Do you want to send a PR ?

else
string.to_s.mb_chars
string.to_s

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leo-souza is returning a string, should not be returning a ActiveSupport::Multibyte::Chars?
Maybe this broke the build.

Suggestion:

def as_8bit_ascii(string, downcase=false)
  string = string.to_s
  string.downcase! if downcase
  if defined?(Encoding)
    string.dup.force_encoding('BINARY')
  else
    string.mb_chars
  end
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#mb_chars have to be called before #downcase or else Á would not become á. But either way, I don't think the else block is being called at all. But this is a good suggestion, maybe just removing #to_s is a good try. If only the tests fail locally

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mb_chars code was legacy 1.8 that I left in there. That's why it's in the elsif Encoding is not defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ruby downcase method doesn't apply to multibyte chars unless you call mb_chars before downcase, even for 1.9+

@leo-souza
Copy link
Contributor Author

Yes, I do.
This build https://travis-ci.org/mbleigh/acts-as-taggable-on/builds/21049248 throw me off of understanding what's going on, a test on mysql broke, and in this other build https://travis-ci.org/mbleigh/acts-as-taggable-on/builds/20835215 a postgres test broke, so not only the sqlite ones are failing.
Should i modify changelog again?

@seuros
Copy link
Collaborator

seuros commented Mar 19, 2014

I noticed that. If you restart the tests , they often pass.

@@ -111,11 +111,13 @@ def binary
/mysql/ === ActiveRecord::Base.connection_config[:adapter] ? "BINARY " : nil
end

def as_8bit_ascii(string)
def as_8bit_ascii(string, downcase=false)
string = string.to_s.dup.mb_chars
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use mb_chars here? Unnecessary in Ruby 1.9

tekniklr pushed a commit to tekniklr/acts-as-taggable-on that referenced this pull request Mar 19, 2021
Fix downcasing of unicode tag names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants