[k] crc32 filters throw UnicodeEncodeError's on non-ASCII characters

VERIFIED FIXED in 2.1

Status

defect
VERIFIED FIXED
9 years ago
9 years ago

People

(Reporter: paulc, Assigned: paulc)

Tracking

unspecified

Firefox Tracking Flags

(Not tracked)

Details

()

Wherever we use crc32, passing unicode to it causes problems.

STR:
1. Go to: http://support.mozilla.com/en-US/search?a=1&tags=日##語

More detail (from stack trace):
  File "/data/www/support.mozilla.com/kitsune/apps/search/views.py", line 213, in search
    tags = [crc32(t.strip()) for t in cleaned['tags'].split()]

  File "/data/www/support.mozilla.com/kitsune/apps/search/utils.py", line 7, in <lambda>
    crc32 = lambda x: zlib.crc32(x) & 0xffffffff

UnicodeEncodeError: 'ascii' codec can't encode character u'\u65e5' in position 0: ordinal not in range(128)
Figured this out! The problem is that zlib.crc32 (as well as binascii.crc32) expect binary data, not unicode. Turning a utf-8 string into binary in python is as simple as:
   mystr.encode('utf-8')

This resource helped:
http://boodebr.org/main/python/all-about-python-and-unicode#UNI_TO_BINARY

I updated the search URL. Testing with the tag 有効, I got the article below as a search result:
Cookie を有効または無効にする

r? at:
http://github.com/pcraciunoiu/kitsune/commit/8ec1514049cd201782a649f35cfc698d8233508e
Merged and pushed:
http://github.com/jsocol/kitsune/commit/066535a1ed886418ae35ee1efe3139355fc0b7f7

Build #65:
https://hudson.mozilla.org/job/support.mozilla.com/65
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Verified FIXED on http://support-stage-new.mozilla.com/en-US/search?q=&language=ja&tags=%E6%9C%89%E5%8A%B9&a=1&w=1.

tests++; you guys rock!
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.