Closed Bug 1316339 Opened 8 years ago Closed 8 years ago

Exception in .parse method reading a .inc file with utf-8 characters in comments

Categories

(Localization Infrastructure and Tools :: compare-locales, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: flod, Assigned: flod)

References

Details

Attachments

(3 files, 1 obsolete file)

Attached file bookmarks.inc
This is the exception I get when parsing the attached file

Traceback (most recent call last):
  File "./app/scripts/tmx_products.py", line 226, in <module>
    main()
  File "./app/scripts/tmx_products.py", line 221, in main
    extracted_strings.extractStrings()
  File "./app/scripts/tmx_products.py", line 139, in extractStrings
    entities, map = file_parser.parse()
  File "/home/flodolo/transvision/libraries/compare-locales/compare_locales/parser.py", line 224, in parse
    for e in self:
  File "/home/flodolo/transvision/libraries/compare-locales/compare_locales/parser.py", line 248, in walk
    entity, offset = self.getEntity(ctx, offset)
  File "/home/flodolo/transvision/libraries/compare-locales/compare_locales/parser.py", line 492, in getEntity
    return (self.createEntity(ctx, m), offset)
  File "/home/flodolo/transvision/libraries/compare-locales/compare_locales/parser.py", line 280, in createEntity
    pre_comment = str(self.last_comment) if self.last_comment else ''
UnicodeEncodeError: 'ascii' codec can't encode character u'\u010d' in position 216: ordinal not in range(128)
I'm not completely sure if it's relevant, but for some strange reasons I get the exception reported multiple times: one for the file with the issue, one for each later .inc file I analyze, almost as if the parser keeps the exception stored somewhere.
I think I've identified the code responsible for the exception: str() should be unicode()
https://hg.mozilla.org/l10n/compare-locales/file/tip/compare_locales/parser.py#l280

But I have no clue about the exception piling up.
I think there might be another issue in the parser with the attached file: since the line

#define seamonkey_l10n_long

doesn't assign a value to the 'seamonkey_l10n_long', the following instruction '#unfilter' is lost.
(In reply to Francesco Lodolo [:flod] from comment #3)
> I think there might be another issue in the parser with the attached file:
> since the line
> 
> #define seamonkey_l10n_long
> 
> doesn't assign a value to the 'seamonkey_l10n_long', the following
> instruction '#unfilter' is lost.

Never mind, it works as expected if there's a space/tab after the entity name, and my editor is trimming whitespaces.
Comment on attachment 8809771 [details]
Bug 1316339 - Support UTF-8 characters in comments within .inc files;

https://reviewboard.mozilla.org/r/92298/#review92308

r=me with the follow-up
Attachment #8809771 - Flags: review?(l10n) → review+
Attachment #8809772 - Flags: review?(francesco.lodolo)
Comment on attachment 8809772 [details]
bug 1316339, follow up to allow defines with no value

https://reviewboard.mozilla.org/r/92300/#review92312

Thanks, it makes a lot more sense like this.
Attachment #8809772 - Flags: review?(francesco.lodolo) → review+
Attachment #8809387 - Attachment is obsolete: true
Attachment #8809387 - Flags: review?(l10n)
Assignee: nobody → francesco.lodolo
pushed to upstream, https://hg.mozilla.org/l10n/compare-locales/pushloghtml?changeset=0effb60622ea
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: