Remove transliteration

RESOLVED FIXED in Firefox 40

Status

()

Core
Internationalization
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: hsivonen, Assigned: AndyP, Mentored)

Tracking

(Blocks: 1 bug, {footprint})

unspecified
mozilla40
x86_64
Linux
footprint
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox40 fixed)

Details

(Whiteboard: [lang=c++][good first bug])

Attachments

(1 attachment, 8 obsolete attachments)

(Reporter)

Description

3 years ago
It appears that the use case for transliteration in Gecko was exporting text to clipboard on systems that didn't support Unicode on the clipboard. We haven't supported such systems in a long while, since it appears that even the Gtk clipboard API takes UTF-8 regardless of the system locale.

To make the size of Gecko a bit smaller, we should remove transliterate.properties and the transliteration capability from nsEntityConverter.
(Reporter)

Comment 1

3 years ago
Oh, and apparently transliteration was used by the old gfx, too. Anyway, gfx has long since been rewritten.
(Reporter)

Updated

3 years ago
Blocks: 956982
Mentor: hsivonen@hsivonen.fi
Whiteboard: [mentor=hsivonen][lang=c++][good first bug] → [lang=c++][good first bug]

Comment 2

3 years ago
Hi there! I would like to take this on!
I do not fully understand the second part of Henri's message. I need more info, please!

> To make the size of Gecko a bit smaller, we should remove
> transliterate.properties and the transliteration capability from
> nsEntityConverter.
Flags: needinfo?(hsivonen)
(Reporter)

Comment 3

3 years ago
I mean that we should:
 1) Delete transliterate.properties.
 2) Make nsEntityConverter not use it via htmlEntityVersions.properties and the associated loading mechanism.

https://mxr.mozilla.org/mozilla-central/source/toolkit/components/viewsource/content/viewPartialSource.js#20 might be a complication.
Flags: needinfo?(hsivonen)

Comment 4

3 years ago
Created attachment 8477908 [details] [diff] [review]
1003731.diff

This patch removes transliterate.properties from moz.build and htmlEntityVersions.properties. Builds successfully. What tests can be done to ensure it's not causing any problems?
(In reply to Henri Sivonen (:hsivonen) from comment #3)
> https://mxr.mozilla.org/mozilla-central/source/toolkit/components/viewsource/
> content/viewPartialSource.js#20 might be a complication.

Wow, that looks like an awful hack! What happens if the file being viewed happens to contain a string of ZWSP characters?

(Answer: it breaks. The highlighting of the selection in the source view is cut short.)
FTR, I just filed bug 1057841 about that; the proposed patch completely removes the reference to transliterate.properties from viewPartialSource.js. (Though even now it doesn't actually depend on it. So no need to worry about it here.)

Comment 7

3 years ago
hello, is somebody working on this bug because i would like to work on this bug, this would be my first bug so i might need some extra guidance debugging it. thanks!
(In reply to diwas joshi from comment #7)
> hello, is somebody working on this bug because i would like to work on this
> bug, this would be my first bug so i might need some extra guidance
> debugging it. thanks!

It looks like someone has already made a start - see comment 4.

(In reply to Anuj Pahuja from comment #4)
> Created attachment 8477908 [details] [diff] [review]
> 1003731.diff
> 
> This patch removes transliterate.properties from moz.build and
> htmlEntityVersions.properties. Builds successfully. What tests can be done
> to ensure it's not causing any problems?

I've pushed your patch to tryserver to see if it affects any existing unit tests (I don't know offhand whether we have any tests that will be affected by this). Results should appear in a couple of hours or so at https://tbpl.mozilla.org/?tree=Try&rev=d2ea12b343fb. Henri, any other thoughts about testing?

In any case, it looks like your patch didn't actually delete (hg rm ...) the transliterate.properties file from the tree, it only removed the references to it. We should also remove the file itself, if it's truly obsolete.
Comment on attachment 8477908 [details] [diff] [review]
1003731.diff

Review of attachment 8477908 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/unicharutil/tables/htmlEntityVersions.properties
@@ +9,5 @@
>  #  Do not translate anything in this file
>  
>  # list supported versions number/name pair
>  # length should not be greater than 32
>  length=5

It looks to me like "length" here is the number of items in the list; in which case it should presumably be changed to 4 when you delete the "transliterate" item.

Comment 10

3 years ago
Created attachment 8485455 [details] [diff] [review]
patch.v1

Taking reference from the existing patch & your feedback, I've created my first patch. Please review & let me know if it needs any changes.
Attachment #8485455 - Flags: review?(hsivonen)
Comment on attachment 8477908 [details] [diff] [review]
1003731.diff

Review of attachment 8477908 [details] [diff] [review]:
-----------------------------------------------------------------

Fixed via attachment : 8485455
Attachment #8477908 - Flags: review-
(Reporter)

Comment 12

3 years ago
Comment on attachment 8485455 [details] [diff] [review]
patch.v1

This patch is a bit incomplete.

You also need to remove intl/uconv/tests/unit/test_bug365345.js . 

>-length=5
>+length=4
> 1=html40Latin1
> 2=html40Symbols
> 3=html40Special
>-4=transliterate
>-5=mathml20
>+4=mathml20

Unfortunately, it seems that these numbers are a leaky abstraction that's visible to existing callers of the nsIEntityConverter API: The bit masks exposed to callers are built by shifting 1 to the left by the number in the key minus 1, so the patch as written changes the mask bit previously reserved to transliteration to mean mathml entities and unmaps the previousl bit for mathml.

I suggest removing htmlEntityVersions.properties and then changing nsEntityConverter.cpp so that you hard-code a mapping from the bitmasks to the remaining property files as follows:

1: html40Latin1.properties
2: html40Symbols.properties
4: html40Special.properties
16: mathml20.properties

And then return an error for 8, which was the mask bit for transliterate.properties.

Also, in nsIEntityConverter.idl, please document that the transliterate is obsolete.
Attachment #8485455 - Flags: review?(hsivonen) → review-

Comment 13

3 years ago
Hi, Sudheesh - Are you still working on this bug?
(Assignee)

Comment 14

3 years ago
Hi, I'd be interested in working on this bug but I have a question/proposal to make. Instead of removing htmlEntityVersions.properties and rewriting most of the nsEntityConverter.cpp, wouldn't it be better to just return nullptr when the versionNumber parameter is 8? This way almost all of the code could stay the way it is and we could continue to use the htmlEntityVersions.properties file. I have a patch ready, so if you are fine with it I'll attach it.

If we were to delete htmlEntityVersions.properties we would make nsEntityVersionList and probably a few of the methods obsolete, however nsEntityVersionList is used quite a lot and like I said above would basically require to rewrite most of the code and I'm not sure which methods are save to remove as none of them are private.
Flags: needinfo?(hsivonen)
(Reporter)

Comment 15

3 years ago
(In reply to Andy Pusch from comment #14)
> Hi, I'd be interested in working on this bug but I have a question/proposal
> to make. Instead of removing htmlEntityVersions.properties and rewriting
> most of the nsEntityConverter.cpp, wouldn't it be better to just return
> nullptr when the versionNumber parameter is 8? This way almost all of the
> code could stay the way it is and we could continue to use the
> htmlEntityVersions.properties file.

The having this data in a property file is a bad design to begin with. This is a good opportunity to remove htmlEntityVersions.properties, so I think we should remove it here.

> If we were to delete htmlEntityVersions.properties we would make
> nsEntityVersionList and probably a few of the methods obsolete, however
> nsEntityVersionList is used quite a lot and like I said above would
> basically require to rewrite most of the code and I'm not sure which methods
> are save to remove as none of them are private.

It's safe to remove anything that doesn't implement a method inherited from nsIEntityConverter.
Flags: needinfo?(hsivonen)
(Assignee)

Comment 16

3 years ago
Created attachment 8562926 [details] [diff] [review]
bug1003731_transliterationRemoval.diff

The file htmlEntityVersions.properties is now removed and a hard coded mapping for the versionNumbers is used. I've slimmed nsEntityConverter.cpp down a little and I've also fixed the tabs, they were inconsistent in nsEntityConverter.h and most of the time used 2 spaces instead of 4 like the first line suggested. 
It built and ran successful. Please tell me if there is anything specific you'd like me to test.
Attachment #8562926 - Flags: review?(hsivonen)
(Reporter)

Comment 17

3 years ago
Comment on attachment 8562926 [details] [diff] [review]
bug1003731_transliterationRemoval.diff

Review of attachment 8562926 [details] [diff] [review]:
-----------------------------------------------------------------

Overall, looks very good, but marking r- over a performance concern arising from the removal of the string bundle cache.

::: intl/unicharutil/nsEntityConverter.cpp
@@ +22,2 @@
>  {
> +    nsAutoCString url(NS_LITERAL_CSTRING("resource://gre/res/entityTables/"));

Please drop the NS_LITERAL_CSTRING wrapped from around the literal.

@@ +25,5 @@
> +
> +    nsCOMPtr<nsIStringBundleService> bundleService =
> +        do_GetService(NS_STRINGBUNDLE_CONTRACTID, &rv);
> +    NS_ENSURE_SUCCESS(rv, nullptr);
> +  

Stray whitespace at the beginning of the line.

@@ +26,5 @@
> +    nsCOMPtr<nsIStringBundleService> bundleService =
> +        do_GetService(NS_STRINGBUNDLE_CONTRACTID, &rv);
> +    NS_ENSURE_SUCCESS(rv, nullptr);
> +  
> +    const char *fileName = GetFileName(versionNumber); 

Stray whitespace at the end of the line.

@@ +32,5 @@
> +
> +    url.Append(fileName);
> +
> +    nsCOMPtr<nsIStringBundle> bundle;
> +    rv = bundleService->CreateBundle(url.get(), getter_AddRefs(bundle));

By removing nsEntityVersionList altogether, it looks like this code in now no longer caching the bundles in RAM unconditionally but instead relying on the LRU cache inside nsStringBundleService. Maybe that's a good thing for memory usage, though.

For performance of *this* class (which can be indirectly invoked by Web content thanks to the behavior of non-UTF-8, non-GB18030 form submission), going through the cache in nsStringBundleService in inner loops seems like a bad idea and this method should maintain a cache of the nsIStringBundles it has already loaded and return plain nsIStringBundle*, since the cache would own the objects anyway.

r- due to this caching concern.

@@ +34,5 @@
> +
> +    nsCOMPtr<nsIStringBundle> bundle;
> +    rv = bundleService->CreateBundle(url.get(), getter_AddRefs(bundle));
> +    NS_ENSURE_SUCCESS(rv, nullptr);
> +  

Stray whitespace.

@@ +44,3 @@
>  {
> +    switch(versionNumber){
> +    case(1):

Please indent 'case' relative to switch. (And 'return' relative to 'case'.)

Please use 
case 1:
instead of
case(1):
for consistency with the rest of the codebase.

@@ +44,5 @@
>  {
> +    switch(versionNumber){
> +    case(1):
> +        return kHTML40LATIN1;
> +        break;

No need for 'break' after 'return'. (Below, too.)

@@ +70,5 @@
>  // nsIEntityConverter
>  //
>  NS_IMETHODIMP
>  nsEntityConverter::ConvertToEntity(char16_t character, uint32_t entityVersion, char **_retval)
>  { 

Stray whitespace.

@@ +79,5 @@
>  nsEntityConverter::ConvertUTF32ToEntity(uint32_t character, uint32_t entityVersion, char **_retval)
>  {
> +    NS_ASSERTION(_retval, "null ptr- _retval");
> +    if(nullptr == _retval)
> +        return NS_ERROR_NULL_POINTER;

Since you are causing these lines to appear changed anyway, please take the opportunity to add braces for 'if' even when it's followed by a single statement.

@@ +81,5 @@
> +    NS_ASSERTION(_retval, "null ptr- _retval");
> +    if(nullptr == _retval)
> +        return NS_ERROR_NULL_POINTER;
> +    *_retval = nullptr;
> +    

Stray whitespace.

@@ +83,5 @@
> +        return NS_ERROR_NULL_POINTER;
> +    *_retval = nullptr;
> +    
> +    for (uint32_t mask = 1, mask2 = 0xFFFFFFFFL; (0!=(entityVersion & mask2)); mask<<=1, mask2<<=1) {
> +        if (0 == (entityVersion & mask)) 

Stray whitespace. (And please add braces.)

@@ +85,5 @@
> +    
> +    for (uint32_t mask = 1, mask2 = 0xFFFFFFFFL; (0!=(entityVersion & mask2)); mask<<=1, mask2<<=1) {
> +        if (0 == (entityVersion & mask)) 
> +            continue;
> +      

Stray whitespace.

@@ +86,5 @@
> +    for (uint32_t mask = 1, mask2 = 0xFFFFFFFFL; (0!=(entityVersion & mask2)); mask<<=1, mask2<<=1) {
> +        if (0 == (entityVersion & mask)) 
> +            continue;
> +      
> +        nsIStringBundle* entities = ((nsCOMPtr<nsIStringBundle>)GetVersionBundleInstance(entityVersion & mask)).get();

This weird cast should go away once GetVersionBundleInstance returns a plain pointer thanks to there being a cache that owns the objects.

@@ +92,2 @@
>  
> +        if (!entities) 

Stray whitespace. (And please add braces.)

@@ +95,2 @@
>  
> +        nsAutoString key(NS_LITERAL_STRING("entity."));

Please drop the NS_LITERAL_CSTRING wrapped from around the literal.

@@ +99,5 @@
> +        nsXPIDLString value;
> +        nsresult rv = entities->GetStringFromName(key.get(), getter_Copies(value));
> +        if (NS_SUCCEEDED(rv)) {
> +            *_retval = ToNewCString(value);
> +            if(nullptr == *_retval)

ToNewCString is infallible, so there's no need to check the return value for null anymore.

@@ +121,5 @@
>  
> +    // per character look for the entity
> +    uint32_t len = NS_strlen(inString);
> +    for (uint32_t i = 0; i < len; i++) {
> +        nsAutoString key(NS_LITERAL_STRING("entity."));

Please drop the NS_LITERAL_CSTRING wrapped from around the literal.

@@ +126,5 @@
> +        if (NS_IS_HIGH_SURROGATE(inString[i]) && i + 2 < len && NS_IS_LOW_SURROGATE(inString[i + 1])) {
> +            key.AppendInt(SURROGATE_TO_UCS4(inString[i], inString[i+1]), 10);
> +            ++i;
> +        }
> +        else {

Please move 'else' to the previous line now that you are changing indent and line wrap anyway.

@@ +129,5 @@
> +        }
> +        else {
> +            key.AppendInt(inString[i],10);
> +        }
> +    

Stray whitespace.

@@ +134,5 @@
> +        nsXPIDLString value;
> +        const char16_t *entity = nullptr;
> +
> +        for (uint32_t mask = 1, mask2 = 0xFFFFFFFFL; (0!=(entityVersion & mask2)); mask<<=1, mask2<<=1) {
> +            if (0 == (entityVersion & mask)) 

Stray whitespace and please add braces.

@@ +139,5 @@
> +                continue;
> +            nsIStringBundle* entities = ((nsCOMPtr<nsIStringBundle>)GetVersionBundleInstance(entityVersion & mask)).get();
> +            NS_ASSERTION(entities, "Cannot get the property file");
> +
> +            if (!entities) 

Stray whitespace and please add braces.

@@ +159,4 @@
>  
> +    *_retval = ToNewUnicode(outString);
> +    if (!*_retval) 
> +        return NS_ERROR_OUT_OF_MEMORY;

No need to null-check what ToNewUnicode returns anymore thanks to the infallible allocator.
Attachment #8562926 - Flags: review?(hsivonen) → review-
(Assignee)

Comment 18

3 years ago
(In reply to Henri Sivonen (:hsivonen) from comment #17)
> @@ +95,2 @@
> >  
> > +        nsAutoString key(NS_LITERAL_STRING("entity."));
> 
> Please drop the NS_LITERAL_CSTRING wrapped from around the literal.
> @@ +121,5 @@
> >  
> > +    // per character look for the entity
> > +    uint32_t len = NS_strlen(inString);
> > +    for (uint32_t i = 0; i < len; i++) {
> > +        nsAutoString key(NS_LITERAL_STRING("entity."));
> 
> Please drop the NS_LITERAL_CSTRING wrapped from around the literal.

I'm not sure if I understood you correctly but when I remove NS_LITERAL_STRING the build process fails because of invalid conversion from ‘const char*’ to ‘nsAString_internal::char_type {aka char16_t}’: http://pastebin.mozilla.org/8725141
Flags: needinfo?(hsivonen)
(Reporter)

Comment 19

3 years ago
(In reply to Andy Pusch from comment #18)
> (In reply to Henri Sivonen (:hsivonen) from comment #17)
> > @@ +95,2 @@
> > >  
> > > +        nsAutoString key(NS_LITERAL_STRING("entity."));
> > 
> > Please drop the NS_LITERAL_CSTRING wrapped from around the literal.
> > @@ +121,5 @@
> > >  
> > > +    // per character look for the entity
> > > +    uint32_t len = NS_strlen(inString);
> > > +    for (uint32_t i = 0; i < len; i++) {
> > > +        nsAutoString key(NS_LITERAL_STRING("entity."));
> > 
> > Please drop the NS_LITERAL_CSTRING wrapped from around the literal.
> 
> I'm not sure if I understood you correctly but when I remove
> NS_LITERAL_STRING the build process fails because of invalid conversion from
> ‘const char*’ to ‘nsAString_internal::char_type {aka char16_t}’:
> http://pastebin.mozilla.org/8725141

Oh, nevermind then, sorry. I just kept copying and pasting my comment for the nsAutoCString case and didn't notice that this one was an nsAutoString instead.
Flags: needinfo?(hsivonen)
Perhaps MOZ_UTF16 will work.
(Assignee)

Comment 21

3 years ago
Created attachment 8564525 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v2

I've tried to make use of the fact that the available property files are hard-coded by creating a fixed size array for caching. I've moved the cache check into the switch case which now returns the nsIStringBundle*.
Attachment #8562926 - Attachment is obsolete: true
Attachment #8564525 - Flags: review?(hsivonen)
(Reporter)

Comment 22

3 years ago
Comment on attachment 8564525 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v2

Review of attachment 8564525 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you. Looks very good. r+ provided that the comments below are addressed.

::: intl/unicharutil/nsEntityConverter.cpp
@@ +27,5 @@
> +    	        NS_ASSERTION(entitiesArray[kHTML40LATIN1_INDEX], "LoadEntityBundle failed");
> +    	    }
> +            return entitiesArray[kHTML40LATIN1_INDEX].get();
> +    	case 2:
> +        	if (!entitiesArray[kHTML40SYMBOLS_INDEX]) {

A tab among the spaces causes mis-indent.

@@ +33,5 @@
> +    	        NS_ASSERTION(entitiesArray[kHTML40SYMBOLS_INDEX], "LoadEntityBundle failed");
> +    	    }
> +            return entitiesArray[kHTML40SYMBOLS_INDEX].get();
> +    	case 4:
> +        	if (!entitiesArray[kHTML40SPECIAL_INDEX]) {

A tab among the spaces causes mis-indent.

@@ +39,5 @@
> +    	        NS_ASSERTION(entitiesArray[kHTML40SPECIAL_INDEX], "LoadEntityBundle failed");
> +    	    }
> +            return entitiesArray[kHTML40SPECIAL_INDEX].get();
> +    	case 16:
> +        	if (!entitiesArray[kMATHML20_INDEX]) {

A tab among the spaces causes mis-indent.

@@ +45,5 @@
> +    	        NS_ASSERTION(entitiesArray[kMATHML20_INDEX], "LoadEntityBundle failed");
> +    	    }
> +            return entitiesArray[kMATHML20_INDEX].get();
> +    	default:
> +        	return nullptr;

A tab among the spaces causes mis-indent.

@@ +53,5 @@
> +already_AddRefed<nsIStringBundle>
> +nsEntityConverter:: LoadEntityBundle(const char *fileName)
> +{
> +    NS_ENSURE_TRUE(fileName, nullptr);
> +    

Stray whitespace.

::: intl/unicharutil/nsEntityConverter.h
@@ +41,2 @@
>  
> +    nsCOMPtr<nsIStringBundle> entitiesArray[4];

For reference counting to work correctly, please change this to:
nsAutoTArray<nsCOMPtr<nsIStringBundle>, 4> entitiesArray;

@@ +46,5 @@
> +    const char* kMATHML20 = "mathml20.properties";
> +    const int kHTML40LATIN1_INDEX = 0;
> +    const int kHTML40SYMBOLS_INDEX = 1;
> +    const int kHTML40SPECIAL_INDEX = 2;
> +    const int kMATHML20_INDEX = 3;

Please make the above four types const size_t instead of const int. (It won't matter in practice in this particular case, but the general rule is to never use 'int' and to always use size_t, uint32_t, etc. as appropriate.)
Attachment #8564525 - Flags: review?(hsivonen) → review+
(Assignee)

Comment 23

3 years ago
Created attachment 8564719 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v3

I've made the requested adjustments. Sorry for the tabs and whitespaces, I've added a few plugins to gedit now so that won't happen again.
Attachment #8564525 - Attachment is obsolete: true
Attachment #8564719 - Flags: review?(hsivonen)
(Reporter)

Comment 24

3 years ago
Comment on attachment 8564719 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v3

Review of attachment 8564719 [details] [diff] [review]:
-----------------------------------------------------------------

Great. Thanks for fixing this.

Has this patch been through the tryserver?
Attachment #8564719 - Flags: review?(hsivonen) → review+
(Reporter)

Updated

3 years ago
Assignee: nobody → drag
(Assignee)

Comment 25

3 years ago
Not yet, I was waiting for the final r+. Here is the link:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2982e5a58f8b

I've used the settings from comment #8, I hope that's alright.
(Assignee)

Comment 26

3 years ago
Looks good to me, I don't think the failed test is related. What do you think?
Flags: needinfo?(hsivonen)
(Reporter)

Comment 27

3 years ago
(In reply to Andy Pusch [:AndyP] from comment #26)
> Looks good to me, I don't think the failed test is related. What do you
> think?

The failures do look related to me. I think the problem is that 
nsAutoTArray<nsCOMPtr<nsIStringBundle>, 4> entitiesArray;
only ensures that the array can hold 4 items without further heap allocations but doesn't set the logical length to 4.

The problem should go away by adding the call
entitiesArray->SetLength(4);
to the constructor of nsEntityConverter.
Flags: needinfo?(hsivonen)
Comment on attachment 8564719 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v3

Review of attachment 8564719 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/unicharutil/nsEntityConverter.h
@@ +42,2 @@
>  
> +    nsAutoTArray<nsCOMPtr<nsIStringBundle>, 4> entitiesArray;

A drive-by comment: it looks like we're dealing with a fixed set of 4 string bundles here, in which case I don't see any benefit in wrapping these nsCOMPtrs in an array at all.

Why not simply have four individual nsCOMPtr members here, and address them directly by name in the GetVersionBundleInstance switch? That'd result in less overhead and more clarity, IMO.
(Reporter)

Comment 29

3 years ago
(In reply to Jonathan Kew (:jfkthame) from comment #28)
> Why not simply have four individual nsCOMPtr members here, and address them
> directly by name in the GetVersionBundleInstance switch? That'd result in
> less overhead and more clarity, IMO.

Good point. That's a better fix than the one I suggested. AndyP, I suggest doing what Jonathan says.

(Still, https://treeherder.mozilla.org/#/jobs?repo=try&revision=2459ed39d528 tests my hypothesis for the cause of failure.)
(Assignee)

Comment 30

3 years ago
Created attachment 8567510 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v4

I've made the changes as suggested by Jonathan.

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=feb83aab19c3

The test test_bug_427350_1.js fails because the transliteration is gone. Is that okay for now or should I do something about it?
Attachment #8564719 - Attachment is obsolete: true
Attachment #8567510 - Flags: review?(hsivonen)
(Reporter)

Comment 31

3 years ago
Comment on attachment 8567510 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v4

Review of attachment 8567510 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Andy Pusch [:AndyP] from comment #30)
> Created attachment 8567510 [details] [diff] [review]
> bug1003731_transliterationRemoval.diff -v4
> 
> I've made the changes as suggested by Jonathan.
> 
> try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=feb83aab19c3
> 
> The test test_bug_427350_1.js fails because the transliteration is gone. Is
> that okay for now or should I do something about it?

Please remove the test.

r+ provided that the above-mentioned test is removed and the below-mentioned nits are fixed. Thank you!

::: intl/unicharutil/nsEntityConverter.cpp
@@ +23,5 @@
> +    switch(versionNumber){
> +        case 1:
> +            if (!html40latin1_bundle) {
> +                html40latin1_bundle = LoadEntityBundle(kHTML40LATIN1);
> +                NS_ASSERTION(html40latin1_bundle, "LoadEntityBundle failed");

Please use MOZ_ASSERT instead of NS_ASSERTION to make sure that people notice if the loading mechanism starts failing in debug builds. (Same thing for the other three asserts below.

@@ +25,5 @@
> +            if (!html40latin1_bundle) {
> +                html40latin1_bundle = LoadEntityBundle(kHTML40LATIN1);
> +                NS_ASSERTION(html40latin1_bundle, "LoadEntityBundle failed");
> +            }
> +            return html40latin1_bundle.get();

Please omit ".get()", since it is unnecessary. (Same thing thrice below.)

@@ +88,5 @@
>  NS_IMETHODIMP
>  nsEntityConverter::ConvertUTF32ToEntity(uint32_t character, uint32_t entityVersion, char **_retval)
>  {
> +    NS_ASSERTION(_retval, "null ptr- _retval");
> +    if(nullptr == _retval) {

Now that indent is being fixed, please add a space after "if", too.

::: intl/unicharutil/nsEntityConverter.h
@@ +45,5 @@
> +    const char* kMATHML20 = "mathml20.properties";
> +    nsCOMPtr<nsIStringBundle> html40latin1_bundle;
> +    nsCOMPtr<nsIStringBundle> html40symbols_bundle;
> +    nsCOMPtr<nsIStringBundle> html40special_bundle;
> +    nsCOMPtr<nsIStringBundle> mathml20_bundle;

Please name these mHTML40Latin1Bundle, mHTML0SymbolsBundle, mHTML40SpecialBundle and mMathML20Bundle for consistency with the dominant Gecko naming style.
Attachment #8567510 - Flags: review?(hsivonen) → review+
(Reporter)

Comment 32

3 years ago
Comment on attachment 8567510 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v4

(In reply to Henri Sivonen (:hsivonen) from comment #31)
> > +    switch(versionNumber){
> > +        case 1:

Oh, and please use nsIEntityConverter::html40Latin1, etc., instead plain numbers.

Also, let's get a review from emk, too.
Attachment #8567510 - Flags: review?(VYV03354)
nsPrimitiveHelpers::ConvertUnicodeToPlatformPlainText still uses nsIEntityConverter::transliterate.
Depends on: 943296
Comment on attachment 8567510 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v4

Review of attachment 8567510 [details] [diff] [review]:
-----------------------------------------------------------------

I would like to see the fixed patch. r- for now.

::: intl/unicharutil/nsEntityConverter.cpp
@@ +19,5 @@
>  
>  nsIStringBundle*
>  nsEntityConverter:: GetVersionBundleInstance(uint32_t versionNumber)
>  {
> +    switch(versionNumber){

We are changing the entire source tree to 2 spaces indent, not 4.
Attachment #8567510 - Flags: review?(VYV03354) → review-
(Assignee)

Comment 35

3 years ago
Created attachment 8572268 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v5

I've addressed comment #31, comment #32 and comment #34.

I'm not sure what to do about comment #33 though. Does "nsPrimitiveHelpers::ConvertUnicodeToPlatformPlainText" have to be rewritten or should we get rid of it all together once bug #943296 lands?
Attachment #8567510 - Attachment is obsolete: true
Attachment #8572268 - Flags: feedback?(hsivonen)
Attachment #8572268 - Flags: feedback?(VYV03354)
Comment on attachment 8572268 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v5

Review of attachment 8572268 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM with comments addressed.

REgarding comment #33, we will have to fix bug 943296 first before landing this.

::: intl/unicharutil/nsEntityConverter.h
@@ +41,2 @@
>  
> +  const char* kHTML40LATIN1 = "html40Latin1.properties";

I'd prefer |const char kHTML40LATIN1[]| (same for the following three members). kHTML40LATIN1 can change to point another string although the string itself is const.

::: intl/unicharutil/tests/unit/xpcshell.ini
@@ +1,1 @@
>  [DEFAULT]

You should |hg rm| this file itself (and the directory).
Attachment #8572268 - Flags: feedback?(VYV03354) → feedback+
(Reporter)

Comment 37

3 years ago
Comment on attachment 8572268 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v5

(In reply to Andy Pusch [:AndyP] from comment #35)
> Created attachment 8572268 [details] [diff] [review]
> bug1003731_transliterationRemoval.diff -v5
> 
> I've addressed comment #31, comment #32 and comment #34.

Thank you.

> I'm not sure what to do about comment #33 though. Does
> "nsPrimitiveHelpers::ConvertUnicodeToPlatformPlainText" have to be rewritten
> or should we get rid of it all together once bug #943296 lands?

In principle, it would be nice to fix bug 943296 first, but I don't want this patch to rot or to otherwise block this on bug 943296. I suggest just removing the transliteration flag from nsPrimitiveHelpers::ConvertUnicodeToPlatformPlainText() to get this landed, since there seems to be no real functional harm in doing so when we don't truly support non-UTF-8 Gtk platforms anyway anymore. But whether that's OK is really emk's call.
Attachment #8572268 - Flags: feedback?(hsivonen) → feedback+
(Assignee)

Comment 38

3 years ago
(In reply to Masatoshi Kimura [:emk] from comment #36)
> ::: intl/unicharutil/nsEntityConverter.h
> @@ +41,2 @@
> >  
> > +  const char* kHTML40LATIN1 = "html40Latin1.properties";
> 
> I'd prefer |const char kHTML40LATIN1[]| (same for the following three
> members). 

How would that work? I get "error: array used as initializer" when I try to build it.

(In reply to Henri Sivonen (:hsivonen) from comment #37)
> In principle, it would be nice to fix bug 943296 first, but I don't want
> this patch to rot or to otherwise block this on bug 943296. I suggest just
> removing the transliteration flag from
> nsPrimitiveHelpers::ConvertUnicodeToPlatformPlainText() to get this landed,
> since there seems to be no real functional harm in doing so when we don't
> truly support non-UTF-8 Gtk platforms anyway anymore. But whether that's OK
> is really emk's call.

What flag do you mean? I'm not quite sure what exactly to remove in that method.
Generally I could try working on bug #943296 but I'm not exactly sure how much work that'd involve and I'd definitely need a mentor for that.
Flags: needinfo?(hsivonen)
Flags: needinfo?(VYV03354)
Ishikawa-san, are you still using EUC-JP locale? If so, is the change of bug 943296 causes any issues?
Flags: needinfo?(VYV03354) → needinfo?(ishikawa)

Comment 40

3 years ago
(In reply to Masatoshi Kimura [:emk] from comment #39)
> Ishikawa-san, are you still using EUC-JP locale? If so, is the change of bug
> 943296 causes any issues?

I will look into this over the weekend. Oh wait, the particular machine that uses EUC-JP is at the office :-(
So please wait until next Monday (Japan  Standard Time) to confirm what happens on the machine.

This change will affect both TB and FF, correct.
But, to be honest, I am not exactly sure when and how the potential problems appear.
Anyway I will try to see what happens (by looking at local file systems, etc.)

CI
Flags: needinfo?(ishikawa)

Comment 41

3 years ago
I didn't mean to clear the needinfo flag to myself since I am not answering the question for real: setting needinfo flag to myself again.
Flags: needinfo?(ishikawa)
(Reporter)

Comment 42

3 years ago
Comment on attachment 8572268 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v5

(In reply to Andy Pusch [:AndyP] from comment #35)
> Created attachment 8572268 [details] [diff] [review]
> bug1003731_transliterationRemoval.diff -v5
> 
> I've addressed comment #31, comment #32 and comment #34.

Thank you.

> I'm not sure what to do about comment #33 though. Does
> "nsPrimitiveHelpers::ConvertUnicodeToPlatformPlainText" have to be rewritten
> or should we get rid of it all together once bug #943296 lands?

To get this bug fixed ahead of bug 943296 (if emk is OK with fixing this before that one), I meant changing
  rv = converter->Init(platformCharset.get(),
                  nsISaveAsCharset::attr_EntityAfterCharsetConv +
                  nsISaveAsCharset::attr_FallbackQuestionMark,
                  nsIEntityConverter::transliterate);
to
  rv = converter->Init(platformCharset.get(),
                  nsISaveAsCharset::attr_EntityAfterCharsetConv +
                  nsISaveAsCharset::attr_FallbackQuestionMark,
                  0);
in nsPrimitiveHelpers::ConvertUnicodeToPlatformPlainText.

To fix bug 943296, the first step would be replacing calls to nsPrimitiveHelpers::ConvertUnicodeToPlatformPlainText with UTF-16 to UTF-8 conversion and nsPrimitiveHelpers::ConvertPlatformPlainTextToUnicode with UTF-8 to UTF-16 conversion in widget/gtk/nsDragService.cpp. This should be the right thing to do because a) we haven't truly supported configurations that set the *nix locale encoding to something other than UTF-8 for many years, so non-UTF-8 configurations are likely broken in various ways already, and b) other than file system paths, the Gtk API is documented to use UTF-8 regardless of the system locale encoding, so passing data in a non-UTF-8 system locale encoding to the Gtk API seems bogus to begin with.

The next step is trickier: Figuring out what code becomes redundant or dead code in widget/gtk/nsDragService.cpp once UTF-8 is always assumed and cleaning up the code accordingly.
Flags: needinfo?(hsivonen)

Comment 43

2 years ago
(In reply to Masatoshi Kimura [:emk] from comment #39)
> Ishikawa-san, are you still using EUC-JP locale? If so, is the change of bug
> 943296 causes any issues?

Testing with TB has to wait.  I can't test TB with real setup for fear
of messing up live data/folders.

I think this is what I need to do:

I need to create a test account and see what happens if I send e-mails
in which the main text body is encoded in
 - ISO-2022-JP encoding
 - SJIS encoding (for testing purposes only)
 - UTF-8 encoding.
 - any other to test?

Maybe I should also add file attachment, the path name of which is encoded
in
 - EUC (for testing purposes only),
 - SJIS (for testing purposes only),
 - UTF-8,
respectively(?).

Oh, I should definitely attach a PLAIN TEXT file of which content is
encoded in
 - EUC
 - SJIS
 - UTF-8

with the different combination of encodings of the main body of text:
I think I need to do this because I notice a strange behavior of
display of the attached content previously when I *FORWARD* such
e-mails and receive them in thunderbird. Sometimes
the in-line display of such messages got garbled.

Does anyone have additional testing ideas?


Below I found browsing a directory with Japanese path names in SJIS,
and UTF-8 encodings did not produce problems.

Testing: Firefox Browsing of a file system where Japanese path names are used

I tested Firefox nightly (15 March. US-version) against local
directory where Japanese file pathnames have been used over the years.
Filenames encoded in S-JIS, EUC and UTF-8 have been used under the
whole directory.

However, a couple of months ago, I bit the bullet and changed the
EUC-encoded file path names in this particular partition
into UTF-8 to move into the modern era. 
[Today, I just realized I missed pathnames encoded in Shift-JIS
still remain in Shift-JIS and these became good test targets.]
(However, the content of the files still remain in EUC, SJIS, etc.
well.)

I probably have more Japanese file names in EUC and Shift-JIS 
inside many zip archives in the directory, but I dare not to touch them at the
moment :-(

So, basically, problematic encoding of file names that remain in this
whole file system are S-JIS, and really broken encoding that got
corrupted in the last 10 years or so when the programs and underlying
OS messed with proper encodings and left unreadable pathnames [in any
reasonable encoding :-( ]

Here is what I found out with the nightly FF (15 March, US version)
when I access this local file directory.

Nothing really stood out except that automatic recognition of file
name encoding when there is a Shift-JIS-encoded path name or two under a
directory does not seem to work. If I set the encoding to Shift-JIS
manually, the listing becomes OK.
So basically, nothing stood out as broken except for this.

Details:

File name encoded in Shift-JIS is shown garbled
in US version of nightly TB (15 March). Understandable.
But once the encoding is set to Shift-JIS, it is readable.

Note auto detect of file name encoding did not seem
to work.  For directory listing in
file://localhost/...directorypath-in-shift-jis...  I need to explicitly set
Shift-JIS.  This is not a big deal. I can live with it.

BTW, firefox could move into a directory with broken encoded name (4
octet, and not in recognizable encodings of EUC, Shift-JIS, UTF-8,
etc.) and I could list the names of the files below it.

In comparison, Emacs dir mode could not even move into the said
directory (of which path is shown as strange name
mixed with octet numbers).  So in this sense, FF fares better.

Content display is OK, of course:

On the other hand, the recognition of content encoding of text files
work. I could visit a Japanese text file (Shift-JIS, EUC, etc.) and
get proper display (I suppose that FF needs a longish text for correct
code encoding recognition. A single Japanese encoding of a pathname in a directory listing may not
be the successful subject of automatic recognition.)

Really broken subdirectories in my tree:

Symptom: Under some directories created more than 5 years ago, I have
directory names in Shift-JIS and file names in UTF-8 under the SAME
directory. [I think Windows XP created these Shift-JIS directory names
on shared mount, and UTF-8 file names are created by an installation of
linux and subsequently Windows 7 which share the remote mount. But to
be honest, I have no idea. Maybe something else created Shift-JIS
directory names. Or maybe these were created during expanding a zip
archive in which old Macs used SJIS encoding for pathnames, etc. ]

We can read these pathnames correctly by setting the encoding
respectively one by one (while the other file names in the other
encodings are garbled.)  Again, I can live with it. There is nothing a
sane program can do in this situation.

Again, auto detection of character encoding of path name does not
work.
The following observation is not a complaint.
For example, when there are 4 UTF-8 names and one SJIS encoded directory name
under one directory, FF does not prefer one over the other and sticks
to Western encoding and show these all 5 names as garbled.  (Emacs dir
mode uses UTF-8 and so only showed the S-JIS file name as garbled.)
Again, the above is not a complaint.
I might prefer the use of Shift-JIS encoding when there are majority
of pathnames encoded in Shift-JIS, but garbling them all is fine.

Paths in really broken encoding is shown garbled both in FF and emacs
dir mode.  Again, I can live with the current behavior.  I can see
that the encoding is broken (luckily usually broken encoding shows up
as really unreadable character sequence, not a totally different
*readable* character sequence, so we can avoid misunderstanding.)

So as far as navigating directories full of Japanese path names of
different encoding using FF nightly is OK.
(I have no idea if this would be true for Chinese and Korean users
with Chinese and Korean encodings, but I suspect the latest change
only affect Japanese users?)

This leaves the testing using TB by (sending and) receiving e-mails
encoded in Shift-JIS, ISO-2022-JP, UTF-8, etc.

TIA
Flags: needinfo?(ishikawa)
Sorry, I should have explained what should be tested. Also I should have created a test build (because this patch is not landed yet.)
Fixing bug 943296 may break copy&paste and drag&drop on Gtk. That's what I want.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d96cdcf7e580

Comment 46

2 years ago
(In reply to Masatoshi Kimura [:emk] from comment #44)
> Sorry, I should have explained what should be tested. Also I should have
> created a test build (because this patch is not landed yet.)
> Fixing bug 943296 may break copy&paste and drag&drop on Gtk. That's what I
> want.

I see, I misunderstood the original question completely.
I will wait for the proper version to be ready and will test copy&paste and drag&drop.

TIA
Here is the test build:
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/VYV03354@nifty.ne.jp-d96cdcf7e580/try-linux/ (x86)
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/VYV03354@nifty.ne.jp-d96cdcf7e580/try-linux64/ (x86-64)
I build both x86 and x86-64 because I don't know the architecture of your machine.
Please test copy&paste and drag&drop with these builds and compare the result with normal Nightly. If you found no problems, I'll proceed this bug.
Flags: needinfo?(ishikawa)

Comment 48

2 years ago
(In reply to Masatoshi Kimura [:emk] from comment #47)
> Here is the test build:
> https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/VYV03354@nifty.ne.
> jp-d96cdcf7e580/try-linux/ (x86)
> https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/VYV03354@nifty.ne.
> jp-d96cdcf7e580/try-linux64/ (x86-64)
> I build both x86 and x86-64 because I don't know the architecture of your
> machine.
> Please test copy&paste and drag&drop with these builds and compare the
> result with normal Nightly. If you found no problems, I'll proceed this bug.

Thank you.

Mine is a x86-64. I will check it out between the weekend and Monday.
(You may know that this is the end of fiscal year in many organizations in Japan, and
my office is also very busy with the end-of-the-year activities, and I can't even spend a longish amount for testing easily.)
Flags: needinfo?(ishikawa)
I'm fine when you have the time. I just put ni to prevent this from getting lost.
(Reporter)

Comment 50

2 years ago
(In reply to Masatoshi Kimura [:emk] from comment #44)
> Sorry, I should have explained what should be tested.

So the patch in the try build turns off transliteration for the operation where the user drags some non-ASCII text out of Gtk Firefox into some other app that accepts text drag&drop on Linux when the system locale is a non-UTF-8 locale.

What kind of findings would have to be reported for you to conclude that we shouldn't go ahead and remove transliteration in that case? In November 2013, smontagu said "There are probably vestigial remnants of support for other platform charsets still around, but I can testify that any code that I contributed in the last 5 years or more only supports UTF-8 on *nix." (https://groups.google.com/forum/#!msg/mozilla.dev.platform/gJBtRRYF26A/balSpkY0YaQJ)

So what we are worrying about here is about a configuration that hasn't really been a supported configuration for years. There are two alternatives:

 1) The Gtk drag&drop code in the non-UTF-8 locale case is fundamentally broken regardless of translation, because the code would end up converting to EUC-JP (in this case) even though Gtk (according to documentation) wants UTF-8 despite the non-UTF-8 system locale.

 2) The Gtk documentation is wrong and Gecko does the right thing by converting to the encoding of the system locale. However, transliterations for characters that are not representable in EUC-JP (in this case) are lost.

In case #1, transliteration is moot anyway, since the bulk of locale-relevant text (Japanese characters in the case of EUC-JP) breaks anyway.

In case #2, EUC-JP can represent smart quotes, dashes, etc. Transliteration would affect things like the euro sign, Latin letters with diacritics or various numerals that aren't really relevant to either Latin-script text or Japanese text. Do we *really* care about those in a configuration that hasn't really been a supported configuration for years? And why would we suddenly care about drag&drop when the Gtk copy/paste code doesn't support anything but UTF-8 (https://dxr.mozilla.org/mozilla-central/source/widget/gtk/nsClipboard.cpp#522)?

Whether we are dealing with case #1 or #2 is still interesting to know to assess the impact of bug 943296, but I don't see how it makes sense to block this bug on this investigation.
(Reporter)

Comment 51

2 years ago
(In reply to Henri Sivonen (:hsivonen) from comment #50)
> And why would we
> suddenly care about drag&drop when the Gtk copy/paste code doesn't support
> anything but UTF-8
> (https://dxr.mozilla.org/mozilla-central/source/widget/gtk/nsClipboard.
> cpp#522)?

The UTF-8-only approach in Gtk nsClipboard.cpp goes all the way back to the Gtk1 to Gtk2 migration in 2002! https://bugzilla.mozilla.org/show_bug.cgi?id=121248#c5

Comment 52

2 years ago
(In reply to ISHIKAWA, Chiaki from comment #48)
> (In reply to Masatoshi Kimura [:emk] from comment #47)

> > Please test copy&paste and drag&drop with these builds and compare the
> > result with normal Nightly. If you found no problems, I'll proceed this bug.

Setting aside the history and policy decision, I found the following.

Using the supplied firefox binary,
under EUC locale (32-bit Debian GNU/Linux), 

I tested a simple copy&taste of text shown in firefox screen
into a native GTK editor called gedit
It did not seem to break anything special.

Then I thought, maybe I don't have enough different characters and
so I visited the following web page to show
many different Kanji characters in firefox window

http://ash.jp/code/codetbl2.htm

and copy & pasted into gedit and 
saved explicitly in utf-8 for later comparison (operation was done
under ja_JP.EUC-8 native locale).
Later I saved the same text into a file, doing the operation
under ja_JP.UTF-8 locale.

The results match: there is no corruption of characters.
copy&paste from  Firefox to gedit seems to work.

The reverse direction is rather difficult to test.

I probably need a web page that allows me to copy and paste a large
text from an application running on the desktop into the browser
screen (textarea) and then save it using whatever method into
somewhere [remote file maybe?] so that I can check it later with my
local copy of the text with enough different characters the saved
(remote) copy.

Nevertheless, I noticed a start-up error message that is probably
related to this missing encoding issue.

>Full message: TypeError: malformed UTF-8 character sequence at offset 0

Excerpt: Firefox start up warning message.

The gtk warning/error messages are also visible under native
ja_JP.UTF-8 locale, and so can be ignored for this discussion.

However, do note the coding exception due to malformed UTF-8 string, etc.
(This does not get displayed under ja_JP.UTF-8 locale, and so unique to
ja_JP.EUC-JP locale.)

$ env | grep LANG
LANG=ja_JP.EUC-JP
$ ~ishikawa/firefox/firefox  

(process:19116): GLib-CRITICAL **: g_slice_set_config: assertion 'sys_page_size == 0' failed

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-button-images after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GnomeProgram::sm-connect after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GnomeProgram::show-crash-dialog after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GnomeProgram::display after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GnomeProgram::default-icon after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-label-select-on-focus after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-can-change-accels after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-menu-popup-delay after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-menu-popdown-delay after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-menu-bar-popup-delay after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-entry-select-on-focus after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-entry-password-hint-timeout after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-im-preedit-style after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-im-status-style after class was initialised

(firefox:19116): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-menu-images after class was initialised
*************************
A coding exception was thrown and uncaught in a Task.

Full message: TypeError: malformed UTF-8 character sequence at offset 0
Full stack: toString@resource://gre/modules/osfile/osfile_unix_allthreads.jsm:92:12
task_init@resource://gre/components/nsSearchService.js:4862:11
TaskImpl_run@resource://gre/modules/Task.jsm:331:41
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:870:21
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:746:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:688:37

*************************
*************************
A coding exception was thrown and uncaught in a Task.

Full message: TypeError: ex is undefined
Full stack: task@resource://gre/components/nsSearchService.js:4129:1
TaskImpl_run@resource://gre/modules/Task.jsm:331:41
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:870:21
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:746:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:688:37

*************************

(plugin-container:19167): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-button-images after class was initialised

(plugin-container:19167): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-label-select-on-focus after class was initialised

(plugin-container:19167): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-can-change-accels after class was initialised

(plugin-container:19167): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-menu-popup-delay after class was initialised

(plugin-container:19167): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-menu-popdown-delay after class was initialised

(plugin-container:19167): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-menu-bar-popup-delay after class was initialised

(plugin-container:19167): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-entry-select-on-focus after class was initialised

(plugin-container:19167): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-entry-password-hint-timeout after class was initialised

(plugin-container:19167): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-im-preedit-style after class was initialised

(plugin-container:19167): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-im-status-style after class was initialised

(plugin-container:19167): GLib-GObject-WARNING **: Attempt to add property GtkSettings::gtk-menu-images after class was initialised
*************************
A coding exception was thrown in a Promise rejection callback.
See https://developer.mozilla.org/Mozilla/JavaScript_code_modules/Promise.jsm/Promise

Full message: TypeError: malformed UTF-8 character sequence at offset 0
Full stack: toString@resource://gre/modules/osfile/osfile_unix_allthreads.jsm:92:12
_formatError@resource://gre/modules/Log.jsm:133:18
exceptionStr@resource://gre/modules/Log.jsm:157:1
onStatFailure@resource://gre/modules/ProfileAge.jsm:159:44
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:870:21
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:746:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:688:37

*************************

In one instance, the error showed the excerpt of corrupted
string ( in 0xHH hexadecimal octet notation)
Somehow this particular error does not show it.


For my work, I don't use the local search feature (?) [to be honest, I am not sure which "nsSearchService" firefox is complaining about], so there is no serious breakage. 
[UI change in the nightly was more of a useability breakage for me as a *PC* user, and not a phone or PAD user :-( ]

Drag&Drop: Under ja_JP.EUC-JP locale, I listed a directory where I have files with names pathnames consisting of UTF-8 Japanese string (that was created under linux with ja_JP.UTF-8 locale.)
Somehow, desktop's filer application is clever enough to convert these UTF-8 pathanames in a readable manner. Firefox does not show it so cleverly and show it in garbled manner.

So if I drag the pathname and drop it into URL bar, the path name seems to be pasted, but firefox cannot grok it and I see "File not found error.". I remove the final component of the pathname and leave the leading directory part, I get the directory listing where the file pathnames are garbled.

So drag&drop may not work, but this has been often the case with filenames that have incompatible encodings, and not only for FF but for many applications, and it is not a big deal: since the file name is garbled when I list the parent directory in Firefox screen, I know something is amiss with the file encoding and so this should not be a big deal.
 
Agah, well, as I write this message on a Windows7 PC, and running Debian GNU/Linux 32-bit inside VirtualBox, my work environment, firefox nightly crashed:

This is the crashreporter's technical detail: (If this reaches mozilla, maybe someone can figure out what went wrong.)

AbortMessage: [Parent 6980] ###!!! ABORT: line 0

Add-ons: %7B972ce4c6-7e08-4474-a285-3208198ce6fd%7D:39.0a1

AsyncShutdownTimeout: {"phase":"Metrics Storage Backend","conditions":[{"name":"FHR: Flushing storage shutdown","state":{"shutdownInitiated":false,"initialized":false,"shutdownRequested":true,"initializeHadError":false,"providerManagerInProgress":true,"storageInProgress":false,"hasProviderManager":true,"hasStorage":true,"shutdownComplete":false,"currentProviderInShutdown":null,"currentProviderInInit":"SearchesProvider","currentProviderInCollect":null},"filename":"resource://gre/modules/HealthReport.jsm","lineNumber":4356,"stack":["resource://gre/modules/HealthReport.jsm:AbstractHealthReporter.prototype<.init/<:4356","self-hosted:next:620",""]}]}

BuildID: 20150325115608

Comments: trying firefox in a ja_JP.EUC-JP locale to see if copy&paste and drag&drop works.

https://bugzilla.mozilla.org/show_bug.cgi?id=1003731

CrashTime: 1427349248

DOMIPCEnabled: 1

EMCheckCompatibility: true

Email: ishikawa@yk.rim.or.jp

FramePoisonBase: 00000000f0dea000

FramePoisonSize: 4096

InstallTime: 1427349148

Notes: OpenGL: Humper -- Chromium -- 2.1 Chromium 1.9 -- texture_from_pixmap

xpcom_runtime_abort([Parent 6980] ###!!! ABORT: line 0)

ProductID: {ec8030f7-c20a-464f-9b0e-13a3a9e97384}

ProductName: Firefox

ReleaseChannel: nightly

ShutdownProgress: profile-before-change

StartupTime: 1427349148

Theme: classic/1.0

Throttleable: 1

Vendor: Mozilla

Version: 39.0a1

useragent_locale: en-US



This report also contains technical information about the state of the application when it crashed. 

---

I hope this makes mozilla firefox (and by extension TB) more robust in a strange environment just in case
some users bump into such environments.

TIA

TIA
(Reporter)

Comment 53

2 years ago
(In reply to ISHIKAWA, Chiaki from comment #52)
> I tested a simple copy&taste of text shown in firefox screen
> into a native GTK editor called gedit
> It did not seem to break anything special.
> 
> Then I thought, maybe I don't have enough different characters and
> so I visited the following web page to show
> many different Kanji characters in firefox window
> 
> http://ash.jp/code/codetbl2.htm
> 
> and copy & pasted into gedit and 
> saved explicitly in utf-8 for later comparison (operation was done
> under ja_JP.EUC-8 native locale).
> Later I saved the same text into a file, doing the operation
> under ja_JP.UTF-8 locale.
> 
> The results match: there is no corruption of characters.
> copy&paste from  Firefox to gedit seems to work.

This strongly indicates that the right way to deal with data transfer in Gtk2 is the way that nsClipboard.cpp deals with it: Using UTF-8 between Gecko and Gtk2 regardless of locale. (This is also what the Gtk2 documentation says.)

> >Full message: TypeError: malformed UTF-8 character sequence at offset 0
> 
> Excerpt: Firefox start up warning message.
> 
> The gtk warning/error messages are also visible under native
> ja_JP.UTF-8 locale, and so can be ignored for this discussion.
> 
> However, do note the coding exception due to malformed UTF-8 string, etc.
> (This does not get displayed under ja_JP.UTF-8 locale, and so unique to
> ja_JP.EUC-JP locale.)
> 
> $ env | grep LANG
> LANG=ja_JP.EUC-JP
> $ ~ishikawa/firefox/firefox  
...
> A coding exception was thrown and uncaught in a Task.
> 
> Full message: TypeError: malformed UTF-8 character sequence at offset 0
> Full stack:
> toString@resource://gre/modules/osfile/osfile_unix_allthreads.jsm:92:12
> task_init@resource://gre/components/nsSearchService.js:4862:11
> TaskImpl_run@resource://gre/modules/Task.jsm:331:41
> Handler.prototype.process@resource://gre/modules/Promise.jsm ->
> resource://gre/modules/Promise-backend.js:870:21
> this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm ->
> resource://gre/modules/Promise-backend.js:746:7
> this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm
> -> resource://gre/modules/Promise-backend.js:688:37

It appears that OS.File on "unix" platforms simply assumes that file paths are UTF-8. When some EUC-JP bytes that are not valid UTF-8 bytes appear in a path, the conversion from UTF-8 to UTF-16 fails. This is bug 978056. This doesn't affect the parts of Firefox that don't use the OS.File API to access files, which explains why Firefox doesn't break totally.

> So if I drag the pathname and drop it into URL bar, the path name seems to
> be pasted, but firefox cannot grok it and I see "File not found error.". I
> remove the final component of the pathname and leave the leading directory
> part, I get the directory listing where the file pathnames are garbled.

The way to test text drag&drop would be writing some Japanese characters in the URL bar or in the search bar and dragging and dropping them to gedit.

> So drag&drop may not work, but this has been often the case with filenames
> that have incompatible encodings, and not only for FF but for many
> applications, and it is not a big deal: since the file name is garbled when
> I list the parent directory in Firefox screen, I know something is amiss
> with the file encoding and so this should not be a big deal.

The presence of transliteration certainly won't help when the text is a file path.

Thank you.

--

I think we should proceed with landing the patch here with nsIEntityConverter::transliterate changed to 0 in nsPrimitiveHelpers::ConvertUnicodeToPlatformPlainText.
(Reporter)

Comment 54

2 years ago
(In reply to Henri Sivonen (:hsivonen) from comment #53)
> I think we should proceed with landing the patch here with
> nsIEntityConverter::transliterate changed to 0 in
> nsPrimitiveHelpers::ConvertUnicodeToPlatformPlainText.

emk, is there anything blocking the above course of action? It's sad if this patch rots.
Flags: needinfo?(VYV03354)
OK, go for it.
Flags: needinfo?(VYV03354)
(Assignee)

Comment 56

2 years ago
Created attachment 8595327 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v6

For some reason I cannot build it anymore, this is the error I get:
https://pastebin.mozilla.org/8830826

Do you have an idea why MOZ_OVERRIDE does not work? I've tried including "mozilla/Attributes.h" but that didn't help.
Attachment #8477908 - Attachment is obsolete: true
Attachment #8485455 - Attachment is obsolete: true
Attachment #8572268 - Attachment is obsolete: true
Attachment #8595327 - Flags: feedback?(hsivonen)
(In reply to Andy Pusch [:AndyP] from comment #56)
> Do you have an idea why MOZ_OVERRIDE does not work? I've tried including
> "mozilla/Attributes.h" but that didn't help.

MOZ_OVERRIDE is dead; use the standard 'override' keyword instead. (See bug 1145631.)
(Assignee)

Updated

2 years ago
Attachment #8595327 - Flags: feedback?(hsivonen)
(Assignee)

Comment 58

2 years ago
Created attachment 8595781 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v7

I've replaced MOZ_OVERRIDE with 'override'. Thanks for the info Jonathan.

try submission: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e07b0abd58f8
Attachment #8595327 - Attachment is obsolete: true
Attachment #8595781 - Flags: review?(hsivonen)
(Reporter)

Comment 59

2 years ago
Comment on attachment 8595781 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v7

Thanks for your patience.

emk, do you want to review as module peer or is my review enough for landing this?
Attachment #8595781 - Flags: review?(hsivonen)
Attachment #8595781 - Flags: review?(VYV03354)
Attachment #8595781 - Flags: review+
Comment on attachment 8595781 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v7

I'll trust your review.
Attachment #8595781 - Flags: review?(VYV03354)
Comment on attachment 8595781 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v7

Forgot to change the flag.
Attachment #8595781 - Flags: review+
(Reporter)

Comment 62

2 years ago
Comment on attachment 8595781 [details] [diff] [review]
bug1003731_transliterationRemoval.diff -v7

(In reply to Masatoshi Kimura [:emk] from comment #60)
> Comment on attachment 8595781 [details] [diff] [review]
> bug1003731_transliterationRemoval.diff -v7
> 
> I'll trust your review.

Thanks. Let's get this landed.
Attachment #8595781 - Flags: review+
(Reporter)

Updated

2 years ago
Keywords: checkin-needed

Comment 63

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/b42e82ef33d9
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/b42e82ef33d9
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox40: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
(Reporter)

Comment 65

2 years ago
Andy, thank you for fixing this and thank you for your patience during the process.
(Assignee)

Comment 66

2 years ago
(In reply to Henri Sivonen (:hsivonen) from comment #65)
> Andy, thank you for fixing this and thank you for your patience during the
> process.

No problem, I'm glad I could help. Thanks for your support. :)
You need to log in before you can comment on or make changes to this bug.