Bugzilla

Reporter

Comment 2

•

11 years ago

(In reply to Jason Smith [:jsmith] from comment #1)
> Steve - Why is this critically needed for 1.1? What's the user impact if
> this is not fixed?

Sorry, it just a improvement for 1.3, not a must have fixing for 1.1, thanks.

blocking-b2g: leo? → 1.3?

status-b2g-v1.1hd: --- → fixed

Flags: needinfo?(schung)

Reporter

Updated

•

11 years ago

status-b2g-v1.1hd: fixed → ---

Updated

•

11 years ago

Whiteboard: [mentor=:julienw]

Updated

•

11 years ago

Whiteboard: [mentor=:julienw] → [mentor=:steveck]

Updated

•

11 years ago

Assignee: nobody → achyland

Comment 3

•

11 years ago

Attached file github pull request — Details

Attachment #821076 - Flags: review?(waldron.rick)

Vicamo Yang [:vicamo][:vyang]

Reporter

Comment 4

•

11 years ago

Just tested the patch but it doesn't always work... Here is my test result:
1) Change the audio file "20sec_test_audio.amr" to "20秒測試音效.amr", it will be failed with internal error
2) Reduce the file name length to "20秒音效.amr", message sent success,

The string "20秒測試音效" will be encoded to string "20%E7%A7%92%E6%B8%AC%E8%A9%A6%E9%9F%B3%E6%95%88", Need more information from gecko devs: Could that be the header length limitation?

Flags: needinfo?(vyang)

Comment 5

•

11 years ago

(In reply to Steve Chung [:steveck] from comment #4)
> Just tested the patch but it doesn't always work... Here is my test result:
> 1) Change the audio file "20sec_test_audio.amr" to "20秒測試音效.amr", it will be
> failed with internal error
> 2) Reduce the file name length to "20秒音效.amr", message sent success,
> 
> The string "20秒測試音效" will be encoded to string
> "20%E7%A7%92%E6%B8%AC%E8%A9%A6%E9%9F%B3%E6%95%88", Need more information
> from gecko devs: Could that be the header length limitation?

Yes, that's 47 chars and per part name has to be less-equal than 40 chars.  '%' is part of valid Token chars.

Flags: needinfo?(vyang)

Comment 6

•

11 years ago

Thanks. I'll update the PR to address this.
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #5)
> (In reply to Steve Chung [:steveck] from comment #4)
> > Just tested the patch but it doesn't always work... Here is my test result:
> > 1) Change the audio file "20sec_test_audio.amr" to "20秒測試音效.amr", it will be
> > failed with internal error
> > 2) Reduce the file name length to "20秒音效.amr", message sent success,
> > 
> > The string "20秒測試音效" will be encoded to string
> > "20%E7%A7%92%E6%B8%AC%E8%A9%A6%E9%9F%B3%E6%95%88", Need more information
> > from gecko devs: Could that be the header length limitation?
> 
> Yes, that's 47 chars and per part name has to be less-equal than 40 chars. 
> '%' is part of valid Token chars.

Reporter

Comment 7

•

11 years ago

It's 2 different issue now and it would be better to wait for ux decision before implemetation.

Hi Ayman, we got 2 issues now, and I think filename lenggth limitation would be more critical. Since mms conformance set a filename length to 40 chars, user sending a mms with attachment filename over 40 chars will get "service currently unavailable" error dialog, which will make user confused because it's not the correct message. I'll ceate another bug for this issue and it should nominate koi? because this will block user sending the mms.

Another issue is unsupport char replacement. We already replace the char with '#' for the filename. The original idea for refinement is we could reuse the text_normalize library to convert a string of characters to its ASCII equivalent. For example, á,ă,ǎ,â... => a , ḃ,ḅ,ɓ,ḇ...=> b and ect, and the filename with these chars would be benifit from this feature.
This patch applied encodeURIComponent to escape UTF-8 encoding chars might be a generic solution and providing a chance to resume with decode api, but sadly, other mobile OS message app doesn't support this ability(at least my android phone won't handle this). The filename displayed is still unreadable code within the patch, and inceasing the possibility that makes the string over 40 chars. Ayman, do you think it's necessary to escape UTF-8 encoding chars like připevnění ('attachment' in čeština) :
Current result : p#ipevn#n# (replace with #)
This patch: p%C5%99ipevn%C4%9Bn%C3%AD (escape UTF-8 encoding chars)
The result I prefer: pripevnenl (text normalize)

Flags: needinfo?(aymanmaat)

Reporter

Comment 8

•

11 years ago

Hi Adam, I've ceate bug 930882 for filename length issue, you could take it if you already work on this issue.

Comment 9

•

11 years ago

The updated patch is a substantial refinement (and will help us with bug 930882). Basically we percent encode, where that isn't possible (due to file name length), names are normalized and then the remaining non-valid characters are converted to '#' (basically falling back to the current solution). Decoding is the reverse process and with the exception of cases where the filename string has percent encoded escape characters (see attachement_test.js) is invertible. 

I prefer to percent encode where possible (especially now that names are properly decoded) before normalizing because at least percent encoding gives us a means to transmit non-latin variants.

(In reply to Steve Chung [:steveck] from comment #7)
> It's 2 different issue now and it would be better to wait for ux decision
> before implemetation.
> 
> Hi Ayman, we got 2 issues now, and I think filename lenggth limitation would
> be more critical. Since mms conformance set a filename length to 40 chars,
> user sending a mms with attachment filename over 40 chars will get "service
> currently unavailable" error dialog, which will make user confused because
> it's not the correct message. I'll ceate another bug for this issue and it
> should nominate koi? because this will block user sending the mms.
> 
> Another issue is unsupport char replacement. We already replace the char
> with '#' for the filename. The original idea for refinement is we could
> reuse the text_normalize library to convert a string of characters to its
> ASCII equivalent. For example, á,ă,ǎ,â... => a , ḃ,ḅ,ɓ,ḇ...=> b and ect, and
> the filename with these chars would be benifit from this feature. 
> This patch applied encodeURIComponent to escape UTF-8 encoding chars might
> be a generic solution and providing a chance to resume with decode api, but
> sadly, other mobile OS message app doesn't support this ability(at least my
> android phone won't handle this). The filename displayed is still unreadable
> code within the patch, and inceasing the possibility that makes the string
> over 40 chars. Ayman, do you think it's necessary to escape UTF-8 encoding
> chars like připevnění ('attachment' in čeština) :
> Current result : p#ipevn#n# (replace with #)
> This patch: p%C5%99ipevn%C4%9Bn%C3%AD (escape UTF-8 encoding chars)
> The result I prefer: pripevnenl (text normalize)

Flags: needinfo?(schung)

Reporter

Comment 10

•

11 years ago

Thanks for the update, this patch make more sence especially you utilized the text normalizing/strip and decoding to reverse the string. The only concern here is filename display on other mobile os, they don't have the decoding process to reverse our encoded string. I will let Ayman to judge whether we should apply this naming mechanim, but anyway I agree non-latin language could get benifit from this patch.

Since you may send another patch for bug 930882, I would prefer we could fix bug 930882 first and apply this patch later to avoid possible conflict becase bug 930882 might be 1.2 blocker but this one might not.

Flags: needinfo?(schung)

Comment 11

•

11 years ago

Hey Steve, since you're probably well located for this, could you see how this works on other mobile OS for such attachments ?

Comment 12

•

11 years ago

I have to see how best to generate that error message but I don't suspect these two should conflict (since the final check has to be against the unique location). 
(In reply to Steve Chung [:steveck] from comment #10)
> Since you may send another patch for bug 930882, I would prefer we could fix
> bug 930882 first and apply this patch later to avoid possible conflict
> becase bug 930882 might be 1.2 blocker but this one might not.

This is true, but the current behavior and in some cases (especially for non-latin characters) the behavior if we just normalized would be worse. When bug 918987 lands we can use String.prototype.normalize, which will be better standardized. 

>The only concern here is filename display on other mobile os, they don't have the decoding process to reverse our encoded string.

ayman maat :maat

Comment 13

•

11 years ago

(In reply to Steve Chung [:steveck] from comment #7)
> It's 2 different issue now and it would be better to wait for ux decision
> before implemetation.
> 
> Hi Ayman, we got 2 issues now, and I think filename lenggth limitation would
> be more critical. Since mms conformance set a filename length to 40 chars,
> user sending a mms with attachment filename over 40 chars will get "service
> currently unavailable" error dialog, which will make user confused because
> it's not the correct message. I'll ceate another bug for this issue and it
> should nominate koi? because this will block user sending the mms.

I think we should be handling file name length at the point of attachment, not at the point of send. but i have commented more on the bug you raised so we can discuss this there. 

> Another issue is unsupport char replacement. We already replace the char
> with '#' for the filename. The original idea for refinement is we could
> reuse the text_normalize library to convert a string of characters to its
> ASCII equivalent. For example, á,ă,ǎ,â... => a , ḃ,ḅ,ɓ,ḇ...=> b and ect, and
> the filename with these chars would be benifit from this feature. 
> This patch applied encodeURIComponent to escape UTF-8 encoding chars might
> be a generic solution and providing a chance to resume with decode api, but
> sadly, other mobile OS message app doesn't support this ability(at least my
> android phone won't handle this). The filename displayed is still unreadable
> code within the patch, and inceasing the possibility that makes the string
> over 40 chars. Ayman, do you think it's necessary to escape UTF-8 encoding
> chars like připevnění ('attachment' in čeština) :
> Current result : p#ipevn#n# (replace with #)
> This patch: p%C5%99ipevn%C4%9Bn%C3%AD (escape UTF-8 encoding chars)
> The result I prefer: pripevnenl (text normalize)

Ok. well the patch renders the file name uninterpretable and therefore unreadable …therefore turns it from information to data, which is not the path we should be following.. 

I would much more prefer to see the result you recommend Steve of text normalisation. (připevnění -> pripevnenl) …At least this way the file name is still interpretable and therefore readable. So could we go with that please.

Flags: needinfo?(aymanmaat)

Joe Cheng [:jcheng] (please needinfo)

Comment 14

•

11 years ago

Ok. Just so we're clear, both the current implementation and an implementation which would include the normalization library will still replace non-latin characters with hashmarks. So připevnění -> pripevnenl but 새끼고양이 -> #####. 

I can also refactor this so the normalization is the first step falling back to encoding for the remaining characters. 

(In reply to ayman maat :maat from comment #13)
> 
> Ok. well the patch renders the file name uninterpretable and therefore
> unreadable …therefore turns it from information to data, which is not the
> path we should be following.. 
> 
> I would much more prefer to see the result you recommend Steve of text
> normalisation. (připevnění -> pripevnenl) …At least this way the file name
> is still interpretable and therefore readable. So could we go with that
> please.

Comment 15

•

11 years ago

triage: would not block release. please land it when its ready. thanks

blocking-b2g: 1.3? → -

Comment 17

•

11 years ago

Adam, I just duped bug 935686 to this bug. Could you take care of not converting space characters to '#' in this patch as well? Thanks!

Nobody; OK to take it and work on it

Comment 18

•

10 years ago

Reseting assignee as Adam is not working anymore on this.

Assignee: achyland → nobody

Assignee

Updated

•

10 years ago

Mentor: schung

Whiteboard: [mentor=:steveck]

Updated

•

10 years ago

Whiteboard: [lang=js]

Comment 19

•

8 years ago

Steve described properly what we want in comment 7: reuse /shared/js/text_normalizer.js to normalize extended characters to ascii characters.

This is still valid and shouldn't be hard for a contributor ! :)

Comment 20

•

7 years ago

Mass closing of Gaia::SMS bugs. End of an era :(

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WONTFIX