Closed Bug 919980 Opened 11 years ago Closed 7 years ago

[B2G][messages] mms attachment file naming mechanism refinement for Latin language

Categories

(Firefox OS Graveyard :: Gaia::SMS, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:-)

RESOLVED WONTFIX
blocking-b2g -

People

(Reporter: steveck, Unassigned, Mentored)

References

Details

(Whiteboard: [lang=js])

Attachments

(1 file)

46 bytes, text/x-github-pull-request
protonk
: review?
rwaldron
Details | Review
It's follow up bug from #909622. Please ref bug 909622 comment 23. We can increasing the string readability for some other languages by reusing the text_normalize library.
Steve - Why is this critically needed for 1.1? What's the user impact if this is not fixed?
Flags: needinfo?(schung)
(In reply to Jason Smith [:jsmith] from comment #1)
> Steve - Why is this critically needed for 1.1? What's the user impact if
> this is not fixed?

Sorry, it just a improvement for 1.3, not a must have fixing for 1.1, thanks.
blocking-b2g: leo? → 1.3?
Flags: needinfo?(schung)
Whiteboard: [mentor=:julienw]
Whiteboard: [mentor=:julienw] → [mentor=:steveck]
Assignee: nobody → achyland
Attached file github pull request
Attachment #821076 - Flags: review?(waldron.rick)
Just tested the patch but it doesn't always work... Here is my test result:
1) Change the audio file "20sec_test_audio.amr" to "20秒測試音效.amr", it will be failed with internal error
2) Reduce the file name length to "20秒音效.amr", message sent success,

The string "20秒測試音效" will be encoded to string "20%E7%A7%92%E6%B8%AC%E8%A9%A6%E9%9F%B3%E6%95%88", Need more information from gecko devs: Could that be the header length limitation?
Flags: needinfo?(vyang)
(In reply to Steve Chung [:steveck] from comment #4)
> Just tested the patch but it doesn't always work... Here is my test result:
> 1) Change the audio file "20sec_test_audio.amr" to "20秒測試音效.amr", it will be
> failed with internal error
> 2) Reduce the file name length to "20秒音效.amr", message sent success,
> 
> The string "20秒測試音效" will be encoded to string
> "20%E7%A7%92%E6%B8%AC%E8%A9%A6%E9%9F%B3%E6%95%88", Need more information
> from gecko devs: Could that be the header length limitation?

Yes, that's 47 chars and per part name has to be less-equal than 40 chars.  '%' is part of valid Token chars.
Flags: needinfo?(vyang)
Thanks. I'll update the PR to address this.
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #5)
> (In reply to Steve Chung [:steveck] from comment #4)
> > Just tested the patch but it doesn't always work... Here is my test result:
> > 1) Change the audio file "20sec_test_audio.amr" to "20秒測試音效.amr", it will be
> > failed with internal error
> > 2) Reduce the file name length to "20秒音效.amr", message sent success,
> > 
> > The string "20秒測試音效" will be encoded to string
> > "20%E7%A7%92%E6%B8%AC%E8%A9%A6%E9%9F%B3%E6%95%88", Need more information
> > from gecko devs: Could that be the header length limitation?
> 
> Yes, that's 47 chars and per part name has to be less-equal than 40 chars. 
> '%' is part of valid Token chars.
It's 2 different issue now and it would be better to wait for ux decision before implemetation.

Hi Ayman, we got 2 issues now, and I think filename lenggth limitation would be more critical. Since mms conformance set a filename length to 40 chars, user sending a mms with attachment filename over 40 chars will get "service currently unavailable" error dialog, which will make user confused because it's not the correct message. I'll ceate another bug for this issue and it should nominate koi? because this will block user sending the mms.

Another issue is unsupport char replacement. We already replace the char with '#' for the filename. The original idea for refinement is we could reuse the text_normalize library to convert a string of characters to its ASCII equivalent. For example, á,ă,ǎ,â... => a , ḃ,ḅ,ɓ,ḇ...=> b and ect, and the filename with these chars would be benifit from this feature. 
This patch applied encodeURIComponent to escape UTF-8 encoding chars might be a generic solution and providing a chance to resume with decode api, but sadly, other mobile OS message app doesn't support this ability(at least my android phone won't handle this). The filename displayed is still unreadable code within the patch, and inceasing the possibility that makes the string over 40 chars. Ayman, do you think it's necessary to escape UTF-8 encoding chars like připevnění ('attachment' in čeština) :
Current result : p#ipevn#n# (replace with #)
This patch: p%C5%99ipevn%C4%9Bn%C3%AD (escape UTF-8 encoding chars)
The result I prefer: pripevnenl (text normalize)
Flags: needinfo?(aymanmaat)
Hi Adam, I've ceate bug 930882 for filename length issue, you could take it if you already work on this issue.
The updated patch is a substantial refinement (and will help us with bug 930882). Basically we percent encode, where that isn't possible (due to file name length), names are normalized and then the remaining non-valid characters are converted to '#' (basically falling back to the current solution). Decoding is the reverse process and with the exception of cases where the filename string has percent encoded escape characters (see attachement_test.js) is invertible. 

I prefer to percent encode where possible (especially now that names are properly decoded) before normalizing because at least percent encoding gives us a means to transmit non-latin variants.

(In reply to Steve Chung [:steveck] from comment #7)
> It's 2 different issue now and it would be better to wait for ux decision
> before implemetation.
> 
> Hi Ayman, we got 2 issues now, and I think filename lenggth limitation would
> be more critical. Since mms conformance set a filename length to 40 chars,
> user sending a mms with attachment filename over 40 chars will get "service
> currently unavailable" error dialog, which will make user confused because
> it's not the correct message. I'll ceate another bug for this issue and it
> should nominate koi? because this will block user sending the mms.
> 
> Another issue is unsupport char replacement. We already replace the char
> with '#' for the filename. The original idea for refinement is we could
> reuse the text_normalize library to convert a string of characters to its
> ASCII equivalent. For example, á,ă,ǎ,â... => a , ḃ,ḅ,ɓ,ḇ...=> b and ect, and
> the filename with these chars would be benifit from this feature. 
> This patch applied encodeURIComponent to escape UTF-8 encoding chars might
> be a generic solution and providing a chance to resume with decode api, but
> sadly, other mobile OS message app doesn't support this ability(at least my
> android phone won't handle this). The filename displayed is still unreadable
> code within the patch, and inceasing the possibility that makes the string
> over 40 chars. Ayman, do you think it's necessary to escape UTF-8 encoding
> chars like připevnění ('attachment' in čeština) :
> Current result : p#ipevn#n# (replace with #)
> This patch: p%C5%99ipevn%C4%9Bn%C3%AD (escape UTF-8 encoding chars)
> The result I prefer: pripevnenl (text normalize)
Flags: needinfo?(schung)
Thanks for the update, this patch make more sence especially you utilized the text normalizing/strip and decoding to reverse the string. The only concern here is filename display on other mobile os, they don't have the decoding process to reverse our encoded string. I will let Ayman to judge whether we should apply this naming mechanim, but anyway I agree non-latin language could get benifit from this patch.

Since you may send another patch for bug 930882, I would prefer we could fix bug 930882 first and apply this patch later to avoid possible conflict becase bug 930882 might be 1.2 blocker but this one might not.
Flags: needinfo?(schung)
Hey Steve, since you're probably well located for this, could you see how this works on other mobile OS for such attachments ?
I have to see how best to generate that error message but I don't suspect these two should conflict (since the final check has to be against the unique location). 
(In reply to Steve Chung [:steveck] from comment #10)
> Since you may send another patch for bug 930882, I would prefer we could fix
> bug 930882 first and apply this patch later to avoid possible conflict
> becase bug 930882 might be 1.2 blocker but this one might not.

This is true, but the current behavior and in some cases (especially for non-latin characters) the behavior if we just normalized would be worse. When bug 918987 lands we can use String.prototype.normalize, which will be better standardized. 

>The only concern here is filename display on other mobile os, they don't have the decoding process to reverse our encoded string.
(In reply to Steve Chung [:steveck] from comment #7)
> It's 2 different issue now and it would be better to wait for ux decision
> before implemetation.
> 
> Hi Ayman, we got 2 issues now, and I think filename lenggth limitation would
> be more critical. Since mms conformance set a filename length to 40 chars,
> user sending a mms with attachment filename over 40 chars will get "service
> currently unavailable" error dialog, which will make user confused because
> it's not the correct message. I'll ceate another bug for this issue and it
> should nominate koi? because this will block user sending the mms.

I think we should be handling file name length at the point of attachment, not at the point of send. but i have commented more on the bug you raised so we can discuss this there. 

> Another issue is unsupport char replacement. We already replace the char
> with '#' for the filename. The original idea for refinement is we could
> reuse the text_normalize library to convert a string of characters to its
> ASCII equivalent. For example, á,ă,ǎ,â... => a , ḃ,ḅ,ɓ,ḇ...=> b and ect, and
> the filename with these chars would be benifit from this feature. 
> This patch applied encodeURIComponent to escape UTF-8 encoding chars might
> be a generic solution and providing a chance to resume with decode api, but
> sadly, other mobile OS message app doesn't support this ability(at least my
> android phone won't handle this). The filename displayed is still unreadable
> code within the patch, and inceasing the possibility that makes the string
> over 40 chars. Ayman, do you think it's necessary to escape UTF-8 encoding
> chars like připevnění ('attachment' in čeština) :
> Current result : p#ipevn#n# (replace with #)
> This patch: p%C5%99ipevn%C4%9Bn%C3%AD (escape UTF-8 encoding chars)
> The result I prefer: pripevnenl (text normalize)

Ok. well the patch renders the file name uninterpretable and therefore unreadable …therefore turns it from information to data, which is not the path we should be following.. 

I would much more prefer to see the result you recommend Steve of text normalisation. (připevnění -> pripevnenl) …At least this way the file name is still interpretable and therefore readable. So could we go with that please.
Flags: needinfo?(aymanmaat)
Ok. Just so we're clear, both the current implementation and an implementation which would include the normalization library will still replace non-latin characters with hashmarks. So připevnění -> pripevnenl but 새끼고양이 -> #####. 

I can also refactor this so the normalization is the first step falling back to encoding for the remaining characters. 

(In reply to ayman maat :maat from comment #13)
> 
> Ok. well the patch renders the file name uninterpretable and therefore
> unreadable …therefore turns it from information to data, which is not the
> path we should be following.. 
> 
> I would much more prefer to see the result you recommend Steve of text
> normalisation. (připevnění -> pripevnenl) …At least this way the file name
> is still interpretable and therefore readable. So could we go with that
> please.
triage: would not block release. please land it when its ready. thanks
blocking-b2g: 1.3? → -
Adam, I just duped bug 935686 to this bug. Could you take care of not converting space characters to '#' in this patch as well? Thanks!
Reseting assignee as Adam is not working anymore on this.
Assignee: achyland → nobody
Mentor: schung
Whiteboard: [mentor=:steveck]
Whiteboard: [lang=js]
Steve described properly what we want in comment 7: reuse /shared/js/text_normalizer.js to normalize extended characters to ascii characters.

This is still valid and shouldn't be hard for a contributor ! :)
Mass closing of Gaia::SMS bugs. End of an era :(
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Mass closing of Gaia::SMS bugs. End of an era :(
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: