Closed Bug 1492938 Opened 6 years ago Closed 6 years ago

Proxy autoconfig scripts should be loaded as UTF-8 if they are valid UTF-8, otherwise as Latin-1 (a byte is a code point)

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

firefox64

---

affected

People

(Reporter: Waldo, Assigned: Waldo)

References

(
URL
)

Details

(Whiteboard: [necko-triaged])

Attachments

(2 files, 1 obsolete file)

Move PAC script compilation (currently as Latin-1 only) into a lambda for readability 6 years ago Jeff Walden [:Waldo] 1.41 KB, patch	bagder : review+	Details \| Diff \| Splinter Review
Attempt compiling PAC scripts as UTF-8 first if they're valid UTF-8, and only if that fails inflate to UTF-16 and compile that 6 years ago Jeff Walden [:Waldo] 10.35 KB, patch	bagder : review+	Details \| Diff \| Splinter Review
Take two, with less/less-verbose renaming? 6 years ago Jeff Walden [:Waldo] 9.54 KB, patch	bagder : review+	Details \| Diff \| Splinter Review

Jeff Walden [:Waldo]

Assignee

Description

•

6 years ago

PAC scripts are currently compiled as Latin-1 text -- a byte equals that code point. In the modern world of emojis that I am sure are now required to be used in every PAC script in existence and constitute critical functionality therein, this really ought to be UTF-8. bz suspects we may not be able to Just Change The Encoding here, and we might need to try to first parse as UTF-8, then if that fails fall back to Latin-1. (Or more precisely, we ought to attempt a UTF-8 to UTF-16 conversion, then if that fails do a Latin-1 to UTF-16 conversion, then call a pure UTF-16 version. Or we could attempt UTF-8 validation and if it passes use the UTF-8 API, otherwise translate Latin-1 to UTF-8. But this latter tack is going to be slower until we have proper UTF-8 compilation support -- coming soon.) Someone should implement such conversion-attempt code in PAC code. *touches nose*

Honza Bambas (:mayhemer)

Updated

•

6 years ago

Priority: -- → P3

Whiteboard: [necko-triaged]

Jeff Walden [:Waldo]

Assignee

Comment 1

•

6 years ago

Attached patch Move PAC script compilation (currently as Latin-1 only) into a lambda for readability — Details — Splinter Review

...said readability not being necessary now, but being quite necessary after the next patch.

Attachment #9029467 - Flags: review?(daniel)

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Assignee: nobody → jwalden+bmo

Status: NEW → ASSIGNED

Jeff Walden [:Waldo]

Assignee

Comment 2

•

6 years ago

Attached patch Attempt compiling PAC scripts as UTF-8 first if they're valid UTF-8, and only if that fails inflate to UTF-16 and compile that (obsolete) — Details — Splinter Review

Most of the change here is alpha-renaming. This could be much more minimal, but I think the clearer-about-encodings names help a ton for understandability and confidence in encoding-safety, which seems especially important where we have dodgy encoding behavior.

Attachment #9029468 - Flags: review?(daniel)

Daniel Stenberg [:bagder]

Comment 3

•

6 years ago

Comment on attachment 9029468 [details] [diff] [review] Attempt compiling PAC scripts as UTF-8 first if they're valid UTF-8, and only if that fails inflate to UTF-16 and compile that Review of attachment 9029468 [details] [diff] [review]: ----------------------------------------------------------------- The code change looks good. I don't like the renamed variables, but can live with them if you insist. ::: netwerk/base/ProxyAutoConfig.cpp @@ +662,4 @@ > } > > nsresult ProxyAutoConfig::Init(const nsCString &aPACURI, > + const nsCString &aUtf8OrLatin1PACScript, I think this rename of the argument and the member variable makes the code harder to read. Why does the name itself have to say that it holds latin-1 or UTF-8?

Attachment #9029468 - Flags: review?(daniel) → review+

Daniel Stenberg [:bagder]

Updated

•

6 years ago

Attachment #9029467 - Flags: review?(daniel) → review+

Jeff Walden [:Waldo]

Assignee

Comment 4

•

6 years ago

Attached patch Take two, with less/less-verbose renaming? — Details — Splinter Review

(In reply to Daniel Stenberg [:bagder] from comment #3) > > nsresult ProxyAutoConfig::Init(const nsCString &aPACURI, > > + const nsCString &aUtf8OrLatin1PACScript, > > I think this rename of the argument and the member variable makes the code > harder to read. Why does the name itself have to say that it holds latin-1 > or UTF-8? Mm. I want some sort of clarity of encoding for this because it's quite unclear what the encoding is throughout the process of handling full PAC data. The comment at the very start (immediately post-download) is nice...but getting from that to where the actual UTF-8 test happens is convoluted. And going backward -- as users might tend to do, to understand why UTF-8 testing is desirable -- is even harder. I agree the naming goo is fugly; ideally we would use the type system for this, but nsACString is woefully unclear whether it holds ASCII only, UTF-8 more broadly, or Latin-1 distinctly more broadly (or at least possibly something else). Here's an idea. What about if I change stuff so the downloaded data is named as "data" -- which to my mind better connotes bag'o'bytes of uncertain encoding -- and then move the "this is what this data is" discussion to ProxyAutoConfig::{Init,SetupJS} far closer to the actual consumption of the bytes with specific encodings? Take a gander, let me know what you think. I'm very not happy with the current naming/approach for clarity, but I also would rather not just bull my way past half-muted reviewer concerns in code I don't regularly touch. :-)

Attachment #9030382 - Flags: review?(daniel)

Jeff Walden [:Waldo]

Assignee

•

6 years ago

Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/a27d4d17da85 Attempt compiling PAC scripts as UTF-8 first if they're valid UTF-8, and only if that fails inflate to UTF-16 and compile that. r=bagder

Razvan Maries

Comment 10

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a27d4d17da85

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

Keywords: leave-open

Resolution: --- → FIXED

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Summary: Proxy autoconfig script loading shouldn't load PAC scripts as Latin-1 → Proxy autoconfig scripts should be loaded as UTF-8 if they are valid UTF-8, otherwise as Latin-1 (a byte is a code point)

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Proxy autoconfig scripts should be loaded as UTF-8 if they are valid UTF-8, otherwise as Latin-1 (a byte is a code point)

Categories

(Core :: Networking, enhancement, P3)

Tracking

()

People

(Reporter: Waldo, Assigned: Waldo)

References

(
URL
)

Details

(Whiteboard: [necko-triaged])

Crash Data

Security

(public)

User Story

Attachments

(2 files, 1 obsolete file)

Description

Updated

Comment 1

Updated

Comment 2

Comment 3

Updated

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Updated

Attachment

General

Description

File Name

Content Type