FHR payloads contain invalid UTF-8 Characters

VERIFIED FIXED in Firefox 32

Status

defect
P1
critical
VERIFIED FIXED
5 years ago
9 months ago

People

(Reporter: mreid, Assigned: gps)

Tracking

unspecified
Firefox 34
x86
All
Dependency tree / graph
Bug Flags:
firefox-backlog +
qe-verify +

Firefox Tracking Flags

(firefox32+ fixed, firefox33+ fixed, firefox34 verified)

Details

Attachments

(1 attachment, 2 obsolete attachments)

As part of bug 1043504, I've been investigating some cases where FHR payloads contain invalid UTF-8 characters.

So far, from what I have seen, the problem is that addon names and descriptions contain characters that may be in a different encoding.

One example is that we see addon descriptions containing the Windows 1252 Trademark character 0x99, which is not a valid byte in UTF-8.

There may be other cases where bad characters sneak through too, but this is definitely one such case.

The client should either send valid UTF-8, or send the correct charset with the request so that the server can decode accordingly.
Blair: is there anything we can do here? What guarantees does the Addon Manager have over the add-on name and description strings, if any?
Flags: needinfo?(bmcbride)
mreid do you have example payloads including HTTP headers? Is this plugin names or addon names or either?

Since the interface between the addon manager and FHR is JS strings not narrow strings, I really doubt we have encoding issues there.

However, reading through rest.js which is used for FHR uploads (via BagheeraClient), it looks like it's trying to use nsIStringStream.setData which will eat any high-bit characters: http://hg.mozilla.org/mozilla-central/annotate/bf27e27c994d/services/common/rest.js#l323 And immediately below the content-type should have utf8 specified.

nsIStringOutputStream.setData is declared with XPIDL "string" which is inherently lossy. I'm about 99% certain that what we want here is nsIUnicharOutputStream.writeString initialized via nsIConverterOutputStream.
The data going to rest.js should be a JSON string, which will have high-bytes encoded as "\uXXYY".

I suspect the internal JS strings aren't proper Unicode or are getting munged to invalid UTF-8.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #2)
> mreid do you have example payloads including HTTP headers? Is this plugin
> names or addon names or either?
I have example payloads, but not the HTTP headers. Is that still useful?

They appear in both "org.mozilla.addons.addons" (v2) and "org.mozilla.addons.plugins" (v1).

The easiest way to get HTTP headers might be to insert some of the bad characters in a local addon/plugin name and submit to a test server.
Yes, a payload by itself would be interesting. I'm assuming from reading the code that the request content-type is text/plain without a charset.

> The data going to rest.js should be a JSON string, which will have high-bytes encoded as "\uXXYY".

Greg, I don't see this as required by the JSON spec or by JSON.stringify in JS: JSON strings are allowed to have any unicode character. From my browser console:

JSON.stringify("©яږ")
""©яږ""
Gah, I mixed the wires: Python does the \u encoding by default.
>>> data[75210:75230]
'???????\xef","scope":8,'

\xef which is the first byte of a 3-byte UTF8 character, perhaps hangul?

Worth checking whether an plugin/addon name of "ᄃᄄᄅ" would reproduce this. I'm trying that with the default theme now. What's the simplest way to run a server which logs requests/request bodies? cherrypy seems like overkill for this.
Flags: needinfo?(bmcbride)
If you have node.js available, you can run 'server.js' from the telemetry-server codebase.

Edit server_config.json to point at paths that exist on your machine, then:
$ node server.js ./server_config.json

Shut it down (ctrl+c) after you've submitted your example(s).

To view the data:
>>> import telemetry.util.files as fu
>>> for r in fu.unpack('/path/to/example_file'):
...   print r.path, r.data[0:50]

The above will gunzip the request body if it was compressed. If you want to see the original bytes of the request, add a `raw=True` parameter to the `unpack` call.
I can reliably reproduce this by changing the title of one of my addons to have korean characters. I expect that these invalid payloads may currently be being thrown away by Bagheera before they even make it to hbase because we parse and re-serialize the FHR JSON packets to insert the geodata.
Points: --- → 5
Flags: firefox-backlog+
Will the fix for this also apply to Telemetry payloads (which also report Addon information)?
No. Are telemetry payloads also broken? They use a different upload mechanism.
I'll check.  I know we see the occasional "JSONDecodeError: Invalid control character" error in Telemetry data too.
This added test reproduces the issue.

We see the provider collect successfully:

 0:20.66 Services.Metrics.ProviderManagerINFOProvider successfully initialized: UnicodeProvider
 0:20.66 Services.Metrics.ProviderManagerDEBUGPull-only provider registration requested.
 0:20.6666 Services.Metrics.ProviderManagerDEBUGRequested pull-only provider    registration and providers are already registered.
 0:20.66 Sqlite.Connection.unicode_payloadTRACEConn #0: Stmt #105 INSERT OR REPLACE INTO Linesast_text VALUES (:field_id, :days, :value) - {"field_id":14,"days":16301,"value":"ᄃᄄᄅ"} 0:20.66
 0:20.66 Sqlite.Connection.unicode_payloadDEBUGConn #0: Stmt #105 finished.

Then the world blows up during upload:

 0:20.69 Services.BagheeraClientINFOUploading data to http://Leaveocalhost:55640/1.0/submit/test/51a1baea-5683-8b4e-8559-5a8c24b6ff7f
 0:20.69 [96641] WARNING: jschar out of char range; high bits of data lost: 0x1103: file /Users/gps/src/firefox/js/xpconnect/src/XPCConvert.cpp, line 363
 0:20.69 Services.BagheeraClientINFORequest body length: 304 0:20.69
 0:20.69 Services.BagheeraClientDEBUGPOST Length: 304
 0:20.69 Sqlite.Connection.unicode_payloadDEBUGConnBUGConn #0: Stmt #111 finished.
 0:20.69 metrics.BagheeraServerINFOReceived request: POST /1.0/submit/test/51a1baea-5683-8b4e-8559-5a8c24b6ff7f HTTP/1.1
 0:20.69 metrics.BagheeraServerINFOHandling data upload for test:51a1baea-5683-8b4e-8559-5a8c24b6ff7f
 0:20.69 metrics.BagheeraServerINFORaw body length: 304 0:20.69
 0:20.69 metrics.BagheeraServerDEBUGUncompressing entity body with deflate.
 0:20.69 metrics.BagheeraServerDEBUGHTTP request body: {"version":2,"clientID":"58bc2fc8-0743-ee4c-a473-f63ef8238e34","clientIDVersion":1,"thisPingDate":"2014-08-19","geckoAppInfo":{"_v":1,"vendor":"Mozilla","name":"xpcshell","id":"xpcshell@tests.mozilla.org","version":"1","appBuildID":"20121107","platformVersion":"p-ver","platformBuildID":"20121106","os":"XPCShell","xpcomabi":"noarch-spidermonkey","updateChannel":"default"},"data":{"last":{"UnicodeProvider.UnicodeMeasurement":{"_v":1,"last-text":"\x03\x04\x05"}},"days":{}}}
 0:20.69 metrics.BagheeraServerINFOJSON parse error.
 0:20.69 metrics.BagheeraServerINFOHttpError thrown: 400 Bad Request

Good times.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Potential failure and fix points:

1) Inserting into SQLite doesn't preserve encoding
2) Retrieving from SQLite doesn't preserve encoding
3) Serializing to JSON doesn't preserve encoding
4) Gzipping the HTTP body doesn't preserve encoding (I doubt it)
5) Sending data over wire doesn't preserve encoding (I doubt it, since data is binary at that point)
The "WARNING: jschar out of char range; high bits of data lost: 0x1103: file /Users/gps/src/firefox/js/xpconnect/src/XPCConvert.cpp, line 363" message is getting printed during CommonUtils.convertString().

https://hg.mozilla.org/mozilla-central/file/4d94eeca89f3/services/common/utils.js#l590

It sure sounds like convertString() is losing data and #4 is the problem after all. I swore we had test coverage for this in Sync and elsewhere. Sadness.
It was observed that FHR was sending invalid JSON payloads to the
server. Specifically, JSON payloads contained invalid Unicode strings.

Investigation revealed that the culprint was CommonUtils.convertString()
silently swallowing high bytes. When the Bagheera client went to gzip
the JSON payload, the input buffer into gzip was missing high bytes.

This patch changes the bagheera client to manually escape Unicode
in serialized JSON. This hacks around SpiderMonkey's default (and
spec valid) behavior of using literal Unicode byte sequences in JSON.
The new behavior of using \u literals is allowed via the JSON spec and
is the default serialization behavior of some json libraries, including
Python. This may consume a few more bytes over the wire, but it is
a quick and easy fix that should meet upload criteria.

Alternatively, we could have changed CommonUtils.convertString() to
be high byte aware. However, many consumers rely on this function.
This patch is written with the intent of being uploaded and the change
performed is targeted at the specific problem.

Tests for Unicode preserving behavior have been added to both the
generic Bagheera client and to FHR. The latter test is arguably
not necessary, but peace of mind is a good thing, especially with
FHR.
Attachment #8475285 - Flags: review?(benjamin)
Attachment #8475246 - Attachment is obsolete: true
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #11)
> No. Are telemetry payloads also broken? They use a different upload
> mechanism.

I checked a bunch of telemetry payloads and they look OK, with the exception of one submission from Fx version 13 (which pre-dates Bug 915850, which was supposed to fix a similar problem for Telemetry).
[Tracking Requested - why for this release]:

FHR could be submitting malformed payloads to the server, likely causing us to lose data and under-report valuable information. We need to get this data flowing again ASAP.
Comment on attachment 8475285 [details] [diff] [review]
Properly handle Unicode in Bagheera payloads

This is correct for characters in the BMP. It's almost certainly incorrect for characters above that because JS is UCS2 not UCS4/UTF16. Although it will at least produce valid JSON output.

I understand the bit about having a minimally-safe patch right now. Longer-term is this really better than passing data through CommonUtils.convertUTF8?

Does it also make sense to add a check for high bytes in CommonUtils.convertString and throw?
Attachment #8475285 - Flags: review?(benjamin) → review+
I didn't want to modify CommonUtils.convertString() because Sync, etc rely on it and I don't want to risk changing behavior, even if the new behavior is sane.

We're having conversations in #fhr now. We should definitely convert a lot of this stuff to ArrayBuffer. Most of this code was written before ArrayBuffer or cargo culted into the present.
QA Whiteboard: [qa+]
I just swapped in CommonUtils.encodeUTF8(). It works and is less
hacky than direct string manipulation.
Attachment #8475315 - Flags: review?(benjamin)
Attachment #8475285 - Attachment is obsolete: true
Attachment #8475315 - Flags: review?(benjamin) → review+
Iteration: --- → 34.3
QA Whiteboard: [qa+]
Flags: qe-verify+
Can you attach an example payload generated with this patch applied?
QA Contact: kamiljoz
Do you need a full payload? The following should be similar to what goes over the wire (it is UTF-8).

{"data": "test":"πόλλ' οἶδ' ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα"}
Comment on attachment 8475315 [details] [diff] [review]
Properly handle Unicode in Bagheera payloads

Approval Request Comment
[Feature/regressing bug #]: FHR deficiency since dawn of time
[User impact if declined]: FHR may not be submitting where it should be, skewing data analysis, especially in locales that don't use the Latin alphabet.
[Describe test coverage new/current, TBPL]: New tests were written. FHR has extensive test coverage.
[Risks and why]: Low risk. We should have been doing this all along. Telemetry does something very similar.
[String/UUID change made/needed]: None

We'll want to hold on uplift until after QA verification. That should happen within a day or two.
Attachment #8475315 - Flags: approval-mozilla-beta?
Attachment #8475315 - Flags: approval-mozilla-aurora?
Tracking for beta but that doesn't necessarily mean that this fix will make it into 32. The last beta goes to build this Thu, Aug 21. We'll need to have our ducks in a row before then if this is going to have a chance for 32.
(In reply to Gregory Szorc [:gps] from comment #24)
> Do you need a full payload? The following should be similar to what goes
> over the wire (it is UTF-8).
> 
> {"data": "test":"πόλλ' οἶδ' ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα"}

This should be fine for my purposes. Thanks.
https://hg.mozilla.org/mozilla-central/rev/f150e176a823
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 34
(In reply to Lawrence Mandel [:lmandel] from comment #27)
> Tracking for beta but that doesn't necessarily mean that this fix will make
> it into 32. The last beta goes to build this Thu, Aug 21. We'll need to have
> our ducks in a row before then if this is going to have a chance for 32.

What do you need? I plan to have this QA'd today as soon as a PGO build (Nightly candidate) comes through.

I'm willing to champion this patch through to 32. I believe it is low risk, high reward.
Flags: needinfo?(lmandel)
Attachment #8475315 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
ducks in a row = patch landed on m-c (comment 29) and QA verification to the point where both QA and you think that this fix is unlikely to cause a late regression. The patch is simple enough (one liner) so given that this has already landed and QA will get on this today I expect that we will be able to take this for beta9.
Flags: needinfo?(lmandel)
We should be able to verify this with latest m-c builds (since we didn't get an fx-team merge last night).

https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-pgo/1408523401/
https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-pgo/1408545007/firefox-34.0a1.en-US.linux-x86_64.tar.bz2
https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1408544126/firefox-34.0a1.en-US.mac.dmg

STR:

1) Install an add-on or plug-in that has Unicode characters in the name or description (I think https://addons.mozilla.org/en-US/firefox/addon/lowest-price-checker/?src=search will do)
2) Trigger an FHR upload by adjusting preferences and waiting up to 90 seconds (see https://wiki.mozilla.org/QA/Desktop_Firefox/Firefox_Health_Report)
3) Go to about:healthreport -> Raw Data and verify the org.mozilla.healthreport.submissions measurement for today has a successful submission

I'm not 100% sure the add-on I linked has high-byte/Unicode characters. Please reproduce upload failure in current Firefox release as part of verification.
I don't think the server(In reply to Gregory Szorc [:gps] from comment #32)
> I'm not 100% sure the add-on I linked has high-byte/Unicode characters.
> Please reproduce upload failure in current Firefox release as part of
> verification.

I don't think the server will report failure in either case - we may need to actually check HBase to see if the records do (or don't) show up.  tmary, can you confirm?
Flags: needinfo?(tmeyarivan)
(In reply to Mark Reid [:mreid] from comment #33)
> I don't think the server will report failure in either case - we may need to
> actually check HBase to see if the records do (or don't) show up.  tmary,
> can you confirm?

Payloads which aren't valid JSON+UTF8 are ignored by the Bagheera consumer => it doesn't make it to HBase.

FWIW - all submissions/payloads are retained in Kafka for a period of ~5 days (consumer reads from Kafka and writes to HBase)

--
Flags: needinfo?(tmeyarivan)
Discussion in IRC revealed this issue appears to be dropping ~5% of FHR records.

When this is all over we should talk about having a better monitoring story around server ingestion. The graphs clearly show an uptick in malformed JSON from nearly 0 to significant when Firefox 28 was released. The reason is bug 928575 added recording of add-on names and descriptions to FHR. This was likely the first significant source of Unicode in the payload.
Severity: normal → critical
Depends on: 928575
Priority: -- → P1
My STR in comment 32 are wrong as Mark pointed out in comment 33: we'll need to verify things on the server. This testing is a bit more involved. tmary and I will figure something out.
Gregory, is there anything that I can do to help out with the testing portion??
Flags: needinfo?(gps)
Doing an actual end-to-end verification would be extremely time consuming.

The code rejecting invalid payloads is https://github.com/mozilla-metrics/fhr-toolbox/blob/f6d019ce1dd725edea2c4c0098fcd8b28405ae03/src/main/java/com/mozilla/fhr/consumer/FHRConsumer.java#L223.

At the end of the day, this boils down to invalid UTF-8 JSON in the payload. So, any server that verifies JSON decoding works should be a sufficient proxy for testing this.

I installed https://addons.mozilla.org/en-us/firefox/addon/%E5%BE%AE%E4%BF%A1%E7%BD%91%E9%A1%B5%E7%89%88%E5%8A%A9%E6%89%8B/, which definitely contains Unicode in the name and description fields (I found this add-on in a payload mreid sent me).

I ran a custom Python server that behaves a lot like the HTTP submission endpoint (http://hastebin.com/gerunokoke.py). Basically, json.loads(zlib.decompress(request.body), 'utf-8').

With old build:

Invalid payload received 620057cb-d6db-5440-815a-bfded5ad51fa

With new build with this patch:

valid payload received 450ddbb6-9983-7346-a7cd-05b8473f1f03

I'm willing to call this verified and ready for beta uplift.
QA Whiteboard: [qa+]
Flags: needinfo?(gps)
I just uploaded a formerly-failing record from a new build to production and tmary confirmed the record made it to HBase.

Not sure what other verification we could do.
Blocks: 1056284
Depends on: 1056289
Comment on attachment 8475315 [details] [diff] [review]
Properly handle Unicode in Bagheera payloads

Thanks for testing as best as you can gps. I'm satisfied enough to take this fix in beta9. We won't have much time to monitor FHR input for beta9 but it would be good to know that this is working as expected before we release. (I don't have any reason to doubt that it will at this point.) beta+
Attachment #8475315 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
So this was verified on 34 based on comment 38 and 39. Is verification needed for 33 (which is currently in Beta)?
Status: RESOLVED → VERIFIED
Flags: needinfo?(gps)
Responding to comment 37, I performed testing when this landed. If this has been verified on 34, I don't think verification on 33 is needed. My steps to test are in comment 32.
Flags: needinfo?(gps)
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.