Open Bug 1040231 Opened 10 years ago Updated 2 years ago

Malformed payloads being uploaded by hotfix addon

Categories

(Firefox :: General, defect)

defect

Tracking

()

People

(Reporter: gps, Unassigned)

References

(Blocks 1 open bug)

Details

The Telemetry server is detecting bad JSON in some client records. More details will be posted shortly.
Clarification: it's not bad JSON, it's bad binary-looking data.
I'm actively looking into this.

Early indications are that a lot of the payloads are similar.
Assignee: nobody → gps
Summary: Bad JSON in client records → Malformed payloads being uploaded
Here are the most common leading bytes out of a set of 645 records.

     52 0000000 5bd5 6f0b 38db fe12 842b 0381 205a f456
     50 0000000 5bd5 6f0b 38db fe12 842b 0381 5c5a e8ac
     42 0000000 5bd5 6f0b 38db fe12 842b 0381 5c5a e8ec
     24 0000000 5cd5 6f0b 38db fe12 842b 0381 205a f476
     21 0000000 5bd5 6f0b 38db fe12 842b 0381 205a f476
     16 0000000 5bd5 6f0b 48db fe0e 032b 0703 b8b4 d158
     16 0000000 5bd5 6f0b 38db fe12 842b 0381 5c1a e8ac
     14 0000000 5bd5 6f0b 38db fe12 842b 0381 205a a856
     11 0000000 5bd5 6f0b 38db fe12 842b 0381 5c5a 50ac
      9 0000000 5bd5 6f0b 38db fe12 842b 0381 5c5a 50ec
      8 0000000 5bd5 6f0b 38db fe12 842b 0381 201a f456
      7 0000000 5bd5 6f0b 38db fe12 842b 0381 205a a876
      6 0000000 5bd5 6f0b 48db fe0e 032b 0703 b8b4 d1d8
      6 0000000 5bd5 6f0b 48db fe0e 032b 0703 40b4 e8ac
      5 0000000 5bd5 6f0b 38db fe12 842b 0381 201a f476
      4 0000000 5dd5 6f0b b8db feb2 422b 0b80 816c d1da
      4 0000000 5dd5 6f0b b8db feb2 422b 0b80 40b4 e8ed
      4 0000000 5bd5 6f0b 48db fe0e 032b 0703 b834 d1d8
      4 0000000 5bd5 6f0b 48db fe0e 032b 0703 4034 e8ac
      3 0000000 5dd5 6f0d 38db fe93 422b 0380 5c5a 48ed
      3 0000000 5cd5 6f0b 38db fe12 842b c381 b8b5 d1d8
ASCII display.

     52 00000000  d5 5b 0b 6f db 38 12 fe  2b 84 81 03 5a 20 56 f4  |.[.o.8..+...Z V.|
     50 00000000  d5 5b 0b 6f db 38 12 fe  2b 84 81 03 5a 5c ac e8  |.[.o.8..+...Z\..|
     42 00000000  d5 5b 0b 6f db 38 12 fe  2b 84 81 03 5a 5c ec e8  |.[.o.8..+...Z\..|
     24 00000000  d5 5c 0b 6f db 38 12 fe  2b 84 81 03 5a 20 76 f4  |.\.o.8..+...Z v.|
     21 00000000  d5 5b 0b 6f db 38 12 fe  2b 84 81 03 5a 20 76 f4  |.[.o.8..+...Z v.|
     16 00000000  d5 5b 0b 6f db 48 0e fe  2b 03 03 07 b4 b8 58 d1  |.[.o.H..+.....X.|
     16 00000000  d5 5b 0b 6f db 38 12 fe  2b 84 81 03 1a 5c ac e8  |.[.o.8..+....\..|
     14 00000000  d5 5b 0b 6f db 38 12 fe  2b 84 81 03 5a 20 56 a8  |.[.o.8..+...Z V.|
     11 00000000  d5 5b 0b 6f db 38 12 fe  2b 84 81 03 5a 5c ac 50  |.[.o.8..+...Z\.P|
      9 00000000  d5 5b 0b 6f db 38 12 fe  2b 84 81 03 5a 5c ec 50  |.[.o.8..+...Z\.P|
      8 00000000  d5 5b 0b 6f db 38 12 fe  2b 84 81 03 1a 20 56 f4  |.[.o.8..+.... V.|
      7 00000000  d5 5b 0b 6f db 38 12 fe  2b 84 81 03 5a 20 76 a8  |.[.o.8..+...Z v.|
      6 00000000  d5 5b 0b 6f db 48 0e fe  2b 03 03 07 b4 b8 d8 d1  |.[.o.H..+.......|
      6 00000000  d5 5b 0b 6f db 48 0e fe  2b 03 03 07 b4 40 ac e8  |.[.o.H..+....@..|
      5 00000000  d5 5b 0b 6f db 38 12 fe  2b 84 81 03 1a 20 76 f4  |.[.o.8..+.... v.|
      4 00000000  d5 5d 0b 6f db b8 b2 fe  2b 42 80 0b b4 40 ed e8  |.].o....+B...@..|
      4 00000000  d5 5d 0b 6f db b8 b2 fe  2b 42 80 0b 6c 81 da d1  |.].o....+B..l...|
      4 00000000  d5 5b 0b 6f db 48 0e fe  2b 03 03 07 34 b8 d8 d1  |.[.o.H..+...4...|
      4 00000000  d5 5b 0b 6f db 48 0e fe  2b 03 03 07 34 40 ac e8  |.[.o.H..+...4@..|
I can confirm the data is not gunzipped. These should be the bytes that are coming over the wire.

Interpreted as UTF-16, it is gibberish:

寕漋㣛︒萫΁⁚ꡖⰻ뎐溝鵮뀶盓몁栈誉ᒵ�ﱽ᯷鋊묓≒⻶倨驛景⼸Ἆ훾펉뽎쯛凎饧熸Ḍᴭ᮫랦㰮ꖽ攼ѱ唏셍Ă䐕仐寿䬧㦮ă쩷痾넖䠈悂䥂焪ᙑ䜄㼝㓦饊麛脍湶镡໨竩駔൷�읝硭嶅㗇㓢릉�ᙇ艹咡鏥ꑵꤞ辟쐺ᤕ樉ﭟꞌ淢頬ᪿ喒辧㪏�ɽ⣐覈≷ഄē儖`ኞ搖ꚜ改뀱숭�猴驝�掄䖒苮⫲胏⑉戕ᄮ纠崺璜怠෮绵ㅾ楤䃈앴ফ偅㌨㉚蠱⤽对ˇ㿱쁾�ĥ딸ꍃﯮ堩滾퉭묛魚叧濊䮙蠯덻୹䣣ᜈ쾥ם⊉乛蠃⨆⏃⃱䙼餑切성娳㰾䓉㹜�△檓퍠髂取᭍餟䛷甚�糹䖴䀤䗀咖梓줹厀㲡醢㣇䕵埌⻊麵当๐᱁뮼艙䤄퉯祥ˮ騕株턌네ﹾĄ銜䋛栈돥靠틎鲜剮ݤᔟ�陇ꊢ藩൚줻晳큏᧚⒈萉⻝⽙㬧轛犖维薣䤐䲸⺦弫긫⤗ﱡ֕儁⨢鎎繬�﵊祸ќ穋掬ἅ쑃

Most of those are Han Chinese characters, but it doesn't make sense.
I think it may be as simple as a missing gzip header:

top 5 "bad" prefixes:
Bad:  ('d5 5b 0b 6f db 38 12 fe 2b 84 81 03 5a 20 56 f4', 67)
Bad:  ('d5 5b 0b 6f db 38 12 fe 2b 84 81 03 5a 5c ac e8', 50)
Bad:  ('d5 5b 0b 6f db 38 12 fe 2b 84 81 03 5a 5c ec e8', 33)
Bad:  ('d5 5c 0b 6f db 38 12 fe 2b 84 81 03 5a 20 76 f4', 27)
Bad:  ('d5 5b 0b 6f db 38 12 fe 2b 84 81 03 5a 20 76 f4', 20)

top 5 "good" prefixes:
Good: ('1f 8b 08 00 00 00 00 00 00 0b d5 5b 0b 6f db 38', 52841)
Good: ('1f 8b 08 00 00 00 00 00 00 0b d5 5b 09 6f db 3a', 13854)
Good: ('1f 8b 08 00 00 00 00 00 00 0b cd 5b 0b 6f db 38', 10730)
Good: ('1f 8b 08 00 00 00 00 00 00 0b d5 5c 0b 6f db 38', 6822)
Good: ('1f 8b 08 00 00 00 00 00 00 0b d5 5b 0b 6f db 48', 4167)

I'm going to try simply inserting 1f 8b 08 00 00 00 00 00 00 0b and see if that's the issue.
I'll need to packet capture to be sure, but I suspect that certain versions of Firefox (presumably old versions) aren't sending the expected header when using the stream converter service to convert "uncompressed" to "gzip."

Code at https://hg.mozilla.org/releases/firefox-hotfixes/file/d29a3d24405c/v20140527.01/resource/update.jsm#l2746

Honza: does this ring any bells?
Flags: needinfo?(honzab.moz)
I have some code I can use to uncompress this data so we don't lose these payloads - is that something you want to do?
Summary: Malformed payloads being uploaded → Malformed payloads being uploaded by hotfix addon
(In reply to Mark Reid [:mreid] from comment #8)
> I have some code I can use to uncompress this data so we don't lose these
> payloads - is that something you want to do?

If you can recover the payloads while introducing little risk, go for it.

The important thing is that we don't lose *all* the data going forward.

I'd be fine discarding the bad payloads if that is what you feel is necessary. My measurements showed these bad payloads only account for 1-1.5% of total payloads. Unfortunate, but not the end of the world.
I was able to packet capture an offending request:

POST /submit/hotfix/672759ba-6292-4881-ae93-1e004a9c1e6b HTTP/1.1
host: hotfix.telemetry.mozilla.org
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: es-MX,es;q=0.8,en-us;q=0.5,en;q=0.3
Cache-Control: no-cache
Content-Encoding: gzip
Content-Type: application/json; charset=UTF-8
Pragma: no-cache
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2
X-Forwarded-For: <redacted>
X-Forwarded-Port: 443
X-Forwarded-Proto: https
Content-Length: 3385
Connection: keep-alive

Sample size of 1, but Firefox 10. Good times.
(In reply to Gregory Szorc [:gps] from comment #9)
> If you can recover the payloads while introducing little risk, go for it.
Ok, I've updated the code to attempt to recover the payloads with missing header info.  There is minimal risk in the added code.  It is running now and I haven't seen any further errors.

I extracted the relevant payloads from the backlog of incoming data - here are some stats.

Breakdown by os, windowsMajorVersion, windowsServicePackVersion:
      5 ["WINNT",5,0]
    114 ["WINNT",6,3]
    149 ["WINNT",6,0]
    220 ["WINNT",6,2]
    919 ["WINNT",5,1]
   2239 ["WINNT",6,1]

Breakdown by version, state.upgradedFrom (top 10 only):
     95 ["30.0",null]
    103 ["23.0.1","23.0.1"]
    128 ["25.0.1","25.0.1"]
    180 ["24.0","24.0"]
    202 ["30.0","26.0"]
    254 ["26.0","26.0"]
    260 ["30.0","27.0.1"]
    315 ["27.0.1","27.0.1"]
    487 ["30.0","28.0"]
    664 ["28.0","28.0"]

By channel:
      2 null
   3644 release

If there's anything else that could be helpful in diagnosing the problem, let me know.
Maybe ask Jonas, he knows XHR better than me.  I'm not sure you enforce compression by setting content-encoding request header on the XHR..
Flags: needinfo?(honzab.moz)
Jonas: Any clue what could be causing all versions of Firefox to "randomly" not send a gzip header with XHR?

Code at https://hg.mozilla.org/releases/firefox-hotfixes/file/d29a3d24405c/v20140527.01/resource/update.jsm#l2746
Flags: needinfo?(jonas)
Are you saying that the |request.setRequestHeader("Content-Encoding", "gzip");| call sometimes appears to be ignored?

Sounds strange to me, I have no idea why that would be. XHR code doesn't do anything special with that header. Possibly necko touches that header sometimes?
Flags: needinfo?(jonas) → needinfo?(jduell.mcbugs)
Yes, we set Content-Encoding: gzip via setRequestHeader.

https://hg.mozilla.org/releases/firefox-hotfixes/file/afc361041e8a/v20140527.01/resource/update.jsm#l2757

Ruling out a Gecko bug, the only other thing I can think of is in intermediate network device mucking about with the payload. I'd hope those network agents would forward the original request as-is after inspecting. But if I've learned anything developing networked applications it is that you can't trust network stacks and proxies.

If we wanted to rule out Gecko, we could:

a) have the client verify/record the payload begins with the gzip header
b) Include a hash of the payload with the request so the server can detect payload tampering
I don't see any code in the http stack that ever strips content-encoding from an HTTP request.  It's certainly possible that an intermediate device could strip it.  

The payload hash is a possible diagnostic tool, yes.  We could also consider using https for these transactions?
Flags: needinfo?(jduell.mcbugs)
(In reply to Jason Duell (:jduell) from comment #16)
> The payload hash is a possible diagnostic tool, yes.  We could also consider
> using https for these transactions?

We are using TLS. But trusted CA certs could be in play.
Assignee: gps → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.