add quoting to add-on name/version pairs in crash report annotations

RESOLVED FIXED in mozilla17

Status

()

Toolkit
Add-ons Manager
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: lars, Assigned: mossop)

Tracking

unspecified
mozilla17
x86_64
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
one of the most common failures in Socorro crash processing involves the list of add-ons.  Breakpad offers it to Socorro as a comma delimited list of id-name/version pairs.  Unfortunately, there are no standards for what the add-on developer can use for their version number scheme.  That means that they embed commas and wreck Socorro's ability to parse the string easily. Just the addition of quotes around the  id-name/version pairs would simplify the Socorro code and gain more accuracy in listing add-ons in the crash reporter.

example:
  xxx@adblock.mozdev.org:2,1,3,{a0d7ccb3-214d-498b-b4aa-0e8fda9a7bf7}:2011,11,07

becomes:
  "xxx@adblock.mozdev.org:2,1,3","{a0d7ccb3-214d-498b-b4aa-0e8fda9a7bf7}:2011,11,07"

that simple change would eliminate a entire class of Socorro processing problems (until we find an add-on using quotes)...
Component: Breakpad Integration → Add-ons Manager
QA Contact: breakpad.integration → add-ons.manager
Summary: requesting a change in meta data crash format → add quoting to add-on name/version pairs in crash report annotations
(Reporter)

Comment 1

5 years ago
you know, it might be even simpler to just use pipe '|' as the separator instead of the comma.  I've not seen any add-on us the pipe in a versions number.
(Assignee)

Comment 2

5 years ago
We could also run the name and version through encodeURIComponent to encode any quotes or pesky ":" characters. Would that be useful?
(Reporter)

Comment 3

5 years ago
if encodeURIComponent were to be used, then I'd have to be able to decode with a python function.  It appears that the neither the python functions urllib.unquote nor urllib.unquote_plus are antipodal functions to javascript's encodeURIComponent.  

I think that perhaps we can resolve this problem more simply with just changing the delimeter to pipe '|' instead of comma.  I've just searched the database and the pipe character doesn't appear anywhere in a version string.

example:
  xxx@adblock.mozdev.org:2,1,3,{a0d7ccb3-214d-498b-b4aa-0e8fda9a7bf7}:2011,11,07

becomes:
  xxx@adblock.mozdev.org:2,1,3|{a0d7ccb3-214d-498b-b4aa-0e8fda9a7bf7}:2011,11,07
but if the version strings are not in any way limited, then that means someone can still break you in the future by including whatever delimiter you chose.
(Assignee)

Comment 5

5 years ago
(In reply to K Lars Lohn [:lars] [:klohn] from comment #3)
> if encodeURIComponent were to be used, then I'd have to be able to decode
> with a python function.  It appears that the neither the python functions
> urllib.unquote nor urllib.unquote_plus are antipodal functions to
> javascript's encodeURIComponent.  

All encodeURIComponent does is replace anything outside of a set of characters with their %xx encoded version. Looks like urllib.unquote would decode that perfectly to me.

(In reply to Ted Mielczarek [:ted] from comment #4)
> but if the version strings are not in any way limited, then that means
> someone can still break you in the future by including whatever delimiter
> you chose.

Yeah I'd like to avoid a game of making changes whenever we see them. Encoding in some fashion should make us future proof.
(Reporter)

Comment 6

5 years ago
a quick test of both urllib.unquote and urllib.unquote_plus fails on the output on this site:  http://www.w3schools.com/jsref/jsref_encodeuricomponent.asp

however, on reexamination, I see that it only fails on the 'special' unicode characters.  It works for ASCII: comma, pipe, single quote, double quote.  I think this could work.

so the plan would have to be
  1 - encode the name and version separately
  2 - stick encoded name and version together with a colon
  3 - do the same with rest of the add-ons, delimiting the add-ons with commas

on the Socorro side:
  1 - split add-ons by the comma delimiter
  2 - split the name version pair by the colon delimiter
  3 - unquote the name and version for saving in the database

how does that sound?
(Assignee)

Comment 7

5 years ago
Pushed a patch to try that does exactly that.

The data we send is cached and at the moment I haven't forced it to rebuild the cache, it will do so naturally whenever the user upgrades to a new version of Firefox or they make a change to their installed extensions. Let me know if that seems like a problem.
Assignee: nobody → dtownsend+bugmail
Status: NEW → ASSIGNED
(Assignee)

Comment 8

5 years ago
Created attachment 623164 [details] [diff] [review]
patch rev 1

Said patch.
Attachment #623164 - Flags: review?(bmcbride)
Attachment #623164 - Flags: review?(bmcbride) → review+
(In reply to Dave Townsend (:Mossop) from comment #7)
> The data we send is cached and at the moment I haven't forced it to rebuild
> the cache, it will do so naturally whenever the user upgrades to a new
> version of Firefox or they make a change to their installed extensions. Let
> me know if that seems like a problem.

The data for bootstrapped (restartless) extensions will get escaped right away. Which means that the data the crash reporter sends can have a mix of escaped and non-escaped addon info.

Lars: Can you confirm that that would be ok?
(Assignee)

Comment 10

5 years ago
I'm guessing this shouldn't be a problem, unless we have version numbers with % in them, in which case the unescaping will probably go wrong. Do we have any of those? Crash stats probably has a better database of used version numbers than anything else!
(Assignee)

Comment 11

5 years ago
Waiting on feedback from Lars before landing.
(Assignee)

Comment 12

5 years ago
Lars: ping, I'd like to hear feedback on the above before landing this.
(Reporter)

Comment 13

5 years ago
sorry, I'm lost in a turbulent sea of urgent bugs.

Unfortunately, there are addons with the '%' character in the version number.  While the examples of this are clearly formatting errors on the part of the add-on writer, we get them never the less:

breakpad=> select distinct extension_id, extension_version from extensions_20120521 where extension_version like E'%\\%%';
              extension_id              |     extension_version      
----------------------------------------+----------------------------
 {D19CA586-DD6C-4a0a-96F8-14644F340D60} | %ScriptScanProductVersion%
 moveplayer@movenetworks.com            | 1.0.0.%(version)s
(2 rows)

If this change were to land soon, what versions of FF would get it?  How soon would Socorro start receiving crashes with encoded addon info?

I've added the encoding to the new processor slated for Socorro 13 (three weeks out - conceivably delayed by the newly urgent Rapid Beta system), but if the crashes of this type were to start coming in before Socorro 13, I'm going to have to backport the encoding stuff into the current processor.  While not particularly onerous, I'd have to work it into my schedule.
(Assignee)

Comment 14

5 years ago
(In reply to K Lars Lohn [:lars] [:klohn] from comment #13)
> sorry, I'm lost in a turbulent sea of urgent bugs.
> 
> Unfortunately, there are addons with the '%' character in the version
> number.  While the examples of this are clearly formatting errors on the
> part of the add-on writer, we get them never the less:
> 
> breakpad=> select distinct extension_id, extension_version from
> extensions_20120521 where extension_version like E'%\\%%';
>               extension_id              |     extension_version      
> ----------------------------------------+----------------------------
>  {D19CA586-DD6C-4a0a-96F8-14644F340D60} | %ScriptScanProductVersion%
>  moveplayer@movenetworks.com            | 1.0.0.%(version)s
> (2 rows)

Looks like urllib.unquote just ignores %'s that aren't followed by two hex characters so at least those two cases would remain the same with the additional unescaping we're talking about.

> If this change were to land soon, what versions of FF would get it?  How
> soon would Socorro start receiving crashes with encoded addon info?

If I landed it today then Nightly users from as early as tomorrow would start submitting crash data, Aurora users from June 5th, Beta users from 6 weeks after that and Release users 6 weeks later again.

> I've added the encoding to the new processor slated for Socorro 13 (three
> weeks out - conceivably delayed by the newly urgent Rapid Beta system), but
> if the crashes of this type were to start coming in before Socorro 13, I'm
> going to have to backport the encoding stuff into the current processor. 
> While not particularly onerous, I'd have to work it into my schedule.

Ok, so sounds like we should just put this on hold till the Socorro side is updated. Let me know when that happens.
(Reporter)

Updated

5 years ago
Depends on: 757650
(Reporter)

Comment 15

5 years ago
the Socorro half of this issue (Bug 757650) has been merged into master on github.  It is targeted for Socorro 11 that goes into staging tomorrow, May 24.  It is slated for pushing into production on Wednesday, May 30.  On that day, production Socorro will be able to handle encoded add-on data, and this bug is released from being on hold.
(Assignee)

Comment 16

5 years ago
Lost this. Are we good to go now? Did everything make it into production as expected?
(Assignee)

Comment 17

5 years ago
(In reply to Dave Townsend (:Mossop) from comment #16)
> Lost this. Are we good to go now? Did everything make it into production as
> expected?

Lars?
(Reporter)

Comment 18

5 years ago
this feature landed in Socorro in late May.
(Assignee)

Comment 19

5 years ago
On inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/1f302f26be3c
Target Milestone: --- → mozilla17
https://hg.mozilla.org/mozilla-central/rev/1f302f26be3c
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.