Last Comment Bug 753900 - add quoting to add-on name/version pairs in crash report annotations
: add quoting to add-on name/version pairs in crash report annotations
Status: RESOLVED FIXED
:
Product: Toolkit
Classification: Components
Component: Add-ons Manager (show other bugs)
: unspecified
: x86_64 Linux
: -- normal (vote)
: mozilla17
Assigned To: Dave Townsend [:mossop]
:
:
Mentors:
Depends on: 757650
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-10 11:27 PDT by K Lars Lohn [:lars] [:klohn]
Modified: 2012-07-28 18:37 PDT (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch rev 1 (6.98 KB, patch)
2012-05-11 08:50 PDT, Dave Townsend [:mossop]
blair: review+
Details | Diff | Splinter Review

Description K Lars Lohn [:lars] [:klohn] 2012-05-10 11:27:07 PDT
one of the most common failures in Socorro crash processing involves the list of add-ons.  Breakpad offers it to Socorro as a comma delimited list of id-name/version pairs.  Unfortunately, there are no standards for what the add-on developer can use for their version number scheme.  That means that they embed commas and wreck Socorro's ability to parse the string easily. Just the addition of quotes around the  id-name/version pairs would simplify the Socorro code and gain more accuracy in listing add-ons in the crash reporter.

example:
  xxx@adblock.mozdev.org:2,1,3,{a0d7ccb3-214d-498b-b4aa-0e8fda9a7bf7}:2011,11,07

becomes:
  "xxx@adblock.mozdev.org:2,1,3","{a0d7ccb3-214d-498b-b4aa-0e8fda9a7bf7}:2011,11,07"

that simple change would eliminate a entire class of Socorro processing problems (until we find an add-on using quotes)...
Comment 1 K Lars Lohn [:lars] [:klohn] 2012-05-10 11:38:37 PDT
you know, it might be even simpler to just use pipe '|' as the separator instead of the comma.  I've not seen any add-on us the pipe in a versions number.
Comment 2 Dave Townsend [:mossop] 2012-05-10 11:41:11 PDT
We could also run the name and version through encodeURIComponent to encode any quotes or pesky ":" characters. Would that be useful?
Comment 3 K Lars Lohn [:lars] [:klohn] 2012-05-10 12:00:45 PDT
if encodeURIComponent were to be used, then I'd have to be able to decode with a python function.  It appears that the neither the python functions urllib.unquote nor urllib.unquote_plus are antipodal functions to javascript's encodeURIComponent.  

I think that perhaps we can resolve this problem more simply with just changing the delimeter to pipe '|' instead of comma.  I've just searched the database and the pipe character doesn't appear anywhere in a version string.

example:
  xxx@adblock.mozdev.org:2,1,3,{a0d7ccb3-214d-498b-b4aa-0e8fda9a7bf7}:2011,11,07

becomes:
  xxx@adblock.mozdev.org:2,1,3|{a0d7ccb3-214d-498b-b4aa-0e8fda9a7bf7}:2011,11,07
Comment 4 Ted Mielczarek [:ted.mielczarek] 2012-05-10 12:18:21 PDT
but if the version strings are not in any way limited, then that means someone can still break you in the future by including whatever delimiter you chose.
Comment 5 Dave Townsend [:mossop] 2012-05-10 12:23:47 PDT
(In reply to K Lars Lohn [:lars] [:klohn] from comment #3)
> if encodeURIComponent were to be used, then I'd have to be able to decode
> with a python function.  It appears that the neither the python functions
> urllib.unquote nor urllib.unquote_plus are antipodal functions to
> javascript's encodeURIComponent.  

All encodeURIComponent does is replace anything outside of a set of characters with their %xx encoded version. Looks like urllib.unquote would decode that perfectly to me.

(In reply to Ted Mielczarek [:ted] from comment #4)
> but if the version strings are not in any way limited, then that means
> someone can still break you in the future by including whatever delimiter
> you chose.

Yeah I'd like to avoid a game of making changes whenever we see them. Encoding in some fashion should make us future proof.
Comment 6 K Lars Lohn [:lars] [:klohn] 2012-05-10 12:38:20 PDT
a quick test of both urllib.unquote and urllib.unquote_plus fails on the output on this site:  http://www.w3schools.com/jsref/jsref_encodeuricomponent.asp

however, on reexamination, I see that it only fails on the 'special' unicode characters.  It works for ASCII: comma, pipe, single quote, double quote.  I think this could work.

so the plan would have to be
  1 - encode the name and version separately
  2 - stick encoded name and version together with a colon
  3 - do the same with rest of the add-ons, delimiting the add-ons with commas

on the Socorro side:
  1 - split add-ons by the comma delimiter
  2 - split the name version pair by the colon delimiter
  3 - unquote the name and version for saving in the database

how does that sound?
Comment 7 Dave Townsend [:mossop] 2012-05-10 15:53:06 PDT
Pushed a patch to try that does exactly that.

The data we send is cached and at the moment I haven't forced it to rebuild the cache, it will do so naturally whenever the user upgrades to a new version of Firefox or they make a change to their installed extensions. Let me know if that seems like a problem.
Comment 8 Dave Townsend [:mossop] 2012-05-11 08:50:45 PDT
Created attachment 623164 [details] [diff] [review]
patch rev 1

Said patch.
Comment 9 Blair McBride [:Unfocused] (UNAVAILABLE) 2012-05-12 00:21:42 PDT
(In reply to Dave Townsend (:Mossop) from comment #7)
> The data we send is cached and at the moment I haven't forced it to rebuild
> the cache, it will do so naturally whenever the user upgrades to a new
> version of Firefox or they make a change to their installed extensions. Let
> me know if that seems like a problem.

The data for bootstrapped (restartless) extensions will get escaped right away. Which means that the data the crash reporter sends can have a mix of escaped and non-escaped addon info.

Lars: Can you confirm that that would be ok?
Comment 10 Dave Townsend [:mossop] 2012-05-12 19:17:02 PDT
I'm guessing this shouldn't be a problem, unless we have version numbers with % in them, in which case the unescaping will probably go wrong. Do we have any of those? Crash stats probably has a better database of used version numbers than anything else!
Comment 11 Dave Townsend [:mossop] 2012-05-14 13:05:34 PDT
Waiting on feedback from Lars before landing.
Comment 12 Dave Townsend [:mossop] 2012-05-21 14:01:54 PDT
Lars: ping, I'd like to hear feedback on the above before landing this.
Comment 13 K Lars Lohn [:lars] [:klohn] 2012-05-21 14:40:56 PDT
sorry, I'm lost in a turbulent sea of urgent bugs.

Unfortunately, there are addons with the '%' character in the version number.  While the examples of this are clearly formatting errors on the part of the add-on writer, we get them never the less:

breakpad=> select distinct extension_id, extension_version from extensions_20120521 where extension_version like E'%\\%%';
              extension_id              |     extension_version      
----------------------------------------+----------------------------
 {D19CA586-DD6C-4a0a-96F8-14644F340D60} | %ScriptScanProductVersion%
 moveplayer@movenetworks.com            | 1.0.0.%(version)s
(2 rows)

If this change were to land soon, what versions of FF would get it?  How soon would Socorro start receiving crashes with encoded addon info?

I've added the encoding to the new processor slated for Socorro 13 (three weeks out - conceivably delayed by the newly urgent Rapid Beta system), but if the crashes of this type were to start coming in before Socorro 13, I'm going to have to backport the encoding stuff into the current processor.  While not particularly onerous, I'd have to work it into my schedule.
Comment 14 Dave Townsend [:mossop] 2012-05-21 14:52:10 PDT
(In reply to K Lars Lohn [:lars] [:klohn] from comment #13)
> sorry, I'm lost in a turbulent sea of urgent bugs.
> 
> Unfortunately, there are addons with the '%' character in the version
> number.  While the examples of this are clearly formatting errors on the
> part of the add-on writer, we get them never the less:
> 
> breakpad=> select distinct extension_id, extension_version from
> extensions_20120521 where extension_version like E'%\\%%';
>               extension_id              |     extension_version      
> ----------------------------------------+----------------------------
>  {D19CA586-DD6C-4a0a-96F8-14644F340D60} | %ScriptScanProductVersion%
>  moveplayer@movenetworks.com            | 1.0.0.%(version)s
> (2 rows)

Looks like urllib.unquote just ignores %'s that aren't followed by two hex characters so at least those two cases would remain the same with the additional unescaping we're talking about.

> If this change were to land soon, what versions of FF would get it?  How
> soon would Socorro start receiving crashes with encoded addon info?

If I landed it today then Nightly users from as early as tomorrow would start submitting crash data, Aurora users from June 5th, Beta users from 6 weeks after that and Release users 6 weeks later again.

> I've added the encoding to the new processor slated for Socorro 13 (three
> weeks out - conceivably delayed by the newly urgent Rapid Beta system), but
> if the crashes of this type were to start coming in before Socorro 13, I'm
> going to have to backport the encoding stuff into the current processor. 
> While not particularly onerous, I'd have to work it into my schedule.

Ok, so sounds like we should just put this on hold till the Socorro side is updated. Let me know when that happens.
Comment 15 K Lars Lohn [:lars] [:klohn] 2012-05-23 07:35:08 PDT
the Socorro half of this issue (Bug 757650) has been merged into master on github.  It is targeted for Socorro 11 that goes into staging tomorrow, May 24.  It is slated for pushing into production on Wednesday, May 30.  On that day, production Socorro will be able to handle encoded add-on data, and this bug is released from being on hold.
Comment 16 Dave Townsend [:mossop] 2012-07-18 15:19:47 PDT
Lost this. Are we good to go now? Did everything make it into production as expected?
Comment 17 Dave Townsend [:mossop] 2012-07-27 10:29:44 PDT
(In reply to Dave Townsend (:Mossop) from comment #16)
> Lost this. Are we good to go now? Did everything make it into production as
> expected?

Lars?
Comment 18 K Lars Lohn [:lars] [:klohn] 2012-07-27 10:43:23 PDT
this feature landed in Socorro in late May.
Comment 19 Dave Townsend [:mossop] 2012-07-27 13:47:38 PDT
On inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/1f302f26be3c
Comment 20 Ryan VanderMeulen [:RyanVM] 2012-07-28 18:37:30 PDT
https://hg.mozilla.org/mozilla-central/rev/1f302f26be3c

Note You need to log in before you can comment on or make changes to this bug.