Closed
Bug 770605
Opened 13 years ago
Closed 12 years ago
poison write during shutdown in release builds too
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: espindola, Assigned: espindola)
References
Details
(Whiteboard: [snappy:p1])
Attachments
(1 obsolete file)
In bug 732173 we got write poisoning enabled in debug builds. We now need to make that available on release builds too. The final objective is to have:
* Release builds default to calling _exit(0), but can optionally poison write and do a full shutdown.
* Debug builds default to poisoning write and doing a full shutdown, but can optionally call _exit(0) instead.
Before we get there we also want to have write poisoning enabled on a nightly to try to find writes not covered by our tests, so the objective of this bug is to get write poisoning enabled on both release and debug builds.
Assignee | ||
Comment 1•13 years ago
|
||
I have tested it locally by moving the poisoning earlier and checking that our bug reporter shows up during shutdown of a release build.
There is a try run at
https://tbpl.mozilla.org/?tree=Try&pusher=respindola@mozilla.com
Assignee: nobody → respindola
Status: NEW → ASSIGNED
Attachment #644992 -
Flags: review?(ted.mielczarek)
Comment 2•13 years ago
|
||
I'm all for write poisoning so we can get data on what writes are being lost. However I think
a) we should call exit(0) on nightlies
b) poisoning should be done non-destructively and cause a stack to be recorded, which would get submitted via telemetry.
I think this means that we should do poisoning randomly, so we can test both exit(0) and poisoned path
Assignee | ||
Comment 3•13 years ago
|
||
(In reply to Taras Glek (:taras) from comment #2)
> I'm all for write poisoning so we can get data on what writes are being
> lost. However I think
> a) we should call exit(0) on nightlies
> b) poisoning should be done non-destructively and cause a stack to be
> recorded, which would get submitted via telemetry.
>
> I think this means that we should do poisoning randomly, so we can test both
> exit(0) and poisoned path
Calling exit(0) and poisoning (crash or telemetery) are exclusive. I am not sure I understand when you want us to do one or the other.
Comment 4•13 years ago
|
||
To clarify:
* debug should poison with crash always
* release should exit(0) or poison 50% of the time depending on rand().
Comment 5•13 years ago
|
||
(In reply to Taras Glek (:taras) from comment #4)
> * release should exit(0) or poison 50% of the time depending on rand().
I meant
* nightly should exit(0) or poison 50% of the time depending on rand().
Non-nightly branches should exit(0) exclusively.
Assignee | ||
Comment 6•13 years ago
|
||
A quick summary from IRC.
We will now have 3 modes of operation which can be selected at runtime (via an env var most likely):
MOZ_POISON=NO ./firefox : will call _exit(0)
MOZ_POISON=Crash ./firefox : will crash on a late write
MOZ_POISON=Report ./firefox : will report a late write via telemetry with a stack trace.
The defaults will be
* Debug builds: Crash
* Nightly builds: Report
* Aurora, Beta and Releases: No
Assignee | ||
Comment 7•13 years ago
|
||
Comment on attachment 644992 [details] [diff] [review]
poison write during shutdown in release builds
Per discussion with Taras this patch is obsolete.
Attachment #644992 -
Attachment is obsolete: true
Attachment #644992 -
Flags: review?(ted.mielczarek)
Comment 8•13 years ago
|
||
Also, note this would be 50% of telemetry users. It doesn't really help to poison writes for non-telemetry users since we wouldn't report those.
Updated•13 years ago
|
Whiteboard: [snappy]
Comment 9•13 years ago
|
||
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #6)
> A quick summary from IRC.
>
> We will now have 3 modes of operation which can be selected at runtime (via
> an env var most likely):
>
> MOZ_POISON=NO ./firefox : will call _exit(0)
> MOZ_POISON=Crash ./firefox : will crash on a late write
> MOZ_POISON=Report ./firefox : will report a late write via telemetry with a
> stack trace.
>
> The defaults will be
> * Debug builds: Crash
> * Nightly builds: Report
> * Aurora, Beta and Releases: No
I think this mostly makes sense except for the Release channel. Currently calling _exit(0) on release builds is data-loss by default because of late writes coming from add-ons etc. We should focus on fixing that problem first I think.
Comment 10•13 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #9)
> I think this mostly makes sense except for the Release channel. Currently
> calling _exit(0) on release builds is data-loss by default because of late
> writes coming from add-ons etc. We should focus on fixing that problem
> first I think.
Lets see how it goes on nightly before we worry about release policy too much.
Comment 11•13 years ago
|
||
(In reply to comment #10)
> (In reply to Ehsan Akhgari [:ehsan] from comment #9)
>
> > I think this mostly makes sense except for the Release channel. Currently
> > calling _exit(0) on release builds is data-loss by default because of late
> > writes coming from add-ons etc. We should focus on fixing that problem
> > first I think.
>
> Lets see how it goes on nightly before we worry about release policy too much.
For sure. But in this particular case we already know that we'll have this problem. We just don't know which add-ons. :-)
Comment 12•13 years ago
|
||
I don't like the idea of making it random. Random would mean that, 25% of the time, the same mode will be chosen for three different tests. (1:8 chance of being exit(0) plus 1:8 chance of write poisoning) It should just alternate.
Comment 13•13 years ago
|
||
Ugh, submitted too early.
Alternation means a 100% chance after two tests (or even three) that it worked. This is more in line with how I believe most addon developers do testing. It would take around 10 tests to make the possibility of all tests being in the same mode small enough to be statistically insignificant, if we stick with random.
Comment 14•13 years ago
|
||
(In reply to comment #12)
> I don't like the idea of making it random. Random would mean that, 25% of the
> time, the same mode will be chosen for three different tests. (1:8 chance of
> being exit(0) plus 1:8 chance of write poisoning) It should just alternate.
We could compute an initial random number and store the result in a pref or something, so that people would face consistent behavior across multiple runs, but we'd still end up with two buckets of results.
Assignee | ||
Comment 15•13 years ago
|
||
> > The defaults will be
> > * Debug builds: Crash
> > * Nightly builds: Report
> > * Aurora, Beta and Releases: No
>
> I think this mostly makes sense except for the Release channel. Currently
> calling _exit(0) on release builds is data-loss by default because of late
> writes coming from add-ons etc. We should focus on fixing that problem
> first I think.
Correct. That is just where we want to get. Currently we have debug builds crash and other builds do nothing. The next step I think is implementing support for reporting for the nightly builds.
Updated•13 years ago
|
Whiteboard: [snappy] → [snappy:p1]
Assignee | ||
Comment 16•12 years ago
|
||
This has been fixed some time ago. We are getting late writes reports in telemetry.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•