Closed Bug 1104918 Opened 5 years ago Closed 5 years ago

ts_paint_cold in Win7 talos-xperf is crashing starting Nov 21 as a result of fx-team push fc5225b5022b

Categories

(Toolkit :: Storage, defect)

x86
Windows 7
defect
Not set
Points:
1

Tracking

()

RESOLVED FIXED
mozilla37
Iteration:
36.3
Tracking Status
firefox36 --- fixed
firefox37 --- fixed

People

(Reporter: jmaher, Assigned: mak)

References

Details

(Keywords: crash, regression)

in bug 1101478, we started causing the xperf job on talos (only runs on win7) to show a crash.  This crash is documented in bug 1103966.

After a bunch of retriggers:
https://treeherder.mozilla.org/#/jobs?repo=fx-team&searchQuery=xperf&fromchange=5ba06e4f49e8&tochange=c773c6be9270

we traced it down to:
https://hg.mozilla.org/integration/fx-team/rev/fc5225b5022b
the code changes have nothing that could cause a crash... provided we can trust the reg-range, here something changed timing due to those changes, but I'm not sure if it may be a thread-safety issue or use-after-shutdown.
looks like we are here
http://mxr.mozilla.org/mozilla-central/source/storage/src/mozStorageConnection.cpp#1239
and the comptr being destroyed sounds like the pragma stmt (mostly guessing).
it's possible the connection handle is invalid

does this reproduce running ts_paint_cold on a local machine? (is that still possible with current Talos?)
for now I'm moving this to toolkit/storage since we are crashing in Sqlite
Component: General → Storage
Product: Firefox → Toolkit
I am not sure if this can reproduce locally.  I only have a win7 VM which I need to rebuild as there is no space to install stuff.

Talos is easy to run locally:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

For this specific test case it would be something like:
./talos -e <path>\firefox.exe -a ts_paint_cold --develop
:mak, can you help us figure out a path to fixing this?  If you need help reproducing this locally, or for me to get a loaner machine setup for you...just name it!
Flags: needinfo?(mak77)
this is showing up on the top 10 list of stared issues by the sheriffs.  RyanVM, should we back this out?  I presume when whomever has time to look into this can do it, fix it, then reland the change :)
Flags: needinfo?(ryanvm)
I need to try setting up talos locally and run the failing test. do we have updated docs on how to do that? the last time I tried the docs were really lacking.
I assume this would also fail on Win8.1? That's what I have here to test.
this should fail on win8, I am not certain of that.

Here are the docs:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

In general all the setup requirements will be installed while bootstrapping the python environment.

To run it once setup do:
talos -e `which firefox` -a ts_paint_cold --develop
30/30 green Win7 PGO xperf with that backed out.
I thought it was ts_paint_cold, not xperf. Is there a difference in how to run those?

I was able to run tests (ts_paint_cold) as explained in comment 7 (yesterday, now they don't work anymore for whatever reason, the documentation is lacking if you use mozilla-build on Win) for more than a hundred times, and it never crashed.
That means it's likely time/scheduler sensitive and we are unlikely to be able to crash in a debug build that could give us more details.
If you should be able to crash on a tinderbox with a debug build, that could help (we'd have useful assertions from our code and sqlite code).

Off hand, the problem seems related to running initializeClone very late in the shutdown cycle, but all of the calls seem to verify we can proceed, so I couldn't find a clear issue for now.
I guess I'll have to make some experimental patches and test those on Try.

For now we are not blocked by the fix in bug 1101478 (we will be in future though), while I don't think that is the cause of the crash, it is likely improving shutdown time and thus uncovering an existing bug there. If this is very problematic for sheriffs you can back it out for now and we can keep on investigating on Try.
Assignee: nobody → mak77
Flags: needinfo?(mak77)
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #10)
> I thought it was ts_paint_cold, not xperf. Is there a difference in how to
> run those?

ts_paint_cold runs in xperf. From bug 1103966:
"buildname: Windows 7 32-bit mozilla-inbound talos xperf"
Summary: ts_paint_cold is crashing on win7 starting Nov 21 as a result of fx-team push fc5225b5022b → ts_paint_cold and xperf are crashing on win7 starting Nov 21 as a result of fx-team push fc5225b5022b
Summary: ts_paint_cold and xperf are crashing on win7 starting Nov 21 as a result of fx-team push fc5225b5022b → ts_paint_cold in Win7 talos-xperf is crashing starting Nov 21 as a result of fx-team push fc5225b5022b
https://hg.mozilla.org/mozilla-central/rev/8e5fce59af55
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
You need to log in before you can comment on or make changes to this bug.