Open Bug 1544735 Opened 2 years ago Updated 2 years ago

Firefox is crashing with builds from around the 2017-07-19

Categories

(Testing :: mozregression, defect, P3)

Version 3
defect

Tracking

(Not tracked)

People

(Reporter: calixte, Unassigned)

Details

When I run:
mozregression --good 2017-07-01

Firefox is crashing (no way to get a crash report)

The output in the console:
0:00.25 INFO: No 'bad' option specified, using 2019-04-16
0:00.45 WARNING: Skipping build 2019-04-16: Unable to find build info for 2019-04-16
0:01.28 INFO: Testing good and bad builds to ensure that they are really good and bad...
0:01.28 INFO: Downloading build from: https://archive.mozilla.org/pub/firefox/nightly/2017/07/2017-07-01-10-02-36-mozilla-central/firefox-56.0a1.en-US.linux-x86_64.tar.bz2
===== Downloaded 100% =====
0:02.11 INFO: Running mozilla-central build for 2017-07-01
0:13.08 INFO: Launching /tmp/tmpQl0FxU/firefox/firefox
0:13.08 INFO: Application command: /tmp/tmpQl0FxU/firefox/firefox -profile /tmp/tmpDvaVXd.mozrunner
0:13.08 INFO: application_buildid: 20170701100236
0:13.08 INFO: application_changeset: 587daa4bdc4b40b7053f4ca3b74723ca747f3b52
0:13.08 INFO: application_name: Firefox
0:13.08 INFO: application_repository: https://hg.mozilla.org/mozilla-central
0:13.08 INFO: application_version: 56.0a1

and if I run manually "/tmp/tmpQl0FxU/firefox/firefox -profile /tmp/tmpDvaVXd.mozrunner", it's ok.

I tried to use mozregression itself to find the regression:
mozregression --good 2017-07-01 --bad 2017-08-01

and a crashy firefox was "good" and a not crashy one was "bad" and I got this regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1b065ffd8a535a0ad4c39a912af18e948e6a42c1&tochange=0985725c848ec0cfc6f2f3c3a5aa3d71321e7620

My os is a Debian (#1 SMP Debian 4.19.16-1 (2019-01-17)) and mozregression version is 2.3.40.

(In reply to Calixte Denizet (:calixte) from comment #0)

and if I run manually "/tmp/tmpQl0FxU/firefox/firefox -profile /tmp/tmpDvaVXd.mozrunner", it's ok.

Hmm, how odd. I wonder what the difference could be? There shouldn't be any difference between running that command standalone vs. running it with mozregression, to my knowledge.

Here's where we're launching the browser, via mozrunner:

https://github.com/mozilla/mozregression/blob/master/mozregression/launchers.py#L224

It's possible there's a mozbase bug here, maybe in mozrunner or mozprofile. I've been planning on unpinning those dependencies soon (I think they're currently set to use rather old versions) when not in CI, perhaps that will fix it.

I found something:
if I comment line 36 mozrunner/base/browser.py (i.e.: self.env['MOZ_NO_REMOTE'] = '1' in class GeckoRuntimeRunner), then no crash.
So if I run the command ran by mozrunner but with MOZ_NO_REMOTE=1, I got:

[calixte@rouxpanda] /tmp$ MOZ_NO_REMOTE=1 /tmp/tmpUSScSX/firefox/firefox -profile /tmp/tmprpIZU9.mozrunner
Sandbox: seccomp sandbox violation: pid 14039, tid 14039, syscall 217, args 38 140417745686576 32768 1472 0 140418363687608.  Killing process.
[Parent 13968] WARNING: pipe error (54): Connexion ré-initialisée par le correspondant: file /home/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353

###!!! [Parent][MessageChannel] Error: (msgtype=0x28007E,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv

[Parent 13968] WARNING: waitpid failed pid:14039 errno:10: file /home/worker/workspace/build/src/ipc/chromium/src/base/process_util_posix.cc, line 276

And it crashes too with --no-remote or --new-instance.

And it's ok with "MOZ_DISABLE_CONTENT_SANDBOX=1".

The priority flag is not set for this bug.
:wlach, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(wlachance)

Going to redirect this to :ahal, as this is far from my area of expertise.

Flags: needinfo?(wlachance) → needinfo?(ahal)

I think the needinfo was just to set a priority :). I'll set P3 to satisfy the nag.

As far as the problem, I'm not sure what's going on but a good next step would be to find someone who's familiar with MOZ_DISABLE_CONTENT_SANDBOX and try to figure out how mozregression does something differently w.r.t that.

Flags: needinfo?(ahal)
Priority: -- → P3

Hmm, reading over the bug I'm wondering if this is a problem with current builds of Firefox. Calixte, can you confirm this is only a problem with builds from 2017?

Realistically I'm not sure if making mozregression work for this narrow edge case of very old builds on a minority platform would be a good use of time.

Flags: needinfo?(cdenizet)

:wlach, yep the crashes stopped in september 2017.

Flags: needinfo?(cdenizet)
You need to log in before you can comment on or make changes to this bug.