Performance (b2gperf) tests crashing on b2g-inbound builds

RESOLVED FIXED

Status

P1
blocker
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: davehunt, Assigned: hub)

Tracking

(4 keywords)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [c=automation p=1 s= u=])

Attachments

(2 attachments)

Since yesterday the performance tests have been crashing after just a few iterations. I can reproduce this locally and it doesn't appear to matter which application is being tested (replicated with both Phone and Contacts).

device_firmware_date: 1403855878
device_firmware_version_incremental: 110
device_firmware_version_release: 4.3
device_id: flame

Last good:
application_buildid: 20140730104209
application_changeset: b8d783033da7
build_changeset: 3aa6abd313f965a84aa86c6b213dc154e4875139
gaia_changeset: b67ddd7d40b52e65199478b8d6631c2c28fdf41d
gaia_date: 1406740488
platform_buildid: 20140730104209
platform_changeset: b8d783033da7

First bad:
application_buildid: 20140730105005
application_changeset: 4cc9e0c5dd67
build_changeset: 3aa6abd313f965a84aa86c6b213dc154e4875139
gaia_changeset: c2d7dafab9dcadf1b5a099972d4c7647dcc4e276
gaia_date: 1406740488
platform_buildid: 20140730105005
platform_changeset: 4cc9e0c5dd67

Comment 1

4 years ago
Hub,

I need you to look into this and identify the root cause.Please work with Dave Hunt and anyone else needed to resolve this.

Thanks,
Mike
Severity: normal → blocker
Status: NEW → ASSIGNED
Component: General → Performance
Keywords: perf
Priority: -- → P1
Whiteboard: [c=automation p= s= u=]
Example console output demonstrating the issue is below. When I've replicated locally I see the device perform a reboot.

b2gperf --address=localhost:2828 --device=356cd072 --delay=10 --sources=sources.xml --testvars=/home/webqa/webqa-credentials/b2g/b2g-13.1.json --dz-project=b2g --dz-branch=master --dz-device=flame --dz-key=**** --dz-secret=**** --dz-build-url=http://jenkins1.qa.scl3.mozilla.com/job/flame.b2g-inbound.perf.b2gperf/950/ --reset Phone Contacts Messages Settings Gallery Video Music Camera Email Calendar Clock FM Radio Usage Template Browser
2014-07-30 11:56:51,959 B2GPerfRunner INFO | Running B2GPerfLaunchTest
2014-07-30 11:58:37,820 B2GPerfRunner INFO | Phone [1/30]
2014-07-30 11:58:49,098 B2GPerfRunner INFO | Phone [2/30]
2014-07-30 11:59:00,470 B2GPerfRunner INFO | Phone [3/30]
2014-07-30 11:59:11,797 B2GPerfRunner INFO | Phone [4/30]
2014-07-30 11:59:23,183 B2GPerfRunner INFO | Phone [5/30]
2014-07-30 11:59:34,167 B2GPerfRunner INFO | Phone [6/30]
2014-07-30 11:59:45,144 B2GPerfRunner INFO | Phone [7/30]
2014-07-30 11:59:56,051 B2GPerfRunner INFO | Phone [8/30]
2014-07-30 12:00:07,158 B2GPerfRunner INFO | Phone [9/30]
Traceback (most recent call last):
  File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/bin/b2gperf", line 9, in <module>
    load_entry_point('b2gperf==0.32', 'console_scripts', 'b2gperf')()
  File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/b2gperf/b2gperf.py", line 595, in cli
    b2gperf.measure_app_perf(args)
  File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/b2gperf/b2gperf.py", line 201, in measure_app_perf
    test.run()
  File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/b2gperf/b2gperf.py", line 338, in run
    self.test()
  File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/b2gperf/b2gperf.py", line 377, in test
    'launch("%s")' % self.app_name)
  File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/marionette/marionette.py", line 1166, in execute_async_script
    filename=os.path.basename(frame[0]))
  File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/marionette/decorators.py", line 35, in _
    return func(*args, **kwargs)
  File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/marionette/marionette.py", line 590, in _send_message
    response = self.client.send(message)
  File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/marionette_transport/transport.py", line 100, in send
    response = self.receive()
  File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/marionette_transport/transport.py", line 57, in receive
    raise IOError(self.connection_lost_msg)
IOError: Connection to Marionette server is lost. Check gecko.log (desktop firefox) or logcat (b2g) for errors.
Summary: Performance tests crashing on b2g-inbound builds → Performance (b2gperf) tests crashing on b2g-inbound builds
I've just tested locally with the two b2g-inbound builds around the regression (without resetting gaia) and was unable to replicate the crash. This would imply it's a gaia issue between b67ddd7d40b52e65199478b8d6631c2c28fdf41d and c2d7dafab9dcadf1b5a099972d4c7647dcc4e276 however I've run out of time today for investigating this.
(In reply to Dave Hunt (:davehunt) from comment #3)
> I've just tested locally with the two b2g-inbound builds around the
> regression (without resetting gaia) and was unable to replicate the crash.
> This would imply it's a gaia issue between
> b67ddd7d40b52e65199478b8d6631c2c28fdf41d and
> c2d7dafab9dcadf1b5a099972d4c7647dcc4e276 however I've run out of time today
> for investigating this.

https://github.com/mozilla-b2g/gaia/compare/b67ddd7d40b52e65199478b8d6631c2c28fdf41d...c2d7dafab9dcadf1b5a099972d4c7647dcc4e276
(In reply to Jason Smith [:jsmith] - At Work Week, Slow to Respond from comment #4)
> (In reply to Dave Hunt (:davehunt) from comment #3)
> > I've just tested locally with the two b2g-inbound builds around the
> > regression (without resetting gaia) and was unable to replicate the crash.
> > This would imply it's a gaia issue between
> > b67ddd7d40b52e65199478b8d6631c2c28fdf41d and
> > c2d7dafab9dcadf1b5a099972d4c7647dcc4e276 however I've run out of time today
> > for investigating this.
> 
> https://github.com/mozilla-b2g/gaia/compare/
> b67ddd7d40b52e65199478b8d6631c2c28fdf41d...
> c2d7dafab9dcadf1b5a099972d4c7647dcc4e276

Maybe bug 1045132 caused this?

ahal - what do you think?
Flags: needinfo?(ahalberstadt)
(In reply to Jason Smith [:jsmith] - At Work Week, Slow to Respond from comment #5)
> (In reply to Jason Smith [:jsmith] - At Work Week, Slow to Respond from
> comment #4)
> > (In reply to Dave Hunt (:davehunt) from comment #3)
> > > I've just tested locally with the two b2g-inbound builds around the
> > > regression (without resetting gaia) and was unable to replicate the crash.
> > > This would imply it's a gaia issue between
> > > b67ddd7d40b52e65199478b8d6631c2c28fdf41d and
> > > c2d7dafab9dcadf1b5a099972d4c7647dcc4e276 however I've run out of time today
> > > for investigating this.
> > 
> > https://github.com/mozilla-b2g/gaia/compare/
> > b67ddd7d40b52e65199478b8d6631c2c28fdf41d...
> > c2d7dafab9dcadf1b5a099972d4c7647dcc4e276
> 
> Maybe bug 1045132 caused this?
> 
> ahal - what do you think?

Ack. Mistyped the bug #. Meant to say bug 1045142.
I just dug into our smoketest reports for today as a point of comparison. The bug that's causing us trouble on our side is bug 1038854. It's causing the camera to fail to start (might also be the reason why email is crashing too).

Comment 9

4 years ago
Ting, can you help here? We need to understand what about your changes in bug 1038854 is causing these test crashes.
Flags: needinfo?(tchou)
ahal mentioned in IRC that bug 1045132 was unlikely to be the cause of this bug, as he thinks the runner service isn't used by anything yet.
Flags: needinfo?(ahalberstadt)
(In reply to Mike Lee [:mlee] from comment #9)
> Ting, can you help here? We need to understand what about your changes in
> bug 1038854 is causing these test crashes.

Note - if bug 1038854 is the cause, then this should be resolved when a new build gets spun with the backout included.
(Assignee)

Comment 12

4 years ago
I noticed this yesterday on my device too. I'll update my tree and try again.
(Assignee)

Comment 13

4 years ago
By notice this, I mean with |make test-perf|. It is an actual crash of Gecko as on the screen it says "B2G crashed".

Looking at bug 1045142 I doubt this code is used anywhere with |make test-perf|
(Assignee)

Comment 14

4 years ago
I just update and rebuilt. My top gecko commit is:

commit c60b44a7b137ed1ebb3444efebb089d755424d54
Author: Wes Kocher <wkocher@mozilla.com>
Date:   Thu Jul 31 15:04:49 2014 -0700

    Backed out changeset f73cd738c1fe (bug 1038854) a=backout


which is the backout for the bug mentioned above.

It still crashes. I'll try to dig further, but we might need to bisect.
(Assignee)

Comment 15

4 years ago
bisecting it right now.
(Assignee)

Updated

4 years ago
Keywords: regression
(Assignee)

Updated

4 years ago
Assignee: nobody → hub
Whiteboard: [c=automation p= s= u=] → [c=automation p=1 s= u=]
Clear NI per comment 14.
Flags: needinfo?(tchou)
(Assignee)

Comment 17

4 years ago
to reproduce |APP=clock RESTART_B2G=0 make test-perf|

I crashes b2g when doing that.
(Assignee)

Comment 18

4 years ago
I confirm that bug 1038854 isn't the source as the crash occurs before this bug was checked in and after it was reverted.
[Blocking Requested - why for this release]:

Regression in an existing test suite that must stay up to allow us to do performance measurements.
blocking-b2g: --- → 2.1?
Keywords: qablocker
(In reply to Hubert Figuiere [:hub] from comment #18)
> I confirm that bug 1038854 isn't the source as the crash occurs before this
> bug was checked in and after it was reverted.

hub - can you get a crash stack for the crash being seen here? I could dig through bugzilla here to see if there's a stack already with the crash you are seeing if I know the crash stack.
Flags: needinfo?(hub)
Not sure if I am running at the same issue here, but when I do |make test-perf|, eventually b2g process crashes, and keeps restarting and crashing even after reboot. The only way to fix is to reflash the phone. Attachment 8466516 [details]  contains the dmesg output and Attachment 8466517 [details] shows the gdb stack trace.
Flags: needinfo?(hub)
(You can work around by not using a debug build.)
(Assignee)

Comment 26

4 years ago
I don't have the "keep restarting" here that Wander is saying he has though.

Bisect result

8062fdbcecee32574f64f4a0553a4da053a91d93 is the first bad commit
commit 8062fdbcecee32574f64f4a0553a4da053a91d93
Author: Sean Lin <selin@mozilla.com>
Date:   Tue Jun 24 10:51:48 2014 +0800

    Bug 874353 - Remove CPU wake lock control from ContentParent. r=gene, khuey

:040000 040000 08e93bb32d9c44606f7ea3860e37ed657258c16f f96f62965e32c498aafb5b384843b3bf08ac4dcc M	dom
(Assignee)

Comment 27

4 years ago
It is a git bisect using the B2G tree. sha1 needs to be map to the actual hg sha1. Just in case we weren't clear on that.
Blocks: 874353
(Assignee)

Comment 28

4 years ago
Also the patch has already been backed out due to bug 1046956 (possibly).

And after that it no longer crashes.

Backtracking to before the back out.
still crashes.
At the back out:
it no longer crashes.

Back out is git revision:

commit 19e5d2d26c417bd79a6c33d7fb1b4bedfb4ec713
Author: Kyle Huey <khuey@kylehuey.com>
Date:   Fri Aug 1 11:02:55 2014 -0700

    Back out bug 874353, which is suspected of causing bug 1046956. r=me a=backout
Hub - Can we close this then if we've confirmed this no longer reproduces with the backout of bug 874353?
Flags: needinfo?(hub)
(Assignee)

Comment 30

4 years ago
Of course we can.

See comment 28 for the resolution.
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Flags: needinfo?(hub)
Resolution: --- → FIXED

Updated

4 years ago
Duplicate of this bug: 1049881
blocking-b2g: 2.1? → ---
Keywords: crash
You need to log in before you can comment on or make changes to this bug.