Closed Bug 1133865 Opened 9 years ago Closed 9 years ago

crash in IPC::Message::EnsureFileDescriptorSet()

Categories

(Core :: Graphics, defect)

All
Android
defect
Not set
critical

Tracking

()

VERIFIED FIXED
2.2 S6 (20feb)
blocking-b2g 2.5+
Tracking Status
firefox36 --- unaffected
firefox37 --- unaffected
firefox38 --- fixed
b2g-v2.2 --- unaffected
b2g-master --- verified

People

(Reporter: pbylenga, Assigned: mchang)

References

Details

(Keywords: crash, regression, smoketest)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-193eb207-0d1a-46d0-bf88-0d30c2150217.
=============================================================

Been able to reproduce this crash just flashing to recent nightly flame-kk build.

Also see: https://crash-stats.mozilla.com/report/index/467c2fbb-77fe-49c9-a354-de9602150217

STRs:
Run the following automation script to reproduce easily

adb forward tcp:2828 tcp:2828 && gaiatest --testvars=testvars.json --address=localhost:2828 --timeout=30000 gaiatest/tests/functional/clock/test_clock_create_new_alarm.py --restart --repeat=5

Environmental Variables:
Device: Flame 3.0 Eng
Build ID: 20150217074221
Gaia: ae02fbdeae77b2002cebe33c61aedeee4b9439fd
Gecko: 4bb425001d8a
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 38.0a1 (3.0)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0
[Blocking Requested - why for this release]:
blocking-b2g: --- → 3.0?
Flags: needinfo?(nhirata.bugzilla)
blocking-b2g: 3.0? → 3.0+
QA Contact: jmercado
Finding a range for this issue has been difficult.  There is a large gap when working with tinderbox builds to attempt to find a window.  Of note is that using the changeset from the nightly build: 2015-02-16-01-03-44 results in the same pushlog as that across the gap.

More work is being done to narrow this down further.

Central Regression Window:

Last Working 
Environmental Variables:
Device: Flame 3.0
BuildID: 20150214032245
Gaia: f0b93e0668ef9565bd6f050b15b4f794d59feb65
Gecko: e0cb32a0b1aa
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 38.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0

First Broken 
Environmental Variables:
Device: Flame 3.0
BuildID: 20150216070747
Gaia: ae02fbdeae77b2002cebe33c61aedeee4b9439fd
Gecko: 09f4968d5f42
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 38.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0

Gecko Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e0cb32a0b1aa&tochange=09f4968d5f42
B2g-inbound Regression Window

Last Working 
Environmental Variables:
Device: Flame 3.0
BuildID: 20150213141858
Gaia: fa244edb7b89bf5331da2ddc87875845eec8e675
Gecko: 2ae1fcb56411
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 38.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0

First Broken 
Environmental Variables:
Device: Flame 3.0
BuildID: 20150213145444
Gaia: fa244edb7b89bf5331da2ddc87875845eec8e675
Gecko: 0e2e7780755b
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 38.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0

Last Working gaia / First Broken gecko - Issue DOES occur
Gaia: fa244edb7b89bf5331da2ddc87875845eec8e675
Gecko: 0e2e7780755b

First Broken gaia / Last Working gecko - Issue does NOT occur
Gaia: fa244edb7b89bf5331da2ddc87875845eec8e675
Gecko: 2ae1fcb56411

Gaia Pushlog: http://hg.mozilla.org/integration/b2g-inbound/pushloghtml?fromchange=2ae1fcb56411&tochange=0e2e7780755b
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Attached file omni.ja —
The regression window points to this bug : https://bugzilla.mozilla.org/show_bug.cgi?id=1123762

Which is just a preference change.
I made the change locally after flashing the latest nightly and it seems stable.

in order to see this,
1) download the omni.ja file
2) adb remount
3) adb push omni.ja /system/b2g/omni.ja
Depends on: 1123762
Flags: needinfo?(nhirata.bugzilla)
Mason, can you take a look at this please?
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker) → needinfo?(mchang)
Mason, Jerry, Wesly, the gfx.vsync.refreshdriver is causing the flame-kk build to crash often.

We would need:
1) to be able to have this temporarily turned off while T2M looks into the issue,
2) T2M to look into why it's causing the crash.
Flags: needinfo?(wehuang)
Flags: needinfo?(hshih)
Flags: needinfo?(mchang)
RyanVM|sheriffduty: nhirata_: fun story, I was targeting bug 1123762 for another backout too
RyanVM|sheriffduty: nhirata_: you already missed the 4pm nightlies, so I'm going to pass on it for a little bit
RyanVM|sheriffduty: but I'll get it done today still

We'll have to wait for it to get backed out and then pushed to a new build.  The fix should be in tomorrow's build.  in the meantime there's a workaround listed in comment 5.
See Also: → 1130051
Assignee: nobody → mchang
Bug 1123762 backed out.
https://hg.mozilla.org/mozilla-central/rev/8f0354b995ea
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.2 S6 (20feb)
Hi, bug 1123762 has been backed out. I haven't been able to reproduce this locally, can you please verify that the issue is no longer occurring? Thanks!
Flags: needinfo?(pbylenga)
Flags: needinfo?(pbylenga)
Keywords: verifyme
This issue no longer occurs on the latest 3.0 Flame KK build.

Actual Results: The phone does not show crash messages when running automation.

Environmental Variables:
Device: Flame 3.0
BuildID: 20150218010226
Gaia: 82f286f10a41aab84a0796c89fbefe67b179994b
Gecko: 9696d1c4b3ba
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 38.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Hi Peter,

I just use the pvt build with build id:20150217074221.
And I can't reproduce the crash. All tests passed in my local test.
reproduce rate:0/40
How do you get the crash log? From adb logcat or other method?

Here is my testing command:
adb forward tcp:2828 tcp:2828 && gaiatest --testvars=testvars_bignose.json --address=localhost:2828 --timeout=30000 tests/python/gaia-ui-tests/gaiatest/tests/functional/clock/test_clock_create_new_alarm.py --restart --repeat=20

I also add the following setting in my test json profile.
  "acknowledged_risks": true,
  "skip_warning": true,

The log:
Results will not be posted to Treeherder. Please set the following environment variables to enable Treeherder reports: TREEHERDER_KEY, TREEHERDER_SECRET
starting httpd
running webserver on http://10.247.25.138:64201/
mozversion application_buildid: 20150217074221
mozversion application_changeset: 4bb425001d8a
mozversion application_display_name: B2G
mozversion application_id: {3c2e2abc-06d4-11e1-ac3b-374f68613e61}
mozversion application_name: B2G
mozversion application_remotingname: b2g
mozversion application_repository: https://hg.mozilla.org/mozilla-central
mozversion application_vendor: Mozilla
mozversion application_version: 38.0a1
mozversion device_firmware_date: 1424189930
mozversion device_firmware_version_base: L1TC100118D0
mozversion device_firmware_version_incremental: eng.cltbld.20150217.111839
mozversion device_firmware_version_release: 4.4.2
mozversion device_id: flame
mozversion gaia_changeset: ae02fbdeae77b2002cebe33c61aedeee4b9439fd
mozversion gaia_date: 1424081833
mozversion platform_buildid: 20150217074221
mozversion platform_changeset: 4bb425001d8a
mozversion platform_repository: https://hg.mozilla.org/mozilla-central
mozversion platform_version: 38.0a1
SUITE-START | Running 1 tests
TEST-START | test_clock_create_new_alarm.py TestClockCreateNewAlarm.test_clock_create_new_alarm
TEST-PASS | test_clock_create_new_alarm.py TestClockCreateNewAlarm.test_clock_create_new_alarm | took 146117ms
......
Flags: needinfo?(hshih) → needinfo?(pbylenga)
It's not a 100 % crash rate.  It's intermittent.  Not sure why you aren't seeing the crash, try the build from 2/16?
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #7)
> Mason, Jerry, Wesly, the gfx.vsync.refreshdriver is causing the flame-kk
> build to crash often.
> 
> We would need:
> 1) to be able to have this temporarily turned off while T2M looks into the
> issue,
> 2) T2M to look into why it's causing the crash.

Is this issue reproducible in pure T2M's release (base image v188 or v18D)? I assume not since this is something in v3, then it will be hard for T2M to help (unless we have identified something strange/wrong in their base image release in this case). Anyway the issue seems verified fixed now?
Flags: needinfo?(wehuang) → needinfo?(nhirata.bugzilla)
I also try the first broken version in comment 3, but I still can't reproduce. :(

adb forward tcp:2828 tcp:2828 && gaiatest --testvars=testvars_bignose.json --address=localhost:2828 --timeout=30000 tests/python/gaia-ui-tests/gaiatest/tests/functional/clock/test_clock_create_new_alarm.py --restart --repeat=20
Results will not be posted to Treeherder. Please set the following environment variables to enable Treeherder reports: TREEHERDER_KEY, TREEHERDER_SECRET
starting httpd
running webserver on http://10.247.25.138:56608/
mozversion application_buildid: 20150213145444
mozversion application_changeset: 0e2e7780755b
mozversion application_display_name: B2G
mozversion application_id: {3c2e2abc-06d4-11e1-ac3b-374f68613e61}
mozversion application_name: B2G
mozversion application_remotingname: b2g
mozversion application_repository: https://hg.mozilla.org/integration/b2g-inbound
mozversion application_vendor: Mozilla
mozversion application_version: 38.0a1
mozversion device_firmware_date: 1423869324
mozversion device_firmware_version_base: L1TC100118D0
mozversion device_firmware_version_incremental: eng.cltbld.20150213.181513
mozversion device_firmware_version_release: 4.4.2
mozversion device_id: flame
mozversion gaia_changeset: fa244edb7b89bf5331da2ddc87875845eec8e675
mozversion gaia_date: 1423863236
mozversion platform_buildid: 20150213145444
mozversion platform_changeset: 0e2e7780755b
mozversion platform_repository: https://hg.mozilla.org/integration/b2g-inbound
mozversion platform_version: 38.0a1
....
SUMMARY
-------
passed: 21
failed: 0
todo: 0
SUITE-END | took 3048s
Wesly, this bug happened in our build as a smoke test blocker over the weekend ( 2/14 ).  It should not happen on 188 nor 18D as they were made before the bug occurred.  I think we are safe there.

Jerry, I'm not sure.  The only other thing I can think of is that you don't have the pref turned on.  You don't necessarily have to run the automation to hit the crash.  I hit the crash 10 seconds on idle after flashing the device and waiting for the OS to come up with the build you mentioned.
Flags: needinfo?(nhirata.bugzilla)
Clearing NI, Comment 16 served the request.  It's verified fixed by backout of bug 1123762.
Flags: needinfo?(pbylenga)
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #16)
> Wesly, this bug happened in our build as a smoke test blocker over the
> weekend ( 2/14 ).  It should not happen on 188 nor 18D as they were made
> before the bug occurred.  I think we are safe there.

Thanks Nzoki, clear.
Well we still plan to land bug 1123762, so it'd be nice if we could reproduce this before we reland the bug. I haven't been able to reproduce this as well.

(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #16)
> Wesly, this bug happened in our build as a smoke test blocker over the
> weekend ( 2/14 ).  It should not happen on 188 nor 18D as they were made
> before the bug occurred.  I think we are safe there.
> 
> Jerry, I'm not sure.  The only other thing I can think of is that you don't
> have the pref turned on.  You don't necessarily have to run the automation
> to hit the crash.  I hit the crash 10 seconds on idle after flashing the
> device and waiting for the OS to come up with the build you mentioned.

Was this consistent or did the device eventually boot up? So the problem went away once the phone hit the homescreen. It'd be good to know if it only crashes during start up. Thanks!
Flags: needinfo?(nhirata.bugzilla)
I don't know if this makes a difference; we have 18D as a base build + full flashing the build ( ie fastboot flash images )

The crash happens at random intervals.  It happens pretty consistently running the automation (not 100 % ), having said that you can idle the device and still hit the crash.
Flags: needinfo?(nhirata.bugzilla)
Mason, I think what we could do is build on cedar or send QA a build to test against.
Please also make sure your backup-flame is using 18D as the base blob.
Is possibly related to this crash - https://crash-stats.mozilla.com/report/index/7c23e927-c2b9-4f80-b225-ec7302150302

I was stuck in a crash loop forever on today's nightly on OS X.
Let me check the nightly build on my mac.
See Also: → 1139090
Just a heads up, bug 1123762 is in inbound again.
Moving the bug to the component where the regression came from.
Component: General → Graphics
Product: Firefox OS → Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: