Closed Bug 1185239 Opened 6 years ago Closed 6 years ago

Crash in Fennec and FirefoxOS due to unguarded access to nsINavHistoryService in nsPermissionManager

Categories

(Core :: DOM: Core & HTML, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla42
Tracking Status
firefox41 --- unaffected
firefox42 + fixed
fennec 42+ ---

People

(Reporter: gerard-majax, Assigned: nika)

References

Details

(5 keywords, Whiteboard: [dogfood-blocker])

Crash Data

Attachments

(3 files, 3 obsolete files)

Attached file gdb backtrace
STR: ./repo sync, build.sh

Expected: Gecko works.

Actual: Gecko boot loop.
Bisecting from acf6cee4eaba7976298c54e4c04915d1c78869e8 which I know was a good one.
In attachment 8635705 [details] we can see the segfault happens in extensions/cookie/nsPermissionManager.cpp and we have bug 1184397 that landed just a few hours ago ...
Flags: needinfo?(josh)
Flags: needinfo?(ehsan)
And FTR, this is on KK Z3c.
Or it's bug 1165263 ?
Cannot bisect anymore, this is the current status:

> $ git bisect log
> git bisect start
> # bad: [9d71fb0927b61a93a711af3faa4965f4e0d4980a] No bug, Automated blocklist update from host bld-linux64-spot-135 - a=blocklist-update
> git bisect bad 9d71fb0927b61a93a711af3faa4965f4e0d4980a
> # good: [acf6cee4eaba7976298c54e4c04915d1c78869e8] Merge b2g-inbound to m-c. a=merge
> git bisect good acf6cee4eaba7976298c54e4c04915d1c78869e8
> # bad: [423e03562bed6a192f3116ffd6b50c30600a714c] Bug 452800 - use ThreeDLightShadow for borders instead of ThreeDFace or ButtonFace, r=bz
> git bisect bad 423e03562bed6a192f3116ffd6b50c30600a714c
> # bad: [68793c3765f9e69d56894294dd01ec9f1a78c1c6] Bug 1131451 followup - Remove one more bidi-in-vertical-mode hack that is no longer required. r=dholbert
> git bisect bad 68793c3765f9e69d56894294dd01ec9f1a78c1c6
Yep, this is all new code from bug 1165263.
Blocks: 1165263
Flags: needinfo?(josh) → needinfo?(michael)
Some of the commits cannot be reverted easily.
Attached patch Potential fix possibility (obsolete) — Splinter Review
Is this specifically a B2G bug? I assumed that the tests would have caught something this bad :-/. I'll try to look into this, but I have no mechanism for testing the changes I'm making, as I don't have access to a B2G device.

My first impression is that this is because of the history API. My guess is that on whatever B2G setup you are booting into, the permissions manager is booting before history is initialized.

It would be great if someone could try out this patch. (replacing SOME_CONDITION_FOR_B2G with a real condition)
Flags: needinfo?(michael)
Flags: needinfo?(lissyx+mozillians)
(In reply to Michael Layzell [:mystor] from comment #8)
> Created attachment 8635716 [details] [diff] [review]
> Potential fix possibility
> 
> Is this specifically a B2G bug? I assumed that the tests would have caught
> something this bad :-/. I'll try to look into this, but I have no mechanism
> for testing the changes I'm making, as I don't have access to a B2G device.
> 
> My first impression is that this is because of the history API. My guess is
> that on whatever B2G setup you are booting into, the permissions manager is
> booting before history is initialized.
> 
> It would be great if someone could try out this patch. (replacing
> SOME_CONDITION_FOR_B2G with a real condition)

This shouldn't break migrations, it just means that permissions won't be using the history API to upgrade the permissions. Worst case scenario, some permissions will be lost in the migration, however, we have stored a backup copy of the pre-migration database, so we can fix that after the fact if we need to without breaking nightly.
This is a patch with the (AFAIK) correct macro. I also disabled the test which is very likely to fail if this change is made on B2G.

I still haven't actually tried this on a device, so I would appreciate it if someone did that for me.
Attachment #8635716 - Attachment is obsolete: true
Attachment #8635720 - Flags: review?(josh)
Assignee: nobody → michael
I can give this a try.
Flags: needinfo?(lissyx+mozillians)
Comment on attachment 8635720 [details] [diff] [review]
Don't use the Places API to migrate permissions database on B2G

My phone is booting again \o/
Attachment #8635720 - Flags: feedback+
Comment on attachment 8635720 [details] [diff] [review]
Don't use the Places API to migrate permissions database on B2G

Why not use feature detection instead? That is, null-check the result of getting the history service.
Good point jdm, that's what happens when you try to fix a bug like this on a train, without coffee :)

I removed the b2g test exclusion, because it was passing before on the emulator, which means that the places service should be avaliable in the b2g tests, and the tests should continue to pass.
Attachment #8635720 - Attachment is obsolete: true
Attachment #8635720 - Flags: review?(josh)
Attachment #8635730 - Flags: review?(josh)
Comment on attachment 8635730 [details] [diff] [review]
Feature-check for the Places API in nsPermissionManager migration code

Could you also check to make sure this also fixes the problem?

Also clearing ehsan's NI flag, as I think we've figured this out.
Flags: needinfo?(ehsan)
Attachment #8635730 - Flags: feedback?(lissyx+mozillians)
Duplicate of this bug: 1185277
This affects Fennec too
(In reply to Mark Finkle (:mfinkle) from comment #17)
> This affects Fennec too

Yeah, I've marked 1185277 as duplicate, the patch should fix both.
Comment on attachment 8635730 [details] [diff] [review]
Feature-check for the Places API in nsPermissionManager migration code

Review of attachment 8635730 [details] [diff] [review]:
-----------------------------------------------------------------

Wouldn't you need to disable test_permmanager_migrate_4-5.js?
Attachment #8635730 - Flags: review?(josh) → review+
(In reply to Ehsan Akhgari (not reviewing patches, not reading bugmail, needinfo? me!) from comment #19)
> Wouldn't you need to disable test_permmanager_migrate_4-5.js?

I didn't disable it because the tests were passing before on try, but I can disable it on fennec and b2g.
Here's a try push with the current setup: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0f727f333ab3
A local fennec build from cset 9c919ce631ea (current m-c tip) doesn't repro the startup crash for me; tried with a pre-existing profile as well as a new profile. So I can't verify the patch on a local build at least. I'll check the try build once it's done as that should displace my current crashing nightly build.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #21)
> A local fennec build from cset 9c919ce631ea (current m-c tip) doesn't repro
> the startup crash for me; tried with a pre-existing profile as well as a new
> profile. So I can't verify the patch on a local build at least. I'll check
> the try build once it's done as that should displace my current crashing
> nightly build.

It'll only happen if you have a dirty profile which has permissions entries and needs to be migrated to the new permissions system. Many people will probably have empty moz_hosts databases which won't need to be migrated. Could you try copying the profile from your crashing build and using that?
Ok, I copied my nightly profile to my local build and reproduced the crashing using current m-c. Applying your patch seems to fix it.
Comment on attachment 8635730 [details] [diff] [review]
Feature-check for the Places API in nsPermissionManager migration code

Removed the previous hack, applied this one, rebuilt and Gecko still boots. Since it booted previously with the older patch, I'm not sure I'm testing it properly though.
Attachment #8635730 - Flags: feedback?(lissyx+mozillians) → feedback+
Status: NEW → ASSIGNED
Component: General → DOM
Product: Firefox OS → Core
Summary: Gecko not booting at all on current master → Crash in Fennec and FirefoxOS due to unguarded access to nsINavHistoryService in nsIPermissionManager
Version: unspecified → Trunk
tracking-fennec: --- → 42+
Summary: Crash in Fennec and FirefoxOS due to unguarded access to nsINavHistoryService in nsIPermissionManager → Crash in Fennec and FirefoxOS due to unguarded access to nsINavHistoryService in nsPermissionManager
Duplicate of this bug: 1185265
Duplicate of this bug: 1185336
Crash Signature: [@ UpgradeHostToOriginAndInsert ]
[Tracking Requested - why for this release]: This is a pretty serious regression and should be tracked.  See comment 0.
Duplicate of this bug: 1185272
https://hg.mozilla.org/mozilla-central/rev/176f0a5ce9c7
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Is there a way to test this? A reproducible crash on start seems like something the CI should have caught.
(In reply to Fred Wenzel [:wenzel] from comment #31)
> Is there a way to test this? A reproducible crash on start seems like
> something the CI should have caught.

It only happened with a old profile which needed to be migrated, and for some reason didn't occur in XPCShell tests (because I think they are run in a different context than normal startups). I'm not sure what the best way to test that would be.
Verified that this is fixed in latest nightly, by manually installing latest from http://nightly.mozilla.org/ over my existing Nightly, which had been startup-crashing with this bug's crash signature. No more startup crashes.
Status: RESOLVED → VERIFIED
(In reply to Fred Wenzel [:wenzel] from comment #31)
> Is there a way to test this? A reproducible crash on start seems like
> something the CI should have caught.

Hmm, so I was thinking about how to test this.  I think we should be able to simulate the nav history service not existing by doing something like the MockFilePicker does.  Specifically, we could register a new factory similar to <https://dxr.mozilla.org/mozilla-central/source/testing/specialpowers/content/MockFilePicker.jsm#22> which in its createInstance function, it just throws an error unconditionally.  That should help reproduce the nav history service not existing at all in an xpcshell test.

Michael, can you please have it a go?  If that trick works, we probably want a separate migration test with such as MockNavHistoryService next to the existing one.  Thanks!
Flags: needinfo?(michael)
Whiteboard: [dogfood-blocker]
This is a copy of one of the tests from bug 1186034 except with the nsINavHistoryService shimmed such that attempts to access it will fail. It is intended to help detect problems such as this crash.
Attachment #8637434 - Flags: review?(ehsan)
I'm not sure if this is the right place to put the test, so I can create a new bug if I need to.
Flags: needinfo?(michael)
Comment on attachment 8637434 [details] [diff] [review]
Test migration logic in the non-presence of a nsINavHistoryService

Review of attachment 8637434 [details] [diff] [review]:
-----------------------------------------------------------------

::: extensions/cookie/test/unit/test_permmanager_migrate_4-7_no_history.js
@@ +161,5 @@
> +    Cc['@mozilla.org/browser/nav-history-service;1'].getService(Ci.nsINavHistoryService);
> +    do_check_true(false, "There shouldn't have been a nsINavHistoryService");
> +  } catch (e) {
> +    do_check_true(true, "There wasn't a nsINavHistoryService");
> +  }

Please move this check to the beginning of run_test().
Attachment #8637434 - Flags: review?(ehsan) → review+
Moved the navHistoryService check to the top of the test
Attachment #8637434 - Attachment is obsolete: true
Please check in Attachment #8637434 [details] [diff] after bug 1186034 is checked in.
Keywords: checkin-needed
(In reply to Michael Layzell [:mystor] from comment #39)
> Please check in Attachment #8637434 [details] [diff] [diff] after bug 1186034 is
> checked in.

oops, Attachment #8637530 [details] [diff] not #8637434
can you attach a try run, they are mandatory for checkin-needed bugs per https://groups.google.com/forum/#!topic/mozilla.dev.platform/9w0-Bh_3vVI
Flags: needinfo?(michael)
Keywords: checkin-needed
I should stop marking bugs as checkin-needed in the evening. Here's a fresh try run for the patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e547614f79a7
Flags: needinfo?(michael)
Duplicate of this bug: 1187621
Depends on: 1190418
Tracking in 42 because regression.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.