Beta and Aurora lock up when using a profile that's recently used Nightly

RESOLVED FIXED

Status

()

RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: mconley, Unassigned)

Tracking

Trunk
x86
All
Points:
---

Firefox Tracking Flags

(firefox18+ verified, firefox19+ fixed)

Details

Attachments

(2 attachments)

I don't have further information about this, and I'm about to head home, but I want to get it down:

I've been using Nightly primarily for the past few months - though I frequently shift back and forth between Beta and Aurora (using the same profile) to test things, etc.

Starting yesterday, or perhaps the day before, I've noticed that if I shift to Beta or Aurora from Nightly, Firefox locks up on start-up.

I haven't been able to repro on my Linux machine, so I can't attach gdb - and debugging on Windows is not something I know how to do yet, so no dice there.

I know a few others have been experiencing this too (both Windows and OSX) - I've CC'd them on this bug.
Created attachment 676780 [details]
sample of hanging Aurora build

FWIW, here's a sample from Activity Monitor when my local (debug) build from the Aurora tree hangs on startup.
(Reporter)

Comment 2

6 years ago
Hey Felipe - I'm Cc'ing Hilary, who's also been hitting this one for the past few days.

Let us know if there's any other data you need from us. Thanks!
This matches what I saw last week: Stuck somewhere under ReadAnnotationEntry. Interestingly, the function is gone in nightlies. It was removed in bug 789224.
Looking briefly, I'm guessing the format of nsPrincipal data changed in object streams?
Since bug 789224's patch 4 appears to be the bug which causes this hang, I'm going to move this to a component where we can maybe get some more visibility to this bug than an untriaged component.
Blocks: 789924
Component: Untriaged → Security: CAPS
Product: Firefox → Core
Blocks: 789224
No longer blocks: 789924
Hrm, that's bad. Bug 789224 just removed the code that read/wrote principals as prefs. So the Nightly builds just wouldn't be reading/writing in this way at all. I'm surprised that breaks stuff, but I don't know that code much. I just removed it. :-)

Can someone confirm that this bug can in fact be triggered by using the same profile for a build before+after bug 789224?
Keywords: qawanted
> Can someone confirm that this bug can in fact be triggered by using the same
> profile for a build before+after bug 789224?

Yeah I bisected that. With a profile saved with a current nightly, older builds lock up until http://hg.mozilla.org/mozilla-central/rev/d7c0dc5df119, and after that cset they don't.
BTW you don't need an entire profile, just the sessionstore.js file. And the hang happens when this is called from SessionStore.jsm: http://mxr.mozilla.org/mozilla-central/source/browser/components/sessionstore/src/SessionStore.jsm#3419
Oh, I see! This isn't related to reading/writing principals from prefs. This is related to the implementation of nsISerializable, which is used by session store.

Do we have a general story for what happens when the layout of a serializable object changes? I know that Java fixes this with a kSerialVersionID or something of the sort. Do we have a way to indicate that the format has changed?
No, we do not.  If you plan to change the layout of serializables, you have to include some sort of version identifier in the serialization (or out-of-band) yourself. I really wish I had for nsPrincipal.  :(
One option is to change the serialization on trunk in such a way that older build will just fail to deserialize when reading it...
Actually, I bet we can just rev the CID on nsPrincipal, which should cause ReadObject to abort on the do_CreateInstance call, before any of the sensitive stuff has been read. Let me put together a patch.
Created attachment 689089 [details] [diff] [review]
Rev the CID on nsPrincipal. v1

Theoretically, this patch should do the trick. However, it would need to be applied to Nightly, aurora, and beta at this point, since those branches all contain the new format. The incompatibility in data formats only lasts until the next branch.

I'm wondering, though. Could this be causing a much wider degree of corruption in session history? What happens when users upgrade? We might well want to take this patch just to be safe.

Can someone who has reproduced the problem confirm that this patch does the trick?
Attachment #689089 - Flags: review?(bzbarsky)
Comment on attachment 689089 [details] [diff] [review]
Rev the CID on nsPrincipal. v1

r=me
Attachment #689089 - Flags: review?(bzbarsky) → review+
I've verified that this fixes the problem here. I can't reliably start a new session and end up with corrupt data, but I do have a copy of a sessionstore.js file known to cause the problem, so here's what I did to test:

1 - copy that file to the profile
2 - start nightly built from current m-c tip and close it so that hopefully the same data is serialized and written again
3 - start firefox 17 release: it hangs

and then I redid it replacing 2 with a build from m-c tip + this patch, and firefox 17 doesn't hang.



This means there will be some potential dataloss when going with a session from 18+ to 17, right? Which is probably fine I believe (not the entire session is lost, just some binary-encoded parts)
> This means there will be some potential dataloss when going with a session from 18+ to 17,

And vice versa, when going from 17 to 18+.

> not the entire session is lost, just some binary-encoded parts

Yes.  What's lost is the security context of about:blank, data:, wyciwyg:, and javascript: documents.  Basically, it would reintroduce bug 389274 across this move.  I suspect this is acceptable, personally, given the other options here.
Comment on attachment 689089 [details] [diff] [review]
Rev the CID on nsPrincipal. v1

Ok, given the confirmation in comment 15, I'm flagging this for m-a and m-b approval. We'll want to land it everywhere simultaneously to avoid unnnecessary breakage of people moving between m-c, m-a, and m-b builds.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 789224

User impact if declined: Using the same profile for Release and {Nightly,Aurora,Beta} can cause data corruption in both directions. In particular, principals can end up getting corrupted, which is bad. The current user-visible impact of this change is a hang. This has the potential to corrupt data for users moving their profile from 17 to 18, which is exactly what our entire worldwide audience will do when they upgrade after the next release.

Note that this will cause the data loss bz mentions in comment 16. But this kind of well-understood, localized data loss is much preferable to poorly-understood data corruption, which could lead to hangs and security vulnerabilities.

Testing completed (on m-c, etc.): None, unfortunately. The CID gets written out for session-store, so forcing an incompatibility with a CID rev on Nightly (but not on Aurora or Beta) will cause artificial incompatibilities for users using the same profile for Nightly and {Aurora,Beta}, even though the data format is identical. Given that this is a pretty well-understood change, it seems preferable to land this on all 3 branches at once.
 
Risk to taking this patch (and alternatives if risky): Least risk of any option, IMO.
String or UUID changes made by this patch: None. CID change, but bz says that's ok.
Attachment #689089 - Flags: approval-mozilla-beta?
Attachment #689089 - Flags: approval-mozilla-aurora?
(In reply to Bobby Holley (:bholley) from comment #17)
> Note that this will cause the data loss bz mentions in comment 16. But this
> kind of well-understood, localized data loss is much preferable to
> poorly-understood data corruption, which could lead to hangs and security
> vulnerabilities.

(In reply to Boris Zbarsky (:bz) from comment #16)
> Yes.  What's lost is the security context of about:blank, data:, wyciwyg:,
> and javascript: documents.  Basically, it would reintroduce bug 389274
> across this move.  I suspect this is acceptable, personally, given the other
> options here.

In https://bugzilla.mozilla.org/show_bug.cgi?id=389274#c0 I see "The text that was in the editor is NOT restored and you cannot type anything in the editor". Is there any way to prevent that? When will the user be able to type something into an affected editor again?
tracking-firefox18: --- → +
tracking-firefox19: --- → +
(In reply to Alex Keybl [:akeybl] from comment #18)
> In https://bugzilla.mozilla.org/show_bug.cgi?id=389274#c0 I see "The text
> that was in the editor is NOT restored and you cannot type anything in the
> editor". Is there any way to prevent that? When will the user be able to
> type something into an affected editor again?

IIUC, the issue there is that the appropriate permissions for the page to be in design mode aren't restored. The user will probably have to manually retoggle design mode.

But this all assumes that the user has a page in design mode (uncommon) in their session history (even more uncommon).
> Is there any way to prevent that?

Not really.

> When will the user be able to type something into an affected editor again?

After hitting the reload button on that page.
Comment on attachment 689089 [details] [diff] [review]
Rev the CID on nsPrincipal. v1

Thanks for the explanation - this risk is manageable (and much better than the alternative as bholley points out).
Attachment #689089 - Flags: approval-mozilla-beta?
Attachment #689089 - Flags: approval-mozilla-beta+
Attachment #689089 - Flags: approval-mozilla-aurora?
Attachment #689089 - Flags: approval-mozilla-aurora+
Also assigning juanb as the qacontact here to try to find unexpected/critical regressions when moving between Firefox versions (17->18 specifically) and session restore.
QA Contact: jbecerra
https://hg.mozilla.org/mozilla-central/rev/5ed8e209a05c
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Started 2012-10-30 Nightly with a new profile. 

I made this profile dirty with lots of history, bookmarks, add-ons, tabs, pinned tabs, plugin content loaded. I set "When Firefox starts: Show my windows and tabs from last time". Closed Nightly.

With this dirty profile, I started then Firefox 17.0.1 and it did not respond on loading tabs from previous session. After 1-2 minutes I forced a quit because Firefox was still not responding. Same behavior for Firefox 16.0.2.

I started Firefox 18.0 Beta 3 with the same dirty profile and the tabs were opened quickly and Firefox was working normally. 

Verified on Windows 7 x86 and Mac OS X 10.8.
Verified also a dirty profile from Latest Nightly to Firefox 18.0 Beta 3 and worked normally.
status-firefox18: fixed → verified
Keywords: qawanted
You need to log in before you can comment on or make changes to this bug.