Closed Bug 1055766 Opened 5 years ago Closed 5 years ago

new @ JS::Handle<JSObject*>::operator JSObject* const&() startup crash in firefox 31

Categories

(Core :: JavaScript Engine, defect, critical)

31 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

VERIFIED FIXED
Tracking Status
firefox-esr31 34+ verified

People

(Reporter: philipp, Assigned: sfink)

References

()

Details

(Keywords: crash)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-18a48b35-0ac4-4e57-b89d-a0b992140819
=============================================================

this startup crash appears to be a regression in firefox 31. 
related sumo forum thread: https://support.mozilla.org/questions/1016155
Component: Untriaged → JavaScript Engine
Keywords: crash
Product: Firefox → Core
Crash Signature: @ JS::Handle<JSObject*>::operator JSObject* const&() → [@ JS::Handle<JSObject*>::operator JSObject* const&()]
I would be interested if this happens in 32 as well, we have no crash reports for this signature there atm. Could we ask affected people if they can check with Beta?
based on the feedback by the two affected users this can be addressed by a clean reinstall and manually deleting the program folder beforehand.

one user was kind enough to restore his deleted program folder and it appears that there was a chrome.manifest file in place registering various components. in this setup firefox 30 and lower would start successfully, 31 crashes with the signature above and subsequent versions will fail to launch silently without generating a crash report at all.
Status: UNCONFIRMED → RESOLVED
Closed: 5 years ago
Resolution: --- → INVALID
We are still seeing this on ESR this seems like a valid bug because cruft is being left behind. Will the solution described in Comment 3 does work the problem with ESR is we have orgs that have thousands of machines and removing cruft manually would be difficult as would deploying a fresh 31.2.0 install.

Thoughts on this?
Flags: needinfo?(kairo)
this appears to be a pretty similar issue than bug 1063052 was for ff31. i still have the zipped-up left overs of the program folder after an uninstall from an affected user, which will cause 31 to crash at each startup in case this can be helpful for debugging...
corr: similar issue than bug 1063052 was for ff32
I agree with Philipp, it does look the same.
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: INVALID → ---
[Tracking Requested - why for this release]:
(In reply to Sylvestre Ledru [:sylvestre] from comment #7)
> I agree with Philipp, it does look the same.

Sounds reasonable to me. Did we change something that triggers the same kind of incompatibility?
Flags: needinfo?(kairo)
(In reply to Henrik Skupin (:whimboo) from comment #8)
> [Tracking Requested - why for this release]:

We believe this crash to actually be Bug #1063052 and I suspect this bug is a dupe of that. Can you confirm it is not? We just sent out a plan to take a patch from Bug #1063052 in order to resolve crashes upon update to 31.2.0.

Since we will GTB early next week we would need to know soon.
Flags: needinfo?(hskupin)
Well, if it is a dupe, we should mark it as such. If we are not fully sure, we should get the patch(es) on the other bugs backported and recheck this bug afterward? Personally I have not seen this crash, so I cannot give a qualified answer nor can I check if the landed patches fixed the problem.
Flags: needinfo?(hskupin)
Benjamim,

Do you have any thoughts on this and it being related to Bug #1063052
Flags: needinfo?(benjamin)
It should be pretty easy to tell whether this is the same kind of issue as bug 1063052 by unpacking omnijar and doing an upgrade. I really couldn't tell you; I think you need to get QA to try it.
Flags: needinfo?(benjamin)
Florin,

Can you reproduce this?
Flags: needinfo?(florin.mezei)
I've tried to reproduce this issue using scenario from https://bugzilla.mozilla.org/show_bug.cgi?id=1063052#c83, and I do get a startup crash after performing a pave-over upgrade from 24.6.0/24.8.1 to 31.1.0/31.1.1/31.2.0, but it's got a different signature: [@ nsGlobalWindow::GetOuterWindowInternal() ]. See for example: https://crash-stats.mozilla.com/report/index/68f09807-5427-41a4-9e55-1331d2141021. Given that it's a different signature, I'm not quite sure whether it's related to this or not.

Testing environment: 
Windows 7 x64

Testing scenario:
1. Install Firefox 24.6.0 or 24.8.1 cleanly in a new empty folder.
2. In the installation folder rename omni.ja to omni.zip.
3. Using Windows Explorer extract all contents of omni.zip to the installation folder.
4. Rename omni.zip back to omni.ja.
5. Download one of the affected builds (31.1.0, 31.1.1, or 31.2.0) and choose to install it at the same location as step #1, thus performing a pave-over upgrade.
6. When upgrade completes, choose to launch Firefox.

Results:
24.8.1 -> 31.1.0 – [@ nsGlobalWindow::GetOuterWindowInternal() ]
24.8.1 -> 31.1.1 - [@ nsGlobalWindow::GetOuterWindowInternal() ]
24.8.1 -> 31.2.0 - [@ nsGlobalWindow::GetOuterWindowInternal() ]
24.6.0 -> 31.1.0 – [@ nsGlobalWindow::GetOuterWindowInternal() ]
24.6.0 -> 31.1.1 - [@ nsGlobalWindow::GetOuterWindowInternal() ]
24.6.0 -> 31.2.0 - [@ nsGlobalWindow::GetOuterWindowInternal() ]

Notes:
- there is no crash when updating from 31.1.0 or 31.1.1 to 31.2.0 (via About->Help or Pave-Over)
- I could not test upgrading through About->Help, since Firefox (24.6.0 or 24.8.1) said that it's up to date
Flags: needinfo?(florin.mezei)
Naveed,

Now that we have a reproduce can we have someone assigned?
Flags: needinfo?(naveed)
(In reply to Florin Mezei, QA (:FlorinMezei) from comment #15)
> - I could not test upgrading through About->Help, since Firefox (24.6.0 or
> 24.8.1) said that it's up to date

You could use the "app.update.url.override" pref and let it point to a locally served update snippet (XML file), e.g. via Apache. You can get the content of it from this update snippet, which will update the 24.x build to 31.2.0esr:

https://aus3.mozilla.org/update/3/Firefox/31.0/20140717132905/WINNT_x86-msvc/en-US/esr/Windows_NT%205.1/default/default/update.xml?force=1
(In reply to Henrik Skupin (:whimboo) from comment #17)
> You could use the "app.update.url.override" pref and let it point to a
> locally served update snippet (XML file), e.g. via Apache. You can get the
> content of it from this update snippet, which will update the 24.x build to
> 31.2.0esr:
> 
> https://aus3.mozilla.org/update/3/Firefox/31.0/20140717132905/WINNT_x86-msvc/
> en-US/esr/Windows_NT%205.1/default/default/update.xml?force=1

Thank you Henrik for the info! However, given our extremely tight schedule, I do not intend to spend QA time doing this unless we consider we really need to. So please let us know if you think this is needed, and with what priority.
Steve, could you take a look at this? It appears to be tripping up in StructuredClone -- I guess we need to be more defensive when reading data from an older blob?
Flags: needinfo?(sphink)
Actually, these STR give a totally unrelated crash with no JS in the stack, so there's not much we can do here. Can we find an owner in Gecko for this?
Flags: needinfo?(sphink) → needinfo?(nihsanullah)
I believe this bug is two separate problems. Comment 0 is a standard action-at-a-distance GC marking crash. Comment 1 shows this same crash repeatedly. Comment 3 describes two users with the "same" crash, which then turns out to be reproducible via the same steps as bug 1063052 and most likely *is* that bug. It appears that the same steps to fix the problem from bug 1063052 (delete the program folder, reinstall) fixed the original bug here. Though given that this is a GC bug, it may just be because it perturbed the timing enough.

Also, the top frame on the stack (in StructuredClone.h) is suspicious. It appears to point to JSAutoStructuredCloneBuffer::clear(), which should not be called during marking, and the function in the stack trace is something that I could easily imagine being coalesced across multiple copies. So it's very possible that it's just bogus.

I don't think there's anything we can do with the original bug here (from comments 0 and 1) without STR or more occurrences.
philipp: You said in https://support.mozilla.org/en-US/questions/1016988#answer-620374 that you were able to replicate the crash. Do you have a stack or crash report url for one of your crashes?
Flags: needinfo?(nihsanullah)
Flags: needinfo?(naveed)
Flags: needinfo?(madperson)
hello, this is a crash report i just produced: https://crash-stats.mozilla.com/report/index/4f3d1dca-2c47-4a39-a1e3-745fe2141023

as mentioned in comment 5, i also have the leftover files of the program folder available in case you would need them for further inspection/debugging (they look like the extracted contents of omni.ja). they are too big to attach to bugzilla, but i could submit them via mail...
Flags: needinfo?(madperson)
(In reply to philipp from comment #23)
> hello, this is a crash report i just produced:
> https://crash-stats.mozilla.com/report/index/4f3d1dca-2c47-4a39-a1e3-
> 745fe2141023
> 
> as mentioned in comment 5, i also have the leftover files of the program
> folder available in case you would need them for further
> inspection/debugging (they look like the extracted contents of omni.ja).
> they are too big to attach to bugzilla, but i could submit them via mail...

Thank you. That is indeed the crash from comment 0. It seems everything you said in comment 3 was accurate.

However, it also sounds as if this was fixed in Firefox 32. So unless someone sees a reason not to, I am re-resolving it as philipp did after comment 3.
Assignee: nobody → sphink
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → INVALID
Bleh. I can't read. "...and subsequent versions will fail to launch silently without generating a crash report at all." does not mean "fixed".
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
(In reply to philipp from comment #23)
> as mentioned in comment 5, i also have the leftover files of the program
> folder available in case you would need them for further
> inspection/debugging (they look like the extracted contents of omni.ja).
> they are too big to attach to bugzilla, but i could submit them via mail...

Yes, please. Can you send them to sphink@gmail.com? Thanks.
ni? philipp for a copy of the wreckage.
Flags: needinfo?(madperson)
Never mind, philipp. I can reproduce the (correct) crash by:

1. install esr24.8.1
2. run it and exit
3. unpack omni.ja in the install directory (note: this omni.ja does not have a defaults/preferences dir. That seems to come from esr31.)
4. run it again. Go to about:config and add a new preference app.update.url.override=http://10.0.0.7/~sfink/update.xml
5. Save https://aus3.mozilla.org/update/3/Firefox/31.0/20140717132905/WINNT_x86-msvc/en-US/esr/Windows_NT%205.1/default/default/update.xml?force=1 to the above url. Download the referenced esr31 build into a local file, update the update.xml file to point to the local file. (This is just because it takes a long time to pull it down over the network from where I am.)
6. To to help/About Firefox and do the update
7. Allow it to restart

That gives me a startup crash with the operator JSObject* const&() stack. \o/

Now to figure out how to pull down symbols and things...
Flags: needinfo?(madperson)
From looking at the disassembly, it looks like I was right -- the association with structured clone is spurious. The top frame should actually be |JS::ObjectPtr::operator JSObject*&()| or something like that. At any rate, it is crashing when accessing mJSProtoObject in XPCWrappedNativeProto::TraceSelf: http://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/xpcprivate.h#1861

The non-inlined stack would be something like

ObjectPtr::operator JSObject*()  http://dxr.mozilla.org/mozilla-central/source/js/public/RootingAPI.h#1250
XPCWrappedNativeProto::TraceSelf   http://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/xpcprivate.h#1861
XPCWrappedNative::TraceInside  http://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/xpcprivate.h#2185
MarkWrappedNative  http://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCWrappedNativeJSOps.cpp#611
XPCWrappedNative::Trace  http://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCWrappedNativeJSOps.cpp#617

Those urls might not quite line up, and will change over time, but close enough.

So I think this means either the mJSProtoObject or the whole wrapper is null, probably the latter.
Bobby, can you tell what's going on here?
Flags: needinfo?(bobbyholley)
We dug through the assembly in a live run and it looks like XPConnect is assuming that if the low bit (XPC_SCOPE_TAG) is set on mMaybeProto, then the proto is non-null. The crash is when dereferencing (mMaybeProto&3)->mJSProtoObject.

Neither Steve nor I know enough about XPConnect's invariants to know if this is correct, or if allowing one of these objects to continue existing will cause massive problems later.
(In reply to Terrence Cole [:terrence] from comment #32)
> We dug through the assembly in a live run and it looks like XPConnect is
> assuming that if the low bit (XPC_SCOPE_TAG) is set on mMaybeProto, then the
> proto is non-null.

I assume you mean _not_ set?

We have a union of mMaybeScope and mMaybeProto, only one of which is valid at a given time. We tag the union via the bottommost bit of the member. If it's set we have a scope, otherwise we have a proto. See xpcprivate.h.

Naively it would make sense for HasProto() to also null-check its member (in addition to verifying that the member isn't a scope), but I think the idea is that it's never supposed to be null. Barring memory corruption, the only way I can see for it to be null is if XPCWrappedNative::Destroy was invoked. Can you verify that? It would be especially helpful to dump the values of all of the other XPCWrappedNative members (there aren't many).

The two callsites for Destroy are ~XPCWrappedNative and FlatJSObjectFinalized. In either case we shouldn't be invoking Trace() on the XPCWrappedNative afterwards, but I'd be very curious to see _why_ we're tracing the XPCWrappedNative. Steve, can you add a few more stack frames to the ones in comment 31?
Flags: needinfo?(bobbyholley)
If I remove the chrome.manifest file, then I do not see this crash.

It looks like this is a shutdown crash. I'm not sure exactly what it hit that caused it to decide to shut down, but it seems like it does it at a bad time (when the state here is inconsistent or something.)

So it looks like the backport(s) from bug 1063052 should be enough to resolve this issue. I don't know if the bug where we get into this state is still present in current versions or not.

(In reply to Bobby Holley (:bholley) from comment #33)
> The two callsites for Destroy are ~XPCWrappedNative and
> FlatJSObjectFinalized. In either case we shouldn't be invoking Trace() on
> the XPCWrappedNative afterwards, but I'd be very curious to see _why_ we're
> tracing the XPCWrappedNative. Steve, can you add a few more stack frames to
> the ones in comment 31?

Yes, the ones in the comment were just my manually-uninlined ones. The VS reported stack is:

	mozjs.dll!JSAutoStructuredCloneBuffer::data()  Line 184	C++
 	xul.dll!XPCWrappedNative::Trace(JSTracer * trc=0x0490b20c, JSObject * obj=0x05dc5040)  Line 597 + 0x82 bytes	C++
 	mozjs.dll!JS_GlobalObjectTraceHook(JSTracer * trc=0x0490b20c, JSObject * global=0x05dc5040)  Line 2557 + 0x8 bytes	C++
 	mozjs.dll!js::GCMarker::processMarkStackTop(js::SliceBudget & budget={...})  Line 1555 + 0x2bb bytes	C++
 	mozjs.dll!js::GCMarker::drainMarkStack(js::SliceBudget & budget={...})  Line 1609	C++
 	mozjs.dll!IncrementalCollectSlice(JSRuntime * rt=0x0490b000, __int64 budget=0x0000000000000000, JS::gcreason::Reason reason=API, js::JSGCInvocationKind gckind=GC_NORMAL)  Line 4404 + 0x25 bytes	C++
 	mozjs.dll!GCCycle(JSRuntime * rt=0x00000000, bool incremental=false, __int64 budget=0x0000000000000000, js::JSGCInvocationKind gckind=GC_NORMAL, JS::gcreason::Reason reason=API)  Line 4565 + 0x12 bytes	C++
 	mozjs.dll!Collect(JSRuntime * rt=0x00000000, bool incremental=false, __int64 budget=0x0000000000000000, js::JSGCInvocationKind gckind=GC_NORMAL, JS::gcreason::Reason reason=API)  Line 4698 + 0x23 bytes	C++
 	mozjs.dll!js::GC(JSRuntime * rt=0x00000000, js::JSGCInvocationKind gckind=GC_NORMAL, JS::gcreason::Reason reason=API)  Line 4730	C++
 	mozjs.dll!JS_GC(JSRuntime * rt=0x0490b000)  Line 1891 + 0xd bytes	C++
 	xul.dll!nsXREDirProvider::DoShutdown()  Line 866 + 0x9 bytes	C++
 	xul.dll!ScopedXPCOMStartup::~ScopedXPCOMStartup()  Line 1198	C++
 	xul.dll!ScopedXPCOMStartup::`scalar deleting destructor'()  + 0x8 bytes	C++
 	xul.dll!XREMain::XRE_main(int argc=0x00000000, char * * argv=0x00221270, const nsXREAppData * aAppData=0x003cf858)  Line 4118	C++
 	xul.dll!XRE_main(int argc=0x00000003, char * * argv=0x00221270, const nsXREAppData * aAppData=0x003cf858, unsigned int aFlags=0x00000000)  Line 4300 + 0xd bytes	C++
        (boring main() stuff elided)
Yes, so that would imply that we're invoking the trace hook on a global after we've already invoked finalize.
I reproduced the original crash using the steps from comment #28. Once the crash was reproduced, fx crashed every single time I re-opened the application:

JS::Handle<JSObject*>::operator JSObject* const&():
- https://crash-stats.mozilla.com/report/index/0fd10387-7e29-4fd4-849f-7cfd92141029
- https://crash-stats.mozilla.com/report/index/2d42a2b4-09ca-4af0-ab48-8764a2141029
- https://crash-stats.mozilla.com/report/index/cbb33c9e-815d-4172-ab19-9a8432141029

Build: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-esr31-win32/1414605546/

I'm trying to verify the fix but it looks like the tinderbox build listed above doesn't include a .mar file. I tried looking around and found two wiki pages that explain how to create full/partial .mar files but I couldn't figure it out after a few hours of attempts. I'm not sure how I can verify that the issue has been fixed using the STR from comment # 28 without a complete mar file. Documents used:

- https://wiki.mozilla.org/Software_Update:HowToManuallyGenerateMARFiles
- https://wiki.mozilla.org/UpdateGeneration

However I did go through the pave-over upgrade method mentioned in comment #15 which still caused issues once the tinderbox build was installed over 24.8.1. Now I'm not certain if this is valid/correct method of testing as you're replacing a release build with a tinderbox nightly build. Crashes:

- https://crash-stats.mozilla.com/report/index/6a0dc682-6f98-4ddd-b9c6-a5ada2141030
- https://crash-stats.mozilla.com/report/index/e042e204-fff5-463d-91fc-792572141030

Please let me know if I'm going about this the wrong way and point me in the right direction if so.. Shouldn't take long to verify as I have everything already setup.
(In reply to Kamil Jozwiak [:kjozwiak] from comment #36)
> I'm trying to verify the fix but it looks like the tinderbox build listed
> above doesn't include a .mar file. I tried looking around and found two wiki
> pages that explain how to create full/partial .mar files but I couldn't
> figure it out after a few hours of attempts. I'm not sure how I can verify
> that the issue has been fixed using the STR from comment # 28 without a
> complete mar file. Documents used:
> 
> - https://wiki.mozilla.org/Software_Update:HowToManuallyGenerateMARFiles
> - https://wiki.mozilla.org/UpdateGeneration

That's about as far as I got. I haven't tried generating my own mar.

> However I did go through the pave-over upgrade method mentioned in comment
> #15 which still caused issues once the tinderbox build was installed over
> 24.8.1. Now I'm not certain if this is valid/correct method of testing as
> you're replacing a release build with a tinderbox nightly build. Crashes:

Cool, I wasn't able to get a crash with a pave-over upgrade. But my testing environment is suspicious (eg my hard drive is dying), and I'm not sure if I really ran that scenario correctly.

> Please let me know if I'm going about this the wrong way and point me in the
> right direction if so.. Shouldn't take long to verify as I have everything
> already setup.

Your testing sounds good to me. But I think this is a shutdown crash that we'll avoid via the bug 1063052 backport. Which is not to say there isn't a separate bug here in our shutdown finalization handling; I strongly suspect there is. If you have a fix for *that*, that would be great, though I think the bug 1063052 fix is enough to unblock updates. What is your fix?
Right now I think we only have nightly builds available for the latest fx-31.2.0esr which is making this difficult to test via updating as it's on the Nightly channel and 24.8.1 is on another channel. I tried using the complete .mar from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-10-30-00-05-01-mozilla-esr31/ but that didn't work:

- unpacked omni.jar into the Mozilla Firefox installation path
- changed app.update.url.override;http://kamiljozwiak.com/update.xml (replaced all the checksums with the correct values)
- attempted to update 24.8.1 -> 31.2.0

[14:03:31.725] AUS:SVC Downloader:onStopRequest - attempting to stage update: Firefox 31.2.0esr
[14:03:32.734] AUS:SVC readStatusFile - status: failed: 19, path: C:\Users\Kamil Jozwiak\AppData\Local\Mozilla\updates\E7CF176E110C211B\updates\0\update.status
[14:03:32.753] AUS:SVC handleFallbackToCompleteUpdate - install of complete or only one patch offered failed.
[14:03:32.762] AUS:SVC UpdateManager:refreshUpdateStatus - Notifying observers that the update was staged. state: failed, status: failed: 19

I need a complete .mar file that was created after 2014-10-29 (bug # 1063052 comment # 112) that will work with the released version of 24.8.1 and 24.6.0.

Is there a way to get around this or am I doing something wrong?
The only way to properly verify this using actual esr bits would be with a mar file generated by releng for esr. Since we have plenty of tests that verify the functionality that this uses and this has been verified on other channels verifying this seems like a lot of effort with little return.

As for the pave over case with the installer, that was done in bug 1070988 and I don't think that was rolled up into the patch that landed on esr.
We have went ahead and gone to build with a dummy release to test whether the fix works.
I think we could have done this with an old enough nightly build, and set the update channel to nightly-esr31 (instead of nightly-esr24). That would have matched signing certs, and avoided the mismatch in comment 38. C'est la vie.
Once bug 1093313 is fixed a channel change will not be necessary. Just take an old nightly build, set up the bad files and check for updates.
Used the following nightly builds for verification:

- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/08/2014-08-01-00-05-01-mozilla-esr24/firefox-24.7.0esrpre.en-US.win32.installer.exe
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/09/2014-09-02-00-05-03-mozilla-esr24/firefox-24.7.0esrpre.en-US.win32.installer.exe
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/09/2014-09-24-00-05-05-mozilla-esr24/firefox-24.8.0esrpre.en-US.win32.installer.exe
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/10/2014-10-01-00-05-02-mozilla-esr24/firefox-24.8.0esrpre.en-US.win32.installer.exe

- extracted the contents of omni.jar into the Mozilla installation directory
- Updated to 31.2.0esr (BuildID: 20141104000504 changeset: d38075a0a0f6)
- ensured that there weren't any crashes/issues during or after the update process
- re-opened fx several times after updating and ensured it opened without any problems
- restarted the machine and re-opened fx a few times ensuring there weren't any issues
- visited several websites and ensured everything was working correctly

Nick, I couldn't find any 24.8.1 nightly builds.. Perhaps I'm looking in the wrong area?? Updating using the above builds worked without any issues and didn't produce any crashes during or after the update.
Flags: needinfo?(nthomas)
The last esr24 nightly I can find is at 
 http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/10/2014-10-16-00-05-01-mozilla-esr24/
and that's still 24.8.0. There is a known issue where the nightly version isn't bumped after a new release is out (bug 902738) so that's not unexpected. If you look in about:buildconfig (or application.ini) for the actual code revision, you'll probably find 9cd2ab4d0029. That's what the 24.8.1 release was based off, plus a thunderbird version bump (see http://hg.mozilla.org/releases/mozilla-esr24/graph for more info).

Are we happy that this is a reliable bug to reproduce ? Would it help if I set up an update to a 31.x nightly build before we landed the fix, to verify we do crash ? If so, what build would I use ?
Flags: needinfo?(nthomas)
Yes, Nick that's a great idea that we should ensure the same tests being done now on the new updates crash before, so let's do an update from 24 build to 31.2.0 build (pre-landings, on nightly channel) to check. Can we go for c5ffa5e63d99 which seems to be right before the fix landed?
Flags: needinfo?(nthomas)
Agreed, I'll go through it once 24.8 points to an older version of 31.2.0 (pre-landings). I reproduced the problem in comment #36 on release and will go through the same steps on nightly.
Looks like bug 1063052 landed at rev d071f8dfe4d6, and the previous nightly was 2 pushes earlier at 3c5b07daee61 - https://tbpl.mozilla.org/?tree=Mozilla-Esr31&fromchange=a0524e2a9b06&tochange=d071f8dfe4d6

Please take windows 24.x nightly, create a new string pref app.update.url.override and set it to http://people.mozilla.org/~nthomas/update-bug1055766.xml, then check for updates.
Flags: needinfo?(nthomas)
I updated the builds listed below using the the information from comment #47 and received a crash during the update process. However I'm receiving a [@ JSAutoStructuredCloneBuffer::data()] crash rather than the original [@ JS::Handle<JSObject*>::operator JSObject* const&()] crash. These two look very similar but I'm not sure if they're the same crashes.

As per comment #43, this crash was definitely not happening when I updating to the latest 31.2.0esr which included the fix.

* http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/10/2014-10-16-00-05-01-mozilla-esr24/
--> https://crash-stats.mozilla.com/report/index/dcab844c-82d9-4777-a84d-29ed22141105

* http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/09/2014-09-24-00-05-05-mozilla-esr24/
--> https://crash-stats.mozilla.com/report/index/62477313-fa7d-4f82-914c-16e622141105

* http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/09/2014-09-02-00-05-03-mozilla-esr24/
--> https://crash-stats.mozilla.com/report/index/c0414a84-825b-4179-991f-0ec3c2141105
(In reply to Kamil Jozwiak [:kjozwiak] from comment #48)
> I updated the builds listed below using the the information from comment #47
> and received a crash during the update process. However I'm receiving a [@
> JSAutoStructuredCloneBuffer::data()] crash rather than the original [@
> JS::Handle<JSObject*>::operator JSObject* const&()] crash. These two look
> very similar but I'm not sure if they're the same crashes.
> 
> As per comment #43, this crash was definitely not happening when I updating
> to the latest 31.2.0esr which included the fix.
> 
> *
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/10/2014-10-16-00-
> 05-01-mozilla-esr24/
> -->
> https://crash-stats.mozilla.com/report/index/dcab844c-82d9-4777-a84d-
> 29ed22141105
> 
> *
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/09/2014-09-24-00-
> 05-05-mozilla-esr24/
> -->
> https://crash-stats.mozilla.com/report/index/62477313-fa7d-4f82-914c-
> 16e622141105
> 
> *
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/09/2014-09-02-00-
> 05-03-mozilla-esr24/
> -->
> https://crash-stats.mozilla.com/report/index/c0414a84-825b-4179-991f-
> 0ec3c2141105

So are we confirmed the patch does fix the original to crashes on update you earlier ran into using the STR?
Flags: needinfo?(kamiljoz)
Steve, are these the same crashes?

* [@ JS::Handle<JSObject*>::operator JSObject* const&()]
--> https://crash-stats.mozilla.com/report/index/0fd10387-7e29-4fd4-849f-7cfd92141029

* [@ JSAutoStructuredCloneBuffer::data()]
--> https://crash-stats.mozilla.com/report/index/dcab844c-82d9-4777-a84d-29ed22141105
Flags: needinfo?(kamiljoz) → needinfo?(sphink)
(In reply to Kamil Jozwiak [:kjozwiak] from comment #50)
> Steve, are these the same crashes?
> 
> * [@ JS::Handle<JSObject*>::operator JSObject* const&()]
> -->
> https://crash-stats.mozilla.com/report/index/0fd10387-7e29-4fd4-849f-
> 7cfd92141029
> 
> * [@ JSAutoStructuredCloneBuffer::data()]
> -->
> https://crash-stats.mozilla.com/report/index/dcab844c-82d9-4777-a84d-
> 29ed22141105

Yes, those are the same. I'm not sure why it sometimes picks one or the other, but it's probably just a build/link order thing. (The code for those methods is trivial and it looks like all of the trivial implementations get merged together, and the first? last? one will be reported for a crash in any of them.) The rest of the stack is the same.
Flags: needinfo?(sphink)
So it looks like the crash has been fixed when using the FX updater. I reproduced the original issue in comment # 36 and verified that it's been fixed via comment # 43. Once we verified that the original crash was fixed, we reproduced it once again in comment # 48 using the esr-nightly channel rather than the esr-release channel. However, there's still an issue when doing pave-over upgrades using the method Florin mentioned in comment #15:

* download/install either of the following builds:
** http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/09/2014-09-02-00-05-03-mozilla-esr24/
** http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/10/2014-10-16-00-05-01-mozilla-esr24/
* once installed, open/close fx
* unpacked omni.jar into the Mozilla Firefox installation path
* download/install to the same location used above: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-11-10-00-05-01-mozilla-esr31/
* run fx once the newer version is installed over the older build and FX will crash every single time it's launched

You'll receive the following crashes every single time you attempt to open FX:

[@ JSAutoStructuredCloneBuffer::data() ]
* https://crash-stats.mozilla.com/report/index/157f31cd-ad00-4309-a4b7-714712141110
* https://crash-stats.mozilla.com/report/index/18f94d42-d61c-43f9-b419-f25f62141110
* https://crash-stats.mozilla.com/report/index/7477fd9b-38b1-4457-8221-c3de22141110
* https://crash-stats.mozilla.com/report/index/b337c7fd-554d-4ba5-ae23-6b9672141110

Steve, should this also have been fixed by the patch? Is testing via the this method (comment #15) valid? I also reproduced this earlier with the tinderbox build in comment #36.
Flags: needinfo?(sphink)
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #39)
>...
> As for the pave over case with the installer, that was done in bug 1070988
> and I don't think that was rolled up into the patch that landed on esr.
(In reply to Kamil Jozwiak [:kjozwiak] from comment #52)
> Steve, should this also have been fixed by the patch? Is testing via the
> this method (comment #15) valid? I also reproduced this earlier with the
> tinderbox build in comment #36.

I haven't been following which patch does what and what has landed.

My understanding is that if you start up with a chrome.manifest (probably from unpacking omni.ja), then you'll crash. This is a bug in the shutdown code, which we have not identified and do not have a fix for, but also don't strictly need if we can ensure that chrome.manifest does not exist at startup.

One patch deletes that file during an upgrade, which prevents the bug from happening as a result of upgrading. This patch has landed and been verified.

For a pave-over upgrade, I would NOT expect the same patch to fix the problem (so I disagree with comment 53). The patch rstrong pointed to in comment 54 DOES look to me like it would solve the problem, as it would remove chrome.manifest on install. So I can only reiterate comment 54; we need to backport the patch from bug 1070988 in order to fix the pave-over install.
Flags: needinfo?(sphink)
Kamil, We have now landed bug 1070988's pave-over install patch into ESR31 (https://hg.mozilla.org/releases/mozilla-esr31/rev/4ef68b98588f) will we need to build another release or can testing be done without a new build?
Flags: needinfo?(kamiljoz)
Reproduced the original issue using the following builds:
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/10/2014-10-16-00-05-01-mozilla-esr24/
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-11-10-00-05-01-mozilla-esr31/

- once the pave over update was completed, fx crashed every time it was opened (ran through this two different times to make sure I can reproduce)

Went through verification using the following build:
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/09/2014-09-02-00-05-03-mozilla-esr24/
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/09/2014-09-24-00-05-05-mozilla-esr24/
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/10/2014-10-16-00-05-01-mozilla-esr24/
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-11-11-00-05-04-mozilla-esr31/ (latest build used as the pave over update)

Test Cases used for each of the above builds once the pave over update was applied:

- opened fx several times from the desktop, taskbar and metro menu without any issues
- restarted the machine and ensured fx still opens without any issues
- checked about:support and ensured the correct version was installed/used
- set app.update.log;true and ensured that the updater is still working (looking for updates)
- opened several websites (facebook, twitter, google, espnfc, )

When uninstalling fx, the original omni.ja files that were exported into "C:\Program Files (x86)\Nightly" are not being removed and thus leaving behind the "Nightly" folder. I installed the latest fx over these files and ensured it wasn't causing any crashes. I'll mention this issue in bug # 1070988. Everything else seems like it's working correctly. Let me know if I've missed something obvious!
Flags: needinfo?(kamiljoz)
As per bug # 1070988 comment # 20, the above problem that I mentioned is the intended behavior and therefore an issue.

Summary:

Crash via the fx updater:

- Reproduced crash via comment # 36 (release channel)
- Reproduced crash via comment # 48 (esr-nightly channel)
- Verified crash wasn't occurring when updating via comment # 43 (esr-nightly channel)

Crash via the pave over update:

- Verified via comment # 57
> As per bug # 1070988 comment # 20, the above problem that I mentioned is the
> intended behavior and therefore an issue.

* intended behavior and therefore NOT an issue.
Absent any opposition I guess we can mark this fixed?
Could someone please mark this as FIXED?
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Thanks Benjamin, marking this as verified as per comment # 58.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.