Closed Bug 1403348 Opened 7 years ago Closed 4 years ago

Crash in mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase[STALLED]

Categories

(Core :: DOM: Service Workers, defect, P2)

57 Branch
All
Windows
defect

Tracking

()

RESOLVED DUPLICATE of bug 1471720
Tracking Status
firefox-esr52 --- unaffected
firefox-esr68 --- wontfix
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- wontfix
firefox72 --- wontfix

People

(Reporter: philipp, Assigned: kmag)

References

(Blocks 1 open bug)

Details

(4 keywords)

Crash Data

Attachments

(4 files, 3 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-ce330305-b3ff-4dea-9841-484570170926.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase() 	dom/workers/ServiceWorkerRegistrar.cpp:1128
1 	xul.dll 	mozilla::dom::ServiceWorkerRegistrar::ProfileStarted() 	dom/workers/ServiceWorkerRegistrar.cpp:1026
2 	xul.dll 	mozilla::dom::ServiceWorkerRegistrar::Observe(nsISupports*, char const*, char16_t const*) 	dom/workers/ServiceWorkerRegistrar.cpp:1155
3 	xul.dll 	nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) 	xpcom/ds/nsObserverList.cpp:112
4 	xul.dll 	nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) 	xpcom/ds/nsObserverService.cpp:296
5 	xul.dll 	nsXREDirProvider::DoStartup() 	toolkit/xre/nsXREDirProvider.cpp:1036
6 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp:4530
7 	xul.dll 	XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4865
8 	xul.dll 	XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4960
9 	firefox.exe 	NS_internal_main(int, char**, char**) 	browser/app/nsBrowserApp.cpp:309
10 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:115
11 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
12 	kernel32.dll 	BaseThreadInitThunk 	
13 	ntdll.dll 	RtlUserThreadStart

this crash signature is newly showing up in firefox 57. so far 2/3 of reports come with "MOZ_RELEASE_ASSERT(client)" & 1/3 with "MOZ_RELEASE_ASSERT(svc)", that got added in bug 1399646.
In general, those should only ever be null at OOM, that doesn't seem likely in this crash report.

The only other possibility I can think of is that initialization of the async shutdown failed on the JS side, for one reason or another. If that's the case, the browser is probably not going to be usable, since any other JS code that touches the AsyncShutdown module will throw as well.

It might be a good idea instrument this code with the result values of those getters, which might give us a better idea of exactly what's going on.
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #1)
> It might be a good idea instrument this code with the result values of those
> getters, which might give us a better idea of exactly what's going on.

baku, can you try adding some instrumentation like Kris suggests? I don't see an STR here or in bug 1399646 so I guess we'll just have to watch the crash reports?
Assignee: nobody → amarchesini
Component: DOM: Workers → DOM: Service Workers
Flags: needinfo?(amarchesini)
Priority: -- → P1
Sorry, when I said "we", I meant "I" :) I don't want someone else to have to spend time fixing my crashes.
Assignee: amarchesini → kmaglione+bmo
Comment on attachment 8912937 [details]
Bug 1403348: Include details about AsyncShutdown failure in crash reason.

https://reviewboard.mozilla.org/r/184258/#review189604

::: dom/workers/ServiceWorkerRegistrar.cpp:1136
(Diff revision 1)
> +  // memory), and there's no point in continuing startup. Include as much
> +  // information as possible in the crash report.
> +  if (NS_FAILED(rv)) {
> +    if (rv == NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS) {
> +      CycleCollectedJSContext* context = CycleCollectedJSContext::Get();
> +      if (nsCOMPtr<nsIException> exn = context->GetPendingException()) {

if (context && ...
Attachment #8912937 - Flags: review?(amarchesini) → review+
Comment on attachment 8912937 [details]
Bug 1403348: Include details about AsyncShutdown failure in crash reason.

https://reviewboard.mozilla.org/r/184258/#review189606

::: dom/workers/ServiceWorkerRegistrar.cpp:1149
(Diff revision 1)
> +    }
> +
> +    nsAutoCString errorName;
> +    GetErrorName(rv, errorName);
> +    MOZ_CRASH_UNSAFE_PRINTF("Failed to get profileBeforeChange shutdown blocker: %s",
> +                            errorName.get());

have you checked if this can happen when firefox -P is executed? Or xpcshell tests?
Comment on attachment 8912937 [details]
Bug 1403348: Include details about AsyncShutdown failure in crash reason.

Requesting data review for the use of MOZ_CRASH_UNSAFE_PRINTF.

This adds an error message from a JS to crash reports. The message may contain a filename where the error occurred, but that filename should always be from system JS.
Attachment #8912937 - Flags: review?(rweiss)
Comment on attachment 8912937 [details]
Bug 1403348: Include details about AsyncShutdown failure in crash reason.

https://reviewboard.mozilla.org/r/184258/#review189606

> have you checked if this can happen when firefox -P is executed? Or xpcshell tests?

It doesn't happen when I run `firefox -P`. It can only happen in xpcshell tests if something is wrong, since we already depend heavily on the async shutdown service in them.
Flags: needinfo?(amarchesini)
Comment on attachment 8912937 [details]
Bug 1403348: Include details about AsyncShutdown failure in crash reason.

https://reviewboard.mozilla.org/r/184258/#review189970

In the future, we will try to enforce a new policy of encouraging all requests for data collection to answer a set of 8 questions.  Seeing as how this is marked critical, and since this is category 1, I'll do the review with the information provided in the bug as is.  If you have time, please clarify answers to my "not sures" in the bug thread.  Thanks.

1) Is there documentation that describes the schema for the ultimate data set available publicly, complete and accurate? 
Information about the implementation in the bug seems sufficient.

2) Is there a control mechanism that allows the user to turn the data collection on and off? (Note, for data collection not needed for security purposes, Mozilla provides such a control mechanism)  
Yes, this information is submitted only resulting from opt-in crash data.

3) If the request is for permanent data collection, is there someone who will monitor the data over time?
Not sure if this is for permanent collection.  I will assume this is temporary.  I will default to asserting that :kmag or :overholt can either claim this over time or indicate who is a better choice.

4) Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under? 
This is type 1.

5) Is the data collection default-on or default-off? 
Default on for crashes, I assume.

6) Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)? 
No.

7) Is the data collection covered by the existing Firefox privacy notice? If unsure: escalate to legal.
Yes.

8) Does there need to be a check-in in the future to determine whether to renew the data? 
Not sure, depends on whether this is permanently on or not.
Attachment #8912937 - Flags: review?(rweiss) → review+
(In reply to Rebecca Weiss from comment #9)
> In the future, we will try to enforce a new policy of encouraging all
> requests for data collection to answer a set of 8 questions.  Seeing as how
> this is marked critical, and since this is category 1, I'll do the review
> with the information provided in the bug as is.  If you have time, please
> clarify answers to my "not sures" in the bug thread.  Thanks.

Thanks. I'll keep that in mind in the future.

> 3) If the request is for permanent data collection, is there someone who
> will monitor the data over time?
> Not sure if this is for permanent collection.  I will assume this is
> temporary.  I will default to asserting that :kmag or :overholt can either
> claim this over time or indicate who is a better choice.

Yes, it is temporary. I'll remove it as soon as we understand the cause of the
crash. I may consider keeping the second MOZ_CRASH that only contains an error
code, but that's a trivial piece of information.

> 5) Is the data collection default-on or default-off? 
> Default on for crashes, I assume.

Default on for crashes that users choose to submit, yes.

> 8) Does there need to be a check-in in the future to determine whether to
> renew the data? 
> Not sure, depends on whether this is permanently on or not.

This bug will remain open until the logging is removed.
https://hg.mozilla.org/integration/mozilla-inbound/rev/772f1641588ac2fc20ff6ec5fc14df98314f4660
Bug 1403348: Include details about AsyncShutdown failure in crash reason. r=baku data-r=rweiss
Keywords: leave-open
Hey Kris, anything we can pull from the diagnostics you landed? This is currently a 57 bug marked as P1. Need to figure out whether or not this blocks 57.
Flags: needinfo?(kmaglione+bmo)
The crash rates for this bug in nightly are extremely low, and so far all of them have been for the first assertion that doesn't have additional logging. We may have to uplift to beta to get better results.
Flags: needinfo?(kmaglione+bmo)
Comment on attachment 8912937 [details]
Bug 1403348: Include details about AsyncShutdown failure in crash reason.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1399646
[User impact if declined]: This is a startup crash with significant volume on beta that we don't know the cause of. This is a diagnostic patch which should give us more information, but the crash volume in nightly is too low for us to get meaningful data in a reasonable time frame.
[Is this code covered by automated tests?]: No, this code only takes effect in catastrophic circumstances that we don't currently know how to reproduce.
[Has the fix been verified in Nightly?]: N/A
[Needs manual test from QE? If yes, steps to reproduce]: N/A 
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No
[Why is the change risky/not risky?]: This change only adds additional diagnostics in a situation where we're already crashing.
[String changes made/needed]: None
Attachment #8912937 - Flags: approval-mozilla-beta?
Comment on attachment 8912937 [details]
Bug 1403348: Include details about AsyncShutdown failure in crash reason.

Diagnostic patch to investigate a top crasher, Beta57+
Attachment #8912937 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Priority: P1 → P2
Waiting for info once more beta users get this.
OK, something very strange is going in here.

At this point, the vast majority of the crashes are MOZ_RELEASE_ASSERT(svc), which don't have additional error reporting.

The ones that crash at the second assertion, however, crash with:

Failed to get profileBeforeChange shutdown blocker: [JavaScript Error: "Services.prefs.getBoolPzef is not a function" {file: "resource://gre/modules/AsyncShutdown.jsm" line: 204}]'[JavaScript Error: "Services.prefs.getBoolPzef is not a function" {file: "resource://gre/modules/AsyncShutdown.jsm" line: 204}]' when calling method: [nsIAsyncShutdownService::profileBeforeChange]


Those all appear to be from the same user. That user also has "ZVFORT32.DLL" module loaded, which I'm pretty sure is responsible for other startup crashes as well.

I'm going to wait for some more data to see if other crashes are similarly strange, but it may make sense at this point to also add additional error reporting to the MOZ_RELEASE_ASSERT(svc) failure case.
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #19)
> Those all appear to be from the same user. That user also has "ZVFORT32.DLL"
> module loaded, which I'm pretty sure is responsible for other startup
> crashes as well.

Sorry. Rather, the last user I looked at with the MOZ_RELEASE_ASSERT(svc) signature has this DLL.

The user with the getBoolPzef crash has "eplgFirefox.dll", which appears to be malware, and "eOppMonitor.dll", which is probably from EPROT, and I'm already suspicious of.
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #20)
> The user with the getBoolPzef crash has "eplgFirefox.dll", which appears to
> be malware, and "eOppMonitor.dll", which is probably from EPROT, and I'm
> already suspicious of.

*ESET
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #21)
> (In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're
> blocked) from comment #20)
> > The user with the getBoolPzef crash has "eplgFirefox.dll", which appears to
> > be malware, and "eOppMonitor.dll", which is probably from EPROT, and I'm
> > already suspicious of.
> 
> *ESET

Hi Kris, by reading your latest comment, I am not sure I got what the next step for 57 is. Significant crashes have correlation with certain .dll... wondering is there anything we could do?
Flags: needinfo?(kmaglione+bmo)
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #22)
> Hi Kris, by reading your latest comment, I am not sure I got what the next
> step for 57 is. Significant crashes have correlation with certain .dll...
> wondering is there anything we could do?

I'm going to submit a patch to add logging to the other failure case, since the number of reports with the additional logging is still quite low.

Given the assertions I've seen so far, though, I don't think this is a new issue in 57. It looks like there's some sort of corruption that just happens to show up as a crash now because of the additional assertion. But these instances would be broken or crashing even without the assertion.
Flags: needinfo?(kmaglione+bmo)
Attachment #8912937 - Attachment is obsolete: true
Attachment #8923224 - Attachment is obsolete: true
Attachment #8923224 - Flags: review?(amarchesini)
Attachment #8923234 - Attachment is obsolete: true
Attachment #8923234 - Flags: review?(continuation)
Comment on attachment 8923237 [details]
Bug 1403348: Add debugging assertions for more AsyncStartup failure modes.

https://reviewboard.mozilla.org/r/194402/#review199450

r+ for the ServiceWorkerRegistrar.cpp. For jxpconnect you must have a r+ from a peer.
Attachment #8923237 - Flags: review?(amarchesini) → review+
(In reply to Andrea Marchesini [:baku] from comment #27)
> r+ for the ServiceWorkerRegistrar.cpp. For jxpconnect you must have a r+
> from a peer.

I am a peer :)
Comment on attachment 8923237 [details]
Bug 1403348: Add debugging assertions for more AsyncStartup failure modes.

r?mccr8 for the component loader part
Attachment #8923237 - Flags: review?(continuation)
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b08f946beb50
Add debugging assertions for more AsyncStartup failure modes. r=baku
Comment on attachment 8923237 [details]
Bug 1403348: Add debugging assertions for more AsyncStartup failure modes.

Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: This patch adds diagnostic code which should help us diagnose a high volume crash.
[Is this code covered by automated tests?]: The surrounding code is, but the added diagnostic code is not.
[Has the fix been verified in Nightly?]: N/A
[Needs manual test from QE? If yes, steps to reproduce]: N/A
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: It simply adds diagnostic code which is only activated in situations where we would otherwise crash.
[String changes made/needed]: None.
Attachment #8923237 - Flags: approval-mozilla-beta?
Comment on attachment 8923237 [details]
Bug 1403348: Add debugging assertions for more AsyncStartup failure modes.

https://reviewboard.mozilla.org/r/194402/#review200100
Attachment #8923237 - Flags: review?(continuation) → review+
Comment on attachment 8923237 [details]
Bug 1403348: Add debugging assertions for more AsyncStartup failure modes.

More diagnostic data to debug startup top crasher, beta57+
Attachment #8923237 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Crash Signature: [@ mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase] → [@ mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase] [@ mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase const]
Results so far are interesting, but still not much to go on.

One user's crashes contain:

Failed to get profileBeforeChange shutdown blocker: [JavaScript Error: "illegal character" {file: "resource://gre/modules/AsyncShutdown.jsm" line: 742 column: 60 source: " getOrigin: () => getOrigin(topFrame, filename, lineNum\x06U\x04"}]

That's clearly corruption, but it seems to be corruption in the JS source, rather than the XDR data, unless we somehow have lazy parsing enabled for this file.


Several crashes, probably from the same user, contain:

Failed to load module "jar:.../omni.ja!/components/nsAsyncShutdown.js": ["[Object]" {file: "", line: <garbage>}]

Which means nsAsyncShutdown.js is failing to load, and throwing some non-Error object as its exception value (probably Components.Exception)


So, one of the issues we're clearly dealing with is corruption, but there may be others.

I'm going to update the component loader logging to handle Components.Exception errors, so we can find out more about what's going on in that failure case, and also update these assertions to re-read the source of the module we're trying to load and annotate the crash report with its contents, so we can try to narrow down the source of the corruption.
Comment on attachment 8925313 [details]
Bug 1403348: Part 4 - Annotate certain startup crashes with AsyncShutdown script contents.  data-r?rweiss

https://reviewboard.mozilla.org/r/196478/#review201936
Attachment #8925313 - Flags: review?(continuation) → review+
Comment on attachment 8925312 [details]
Bug 1403348: Part 3 - Support plain XPC Exceptions in ExtractErrorValues.

https://reviewboard.mozilla.org/r/196476/#review202124

::: dom/base/nsContentUtils.cpp:10913
(Diff revision 1)
>      !aName.LowerCaseEqualsLiteral("_top") &&
>      !aName.LowerCaseEqualsLiteral("_parent") &&
>      !aName.LowerCaseEqualsLiteral("_self");
>  }
>  
> +// Ugly.

Can you please improve this comment? :)

::: dom/base/nsContentUtils.cpp:10940
(Diff revision 1)
> +    CopyUTF16toUTF8(filename, aSourceSpecOut);
> +    *aLineOut = exn->LineNumber(aCx);
> +    *aColumnOut = exn->ColumnNumber();
> +  }
> +
> +  exn->GetName(aMessageOut);

We should check the return value of GetName, GetFilename and GetMessageMoz
Attachment #8925312 - Flags: review?(amarchesini) → review+
Comment on attachment 8925313 [details]
Bug 1403348: Part 4 - Annotate certain startup crashes with AsyncShutdown script contents.  data-r?rweiss

Requesting data review for additional crash reports annotations.

1) Is there documentation that describes the schema for the ultimate data set available publicly, complete and accurate? 

This data is collected as part of ordinary crash reporter metadata.

This patch adds two annotations to crash reports for certain startup failures, "nsAsyncShutdownComponent" and "AsyncShutdownModule", which are the on-disk contents of the nsAsyncShutdown.js and AsyncShutdown.jsm scripts, respectively.

These scripts are shipped as part of Firefox, and should contain no user data. The annotations are intended to help us determine whether the corruption we're seeing is in our on-disk data, or in memory after the scripts are read.

2) Is there a control mechanism that allows the user to turn the data collection on and off? (Note, for data collection not needed for security purposes, Mozilla provides such a control mechanism)

Yes, the data is only submitted if the user chooses to submit a crash report.

3) If the request is for permanent data collection, is there someone who will monitor the data over time?

No, the assertion will be removed when we understand the cause of the crash. This bug will remain open until it's removed.

4) Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under? 

This is type 1.

5) Is the data collection default-on or default-off? 

Default on, in the cases where users choose to submit crash reports.

6) Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)? 

No.

7) Is the data collection covered by the existing Firefox privacy notice? If unsure: escalate to legal.

Yes.

8) Does there need to be a check-in in the future to determine whether to renew the data?

The removal should be verified when this bug is closed.
Attachment #8925313 - Flags: review?(rweiss)
Comment on attachment 8925312 [details]
Bug 1403348: Part 3 - Support plain XPC Exceptions in ExtractErrorValues.

https://reviewboard.mozilla.org/r/196476/#review202124

> Can you please improve this comment? :)

Heh. Sorry, I meant to do that before I submitted :)

> We should check the return value of GetName, GetFilename and GetMessageMoz

(These methods are infallible, and return void)
https://hg.mozilla.org/integration/mozilla-inbound/rev/8daf75d35c4843b10c40652abce7fd0ff8514114
Bug 1403348: Part 3 - Support plain XPC Exceptions in ExtractErrorValues. r=baku
Comment on attachment 8925313 [details]
Bug 1403348: Part 4 - Annotate certain startup crashes with AsyncShutdown script contents.  data-r?rweiss

https://reviewboard.mozilla.org/r/196478/#review207104

1.Yes, standard crash reporter documentation is available.
2.Yes, standard data preferences and crash reporting is not opt-out
3.The request is not permanent
4.Type 1
5.Default off
6.No new identifiers
7.Yes, covered by existing privacy policy
8.No need for further check-in
Attachment #8925313 - Flags: review?(rweiss) → review+
Seems this was last touched two months ago. Do we have any updated status on this?
Flags: needinfo?(kmaglione+bmo)
As I understand the state of things:
- Part 4 got r+'d but didn't land.
- Parts 1-3 did land, and they are telling us that do_getService is returning NS_ERROR_FACTORY_NOT_REGISTERED.  Because both XPCOMUtils.jsm's generateNSGetFactory and the methods in nsComponentManager can return this error code, the part 4 source dumps so seem like they'd help disambiguate what's going on a little further, although I haven't done the full call-chain analysis.
- There are a very small number of affected installations.  There is of course a sampling bias here since we're talking about startup crashes that, at least for the people affected here, happens repeatedly and so we'd expect them to either become ex-users very quick or perform a re-install that fixes them.
  - Relatedly, since we purged all of the old crash stats data, it's possible that the impacted population was originally much higher and they're all simply ex-users or successful re-installations at this point.

Action-wise, since some of the crash reports are from the nightly channel, perhaps we want to land part 4 once trunk becomes nightly?  Or perhaps this is the perfect time to land this in beta's (with the plan to remove it from the beta once we know what's happening.)  Since :kmag is already needinfo'ed leaving things at that.
Sorry, I got pulled into other things while waiting for data-r.

Bug 1422087 should significantly ameliorate this problem if it's related to the preloader or startup cache, since those will be purged after the first startup crash.

But since we don't actually know whether this is caused by startup cache corruption, omnijar corruption, or something else, the diagnostic patch will still be useful. So I'll land it now.
Flags: needinfo?(kmaglione+bmo)
https://hg.mozilla.org/integration/mozilla-inbound/rev/cf8613751378c8790b56131cd2a1be68573f9286
Bug 1403348: Part 4 - Annotate certain startup crashes with AsyncShutdown script contents. r=mccr8 data-r=rweiss
Backed out for build bustage ServiceWorkerRegistrar.cpp:29

backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/d4b3440ba8c95ec6352814a5a710758c93366f3c

push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=cf8613751378c8790b56131cd2a1be68573f9286&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-classifiedState=unclassified

failure log : https://treeherder.mozilla.org/logviewer.html#?job_id=166859329&repo=mozilla-inbound&lineNumber=17677

[task 2018-03-08T22:52:56.212Z] 22:52:56     INFO -  mkdir -p '.deps/'
[task 2018-03-08T22:52:56.212Z] 22:52:56     INFO -  make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/dom/plugins/test/testplugin/flashplugin'
[task 2018-03-08T22:52:56.213Z] 22:52:56     INFO -  make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/dom/plugins/test/testplugin/flashplugin'
[task 2018-03-08T22:52:56.213Z] 22:52:56     INFO -  make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/dom/plugins/test/testplugin/flashplugin'
[task 2018-03-08T22:52:56.691Z] 22:52:56     INFO -  make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/dom/serviceworkers'
[task 2018-03-08T22:52:56.693Z] 22:52:56     INFO -  /builds/worker/workspace/build/src/sccache2/sccache /builds/worker/workspace/build/src/clang/bin/clang++ -target x86_64-apple-darwin11 -B /builds/worker/workspace/build/src/cctools/bin -isysroot /builds/worker/workspace/build/src/MacOSX10.11.sdk -std=gnu++14 -o Unified_cpp_dom_serviceworkers0.o -c -fvisibility=hidden -fvisibility-inlines-hidden -DDEBUG=1 -DOS_POSIX=1 -DOS_MACOSX=1 -DSTATIC_EXPORTABLE_JS_API -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -I/builds/worker/workspace/build/src/dom/serviceworkers -I/builds/worker/workspace/build/src/obj-firefox/dom/serviceworkers -I/builds/worker/workspace/build/src/obj-firefox/ipc/ipdl/_ipdlheaders -I/builds/worker/workspace/build/src/ipc/chromium/src -I/builds/worker/workspace/build/src/ipc/glue -I/builds/worker/workspace/build/src/obj-firefox/dist/include -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nss -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-firefox/mozilla-config.h -Qunused-arguments -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -Qunused-arguments -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wloop-analysis -Wc++1z-compat -Wcomma -Wimplicit-fallthrough -Werror=non-literal-null-conversion -Wstring-conversion -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments -Wno-unknown-warning-option -Wno-return-type-c-linkage -fno-sized-deallocation -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fno-exceptions -fno-strict-aliasing -stdlib=libc++ -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -Xclang -load -Xclang /builds/worker/workspace/build/src/obj-firefox/build/clang-plugin/libclang-plugin.dylib -Xclang -add-plugin -Xclang moz-check -O3 -fno-omit-frame-pointer -Werror  -MD -MP -MF .deps/Unified_cpp_dom_serviceworkers0.o.pp   /builds/worker/workspace/build/src/obj-firefox/dom/serviceworkers/Unified_cpp_dom_serviceworkers0.cpp
[task 2018-03-08T22:52:56.693Z] 22:52:56     INFO -  In file included from /builds/worker/workspace/build/src/obj-firefox/dom/serviceworkers/Unified_cpp_dom_serviceworkers0.cpp:128:
[task 2018-03-08T22:52:56.693Z] 22:52:56     INFO -  /builds/worker/workspace/build/src/dom/serviceworkers/ServiceWorkerRegistrar.cpp:29:10: fatal error: 'mozJSComponentLoader.h' file not found
[task 2018-03-08T22:52:56.693Z] 22:52:56     INFO -  #include "mozJSComponentLoader.h"
[task 2018-03-08T22:52:56.693Z] 22:52:56     INFO -           ^~~~~~~~~~~~~~~~~~~~~~~~
[task 2018-03-08T22:52:56.693Z] 22:52:56     INFO -  1 error generated.
[task 2018-03-08T22:52:56.693Z] 22:52:56     INFO -  /builds/worker/workspace/build/src/config/rules.mk:1047: recipe for target 'Unified_cpp_dom_serviceworkers0.o' failed
[task 2018-03-08T22:52:56.693Z] 22:52:56     INFO -  make[4]: *** [Unified_cpp_dom_serviceworkers0.o] Error 1
[task 2018-03-08T22:52:56.693Z] 22:52:56     INFO -  make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/dom/serviceworkers'
[task 2018-03-08T22:52:56.693Z] 22:52:56     INFO -  make[4]: *** Waiting for unfinished jobs....
Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(kmaglione+bmo)
https://hg.mozilla.org/integration/mozilla-inbound/rev/3db841301bfd8d7638b92575e76ecc550a2101af
Bug 1403348: Part 4 - Annotate certain startup crashes with AsyncShutdown script contents. r=mccr8 data-r=rweiss
Andrew, can you get me the annotations from some of the recent crash reports?

https://crash-stats.mozilla.com/signature/?version=61.0a1&signature=mozilla%3A%3Adom%3A%3AServiceWorkerRegistrar%3A%3AGetShutdownPhase&date=%3E%3D2018-03-19T05%3A03%3A47.000Z&date=%3C2018-04-19T05%3A03%3A47.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports

So far, unfortunately, they're all from the less interesting "Failed to get async shutdown service: NS_ERROR_FACTORY_NOT_REGISTERED" failure, rather than the ones that point to file corruption, but they might still be useful.
Flags: needinfo?(continuation)
What do the annotations look like? I don't see anything named "nsAsyncShutdownComponent" or "AsyncShutdownModule" under "details" or "metadata".
Flags: needinfo?(continuation)
(In reply to Andrew McCreight [:mccr8] from comment #55)
> What do the annotations look like? I don't see anything named
> "nsAsyncShutdownComponent" or "AsyncShutdownModule" under "details" or
> "metadata".

Hm. In that case, it's possible that we're failing to read them altogether, and that's what's causing the crash in the factory not registered cases (which wouldn't be surprising). I guess I should update the annotation to explicitly note a read failure.

The other failure modes, where we clearly are running scripts, haven't shown up in nightly since this landed.
This is still a pretty frequent startup crash on Beta & Release. Can we try to revisit this?
Sorry, we've revisited this in other bugs.

This is clearly a case of Omnijar corruption that's causing other bugs as well. Exactly what the nature of that corruption is (memory corruption, disk corruption, something else...) we don't know. I'd like to get omnijar checksums added to the build metadata and checked by the crash reporter, but I don't have time to work on that, and I don't have anyone obvious to redirect to either.
Flags: needinfo?(kmaglione+bmo)
Marion, any thoughts on who we might redirect this to?
Flags: needinfo?(mdaly)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #59)
> Marion, any thoughts on who we might redirect this to?

Hmm, is it being corrupted on the omnijar side itself or on the SW side? 

I'm going to ask :asuth to give this an overview and give his recommendations. This would be a backlogged item for us if we did take it on.
Flags: needinfo?(mdaly) → needinfo?(bugmail)
(In reply to Marion Daly [:mdaly] from comment #60)
> (In reply to Ryan VanderMeulen [:RyanVM] from comment #59)
> > Marion, any thoughts on who we might redirect this to?
> 
> Hmm, is it being corrupted on the omnijar side itself or on the SW side? 

We don't know, which is part of the problem. We know that here and in other bugs, we're winding up with script sources that clearly contain corrupted data, but it's not obvious from those whether it's memory corruption or disk corruption. They do tend to repeatedly happen in the same way for the same users, but there are ways that could happen in either case.

There are some other failure cases where we fail to load the app omnijar entirely. Some of those are probably from network errors, but some of them may be from corruption as well.
Although, the network error issue is an unlikely cause of the omnijar failures, since we get crash reports for these failures, and the crash reporter is in the same location as the omnijar. It's more often the cause of failures in users with roaming profiles, when we touch memmapped data loaded from them.
This is currently the #2 topcrash on beta, but as a repeated startup crash for a smaller number of users than the total crashes.
Should we mark this bug stalled as stalled? There is still a significant number of startup crashes on 63 beta.
Yes let's mark it as stalled.
Summary: Crash in mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase → Crash in mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase[STALLED]
removing old NI as it's stalled.
Flags: needinfo?(bugmail)
Assignee: kmaglione+bmo → nobody
See Also: → 1528348

Hey :kmag, in bug 1553401 when looking at the raw crash JSON for https://crash-stats.mozilla.org/report/index/aee3cc64-bb79-4e9a-a03f-1e09f0190522 I noticed that we still seem to be trying to annotate the contents of "nsAsyncShutdown.js"[1] but you renamed that[2] in bug 1524688. The crash report in question seems to think it has the contents of nsAsyncShutdown.js.

I guess my Q is:

  • Is there anything informative in that bug's crash/friend crashes.
  • Should we be updating the annotation code?

1: https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/js/xpconnect/loader/mozJSComponentLoader.cpp#371
2: https://hg.mozilla.org/mozilla-central/rev/fe9b6695212ec3960ab9e730ccc446a8c3da9375#l6.1

Flags: needinfo?(kmaglione+bmo)
Crash Signature: [@ mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase] [@ mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase const] → [@ mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase] [@ mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase const] [@ AnnotateScriptContents]

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #68)

Hey :kmag, in bug 1553401 when looking at the raw crash JSON for https://crash-stats.mozilla.org/report/index/aee3cc64-bb79-4e9a-a03f-1e09f0190522 I noticed that we still seem to be trying to annotate the contents of "nsAsyncShutdown.js"[1] but you renamed that[2] in bug 1524688. The crash report in question seems to think it has the contents of nsAsyncShutdown.js.

I guess my Q is:

  • Is there anything informative in that bug's crash/friend crashes.
  • Should we be updating the annotation code?

I don't know, at this point. I'd still like to see the contents of those files, but I asked Andrew McCreight to get the annotations for me and he said they weren't there. If there actually are crash reports with that annotation data now, then I'll update the annotation code and maybe we can actually get some useful information from them.

If not, I guess the annotation code should just be removed.

Crash Signature: [@ mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase] [@ mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase const] [@ AnnotateScriptContents] → [@ mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase] [@ mozilla::dom::ServiceWorkerRegistrar::GetShutdownPhase const] [@ AnnotateScriptContents]
Flags: needinfo?(kmaglione+bmo)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #68)

Hey :kmag, in bug 1553401 when looking at the raw crash JSON for https://crash-stats.mozilla.org/report/index/aee3cc64-bb79-4e9a-a03f-1e09f0190522 I noticed that we still seem to be trying to annotate the contents of "nsAsyncShutdown.js"[1] but you renamed that[2] in bug 1524688. The crash report in question seems to think it has the contents of nsAsyncShutdown.js.

So, interestingly, based on this crash, it looks like the user is running a newer Firefox binary with an omni.ja from an older version. It has the actual contents of the old version of nsAsyncShutdown.js. Which would explain why loading the component would fail, since the JSM wouldn't actually exist at the new location we try to load it from.

I suppose that may be the result of an incomplete update attempt or a botched system restore of some sort.

But it's also different from the other errors I've seen, where a lot of the time we're failing with an undefined symbol error for a symbol which is clearly a corrupted version of the symbol the script is supposed to be referencing.

I suppose maybe both types of errors could possibly be explained by a partial MAR update gone wrong...

This is the #5 overall crash so far in 68.0b13.

Just to be clear, at this point we have pretty high confidence that this crash is caused by some sort of corruption that just happens to show up here. It doesn't actually have anything to do with the ServiceWorker code. There's omni.ja (or possibly in some cases memory) corruption that's going to make the browser crash or at least be unusably broken.

It's still not clear whether there's more than one source of corruption, though.

Good day. I was having this problem with starting my Firefox 32 bit (I think)
https://crash-stats.mozilla.org/report/index/17dfacfa-391a-4883-b5b8-58ef80190803

As a test, I installed the 64 bit and it works.

Adding updated affected branches, based on the fact this is seen in the top 25 crashes in 70.0b5.

Keywords: top50, topcrash

(In reply to Kris Maglione [:kmag] from comment #72)

Just to be clear, at this point we have pretty high confidence that this crash is caused by some sort of corruption that just happens to show up here. It doesn't actually have anything to do with the ServiceWorker code. There's omni.ja (or possibly in some cases memory) corruption that's going to make the browser crash or at least be unusably broken.

Does this mean, we can/should change the components assignment accordingly?

Flags: needinfo?(bugmail)

:rstrong, would it make more sense for this bug to be moved to some installer component where it can hopefully make more progress given that the corruption presumably involves app update?

The general situation here is that, at startup, the ServiceWorkerRegistrar tries to get the the nsIAsyncShutdownService ("@mozilla.org/async-shutdown-service;1") so it can register itself as a blocker. The service is implemented in JS and fails to load and by all indications the load is corrupt. This manifests itself in ServiceWorkers in crash-stats, but the underlying problem is systemic

Flags: needinfo?(bugmail) → needinfo?(robert.strong.bugs)

I just checked crash-stats and this bug also affects FennecAndroid which doesn't use Application Update so there must be something else causing this other than Application Update... perhaps a cache hasn't been invalidated.

Flags: needinfo?(robert.strong.bugs)
Flags: needinfo?(bugmail)
Blocks: 1616059

Duping to bug 1471720 where there is a plan at https://bugzilla.mozilla.org/show_bug.cgi?id=1471720#c18 to help move the attribution out of ServiceWorkers, but which won't magically fix omni.ja corruption.

Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(bugmail)
Resolution: --- → DUPLICATE

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled

They've done their job. It's clear at this point that these failures are
happening because of omni.ja corruption.

Assignee: nobody → kmaglione+bmo
Pushed by maglione.k@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/7c67e3cb1ded
Follow-up: Remove crash annotations for AsyncShutdown load failures. r=mccr8
Pushed by maglione.k@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/1965db355287
Follow-up: Remove crash annotations for AsyncShutdown load failures. r=mccr8
Flags: needinfo?(kmaglione+bmo)
See Also: 1471720
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: