Implement failure logging for bootstrap process
Categories
(Firefox :: General, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox67 | --- | fixed |
People
(Reporter: bugzilla, Assigned: bugzilla)
References
(Blocks 1 open bug)
Details
(Whiteboard: inj+)
Attachments
(6 files, 3 obsolete files)
Assignee | ||
Comment 1•6 years ago
|
||
Assignee | ||
Comment 2•6 years ago
|
||
Assignee | ||
Comment 3•6 years ago
|
||
Comment 4•6 years ago
|
||
Assignee | ||
Comment 5•6 years ago
|
||
Comment 6•6 years ago
|
||
Assignee | ||
Comment 7•6 years ago
|
||
Assignee | ||
Comment 8•6 years ago
|
||
Comment 9•6 years ago
|
||
Updated•6 years ago
|
Assignee | ||
Comment 10•6 years ago
|
||
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #9)
Given the amount of work that is still outstanding to get Fluent up and running on the standalone binaries, I'm not sure it is worth doing for the purposes of the launcher process.
Can you elaborate on what fluent-rs is missing for that purpose?
Sure. The primary issue is that instantiating a FluentBundle requires a list of locales. But the launcher process is the very first thing that is run in Firefox; we don't have prefs, we don't have omnijar, we don't have anything other than the C++ runtime and Win32.
It appears to me that the only way that we could obtain information about the desired locale(s) is to dig it out from somewhere, which currently appears to be difficult when so few services are available to us.
Having said that, concurrently with this discovery, I also concluded that displaying error messages to users is of limited utility anyway, so we're not going that route regardless.
But in case we need to use fluent-rs from the launcher process in the future, being able to determine the desired locale for the Firefox installation without having to dive into the Omnijar (or some other non-trivial method) would be ideal.
Comment 11•6 years ago
|
||
Great, thank you!
It seems like a not-that-rare request to get locale info (and locale resources) out of the Firefox package for the external app to use via fluent-rs. @ted mentioned that for the crash reporter as well.
In this particular case, my guess is that you don't actually want Firefox locale - you want the OS locale which you could get out of some winapi.
Assignee | ||
Comment 12•6 years ago
|
||
Okay, back to working on this: I'm going to make the error handler generate a telemetry ping, and then invoke our existing pingsender binary to send it, since it already does everything that I need it to do.
Assignee | ||
Comment 13•6 years ago
|
||
Assignee | ||
Comment 14•6 years ago
|
||
BasicDllServices is used to gain access to the authenticode APIs in non-Gecko
contexts. One feature that WinDllServices provides is the ability to register
a callback interface to be notified when a DLL has been loaded.
This is not particularly useful in the BasicDllServices use case, and in the
"handle a launcher process failure on a background thread" use case, would
actually be harmful.
This patch modifies the DLLServices backend to offer a "basic" option that
omits the callback stuff.
Assignee | ||
Comment 15•6 years ago
|
||
This patch does a few things:
- Fleshes out the launcher process failure ping;
- Sends that ping via pingsender;
- If there is any failure in doing so, we fall back to the Windows event log;
- Any launcher process failures will result in us falling back to the normal
startup code path, ensuring that users will still see a browser.
A sample ping will be attached to the bug.
Depends on D19696
Assignee | ||
Comment 16•6 years ago
|
||
Assignee | ||
Comment 17•6 years ago
|
||
Comment 18•6 years ago
|
||
Before I get started on Data Collection Review, allow me to put on my Firefox Telemetry Engineer hat.
First allow me to apologize for our C++ ping-sending story not being great. Ultimately one of the future aims of the glean project is to have a cross-platform library with lovely interfaces that'll take care of a lot of this for you. But that's a ways off (Firefox Desktop isn't even the second product we plan to support), so for now it isn't the friendliest.
Now to the tech: if possible and if it makes sense in this case I would recommend having this ping adhere to the Common Ping Format. This'll help with ingestion as quite a few things may depend on, for example, the ping having a unique identifying "type" and "id" and both being present in the outermost part of the JSON body. (Having the "application" structure match is also helpful for channel normalization and other stuff). Of course, this depends on the information being present at the time you're assembling the ping and relevant to your data needs. We can always adapt ingestion to your needs, but it might be faster/more helpful for your analysis requirements to fit it into the same rough shape as other Firefox Telemetry pings.
The example ping is fairly small, and seems like it should only be sent a maximum of once per session which is nice from a bandwidth/budget standpoint. It appears as though there are some structures with potentially-unbounded sizes that could be exploited to make our lives a bit more annoying, though. If possible, consider putting byte limits on the strings and length limits on the arrays to ensure things are sane, and to ensure those persistent haxxors fuzzing our Telemetry ingestion point don't waste too many of our buffers with their SQL injection and buffer overflow attempts. We use 80 bytes for string Scalars, and shorter ones for Telemetry Event strings, if you're looking for ideas.
The considerations of the pipeline aren't exactly my wheelhouse, so consider reaching out to mreid to put you in touch with a pipeline Data Engineer to consult. You'll probably need to talk to them about how to shape your data for analysis anyway :) (ie, do you want this in sql.tmo? Will you need scheduled jobs?)
Comment 19•6 years ago
|
||
Assignee | ||
Comment 20•6 years ago
|
||
(In reply to Chris H-C :chutten from comment #19)
Comment on attachment 9045098 [details]
Data review request for launcher failure pingThis is data-review- for a couple of reasons.
- We're going to need documentation (like
[this](https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/
telemetry/data/downgrade-ping.html)) for the ping to describe the collection.
OK, will do.
- The path of the build machine is Category 4 data, as anyone's allowed to
build Firefox and user names can be considered personal. Think of a Linux
distro volunteer building Firefox at home and packaging it for everyone.
Makes sense.
There do exist mechanisms (which I think you helped write and review) for
sanitizing path names to only include Cat 1 data (mostly by ensuring that we
only include subfolders (or, for externally-located files, only the file
name)).
I can't use those functions in the launcher process because they contain Gecko dependencies, but I can write new ones.
Would your analysis requirements be satisfied by sanitized paths?
The leaf name of the file is probably sufficient. I'm adding that to a subsequent revision.
- The opt-out must be available to Nightly users as well. It sounds as
though the launcher process is enabled for all Nightly users irrespective of
their study/data choices prefs so we'll need some signal to disable data
reporting if the pref's off. (Or maybe I'm misunderstanding the precedence)
I filed (and am in the process of landing) dependent bug 1529429 that will tie this to the SHIELD enable pref. I'll update the data review request accordingly.
Assignee | ||
Comment 21•6 years ago
|
||
(In reply to Chris H-C :chutten from comment #18)
Now to the tech: if possible and if it makes sense in this case I would recommend having this ping adhere to the Common Ping Format. This'll help with ingestion as quite a few things may depend on, for example, the ping having a unique identifying "type" and "id" and both being present in the outermost part of the JSON body. (Having the "application" structure match is also helpful for channel normalization and other stuff). Of course, this depends on the information being present at the time you're assembling the ping and relevant to your data needs. We can always adapt ingestion to your needs, but it might be faster/more helpful for your analysis requirements to fit it into the same rough shape as other Firefox Telemetry pings.
Unfortunately some of the stuff in the common ping format is not usable or sensible in the case of the launcher process. Having said that, I did see some value in some of the fields that were included in that format, so I have modified both the patch and the schema to add those additional fields.
Assignee | ||
Comment 22•6 years ago
|
||
Updated schema for reference purposes
Assignee | ||
Comment 23•6 years ago
|
||
Updating the sample ping. (This is a multi-line sample because it was generated by a debug build. Release builds generate single-line JSON)
Assignee | ||
Updated•6 years ago
|
Comment 24•6 years ago
|
||
Assignee | ||
Comment 25•6 years ago
|
||
Assignee | ||
Comment 26•6 years ago
|
||
All of your concerns should be handled now. The data review request has been updated accordingly.
(In reply to Chris H-C :chutten from comment #19)
Comment on attachment 9045098 [details]
Data review request for launcher failure pingThis is data-review- for a couple of reasons.
- We're going to need documentation (like
[this](https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/
telemetry/data/downgrade-ping.html)) for the ping to describe the collection.
Okay, done with reluctance - I have added it as a Part 3 to this patch. However, I really think that this is a waste of developer time and should be automatically derived from the schema. mreid has filed an issue on GitHub for looking into this.
- The path of the build machine is Category 4 data, as anyone's allowed to
build Firefox and user names can be considered personal. Think of a Linux
distro volunteer building Firefox at home and packaging it for everyone.
There do exist mechanisms (which I think you helped write and review) for
sanitizing path names to only include Cat 1 data (mostly by ensuring that we
only include subfolders (or, for externally-located files, only the file
name)). Would your analysis requirements be satisfied by sanitized paths?
I am now just including the leaf name.
- The opt-out must be available to Nightly users as well. It sounds as
though the launcher process is enabled for all Nightly users irrespective of
their study/data choices prefs so we'll need some signal to disable data
reporting if the pref's off. (Or maybe I'm misunderstanding the precedence)
Okay, in bug 1529429 I tied launcher process to the SHIELD pref.
Comment 27•6 years ago
|
||
bugherder |
Comment 28•6 years ago
|
||
bugherder |
Comment 29•6 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #26)
(In reply to Chris H-C :chutten from comment #19)
Comment on attachment 9045098 [details]
Data review request for launcher failure pingThis is data-review- for a couple of reasons.
- We're going to need documentation (like
[this](https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/
telemetry/data/downgrade-ping.html)) for the ping to describe the collection.Okay, done with reluctance - I have added it as a Part 3 to this patch. However, I really think that this is a waste of developer time and should be automatically derived from the schema. mreid has filed an issue on GitHub for looking into this.
It's the price we pay for data collection. Telemetry tries to make this easy for default types via the "description" field, and I'm glad to hear :mreid's looking into making it similarly easy to generate from schemas.
Comment 30•6 years ago
|
||
Comment 31•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Comment 32•6 years ago
|
||
Comment 33•6 years ago
|
||
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=busted&selectedJob=230350963&revision=7733f36c6e38cf684cf612e81f6b89cad4df3566
Failure log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=230354802&repo=autoland&lineNumber=29458
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=230354785&repo=autoland&lineNumber=33407
Backout link: https://hg.mozilla.org/integration/autoland/rev/73f63e74eaa1b62e0d702c21c1284757e51d6c08
18:33:45 INFO - z:/build/build/src/browser/app/winlauncher/ErrorHandler.cpp(667,27): error: unused function 'SendPingThread' [-Werror,-Wunused-function]
18:33:45 INFO - static unsigned __stdcall SendPingThread(void* aContext) {
18:33:45 INFO - ^
18:33:45 INFO - 1 error generated.
18:33:45 INFO - z:/build/build/src/config/rules.mk:805: recipe for target 'Unified_cpp_app_winlauncher0.obj' failed
18:33:45 INFO - mozmake.EXE[4]: *** [Unified_cpp_app_winlauncher0.obj] Error 1
18:33:45 INFO - mozmake.EXE[4]: Leaving directory 'z:/build/build/src/obj-firefox/browser/app/winlauncher'
18:33:45 INFO - z:/build/build/src/config/recurse.mk:74: recipe for target 'browser/app/winlauncher/target' failed
18:33:45 INFO - mozmake.EXE[3]: *** [browser/app/winlauncher/target] Error 2
18:33:45 INFO - mozmake.EXE[3]: *** Waiting for unfinished jobs....
18:33:45 INFO - mozmake.EXE[4]: Entering directory 'z:/build/build/src/obj-firefox/mfbt/tests'
18:33:45 INFO - z:/build/build/src/sccache2/sccache.exe z:/build/build/src/clang/bin/clang.exe --driver-mode=cl -FoTestAlgorithm.obj -c -DDEBUG=1 -DIMPL_MFBT -Iz:/build/build/src/mfbt/tests -Iz:/build/build/src/obj-firefox/mfbt/tests -Iz:/build/build/src/obj-firefox/dist/include -Iz:/build/build/src/obj-firefox/dist/include/testing -Iz:/build/build/src/obj-firefox/dist/include/nspr -Iz:/build/build/src/obj-firefox/dist/include/nss -MD -FI z:/build/build/src/obj-firefox/mozilla-config.h -DMOZILLA_CLIENT -Qunused-arguments -guard:cf -U_FORTIFY_SOURCE -Xclang -fno-common -Qunused-arguments -fsanitize=address -fsanitize-blacklist=z:/build/build/src/build/sanitizers/asan_blacklist_win.txt -fcrash-diagnostics-dir=z:/build/public/build -fcrash-diagnostics-dir=/z/build/public/build -TP -nologo -w15038 -wd5026 -wd5027 -Zc:sizedDealloc- -wd4091 -wd4577 -D_HAS_EXCEPTIONS=0 -guard:cf -U_FORTIFY_SOURCE -Xclang -fno-common -W3 -Gy -Zc:inline -wd4251 -wd4244 -wd4267 -wd4800 -wd4595 -wd4065 -Wno-inline-new-delete -Wno-invalid-offsetof -Wno-microsoft-enum-value -Wno-microsoft-include -Wno-unknown-pragmas -Wno-ignored-pragmas -Wno-deprecated-declarations -Wno-invalid-noreturn -Wno-inconsistent-missing-override -Wno-implicit-exception-spec-mismatch -Wno-unused-local-typedef -Wno-ignored-attributes -Wno-used-but-marked-unused -we4553 -D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING -GR- -Z7 -Xclang -load -Xclang z:/build/build/src/obj-firefox/build/clang-plugin/clang-plugin.dll -Xclang -add-plugin -Xclang moz-check -O1 -Oy- -WX -wd4275 -wd4530 -Xclang -MP -Xclang -dependency-file -Xclang .deps/TestAlgorithm.obj.pp -Xclang -MT -Xclang TestAlgorithm.obj -FdTestAlgorithm.pdb z:/build/build/src/mfbt/tests/TestAlgorithm.cpp
18:33:45 INFO - mozmake.EXE[4]: Leaving directory 'z:/build/build/src/obj-firefox/mfbt/tests'
18:33:45 INFO - mozmake.EXE[4]: Entering directory 'z:/build/build/src/obj-firefox/mfbt/tests'
[task 2019-02-25T18:11:42.122Z] 18:11:42 INFO - /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/interceptor/VMSharingPolicies.h:141:12: note: remove std::move call here
[task 2019-02-25T18:11:42.123Z] 18:11:42 INFO - return std::move(items);
[task 2019-02-25T18:11:42.123Z] 18:11:42 INFO - ^~~~~~~~~~ ~
[task 2019-02-25T18:11:42.123Z] 18:11:42 INFO - In file included from /builds/worker/workspace/build/src/obj-firefox/browser/app/winlauncher/Unified_cpp_app_winlauncher0.cpp:11:
[task 2019-02-25T18:11:42.123Z] 18:11:42 ERROR - /builds/worker/workspace/build/src/browser/app/winlauncher/ErrorHandler.cpp:162:52: error: unknown type name '_bstr_t'
[task 2019-02-25T18:11:42.123Z] 18:11:42 INFO - static mozilla::UniquePtr<char[]> WideToUTF8(const _bstr_t& aStr) {
[task 2019-02-25T18:11:42.123Z] 18:11:42 INFO - ^
[task 2019-02-25T18:11:42.123Z] 18:11:42 ERROR - /builds/worker/workspace/build/src/browser/app/winlauncher/ErrorHandler.cpp:325:9: error: no matching function for call to '_wcslwr_s'
[task 2019-02-25T18:11:42.123Z] 18:11:42 INFO - if (_wcslwr_s(leaf)) {
[task 2019-02-25T18:11:42.123Z] 18:11:42 INFO - ^~~~~~~~~
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 34•6 years ago
|
||
Comment 35•6 years ago
|
||
Comment 36•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9105b6f8d62c
https://hg.mozilla.org/mozilla-central/rev/7f034553129d
Updated•6 years ago
|
Description
•