Update can get caught in a loop when it fails due to a write error
Categories
(Toolkit :: Application Update, defect, P3)
Tracking
()
People
(Reporter: bytesized, Assigned: erchen)
References
Details
(Whiteboard: [fidedi-ope])
Attachments
(2 files, 1 obsolete file)
It appears that it's possible for update to get caught in a loop.
- An update fails due to a write error.
_postUpdateProcessing
sees this and callshandleUpdateFailure
.handleUpdateFailure
identifies the failure as a write failure and thinks "this is a failure using the service. We should fall back to not use the service". But it is possible to get this type of error without using the service, and this code does not check that the service was actually used.- Repeat.
We should fix this. I think it makes sense to fall back once. But once we have already fallen back, we should not just keep retrying over and over.
Updated•9 months ago
|
Updated•18 days ago
|
Updated•17 days ago
|
Updated•16 days ago
|
Pushed by erchen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/03d24dcffbb0 Fix update error loop when service is used r=bytesized,application-update-reviewers
Comment 3•14 days ago
•
|
||
Backed out for causing xpcshell failures on updateManagerXML.js.
[task 2024-06-14T14:38:17.059Z] 14:38:17 INFO - TEST-START | toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML.js
[task 2024-06-14T14:38:17.305Z] 14:38:17 WARNING - TEST-UNEXPECTED-FAIL | toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML.js | xpcshell return code: 0
[task 2024-06-14T14:38:17.305Z] 14:38:17 INFO - TEST-INFO took 242ms
[task 2024-06-14T14:38:17.305Z] 14:38:17 INFO - >>>>>>>
[task 2024-06-14T14:38:17.307Z] 14:38:17 INFO - PID 7023 | [Parent 7023, Main Thread] WARNING: Couldn't get the user appdata directory. Crash events may not be produced.: file /builds/worker/checkouts/gecko/toolkit/crashreporter/nsExceptionHandler.cpp:2888
[task 2024-06-14T14:38:17.308Z] 14:38:17 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2024-06-14T14:38:17.309Z] 14:38:17 INFO - "14:38:17:168 | TEST-INFO | xpcshellUtilsAUS.js | [setupTestCommon : 883] start - general test setup"
[task 2024-06-14T14:38:17.309Z] 14:38:17 INFO - TEST-PASS | toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML.js | setupTestCommon - [setupTestCommon : 885] gTestID should be 'undefined' (setupTestCommon should only be called once) - "undefined" === "undefined"
[task 2024-06-14T14:38:17.309Z] 14:38:17 INFO - (xpcshell/head.js) | test pending (2)
[task 2024-06-14T14:38:17.310Z] 14:38:17 INFO - "14:38:17:173 | TEST-INFO | xpcshellUtilsAUS.js | [setupTestCommon : 979] Updates Directory (UpdRootD) Path: /builds/worker/workspace/build/tests/xpcshell/tests/toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML/dir.app"
[task 2024-06-14T14:38:17.311Z] 14:38:17 INFO - "14:38:17:174 | TEST-INFO | xpcshellUtilsAUS.js | [setupTestCommon : 996] resetting update lock"
[task 2024-06-14T14:38:17.311Z] 14:38:17 INFO - "14:38:17:175 | TEST-INFO | xpcshellUtilsAUS.js | [setupTestCommon : 1026] finish - general test setup"
[task 2024-06-14T14:38:17.311Z] 14:38:17 INFO - "14:38:17:175 | TEST-INFO | /builds/worker/workspace/build/tests/xpcshell/tests/toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML.js | [run_test : 9] testing addition of a successful update to updates.xml and verification of update properties including the format prior to bug 530872"
[task 2024-06-14T14:38:17.311Z] 14:38:17 INFO - "14:38:17:175 | TEST-INFO | shared.js | [setUpdateChannel : 226] setting default pref app.update.channel to test_channel"
[task 2024-06-14T14:38:17.311Z] 14:38:17 INFO - PID 7023 | (xpcshell:7023): GLib-GObject-WARNING **: 14:38:17.187: invalid (NULL) pointer instance
[task 2024-06-14T14:38:17.311Z] 14:38:17 INFO - PID 7023 | (xpcshell:7023): GLib-GObject-CRITICAL **: 14:38:17.187: g_signal_connect_data: assertion 'G_TYPE_CHECK_INSTANCE (instance)' failed
[task 2024-06-14T14:38:17.311Z] 14:38:17 INFO - PID 7023 | (xpcshell:7023): GLib-GObject-WARNING **: 14:38:17.187: invalid (NULL) pointer instance
<...>
[task 2024-06-14T14:38:17.331Z] 14:38:17 INFO - TEST-PASS | toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML.js | run_test - [run_test : 259] the update patch custom2_attr property should equal the expected value - "custom2 patch value" == "custom2 patch value"
[task 2024-06-14T14:38:17.331Z] 14:38:17 INFO - "14:38:17:243 | TEST-INFO | /builds/worker/workspace/build/tests/xpcshell/tests/toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML.js | [run_test : 265] checking the first update patch enumerator"
[task 2024-06-14T14:38:17.331Z] 14:38:17 INFO - TEST-PASS | toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML.js | run_test - [run_test : 266] patch enumerator should be an instance of nsISimpleEnumerator - true == true
[task 2024-06-14T14:38:17.332Z] 14:38:17 WARNING - TEST-UNEXPECTED-FAIL | toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML.js | run_test - [run_test : 271] the length of the array created from the patch enumerator should equal the expected value - 3 == 2
[task 2024-06-14T14:38:17.332Z] 14:38:17 INFO - /builds/worker/workspace/build/tests/xpcshell/tests/toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML.js:run_test:271
[task 2024-06-14T14:38:17.333Z] 14:38:17 INFO - /builds/worker/workspace/build/tests/xpcshell/head.js:_do_main:245
[task 2024-06-14T14:38:17.333Z] 14:38:17 INFO - /builds/worker/workspace/build/tests/xpcshell/head.js:_execute_test:596
[task 2024-06-14T14:38:17.333Z] 14:38:17 INFO - -e:null:1
[task 2024-06-14T14:38:17.333Z] 14:38:17 INFO - exiting test
[task 2024-06-14T14:38:17.334Z] 14:38:17 INFO - PID 7023 | Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[task 2024-06-14T14:38:17.334Z] 14:38:17 INFO - PID 7023 | [Parent 7023, Main Thread] WARNING: Could not get the program name for a cubeb stream.: 'NS_SUCCEEDED(rv)', file /builds/worker/checkouts/gecko/dom/media/CubebUtils.cpp:467
[task 2024-06-14T14:38:17.334Z] 14:38:17 INFO - "CONSOLE_MESSAGE: (info) No chrome package registered for chrome://branding/locale/brand.properties"
[task 2024-06-14T14:38:17.335Z] 14:38:17 INFO - PID 7023 | [Parent 7023, Main Thread] WARNING: 'mColorSchemePreference == ComputeColorSchemeSetting()', file /builds/worker/checkouts/gecko/widget/gtk/nsLookAndFeel.cpp:1544
[task 2024-06-14T14:38:17.335Z] 14:38:17 INFO - PID 7023 | JavaScript error: /builds/worker/workspace/build/tests/xpcshell/head.js, line 869: NS_ERROR_ABORT
[task 2024-06-14T14:38:17.335Z] 14:38:17 INFO - "CONSOLE_MESSAGE: (error) [JavaScript Error: "NS_ERROR_ABORT" {file: "/builds/worker/workspace/build/tests/xpcshell/head.js" line: 869}]"
[task 2024-06-14T14:38:17.336Z] 14:38:17 WARNING - TEST-UNEXPECTED-FAIL | toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML.js | assertNoUncaughtRejections - [assertNoUncaughtRejections : 266] A promise chain failed to handle a rejection: [Exception... "Abort" nsresult: "0x80004004 (NS_ERROR_ABORT)" location: "JS frame :: /builds/worker/workspace/build/tests/xpcshell/head.js :: _abort_failed_test :: line 869" data: no] - stack: _abort_failed_test@/builds/worker/workspace/build/tests/xpcshell/head.js:869:20
[task 2024-06-14T14:38:17.337Z] 14:38:17 INFO - do_report_result@/builds/worker/workspace/build/tests/xpcshell/head.js:981:5
[task 2024-06-14T14:38:17.337Z] 14:38:17 INFO - Assert<@/builds/worker/workspace/build/tests/xpcshell/head.js:70:21
[task 2024-06-14T14:38:17.338Z] 14:38:17 INFO - Assert.prototype.report@resource://testing-common/Assert.sys.mjs:251:10
[task 2024-06-14T14:38:17.338Z] 14:38:17 INFO - equal@resource://testing-common/Assert.sys.mjs:293:8
[task 2024-06-14T14:38:17.338Z] 14:38:17 INFO - run_test@/builds/worker/workspace/build/tests/xpcshell/tests/toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML.js:271:10
[task 2024-06-14T14:38:17.339Z] 14:38:17 INFO - _do_main@/builds/worker/workspace/build/tests/xpcshell/head.js:245:6
[task 2024-06-14T14:38:17.339Z] 14:38:17 INFO - _execute_test@/builds/worker/workspace/build/tests/xpcshell/head.js:596:5
[task 2024-06-14T14:38:17.340Z] 14:38:17 INFO - @-e:1:1
[task 2024-06-14T14:38:17.340Z] 14:38:17 INFO - Rejection date: Fri Jun 14 2024 14:38:17 GMT+0000 (Coordinated Universal Time) - false == true
[task 2024-06-14T14:38:17.341Z] 14:38:17 INFO - resource://testing-common/PromiseTestUtils.sys.mjs:assertNoUncaughtRejections:266
[task 2024-06-14T14:38:17.341Z] 14:38:17 INFO - /builds/worker/workspace/build/tests/xpcshell/head.js:_execute_test:597
[task 2024-06-14T14:38:17.342Z] 14:38:17 INFO - -e:null:1
[task 2024-06-14T14:38:17.342Z] 14:38:17 INFO - exiting test
[task 2024-06-14T14:38:17.344Z] 14:38:17 INFO - "14:38:17:254 | TEST-INFO | xpcshellUtilsAUS.js | [AGP_cleanup : 4553] start - unregistering directory provider"
[task 2024-06-14T14:38:17.344Z] 14:38:17 INFO - "14:38:17:255 | TEST-INFO | xpcshellUtilsAUS.js | [cleanupTestCommon : 1035] start - general test cleanup"
[task 2024-06-14T14:38:17.345Z] 14:38:17 INFO - "14:38:17:256 | TEST-INFO | xpcshellUtilsAUS.js | [cleanupTestCommon : 1144] attempting to remove directory. Path: /builds/worker/workspace/build/tests/xpcshell/tests/toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML"
[task 2024-06-14T14:38:17.345Z] 14:38:17 INFO - "14:38:17:256 | TEST-INFO | shared.js | [removeDirRecursive : 605] attempting to remove directory. Path: /builds/worker/workspace/build/tests/xpcshell/tests/toolkit/mozapps/update/tests/unit_aus_update/updateManagerXML"
[task 2024-06-14T14:38:17.346Z] 14:38:17 INFO - "14:38:17:257 | TEST-INFO | xpcshellUtilsAUS.js | [cleanupTestCommon : 1160] finish - general test cleanup"
[task 2024-06-14T14:38:17.346Z] 14:38:17 INFO - "14:38:17:257 | TEST-INFO | xpcshellUtilsAUS.js | [AGP_cleanup : 4617] finish - unregistering directory provider"
[task 2024-06-14T14:38:17.347Z] 14:38:17 INFO - PID 7023 | (xpcshell:7023): GLib-GObject-WARNING **: 14:38:17.293: invalid (NULL) pointer instance
[task 2024-06-14T14:38:17.347Z] 14:38:17 INFO - PID 7023 | (xpcshell:7023): GLib-GObject-CRITICAL **: 14:38:17.293: g_signal_handlers_disconnect_matched: assertion 'G_TYPE_CHECK_INSTANCE (instance)' failed
[task 2024-06-14T14:38:17.347Z] 14:38:17 INFO - PID 7023 | [Parent 7023, Main Thread] WARNING: OOPDeinit() without successful OOPInit(): file /builds/worker/checkouts/gecko/toolkit/crashreporter/nsExceptionHandler.cpp:3460
[task 2024-06-14T14:38:17.348Z] 14:38:17 INFO - PID 7023 | [Parent 7023, Main Thread] WARNING: XPCOM object PerThreadData destroyed from static ctor/dtor: file /builds/worker/checkouts/gecko/xpcom/base/nsTraceRefcnt.cpp:216
[task 2024-06-14T14:38:17.348Z] 14:38:17 INFO - PID 7023 | [Parent 7023, Main Thread] WARNING: XPCOM object SourceList destroyed from static ctor/dtor: file /builds/worker/checkouts/gecko/xpcom/base/nsTraceRefcnt.cpp:216
[task 2024-06-14T14:38:17.348Z] 14:38:17 INFO - <<<<<<<
[task 2024-06-14T14:38:17.349Z] 14:38:17 INFO - TEST-START | toolkit/mozapps/update/tests/unit_aus_update/cleanupSuccessLogsFIFO.js
Updated•10 days ago
|
Updated•10 days ago
|
Updated•10 days ago
|
Updated•10 days ago
|
It looks like the updated behaviour will also affect this test (https://searchfox.org/mozilla-central/source/toolkit/mozapps/update/tests/unit_base_updater/marFileLockedFailureComplete_win.js#16, and the equivalent in unit_service_updater).
Unlike the previous test that we handled in D214179 this is actually simulating a write error (with the case where the .mar file is locked), I'm wondering if we can change the test so I verify if handleFallbackToCompleteUpdate() is called after the update is applied, instead of checking if the state falls back to pending?
Reporter | ||
Comment 7•8 days ago
•
|
||
Oh no. I had been wondering for a while why this code was ever written this way. It seems to so clearly have the exact flaw that we are trying to fix. Well, this test explains why. Some of these write errors may be transient failures that will be resolved when a lock is released. In that case, it does actually make sense to try again, though I don't know that trying again infinitely was a good idea.
I think that the way that this ought to work is to, if possible, detect these transient failures and use a different error code if they are detected. Then, in that case, we can try again up to some maximum number of times.
This project just keeps getting more and more complex than I originally intended for it to be. Let's chat with Amir about how to proceed here.
Leaving this ticket in the backlog for now as I need to work on other projects at this time
Description
•