Closed Bug 1601707 Opened 2 months ago Closed 1 month ago

segfault in DoDatabaseWork (this makes Firefox 71 to crash at startup)

Categories

(Core :: Storage: IndexedDB, defect, P2)

71 Branch
x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox-esr68 --- unaffected
firefox71 --- wontfix
firefox72 --- wontfix
firefox73 + fixed
firefox74 --- fixed

People

(Reporter: jh, Assigned: sg, NeedInfo)

References

(Blocks 2 open bugs, Regression)

Details

(Keywords: crash, regression)

Attachments

(5 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0

Steps to reproduce:

I build Firefox 71 with GCC 9 and LTO enabled

Actual results:

It segfaults on startup. The problem is with following code:
#5 mozilla::dom::indexedDB::(anonymous namespace)::ObjectStoreAddOrPutRequestOp::DoDatabaseWork (this=0x7fffd7687430, aConnection=0x7fffd92d6640)
at /aux/hubicka/firefox-2019-2/dom/indexedDB/ActorsParent.cpp:24402
24402 rv = aConnection->GetCachedStatement(
(gdb) l
24397 // detectable errors rather than corrupting data.
24398 DatabaseConnection::CachedStatement stmt;
24399 const auto& optReplaceDirective = (!mOverwrite || keyUnset)
24400 ? NS_LITERAL_CSTRING("")
24401 : NS_LITERAL_CSTRING("OR REPLACE ");
24402 rv = aConnection->GetCachedStatement(
24403 NS_LITERAL_CSTRING("INSERT ") + optReplaceDirective +
24404 NS_LITERAL_CSTRING("INTO object_data "
24405 "(object_store_id, key, file_ids, data) "
24406 "VALUES (:") +
(gdb)
24407 kStmtParamNameObjectStoreId + NS_LITERAL_CSTRING(", :") +
24408 kStmtParamNameKey + NS_LITERAL_CSTRING(", :") +
24409 kStmtParamNameFileIds + NS_LITERAL_CSTRING(", :") +
24410 kStmtParamNameData + NS_LITERAL_CSTRING(");"),
24411 &stmt);
24412 if (NS_WARN_IF(NS_FAILED(rv))) {
24413 return rv;
24414 }
24415
24416 rv = stmt->BindInt64ByName(kStmtParamNameObjectStoreId, osid);
Here GCC optimizes out constructors in

24399     const auto& optReplaceDirective = (!mOverwrite || keyUnset)                                                                                                      
24400                                           ?  NS_LITERAL_CSTRING("")                                                                                                  
24401                                           : NS_LITERAL_CSTRING("OR REPLACE ");                                                                                       

Because their lifetime ends at the end of the statement. A

self-contained testcase is:
struct a {int val;
a(int v) : val(v) {}
};
int
test(int bar)
{
const struct a &ref=bar ? static_cast<const a &>(a(1)):static_cast <const a &>(a(2));
return ref.val;
}

Here initialization will be optimized out while

struct a {int val;
a(int v) : val(v) {}
};
int
test(int bar)
{
const struct a &ref=static_cast<const a &>(a(1));
return ref.val;
}

Let me see if the I cut&paste the code and get it formated nicely

24402     rv = aConnection->GetCachedStatement(                                                                                                                            
(gdb) l                                                                                                                                                                    
24397     // detectable errors rather than corrupting data.                                                                                                                
24398     DatabaseConnection::CachedStatement stmt;                                                                                                                        
24399     const auto& optReplaceDirective = (!mOverwrite || keyUnset)                                                                                                      
24400                                           ?  NS_LITERAL_CSTRING("")                                                                                                  
24401                                           : NS_LITERAL_CSTRING("OR REPLACE ");                                                                                       
24402     rv = aConnection->GetCachedStatement(                                                                                                                            
24403         NS_LITERAL_CSTRING("INSERT ") + optReplaceDirective +                                                                                                        
24404             NS_LITERAL_CSTRING("INTO object_data "                                                                                                                   
24405                                "(object_store_id, key, file_ids, data) "                                                                                             
24406                                "VALUES (:") +                                                                                                                        
(gdb)                                                                                                                                                                      
24407             kStmtParamNameObjectStoreId + NS_LITERAL_CSTRING(", :") +                                                                                                
24408             kStmtParamNameKey + NS_LITERAL_CSTRING(", :") +                                                                                                          
24409             kStmtParamNameFileIds + NS_LITERAL_CSTRING(", :") +                                                                                                      
24410             kStmtParamNameData + NS_LITERAL_CSTRING(");"),                                                                                                           
24411         &stmt);                                                                                                                                                      
24412     if (NS_WARN_IF(NS_FAILED(rv))) {                                                                                                                                 
24413       return rv;                                                                                                                                                     
24414     }                                                                                                                                                                
24415                                                                                                                                                                      
24416     rv = stmt->BindInt64ByName(kStmtParamNameObjectStoreId, osid);
Blocks: build-gcc-9
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Version: 70 Branch → 71 Branch
Duplicate of this bug: 1601608
Attached patch workaround.patchSplinter Review

The lifetime of the objects in ?: was extended by 2017 DR as discussed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92831
This is implemented by Clang7+ but not yet by GCC. So Clang 6 also miscompiles the simplified testcase.
I am attaching a workaround.

I needed to rebase the patch to make it apply correctly.
The version I rebased with comments documenting the issue is attached.

Thanks. I think dom/localstorage/ActorsParent.cpp needs also some care to get extensions working.

The patch is incomplete, I'm running asan build to identify all affected places.

Duplicate of this bug: 1601365

This patch fixes the extensions for me.

Is it really Linux specific? (I arrived here by chain of "duplicate of")

User of uBlock Origin on Reddit thread reports that uBO fails to load properly on fresh profile -
https://www.reddit.com/r/uBlockOrigin/comments/e6lr3x/ublock_origin_no_filter_lists_appear/

This apparently started in 70 and continues in 71.

He uses Windows 10.

After turning uBO off/on this appears in console:

Webconsole context has changed

console.trace() Error: "An unexpected error occurred" start.js:298:13 &lt;anonymous> moz-extension://7d76b2af-3a6a-40d5-9c27-8162efc90c3d/js/start.js:298

UnknownError: The operation failed for reasons unrelated to the database itself and not covered by any other error code. ExtensionStorageIDB.jsm:812

normalizeStorageError resource://gre/modules/ExtensionStorageIDB.jsm:812
method chrome://extensions/content/child/ext-storage.js:273
AsyncFunctionThrow self-hosted:693

Ok, so https://www.reddit.com/r/uBlockOrigin/comments/e6lr3x/ublock_origin_no_filter_lists_appear/ is resolved.

I narrowed it down to not being able to write to the profile directory, which is on a shared drive. That said I have full permissions to it and I am able to write to it without issue. To fix the issue I set the profile to save on my C drive in ProgramData. Now it is working as it should. I am not sure why it was not able to write to the shared drive.

Comment on attachment 9114512 [details] [diff] [review]
workaround_dom_indexdb_actorsparent_allignment-v2.patch

Review of attachment 9114512 [details] [diff] [review]:
-----------------------------------------------------------------

It's nice to find the problem, but I don't think this patch is acceptable as it stands.

We'd also need a patch against trunk.

::: firefox-71.0/dom/indexedDB/ActorsParent.cpp.gcc-workaround
@@ -25869,5 @@
>      }
>    }
>  
> -  const auto& comparisonChar =
> -      isIncreasingOrder ? NS_LITERAL_CSTRING(">") : NS_LITERAL_CSTRING("<");

Rather than substituting the expression everywhere, does something like:

```
NS_NAMED_LITERAL_CSTRING(kGreaterThan, ">");
NS_NAMED_LITERAL_CSTRING(kLessThan, "<");
const auto& comparisonChar = isIncreasingOrder ? kGreaterThan : kLessThan;
```

work?  And similar changes to other places in the file?  That would be a significantly more acceptable change.
Attachment #9114512 - Flags: feedback+

ni? for comment 14.

Flags: needinfo?(stransky)
Flags: needinfo?(jh)

The status of this code is funky. Bug 1168606 has landed what broke things in 71, but also landed things years ago, as well as in the 72 cycle. The code Nathan specifically points out in comment 14 doesn't exist anymore, and instead, uses something kind of like what he suggested, but using a GetSortKeyClause function (per https://hg.mozilla.org/integration/autoland/rev/cf53f4e57313). OTOH, the others are still there, but indexTable and table have been using ?: for as long as the file has existed...

Component: Untriaged → Storage: IndexedDB
Product: Firefox → Core
Regressions: 1168606

Simon, maybe you can investigate the issue per comment 16?

Flags: needinfo?(sgiesecke)
Priority: -- → P2

I will look into this.

Assignee: nobody → sgiesecke
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(sgiesecke)

Thanks for your work on the patches. I think that only minimal changes are necessary to fix this, only the reference needs to removed, so instead of, e.g.

  const auto& optReplaceDirective = (!mOverwrite || keyUnset)
                                        ? NS_LITERAL_CSTRING("")
                                        : NS_LITERAL_CSTRING("OR REPLACE ");

we can just write:

  const auto optReplaceDirective = (!mOverwrite || keyUnset)
                                       ? NS_LITERAL_CSTRING("")
                                       : NS_LITERAL_CSTRING("OR REPLACE ");

which does not rely on lifetime extension implemented differently by different compilers, and it does not impact the quality of the source code nor the generated code negatively, since it is equivalent in this case, given that both arms are temporaries.

I will prepare patches for trunk and uplift requests to beta and release to fix that.

(In reply to Jan Hubicka from comment #5)

The lifetime of the objects in ?: was extended by 2017 DR as discussed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92831
This is implemented by Clang7+ but not yet by GCC. So Clang 6 also miscompiles the simplified testcase.
I am attaching a workaround.

In my understanding, this behaviour wasn't changed in the context of http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1299, but merely made explicit. While it was obviously open to different interpretations before that, I don't think that the standard said before that that the lifetime of temporaries in the arms of ?: should not be extended. The behaviour of gcc and clang <7 is rather surprising IMO.

We didn't catch this earlier, since while we have base Linux toolchain builds (linux64-base-toolchains and linux64-base-toolchains-clang) building with the oldest supported compiler version, I think that tests are only run using clang 9 builds. Mike, is that accurate?

Flags: needinfo?(mh+mozilla)

(In reply to Mike Hommey [:glandium] (high latency) from comment #16)

OTOH, the others are still there, but indexTable and table have been using ?: for as long as the file has existed...

They have been using ?: forever, but that's not the sole condition for the problem, but binding the result of ?: with temporaries constructed in the arms to a const& local variable constitutes the problem. This was only introduced by https://hg.mozilla.org/integration/autoland/rev/cddcfdcb52f6a465f62bdfb36bf9dc9a8dbd11a7

Attachment #9115402 - Attachment description: Bug 1601707 - Workaround for compilers that do not extend the lifetime of temporaries resulting from ?: expressions. r!#dom-storage-and-workers → Bug 1601707 - Workaround for compilers that do not extend the lifetime of temporaries resulting from ?: expressions.

Jan, Martin could you please check if applying https://bugzilla.mozilla.org/attachment.cgi?id=9115405 to release fixes the issue with your toolchain? That one is the patch against release, which also fixes one additional code location that no longer exists on trunk and therefore is missing in the other patch, as Mike mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1601707#c16

Flags: needinfo?(stransky)
Flags: needinfo?(jh)

Comment on attachment 9115402 [details]
Bug 1601707 - Workaround for compilers that do not extend the lifetime of temporaries resulting from ?: expressions.

Beta/Release Uplift Approval Request

  • User impact if declined: Optimized builds using gcc or clang<7 will crash on startup.

Note that this patch is the one that can be applied to trunk and beta, while the other patch is the one to be applied to release, which needs a fix at one additional code location.

  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The change only changes the type of some local (string) variables from a const reference to a const value, which are initialized from temporaries.
  • String changes made/needed:
Attachment #9115402 - Flags: approval-mozilla-release?
Attachment #9115402 - Flags: approval-mozilla-beta?
Attachment #9115405 - Flags: approval-mozilla-release?
Attachment #9115405 - Flags: approval-mozilla-beta?
Flags: needinfo?(stransky)
Flags: needinfo?(jh)
Regressed by: 1168606
No longer regressions: 1168606
Attachment #9115402 - Flags: approval-mozilla-release?
Attachment #9115405 - Flags: approval-mozilla-beta?

User impact if declined: Optimized builds using gcc or clang<7 will crash on startup.

Note this doesn't necessarily lead to crashes. In some cases, it "just" breaks addons.

only run using clang 9 builds. Mike, is that accurate?

Yes.

Flags: needinfo?(mh+mozilla)
Keywords: crash

Thanks for the patch, I'll test it this week.

The bug doesn't impact Mozilla builds. We are half way to 72 with no dot release planned in 71 and distros can apply the fix to their toolchain as a temporary workaround. Marking 71 as wontfix.

Attachment #9115405 - Flags: approval-mozilla-release? → approval-mozilla-release-

Just a heads up, gcc-9.x branch has had the backport and is now working on expected. I do believe the best solution is provided, I doubt many distros are gonna roll a new update to the toolchain.

Looks like we are still waiting for more feedback. I don't want to land this in late beta 72 during the holidays so it will likely need to wait till 73/74 in January.

Comment on attachment 9115402 [details]
Bug 1601707 - Workaround for compilers that do not extend the lifetime of temporaries resulting from ?: expressions.

72 is now on m-r.

Attachment #9115402 - Flags: approval-mozilla-beta? → approval-mozilla-release?

Still waiting for feedback from distro, but I am landing the patch to central, as the underlying issue affects our coverage builds as well, so we will hopefully see if it fixes the issue in Bug 1607001.

Pushed by sgiesecke@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/810f85be6ba9
Workaround for compilers that do not extend the lifetime of temporaries resulting from ?: expressions. r=dom-workers-and-storage-reviewers,janv
Duplicate of this bug: 1607001
Duplicate of this bug: 1606863

So I was told this issue is the cause of
https://github.com/vector-im/riot-web/issues/11711
which is not a crash but a massive problem with using local storage.

Apparently:
https://phabricator.services.mozilla.com/D56873
is supposed to fix this but I tried this on nixos and it didn‘t solve my problem.

Debian uses an alternative patch for firefox 71:
https://salsa.debian.org/mozilla-team/firefox/commit/e888ce1578d68116949782035f8149cdbc5e832c
I tried that one and that does actually solve the problem.

(In reply to Malte Brandy from comment #36)

So I was told this issue is the cause of
https://github.com/vector-im/riot-web/issues/11711
which is not a crash but a massive problem with using local storage.

Apparently:
https://phabricator.services.mozilla.com/D56873
is supposed to fix this but I tried this on nixos and it didn‘t solve my problem.

Debian uses an alternative patch for firefox 71:
https://salsa.debian.org/mozilla-team/firefox/commit/e888ce1578d68116949782035f8149cdbc5e832c
I tried that one and that does actually solve the problem.

It helps if you let people know what version you are working with. The patches both solve the same problem just different code sets. Many of the fixes in the debian patch where drop'd already in the beta/72.0 final hense it wouldn't solve your issue in 71.

I tested on 71.0.

Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74

(In reply to Malte Brandy from comment #36)

So I was told this issue is the cause of
https://github.com/vector-im/riot-web/issues/11711
which is not a crash but a massive problem with using local storage.

Apparently:
https://phabricator.services.mozilla.com/D56873
is supposed to fix this but I tried this on nixos and it didn‘t solve my problem.

Sorry if that was not clear, but the patch you need is https://phabricator.services.mozilla.com/D56876. https://phabricator.services.mozilla.com/D56873 is missing one code location, which was removed in the meantime. This is probably the reason it doesn't solve the issue with FF 71.

This patch missed Fx73 moving to the Beta channel. Please nominate the patch for Beta approval assuming we'd like this fix on 73 :)

Flags: needinfo?(sgiesecke)

Comment on attachment 9115402 [details]
Bug 1601707 - Workaround for compilers that do not extend the lifetime of temporaries resulting from ?: expressions.

Beta/Release Uplift Approval Request

  • User impact if declined: Optimized builds using gcc or clang<7 will crash on startup. gcc/gcov coverage builds will show several test failures, some permanent, some intermittent.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The change only changes the type of some local (string) variables from a const reference to a const value, which are initialized from temporaries.
  • String changes made/needed:
Flags: needinfo?(sgiesecke)
Attachment #9115402 - Flags: approval-mozilla-beta?

(In reply to Simon Giesecke [:sg] [he/him] from comment #40)

Sorry if that was not clear, but the patch you need is https://phabricator.services.mozilla.com/D56876. https://phabricator.services.mozilla.com/D56873 is missing one code location, which was removed in the meantime. This is probably the reason it doesn't solve the issue with FF 71.

Thank you! That fixed it!

Comment on attachment 9115402 [details]
Bug 1601707 - Workaround for compilers that do not extend the lifetime of temporaries resulting from ?: expressions.

Fixes a crash with some older compilers. Approved for 73.0b2.

Attachment #9115402 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9115402 [details]
Bug 1601707 - Workaround for compilers that do not extend the lifetime of temporaries resulting from ?: expressions.

let's leave this alone for 72.

Attachment #9115402 - Flags: approval-mozilla-release? → approval-mozilla-release-
Flags: needinfo?(stransky)
You need to log in before you can comment on or make changes to this bug.