Closed Bug 1496581 Opened Last year Closed Last year

Crash in arena_dalloc | nsLocalFile::Release

Categories

(Core :: DOM: Core & HTML, defect, P2, critical)

64 Branch
Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox-esr60 --- unaffected
firefox62 --- unaffected
firefox63 --- unaffected
firefox64 + fixed

People

(Reporter: calixte, Assigned: baku)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file, 5 obsolete files)

This bug was filed from the Socorro interface and is
report bp-01b4d072-4d31-42bb-84dd-40d490181004.
=============================================================

Top 10 frames of crashing thread:

0 mozglue.dll static void arena_dalloc memory/build/mozjemalloc.cpp:3527
1 xul.dll nsLocalFile::Release xpcom/io/nsLocalFileWin.cpp:862
2 xul.dll nsFileStreamBase::DoOpen netwerk/base/nsFileStreams.cpp:378
3 xul.dll nsresult nsFileInputStream::SeekInternal netwerk/base/nsFileStreams.cpp:548
4 xul.dll mozilla::net::HttpBaseChannel::EnsureUploadStreamIsCloneable netwerk/protocol/http/HttpBaseChannel.cpp:928
5 xul.dll bool mozilla::net::HttpChannelParent::DoAsyncOpen netwerk/protocol/http/HttpChannelParent.cpp:729
6 xul.dll mozilla::net::HttpChannelParent::Init netwerk/protocol/http/HttpChannelParent.cpp:139
7 xul.dll mozilla::net::NeckoParent::RecvPHttpChannelConstructor netwerk/ipc/NeckoParent.cpp:323
8 xul.dll mozilla::net::PNeckoParent::OnMessageReceived ipc/ipdl/PNeckoParent.cpp:991
9 xul.dll void mozilla::ipc::MessageChannel::DispatchMessageW ipc/glue/MessageChannel.cpp:2175

=============================================================

There are 8 crashes (from 2 installations) in nightly 64 with buildid 20181004100222. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1494176.

[1] https://hg.mozilla.org/mozilla-central/rev?node=f2ff07352cbe
Flags: needinfo?(amarchesini)
I also got this crash in latest Nightly in one of my mobile.twitter.com tabs, when switching to it from another one. The one that crashes is in the main container, the one that works prior to that is in a Private container tab.

Crash ID: bp-09bbc02b-320b-48e9-b779-563420181005
QA Contact: overholt
Crash Signature: [@ arena_dalloc | nsLocalFile::Release] → [@ arena_dalloc | nsLocalFile::Release] [@ nsFileStreamBase::DoOpen] [@ <name omitted> | nsFileStreamBase::DoOpen]
Yes, I can reliably reproduce this when just opening a tab in the standard context, or the Private context I had previously also logged into Twitter, and bring up https://mobile.twitter.com. The main Firefox process then immediately crashes. I marked further submitted reports with my handle, comments start with MarcoZ: .
Thanks Marco. Can you tell me the correct STR? Are you able to reproduce it on linux or mac?
Flags: needinfo?(amarchesini)
Flags: needinfo?(mzehe)
All I have to do is:

1. Press CTRL+T to open a new tab.
2. Go to https://mobile.twitter.com and press Enter.

The crash occurs immediately upon page load.

I currently have no Mac or Linux at my disposal to try, sorry.
Flags: needinfo?(mzehe)
I'm not able to reproduce it.
Marion, do you know anybody who can help here?
Flags: needinfo?(mdaly)
Perry, can you try to reproduce this? Once you get it reproduced, consult with Andrew on best next steps.
Flags: needinfo?(mdaly) → needinfo?(perry)
Will do.
Flags: needinfo?(perry)
QA Contact: overholt
I'm unable to reproduce this on Windows 10, macOS (High Sierra), and Linux (Ubuntu 18.04).
## Repro-Wise ##

If I go to https://mobile.twitter.com logged in and do "Work offline" from the hamburger menu's "More" sub-menu to force Firefox offline, I find that a number of things get stored in the "sync" IndexedDB data's "queue" object store.

These end up looking like the following devtools console output:
"""
body: Blob
​​​  size: 162
​​​  type: "application/x-www-form-urlencoded"
​​​  <prototype>: BlobPrototype { slice: slice(), size: Getter, type: Getter, … }
​​cache: "no-store"
​​credentials: "include"
​​headers: Object { accept: "*/*", "accept-language": "en-US,en;q=0.5", authorization: "ELIDED", … }
​​method: "POST"
​​mode: "cors"
​​redirect: "follow"
​​referrer: "https://mobile.twitter.com/compose/tweet"
​​url: "https://api.twitter.com/1.1/jot/client_event.json"
"""

Note that this is manually from doing the following in the console; the devtools' storage tab is great but hides the fact that the "body" is a Blob because it just JSONs the object or something like that.
"""
mdbReq = indexedDB.open('sync', 1)
mdb = mdbReq.result
mdbAll = mdb.transaction('queue').objectStore('queue').getAll()
mdbAll.result
"""

The contents of the Blob are URL-encoded JSON that seem to be metadata about all the videos I was potentially shown.  The Blobs can become as large as 20KiB if I scroll around a lot.  If I had let Twitter download more tweets before going offline, it's possible the Blob got huge.

I haven't been able to quite repro on this.  See below for a theory about what's going on.  To maximize repro, I expect you'd want to browse to mobile.twitter.com, switch to be offline and let the IndexedDB stuff really pile up, then close the site.  Then go online again and go back to the site.

## Theory-wise ##

Looking at the crash signatures, it looks like half of the crashes happen on the StreamTransportService's thread-pool in InputStreamLengthHelper while the main thread is still in HttpChannelParent::DoAsyncOpen in InputStreamLengthHelper-related logic.  This suggests that there may be a race that involves calling methods on the underlying file stream concurrently from the main thread and the STS thread.
Hm, actually, maybe don't "work offline" with twitter open too long.  My mobile twitter is now broken for some reason, possible due to the massive number of backlogged data?  Not sure.
(In reply to Andrew Sutherland [:asuth] from comment #11)
> Hm, actually, maybe don't "work offline" with twitter open too long.  My
> mobile twitter is now broken for some reason, possible due to the massive
> number of backlogged data?  Not sure.

Then yours is at the same state as mine now. I can't open it without immediately crashing, either. And I didn't even work offline, at least not intentionally.
Duplicate of this bug: 1496695
Same issue here, though it seems a bit less consistent with today's Nightly - I can load Twitter and scroll a bit sometimes, and then it crashes.
nsFileInputStream can be easily broken because it pretends to be thread-safe, but it doesn't protect its members. This crash seems to be a race condition. I'm going to add a mutex and some security checks as first step.
Attached patch lock.patch (obsolete) — Splinter Review
This patch adds a mutex to protect internal members for nsFileStreams.
What I suspect it's a race condition, and, probably, this patch would help to understand more what is going on.
Assignee: nobody → amarchesini
Attachment #9015497 - Flags: review?(honzab.moz)
Comment on attachment 9015497 [details] [diff] [review]
lock.patch

Review of attachment 9015497 [details] [diff] [review]:
-----------------------------------------------------------------

There is one major bad thing about this patch: you are doing likely blocking I/O (various PR_* calls) under the lock.  When main thread happens to try to acquire the lock as well, we have a jank.

You will have to have a "ref counter" for mFD, similarly to 
https://searchfox.org/mozilla-central/rev/80ac71c1c54af788b32e851192dfd2de2ec18e18/netwerk/base/nsSocketTransport2.cpp#2045
https://searchfox.org/mozilla-central/rev/80ac71c1c54af788b32e851192dfd2de2ec18e18/netwerk/base/nsSocketTransport2.cpp#2115
Attachment #9015497 - Flags: review?(honzab.moz) → review-
Priority: -- → P2
Attached patch lock.patch (obsolete) — Splinter Review
Attachment #9015497 - Attachment is obsolete: true
Attachment #9016229 - Flags: review?(honzab.moz)
Comment on attachment 9016229 [details] [diff] [review]
lock.patch

dropping r after an irc discussion.
Attachment #9016229 - Flags: review?(honzab.moz)
I get this one regularly (bp-4a210d4a-a3ac-45f3-8cff-536600181014 etc), although I'm not sure how to reproduce it in an isolated environment.

Just now, I got a deadlock opening mobile.twitter.com, and trying to send SIGABRT deadlocked again on breakpad. The backtrace doesn't look the same but I hope it's related:

(gdb) bt
#0  0x00007febbcc91edc in __lll_lock_wait () at /usr/lib/libpthread.so.0
#1  0x00007febbcc8b298 in pthread_mutex_lock () at /usr/lib/libpthread.so.0
#2  0x00007febb3262e19 in google_breakpad::ExceptionHandler::SignalHandler(int, siginfo*, void*) () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#3  0x00007febb327506f in nsProfileLock::FatalSignalHandler(int, siginfo*, void*) () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#4  0x00007febbcc933c0 in <signal handler called> () at /usr/lib/libpthread.so.0
#5  0x00007febbcc91edc in __lll_lock_wait () at /usr/lib/libpthread.so.0
#6  0x00007febbcc8b3f3 in pthread_mutex_lock () at /usr/lib/libpthread.so.0
#7  0x00005558aa5d28f9 in arena_dalloc(void*, unsigned long, arena_t*) ()
#8  0x00007febb369da47 in mozilla::Runnable::Release() () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#9  0x00007febb09db949 in mozilla::net::HttpChannelParent::DoAsyncOpen(mozilla::ipc::URIParams const&, mozilla::ipc::OptionalURIParams const&, mozilla::ipc::OptionalURIParams const&, mozilla::ipc::OptionalURIParams const&, unsigned int const&, mozilla::ipc::OptionalURIParams const&, mozilla::ipc::OptionalURIParams const&, unsigned int const&, nsTArray<mozilla::net::RequestHeaderTuple> const&, nsTString<char> const&, mozilla::ipc::OptionalIPCStream const&, bool const&, short const&, unsigned int const&, unsigned char const&, bool const&, unsigned int const&, bool const&, unsigned long const&, nsTString<char> const&, bool const&, nsTString<char> const&, bool const&, bool const&, bool const&, unsigned int const&, mozilla::net::OptionalLoadInfoArgs const&, mozilla::net::OptionalHttpResponseHead const&, nsTString<char> const&, unsigned int const&, unsigned long const&, mozilla::net::OptionalCorsPreflightArgs const&, unsigned int const&, bool const&, bool const&, bool const&, nsTString<char> const&, unsigned int const&, unsigned int const&, unsigned long const&, nsTString<char16_t> const&, unsigned long const&, nsTString<char> const&, unsigned long const&, mozilla::TimeStamp const&, mozilla::TimeStamp const&, mozilla::TimeStamp const&, mozilla::TimeStamp const&, mozilla::TimeStamp const&, mozilla::TimeStamp const&, bool const&, mozilla::TimeStamp const&) () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#10 0x00007febb09da57e in mozilla::net::HttpChannelParent::Init(mozilla::net::HttpChannelCreationArgs const&) () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#11 0x00007febb0a4a0bc in mozilla::net::NeckoParent::RecvPHttpChannelConstructor(mozilla::net::PHttpChannelParent*, mozilla::dom::PBrowserOrId const&, IPC::SerializedLoadContext const&, mozilla::net::HttpChannelCreationArgs const&) ()
    at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#12 0x00007febb0c39f44 in mozilla::net::PNeckoParent::OnMessageReceived(IPC::Message const&) () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#13 0x00007febb3bf3847 in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#14 0x00007febb3bf43fe in mozilla::ipc::MessageChannel::MessageTask::Run() () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#15 0x00007febb3a3aeb2 in nsThread::ProcessNextEvent(bool, bool*) () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#16 0x00007febb3a3d06a in NS_ProcessNextEvent(nsIThread*, bool) () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#17 0x00007febb3bf55b9 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#18 0x00007febb3be4bc5 in MessageLoop::Run() () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#19 0x00007febb251c7f9 in nsBaseAppShell::Run() () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#20 0x00007febb31f1be3 in nsAppStartup::Run() () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#21 0x00007febb3280f3a in XREMain::XRE_mainRun() () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#22 0x00007febb3281fc7 in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#23 0x00007febb32826a9 in XRE_main(int, char**, mozilla::BootstrapConfig const&) () at /home/dx/test/asd/homer/firefox-nightly/libxul.so
#24 0x00005558aa5c0973 in do_main(int, char**, char**) ()
#25 0x00005558aa5c077a in main ()


While typing this comment, i got bp-4a210d4a-a3ac-45f3-8cff-536600181014 - this time mobile.twitter.com wasn't even focused but it did open earlier in the same session.
Attached patch tell_1.patch (obsolete) — Splinter Review
Instead of looking for a workaround, here I split ::Seek() and ::Tell() in 2 different interfaces. nsPipeInputStream implements only nsITellableStream; all the other interfaces implement nsISeekableStream + nsITellableStream.

This is nice because it fixes the issue we have, plus, it removes the hack around the implementation of ::Seek(), which causes I/O on main-thread.
Attachment #9016229 - Attachment is obsolete: true
Attachment #9017152 - Flags: review?(nfroyd)
Attachment #9017152 - Flags: feedback?(honzab.moz)
Comment on attachment 9017152 [details] [diff] [review]
tell_1.patch

Review of attachment 9017152 [details] [diff] [review]:
-----------------------------------------------------------------

I actually like the "tellable" name :)

::: xpcom/io/nsITellableStream.idl
@@ +16,5 @@
> +    /**
> +     *  tell
> +     *
> +     *  This method reports the current offset, in bytes, from the start of the 
> +     *  stream. 

nit, a bit richer comments would be good and trailing ws
Attachment #9017152 - Flags: feedback?(honzab.moz) → feedback+
Comment on attachment 9017152 [details] [diff] [review]
tell_1.patch

Review of attachment 9017152 [details] [diff] [review]:
-----------------------------------------------------------------

Please address Honza's comment.

Please also write a commit message that explains what the problem is and why doing this addresses it.  I can see that the refactoring is correct, but I'm completely in the dark as to why it might be the correct thing to do to add more indirection here.

::: netwerk/base/nsInputStreamPump.cpp
@@ +573,5 @@
>          //       an infinite loop.  we do our best here to try to catch
>          //       such an error.  (see bug 189672)
>  
>          // in most cases this QI will succeed (mAsyncStream is almost always
>          // a nsPipeInputStream, which implements nsISeekableStream::Tell).

Nit: comment needs updating here.
Attachment #9017152 - Flags: review?(nfroyd) → review-
Attached patch tell_1.patch (obsolete) — Splinter Review
In this patch comments are updated and improved.

I don't think it's nice to write a comment explaining that initially, nsISeekableStream and nsITellableStream were the same class, because after X years, that comment would be nonsense.

I can tell more why this patch is needed here on bugzilla:

In the current code there are 3 main issues:

1. nsFileStream is not really thread-safe. There is nothing to protect the internal members and we see crashes.

2. nsPipeInputStream doesn't implement ::Seek() method and that caused issues in devtools when a nsHttpChannel sends POST data using a pipe. In order to fix this, bug 1494176 added a check in nsHttpChannel: if the stream doesn't implement ::Seek(), let's clone it. This was an hack around nsPipeInputStream, and it's bad.

3. When nsHttpChannel sends POST data using a file stream, nsFileStream does I/O on main-thread because of the issue 2. Plus, ::Seek() is called on the main-thread causing issue 1.

Note that nsPipeInputStream implements only ::Tell(), of the nsISeekableStream methods. It doesn't implement ::Seek() and it doesn't implement ::SetEOF().
 
With this patch I want to fix point 2 and point 3 (and consequentially issue 1 - but we need a separate fix for it - follow up). The patch does:

1. it splits nsISeekableStream in 2 interfaces: nsITellableStream and nsISeekableStream.
2. nsPipeInputStream implements only nsITellableStream. Doing this, we don't need the ::Seek() check for point 2 in nsHttpChannel: a simple QI check is enough.
3. Because we don't call ::Seek() in nsHttpChannel, nsFileStream doesn't do I/O on the main-thread, and we don't crash doing so.
Attachment #9017152 - Attachment is obsolete: true
Attachment #9017802 - Flags: review?(nfroyd)
A commit message seems like exactly the right place for those explanations...
Attached patch part 2 (obsolete) — Splinter Review
Attachment #9017883 - Flags: review?(nfroyd)
Attached patch patchSplinter Review
Attachment #9017802 - Attachment is obsolete: true
Attachment #9017883 - Attachment is obsolete: true
Attachment #9017802 - Flags: review?(nfroyd)
Attachment #9017883 - Flags: review?(nfroyd)
Attachment #9017892 - Flags: review?(nfroyd)
Comment on attachment 9017892 [details] [diff] [review]
patch

Review of attachment 9017892 [details] [diff] [review]:
-----------------------------------------------------------------

Commit message final paragraph looks like it needs some line breaks to maintain the bullet points.

r=me with the changes below, thank you!

::: xpcom/io/nsISeekableStream.idl
@@ +11,5 @@
>   *
> + * This class inherits nsITellableStream in order to let streams to implement
> + * ::Tell() without implementing the whole nsISeekableStream interface. Callers
> + * can QI the stream to know what is implemented. This is mainly done for
> + * nsPipeInputStream.

I feel like this kind of comment--slightly rewritten, of course, really belongs in nsITellableStream, 'cause it's not really obvious why nsITellableStream is a standalone interface.

::: xpcom/io/nsITellableStream.idl
@@ +19,5 @@
> +    /**
> +     *  tell
> +     *
> +     *  This method reports the current offset, in bytes, from the start of the 
> +     *  stream. 

Nit: trailing whitespace on these two lines.
Attachment #9017892 - Flags: review?(nfroyd) → review+
(In reply to Julien Cristau [:jcristau] from comment #27)
> A commit message seems like exactly the right place for those explanations...

Don't want to start political discussion here, but I find it way more useful to have an explanation in bugzilla (maybe as well?).  Commit messages just pollute hg logs and when I first find the bug (not the changeset) when hunting for explanations, I will never look at the commit messages, I will just give up with "there is no explanation!"
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2da6504c901e
Split nsISeekableStream in 2 classes: nsISeekableStream and nsITellableStream, f=mayhemer, r=froydnj
https://hg.mozilla.org/mozilla-central/rev/2da6504c901e
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Duplicate of this bug: 1500287
Depends on: 1500370
Some notes from chat with baku:

there are 3 issues here:
1. I/O on main-thread
2. the crash in nsFileStream (because of concurrent access to mOpenParams.localFile)
3. the bug in nsPipeInputStream not implementing ::Seek()

#1 and #3 from the above listed issues have been fixed by this bug, but we definitely need a followup for the issue #2.  it should at least add diagnostic assertions on single thread access to mOpenParams.localFile (which is a nsCOMPtr<nsIFile>.
Component: HTML: Form Submission → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.