<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 2

•

11 years ago

Michal, can you please elaborate?

Flags: needinfo?(michal.novotny)

Comment 3

•

11 years ago

BTW, we only have reentrant monitors (also has condvar, which here would have no use), AFAIK.

Comment 4

•

11 years ago

BTW2, if this were split to two protected blocks with writer() call between them, you wouldn't probably file this bug. This is the same.

Assignee

Comment 5

•

11 years ago

I don't think that releasing the lock is really necessary, but the common rule is to not call an unknown callback while holding the lock. The CacheFile's lock is shared among CacheFileInputStream, CacheFileOutputStream and CacheFileChunk and it's OK when e.g. some output stream method is called while the lock is released in CacheFileInputStream::ReadSegments(). The only problem at this place would be if some method of the _same_ input stream would be called. Such input stream usage doesn't make sense and the fact that it doesn't happen is verified in debug build with the mInReadSegments checks.

Flags: needinfo?(michal.novotny)

Kelsey Gilbert [:jgilbert]

Comment 6

•

11 years ago

Thanks. WONTFIX based on comment 5.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → WONTFIX

Reporter

Comment 7

•

11 years ago

(In reply to Michal Novotny (:michal) from comment #5) > I don't think that releasing the lock is really necessary, but the common > rule is to not call an unknown callback while holding the lock. The > CacheFile's lock is shared among CacheFileInputStream, CacheFileOutputStream > and CacheFileChunk and it's OK when e.g. some output stream method is called > while the lock is released in CacheFileInputStream::ReadSegments(). The only > problem at this place would be if some method of the _same_ input stream > would be called. Such input stream usage doesn't make sense and the fact > that it doesn't happen is verified in debug build with the mInReadSegments > checks. I'm not super-satisfied with this, since I'm getting this crash about once every two weeks. I think it's reasonable to say that it could be missed by DEBUG builds if it is a race condition in this case. Is it impossible for contention by construction? I would argue that "don't call an unknown callback while holding a lock" does not necessarily apply here, since if you're worried the callback might deadlock, the possibility of the deadlock means you are preventing the race condition. Removing the lock just removes the symptom, and will let it work *in many cases*. What situations are there where a callback might re-enter the lock, but would not constitute a race condition?

Flags: needinfo?(michal.novotny)

Assignee

Comment 8

•

11 years ago

I've spent some time trying find out whether releasing of the lock is really necessary. I've removed releasing of the lock and I've run FF locally as well as I've pushed it to try (https://tbpl.mozilla.org/?tree=Try&rev=0d60b0a69f21). So far I didn't experience reentering the lock, so this problem is probably just theoretical and I think we could demand that callback passed to CacheFileInputStream::ReadSegments() won't call the cache code. There is really a problem when the same entry is written on a different thread while the lock is released. When the data won't fit into the existing buffer, it is reallocated in CacheFileChunk::EnsureBufSize() which invalidates the buffer returned by CacheFileInputStream::CanRead(). So my proposal is to hold the lock all the time. If this is not acceptable, then I don't see any other solution than removing ReadSegments() method and using just Read().

Severity: normal → critical

Status: RESOLVED → REOPENED

Flags: needinfo?(michal.novotny)

Resolution: WONTFIX → ---

Assignee

Comment 9

•

11 years ago

Attached patch patch v1 — Details — Splinter Review

Assignee: nobody → michal.novotny

Attachment #8455814 - Flags: review?(honzab.moz)

Comment 10

•

11 years ago

Michal, this is not about reentering the lock, it's about potential deadlock warning we may get. The code is designed to enter locks in cascades like: cache service lock -> cache entry lock -> cache file lock. But when we don't release the lock in ReadSegments before calling the callback, we may get cascade of cache file lock -> cache entry lock. This will be detected (correctly!) as a potential deadlock since there are two codepaths that may enter two locks (the file's and the entry's) in opposite ways.

Comment 11

•

11 years ago

Comment on attachment 8455814 [details] [diff] [review] patch v1 Review of attachment 8455814 [details] [diff] [review]: ----------------------------------------------------------------- We need to release the lock, that's obvious. What's wrong here is that the buffer we pass as an argument may be released. Can we hold the chunk locally (in a local var, on the stack) and maybe keep it in some 'locked' or 'just being operated' state so that it cannot go away sooner than the callback returns?

Attachment #8455814 - Flags: review?(honzab.moz) → review-

Comment 12

•

11 years ago

Few notes: - the crash reports all indicate the buffer is corrupted (supports my suggestion) - nsBufferedInputStream::ReadSegments is not very smart too (doesn't count with call to Seek() while inside ReadSegments()->writer()) and doesn't keep any locks despite the class implements thread-safe ref counter, [1] [1] http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsBufferedStreams.cpp#334

Assignee

Comment 13

•

11 years ago

(In reply to Honza Bambas (:mayhemer) from comment #10) > Michal, this is not about reentering the lock, it's about potential deadlock > warning we may get. The code is designed to enter locks in cascades like: > cache service lock -> cache entry lock -> cache file lock. But when we > don't release the lock in ReadSegments before calling the callback, we may > get cascade of cache file lock -> cache entry lock. This will be detected > (correctly!) as a potential deadlock since there are two codepaths that may > enter two locks (the file's and the entry's) in opposite ways. I clearly wrote that we could demand that the callback won't call any cache code which would mean that this cannot happen. > We need to release the lock, that's obvious. What's wrong here is that the > buffer we pass as an argument may be released. Can we hold the chunk > locally (in a local var, on the stack) and maybe keep it in some 'locked' or > 'just being operated' state so that it cannot go away sooner than the > callback returns? No, you didn't read my comment correctly. It's not about releasing the chunk and its buffer. The problem is that on one thread we use some buffer outside the lock and on another thread this buffer is reallocated and so likely moved, so the previous pointer becomes invalid.

Assignee

Comment 14

•

11 years ago

(In reply to Honza Bambas (:mayhemer) from comment #12) > Few notes: > > - the crash reports all indicate the buffer is corrupted (supports my > suggestion) > - nsBufferedInputStream::ReadSegments is not very smart too (doesn't count > with call to Seek() while inside ReadSegments()->writer()) and doesn't keep > any locks despite the class implements thread-safe ref counter, [1] > > [1] > http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/ > nsBufferedStreams.cpp#334 I don't know what you're trying to say with this, please read comment #13 to understand the problem.

Comment 15

•

11 years ago

(In reply to Michal Novotny (:michal) from comment #13) > (In reply to Honza Bambas (:mayhemer) from comment #10) > > Michal, this is not about reentering the lock, it's about potential deadlock > > warning we may get. The code is designed to enter locks in cascades like: > > cache service lock -> cache entry lock -> cache file lock. But when we > > don't release the lock in ReadSegments before calling the callback, we may > > get cascade of cache file lock -> cache entry lock. This will be detected > > (correctly!) as a potential deadlock since there are two codepaths that may > > enter two locks (the file's and the entry's) in opposite ways. > > I clearly wrote that we could demand that the callback won't call any cache > code which would mean that this cannot happen. I'm afraid we cannot enforce this. The problem comes from an old code we just undermine with this change, but read on... > > > > We need to release the lock, that's obvious. What's wrong here is that the > > buffer we pass as an argument may be released. Can we hold the chunk > > locally (in a local var, on the stack) and maybe keep it in some 'locked' or > > 'just being operated' state so that it cannot go away sooner than the > > callback returns? > > No, you didn't read my comment correctly. It's not about releasing the chunk > and its buffer. The problem is that on one thread we use some buffer outside > the lock and on another thread this buffer is reallocated and so likely > moved, so the previous pointer becomes invalid. OK, reallocated means released as well. So, the problem is that the buffer may get invalid while we are in writer(). Can we somehow prevent it? I assume reallocation may only happen when we do partial request? (one thread reads, other writes and prolongs = reallocates the buffer.) This is rather complicated... hmm... maybe we should find the original stack that calls from the callback back on the cache and fix that code and try to enforce (I suspect this will be very hard tho and will cause us just troubles in the future). Unfortunatelly this change was introduced in times we were working on Gum w/o recording bugs. I'll try to find any logs on my hdd somewhere to find the stack... The changeset we have added the unlock change: https://hg.mozilla.org/projects/gum/rev/abd66f7c85b8

Comment 16

•

11 years ago

(In reply to Michal Novotny (:michal) from comment #14) > (In reply to Honza Bambas (:mayhemer) from comment #12) > > Few notes: > > > > - the crash reports all indicate the buffer is corrupted (supports my > > suggestion) > > - nsBufferedInputStream::ReadSegments is not very smart too (doesn't count > > with call to Seek() while inside ReadSegments()->writer()) and doesn't keep > > any locks despite the class implements thread-safe ref counter, [1] > > > > [1] > > http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/ > > nsBufferedStreams.cpp#334 > > I don't know what you're trying to say with this, please read comment #13 to > understand the problem. After further understanding this bug all this seems moot now, right.

Comment 17

•

11 years ago

Comment on attachment 8455814 [details] [diff] [review] patch v1 Review of attachment 8455814 [details] [diff] [review]: ----------------------------------------------------------------- Ah, I didn't finish reading comment 8 first! Sorry. OK, let's get this in and see what happens. For now it's better to have a potential deadlock code path we can then find and fix than a bad crash. r+

Attachment #8455814 - Flags: review- → review+

Updated

•

11 years ago

Status: REOPENED → ASSIGNED

Jason Duell

Comment 18

•

11 years ago

Comment on attachment 8455814 [details] [diff] [review] patch v1 Approval Request Comment [Feature/regressing bug #]: cache2 [User impact if declined]: crash [Describe test coverage new/current, TBPL]: needs to bake on m-c for a few days [Risks and why]: This is a change in locking code, so always some risk. That's why I think this should bake for a few days on m-c. [String/UUID change made/needed]: none

Attachment #8455814 - Flags: approval-mozilla-aurora?

https://hg.mozilla.org/integration/mozilla-inbound/rev/6f0a5bfa00a6

Assignee

Comment 19

•

11 years ago

Lawrence Mandel [:lmandel] (use needinfo)

Comment 20

•

11 years ago

This looks like it will hit m-c today. Let's give it until at least Friday before considering for Aurora approval. (Note that we're almost out of time for Aurora uplift.)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 21

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/6f0a5bfa00a6

Status: ASSIGNED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla33

Sylvestre Ledru [:Sylvestre]

Updated

•

11 years ago

status-firefox32: --- → affected

status-firefox33: --- → fixed

Sylvestre Ledru [:Sylvestre]

Updated

•

11 years ago

Attachment #8455814 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Comment 22

•

11 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/741090be8784

status-firefox32: affected → fixed

Updated

•

11 years ago

Keywords: verifyme

Updated

•

11 years ago

Depends on: 1052266

Comment 23

•

11 years ago

Looking through Socorro [1] for [@ nsUTF8ToUnicode::Convert(char const*, int*, wchar_t*, int*)], I can see that there are 3 crashes on Firefox 32 Beta, in builds with IDs: 20140722195627, 20140804164216, 20140807212602. Michal, should I reopen this issue? Is there more work needed here? [1] - https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=28&signature=nsUTF8ToUnicode%3A%3AConvert%28char+const%2A%2C+int%2A%2C+wchar_t%2A%2C+int%2A%29#tab-reports

Flags: needinfo?(michal.novotny)

Assignee

Comment 24

•

11 years ago

Please file a new bug. It seems that this crash is not caused by releasing the lock in CacheFileInputStream::ReadSegments(), but as I described in comment #8 it could cause a crash, so let's keep this bug about that issue.

Flags: needinfo?(michal.novotny)

Updated

•

11 years ago

Depends on: 1055529