Closed Bug 1005991 Opened 10 years ago Closed 9 years ago

crash in NS_CycleCollectorSuspect3 under mozilla::storage::Statement::internalFinalize(bool)

Categories

(Toolkit :: Storage, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox36 + wontfix
firefox37 + fixed
firefox38 --- fixed
firefox39 --- fixed
firefox-esr31 --- wontfix
firefox-esr38 --- fixed
b2g-v2.0 --- wontfix
b2g-v2.0M --- wontfix
b2g-v2.1 --- wontfix
b2g-v2.1S --- wontfix
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: mccr8, Assigned: mak)

References

Details

(4 keywords, Whiteboard: [adv-main37+][b2g-adv-main2.2+][post-critsmash-triage])

Crash Data

Attachments

(2 files, 6 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-80d445f7-8ef7-4c94-8916-773ad2140504.
=============================================================

It looks like |data| is null in NS_CycleCollectorSuspect3, which either means we're late in shutdown or we're calling it on a thread that does not support cycle collection.  Given that we're not on the main thread, I think it is the latter.  This field has type nsIXPConnectJSObjectHolder, which I don't think is ever okay to use off main thread.  I'm not sure if something changed here recently or what.

This is the #23 crash on Nightly.
Looking at Statement::internalClose, it seems that we manipulate a few nsCOMPtr<nsIXPConnectWrappedNative> off the main thread. Is that legal?
The only implementation of that interface is XPCWrappedNative, which does not use threadsafe refcounting, so I think that is not okay.  XPConnect has XPCWNs in various tables, so this seems like it would lead to badness.  You should be able to dispatch releases to the main thread or something.
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?" - I'll be away on May 8th-12th) from comment #1)
> Looking at Statement::internalClose, it seems that we manipulate a few
> nsCOMPtr<nsIXPConnectWrappedNative> off the main thread. Is that legal?

Non, c'est strictement interdit. :-(
Assignee: nobody → dteller
Ah yeah, so I think this was introduced by http://hg.mozilla.org/mozilla-central/rev/c7d907403794#l2.1 because of some fast-and-loose badness on mozStorage's part with XPCVariants which I may or may not have been involved in.  And with sdwilsh escaped, I was the last one who knew were the bodies were, but I forgot about them when doing the review on that.

If I'm remembering things correctly, JS code using magic bindings hands nsIVariants to mozStorage and we take those at face value.  We were careful to only release them on the main thread so their thread safety theoretically didn't matter.  (Famous last words, right?)

A smoking gun comment is:
http://dxr.mozilla.org/mozilla-central/source/storage/src/mozStorageBindingParams.cpp#180

Basically I think the fix is in mozStorageBindingParams and our JS magic to make sure that all nsIVariants are coerced to mozStorage internal SQLite variants.  I have no idea why we didn't do this originally; probably schedule pressure for both Firefox and Thunderbird at the time.


From a security risk perspective, note that mozStorage is never exposed directly to content.  Although content can impact code that uses mozStorage like the localStorage code, etc.
I'm looking at replacing these two nsCOMPtr<> with nsMainThreadPtrHandle<>.
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?" - I'll be away on May 8th-12th) from comment #5)
> I'm looking at replacing these two nsCOMPtr<> with nsMainThreadPtrHandle<>.

Yes, that's the correct approach.
(In reply to Andrew Sutherland (:asuth) from comment #4)
> Basically I think the fix is in mozStorageBindingParams and our JS magic to
> make sure that all nsIVariants are coerced to mozStorage internal SQLite
> variants.  I have no idea why we didn't do this originally; probably
> schedule pressure for both Firefox and Thunderbird at the time.

I have an early prototype that uses nsMainThreadPtrHandle to ensure that only ever release the xpc stuff on the main thread. Should I still investigate what you wrote above?
I would go with :bholley's judgement over mine most any day :)

The docs for nsMainThreadPtr* at http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsProxyRelease.h#67 do seem to indicate this is an okay use-case, and that's consistent with what we did/rationalized originally.

Mainly I was thinking we probably want to address the variant problem more at the root is that:
- Our use of the XPCVariants was always sketchy.  In general we convert the types to native types later or something Blobby.  And Blobby things can be converted to thread-safe (I think?) nsDOMFile* classes or something like that.
- The comment for the changed code from bug 874814 seemed to be saying that the goal was to simplify the release semantics.

If the proposed changes ensures that it's just the XPConnect-objects we're releasing on the main thread and we're not changing the life-cycle of the mozStorage stuff again, then that absolutely sounds fine/safe to me.  It's likely also okay to change the mozStorage stuff back again if needed, that'll just take more thinking.
(In reply to Andrew Sutherland (:asuth) from comment #8)
> I would go with :bholley's judgement over mine most any day :)

Well, my comment was just based on the fact that nsMainThreadPtrHolder is the general solution for this sort of problem. I haven't looked at the code at all, so if there's a domain-specific refactoring that would make this go away, that's even better!
Here's a first attempt, which doesn't attempt to address the root cause, only makes sure that we don't manipulate the xpc pointers safely.
Attachment #8417916 - Flags: feedback?(mak77)
Let's take the opportunity to apply the same treatment to async statements.
Attachment #8417916 - Attachment is obsolete: true
Attachment #8417916 - Flags: feedback?(mak77)
Attachment #8417927 - Flags: feedback?(mak77)
Andrew, I've been looking at the core issue and I don't really see a way out. From what I understand, these nsIXPC* values are purely a cache for the JS helper. We can remove them without changing the semantics, but I assume that it will have performance repercussions. Am I missing something?
Unless we can just clear the reference to `mStatementParamsHolder` & co, without clearing its `mStatement` field? Why do we need to do both?
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?" - I'll be away on May 8th-12th) from comment #12)
> Andrew, I've been looking at the core issue and I don't really see a way
> out. From what I understand, these nsIXPC* values are purely a cache for the
> JS helper. We can remove them without changing the semantics, but I assume
> that it will have performance repercussions. Am I missing something?

Apologies, I was confusing the XPCWrappedNative from the helpers with my guilt over the XPCVariant risks I mention in https://bugzilla.mozilla.org/show_bug.cgi?id=507414#c19

In a nutshell, it was then, and it appears is still possible now, to do something like the following from JS and get an XPCVariant into a BindingParams instance:
  statement.BindByIndex(1, 'blah');
There is no magic JS wrapper in that case, just XPConnect creating an nsIVariant and passing it to us.

The statement method shunts to the mozIStorageBindingParams implementation via the macro glue here:
http://dxr.mozilla.org/mozilla-central/source/storage/src/StorageBaseStatementInternal.h#246

BindingParams just does mParameters.ReplaceObjectAt():
http://dxr.mozilla.org/mozilla-central/source/storage/src/mozStorageBindingParams.cpp#386

mParameters is defined as "nsCOMArray<nsIVariant> mParameters;"
http://dxr.mozilla.org/mozilla-central/source/storage/src/mozStorageBindingParams.h#60


I think the change at http://hg.mozilla.org/mozilla-central/rev/c7d907403794#l2.63 to call mStatements.Clear(); likely made this bug (XPCWrappedNative) happen as well as my concern about XPCVariants potentially getting freed on the async thread a bug again as well.

I do not believe your fix will solve this XPCVariant problem since it relates only to cleanup of the statement and not to the StatementData::mParamsArray.  (Although it's been a long day and I could be missing something.)

I should note that I believe the JS helpers avoid this problem since they have explicit calls to our convertJSValToVariant implementation in their ::SetProperty magic:
http://dxr.mozilla.org/mozilla-central/source/storage/src/mozStorageStatementParams.cpp#46


The key thing there is to either ensure the nsIVariants are ones we created rather than XPCVariants or that we free them on the main thread.  Since we already have our very own variant implementation with coercion from JSVal types and an nsIVariant-to-SQLite type thing, it seems like we could mash them all up to make a sanitizing variant wrapper dude:
- http://dxr.mozilla.org/mozilla-central/source/storage/src/Variant.h
- http://dxr.mozilla.org/mozilla-central/source/storage/src/mozStoragePrivateHelpers.cpp#117
- http://dxr.mozilla.org/mozilla-central/source/storage/src/variantToSQLiteT_impl.h
(In reply to Andrew Sutherland (:asuth) from comment #14)
> I think the change at
> http://hg.mozilla.org/mozilla-central/rev/c7d907403794#l2.63 to call
> mStatements.Clear(); likely made this bug (XPCWrappedNative) happen as well
> as my concern about XPCVariants potentially getting freed on the async
> thread a bug again as well.

So, if I understand correctly, David's patch would be fine to fix the holders being released on the wrong thread, though that's not the only bug here, we also have non thread-safe nsIVariant implementations (XPCVariant) passed from xpconnect that are stored locally and we stopped taking care of releasing those on the main-thread.
Though we have out own thread-safe version of nsIVariant, that is http://dxr.mozilla.org/mozilla-central/source/storage/src/Variant.h
and you suggest we may always convert passed-in nsIVariant to our own thread-safe implementation, like the JS helpers already do.

Though, even if we do this to fix XPCVariants (and I think it's a very good suggestion, so there's less risk we do the wrong thing later), we still need 
David's patch to release the nsIXPConnectJSObjectHolder on the right thread.

Though, if we do that, do we still need to release mStatement on the main-thread? it has thread-safe isupports, the ClearBackReferencesEvent thing seems just a way to try to release mParamsArray, but if that keeps onto thread-safe values, there may be no more the need to.

So in the end the patch would look like:
- change nsCOMPtr<nsIXPConnectJSObjectHolder> to nsMainThreadPtrHandle<nsIXPConnectJSObjectHolder>
- always use our internal thread-safe refcounted nsIVariant implementation, so that mParamsArray can be released on any thread

Am I correct? Is this what you had in mind?

There's still the blobby thing to figure, we need a thread-safe refcounting container for that, but off-hand it looks like we are just keeping an array of uint8_t, where do you think we may need to use nsDOMFile* classes?

> it seems like we could mash
> them all up to make a sanitizing variant wrapper dude:
> - http://dxr.mozilla.org/mozilla-central/source/storage/src/Variant.h
> -
> http://dxr.mozilla.org/mozilla-central/source/storage/src/
> mozStoragePrivateHelpers.cpp#117
> -
> http://dxr.mozilla.org/mozilla-central/source/storage/src/
> variantToSQLiteT_impl.h

My assumption was that we just had to copy the XPCVariant into our own Variant, which kind of wrapper are you suggesting to build here?
Flags: needinfo?(bugmail)
(In reply to Marco Bonardo [:mak] from comment #15)
> So, if I understand correctly, David's patch would be fine to fix the
> holders being released on the wrong thread, though that's not the only bug
> here, we also have non thread-safe nsIVariant implementations (XPCVariant)
> passed from xpconnect that are stored locally and we stopped taking care of
> releasing those on the main-thread.

Yes.

> Though we have out own thread-safe version of nsIVariant, that is
> http://dxr.mozilla.org/mozilla-central/source/storage/src/Variant.h
> and you suggest we may always convert passed-in nsIVariant to our own
> thread-safe implementation, like the JS helpers already do.

Yes.
 
> Though, even if we do this to fix XPCVariants (and I think it's a very good
> suggestion, so there's less risk we do the wrong thing later), we still need 
> David's patch to release the nsIXPConnectJSObjectHolder on the right thread.

Yes.
 
> Though, if we do that, do we still need to release mStatement on the
> main-thread? it has thread-safe isupports, the ClearBackReferencesEvent
> thing seems just a way to try to release mParamsArray, but if that keeps
> onto thread-safe values, there may be no more the need to.

I did not read the ClearBackReferencesEvent patch piece so much so far.  To be honest I was already on-board with the nsMainThreadPtrHandle and didn't quite understand what the ClearBackReferencesEvent was trying to do because at that point I realized I had gotten confused.  My apologies to :Yoric if my confusion about this bug caused him to go down that complex path needlessly :(

What you are saying sounds reasonable to me.  And given that the bug that introduced this regression was trying to release more things on the async thread, it generally seems appealing to me to stick with that strategy so as to minimize changes in behaviour for this bug.

> So in the end the patch would look like:
> - change nsCOMPtr<nsIXPConnectJSObjectHolder> to
> nsMainThreadPtrHandle<nsIXPConnectJSObjectHolder>
> - always use our internal thread-safe refcounted nsIVariant implementation,
> so that mParamsArray can be released on any thread
> 
> Am I correct? Is this what you had in mind?

Yes, this jives with my thoughts, noting that I mainly was thinking about the second thing.

> There's still the blobby thing to figure, we need a thread-safe refcounting
> container for that, but off-hand it looks like we are just keeping an array
> of uint8_t, where do you think we may need to use nsDOMFile* classes?

Ignore the nsDOMFile stuff comments.  Too much IndexedDB and DOM Blob/File stuff on my mind.  It is not relevant to mozStorage.

> > it seems like we could mash
> > them all up to make a sanitizing variant wrapper dude:
> > - http://dxr.mozilla.org/mozilla-central/source/storage/src/Variant.h
> > -
> > http://dxr.mozilla.org/mozilla-central/source/storage/src/
> > mozStoragePrivateHelpers.cpp#117
> > -
> > http://dxr.mozilla.org/mozilla-central/source/storage/src/
> > variantToSQLiteT_impl.h
> 
> My assumption was that we just had to copy the XPCVariant into our own
> Variant, which kind of wrapper are you suggesting to build here?

Sorry, no new wrapper.  I was trying to indicate that those code locations are what you would look at to make the copying function that uses mozStorage's existing variant implementation.  These callouts were mainly for my own edification since I had to refamiliarize myself with our variant code.

It sounds like you 100% understand what I was getting it (whereas I understood less than 100% :)
Flags: needinfo?(bugmail)
I have a patch implementing that proposal, since David is on PTO I decided to experiment with it. I still want to cleanup a couple details though, but I wanted to advice David to wait a little while before touching the current patch since it may not be needed.
Attached patch patch v1 (obsolete) — Splinter Review
So, I think this is sort-of what you had in mind.
Attachment #8422297 - Flags: review?(bugmail)
Attachment #8422297 - Flags: feedback?(dteller)
Comment on attachment 8422297 [details] [diff] [review]
patch v1

Review of attachment 8422297 [details] [diff] [review]:
-----------------------------------------------------------------

If I understand correctly your patch, the following line in mozStorageStatement.cpp is untouched:

nsCOMPtr<nsIXPConnectWrappedNative> wrapper = do_QueryInterface(mStatementParamsHolder);

This will still cause an assertion failure during off main thread finalization, won't it?

::: storage/src/mozStorageAsyncStatementJSHelper.cpp
@@ +40,2 @@
>  #endif
>  

Might be a good idea to add a comment (or an assertion) about the fact that this code is always executed on the main thread.

::: storage/src/mozStorageBindingParams.cpp
@@ +99,5 @@
>  #include "variantToSQLiteT_impl.h"
>  
> +/**
> + * Pad a Variants array with NULL Variants to a given size.
> + */

I don't understand the point of this function. Why not simply nullptr?

::: storage/src/mozStoragePrivateHelpers.cpp
@@ +149,5 @@
>    return nullptr;
>  }
>  
> +Variant_base *
> +convertVariantToStorageVariant(nsIVariant* aVariant)

Shouldn't this go to Variant.{h, cpp}?

@@ +153,5 @@
> +convertVariantToStorageVariant(nsIVariant* aVariant)
> +{
> +  nsRefPtr<Variant_base> variant = do_QueryObject(aVariant);
> +  if (variant) {
> +    // This is already a Storage variant, just passthrough.

When can this happen?

@@ +215,5 @@
> +      rv = aVariant->GetAsArray(&type, &iid, &len, &rawArray);
> +      NS_ENSURE_SUCCESS(rv, nullptr);
> +      if (type == nsIDataType::VTYPE_UINT8) {
> +        std::pair<const void *, int> v(static_cast<const void *>(rawArray), len);
> +        return new BlobVariant(v);

Does this cause memory copies?

@@ +217,5 @@
> +      if (type == nsIDataType::VTYPE_UINT8) {
> +        std::pair<const void *, int> v(static_cast<const void *>(rawArray), len);
> +        return new BlobVariant(v);
> +      }
> +    } 

Nit: whitespace.
Attachment #8422297 - Flags: feedback?(dteller) → feedback+
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?" - I'll be away on May 8th-12th) from comment #20)
> If I understand correctly your patch, the following line in
> mozStorageStatement.cpp is untouched:
> 
> nsCOMPtr<nsIXPConnectWrappedNative> wrapper =
> do_QueryInterface(mStatementParamsHolder);
> 
> This will still cause an assertion failure during off main thread
> finalization, won't it?

well, it doesn't assert, but I think I overlooked it and it still needs to be clarified. Off-hand I think it is not complaining cause this comptr is created and release on this thread, while mStatementParamsHolder = nullptr will be proxyreleased on mainthread, later.

> ::: storage/src/mozStorageBindingParams.cpp
> @@ +99,5 @@
> >  #include "variantToSQLiteT_impl.h"
> >  
> > +/**
> > + * Pad a Variants array with NULL Variants to a given size.
> > + */
> 
> I don't understand the point of this function. Why not simply nullptr?

nsCOMArray was creating null XPCVariants to pad the array, and we bind null when a param is not provided, I'm just replicating that behavior. Using nullptr causes fancy release crashes fwiw.

> ::: storage/src/mozStoragePrivateHelpers.cpp
> @@ +149,5 @@
> >    return nullptr;
> >  }
> >  
> > +Variant_base *
> > +convertVariantToStorageVariant(nsIVariant* aVariant)
> 
> Shouldn't this go to Variant.{h, cpp}?

I don't think so, this is just an helper and not part of the Variant implementation

> @@ +153,5 @@
> > +convertVariantToStorageVariant(nsIVariant* aVariant)
> > +{
> > +  nsRefPtr<Variant_base> variant = do_QueryObject(aVariant);
> > +  if (variant) {
> > +    // This is already a Storage variant, just passthrough.
> 
> When can this happen?

Through the js helper, we may invoke BindByIndex with a Storage Variant, I preferred to manage this here than to differentiate in the caller.

> @@ +215,5 @@
> > +      rv = aVariant->GetAsArray(&type, &iid, &len, &rawArray);
> > +      NS_ENSURE_SUCCESS(rv, nullptr);
> > +      if (type == nsIDataType::VTYPE_UINT8) {
> > +        std::pair<const void *, int> v(static_cast<const void *>(rawArray), len);
> > +        return new BlobVariant(v);
> 
> Does this cause memory copies?

Good question, I should check.
Comment on attachment 8417927 [details] [diff] [review]
Using nsMainThreadPtrHandle instead of nsCOMPtr for xpc references, v2

clear request for now
Attachment #8417927 - Flags: feedback?(mak77)
Comment on attachment 8422297 [details] [diff] [review]
patch v1

I will first address David's comments, let's not waste resources. Though if you have any further comments on the current patch please bring them out so I can go through them.
Attachment #8422297 - Flags: review?(bugmail)
Attached patch patch v1 (obsolete) — Splinter Review
Sorry if it took a while, most of the patch has been built on trains in small chunks, it took a little while to make it clean.

It seems to work fine on my Mac, I'm going to do a Try run now.
Attachment #8417927 - Attachment is obsolete: true
Attachment #8422297 - Attachment is obsolete: true
Attachment #8427110 - Flags: review?(bugmail)
Attachment #8427110 - Flags: feedback?(dteller)
Comment on attachment 8427110 [details] [diff] [review]
patch v1

Review of attachment 8427110 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good and makes sense once my brain came to terms that nsMainThreadPtrHolder had different-looking boilerplate than the standard XPCOM stuff.  (Kudos to whoever wrote the doc-block for that!)

r=asuth assuming there is nothing tricky about addressing the variant reuse that you commented out.

::: storage/src/mozStoragePrivateHelpers.cpp
@@ +157,5 @@
> +Variant_base *
> +convertVariantToStorageVariant(nsIVariant* aVariant)
> +{
> +  /*
> +  nsRefPtr<Variant_base> variant = do_QueryObject(aVariant);

Why is this commented out?  From the bug I understood that you want this logic to happen and I think it would be a good idea too.

I'm guessing this either doesn't work or you thought this might be the source of a crash and didn't uncomment and put it back.  Please fix this and make sure to run another try run with this.  (Unless you already fixed this in your try run.)
Attachment #8427110 - Flags: review?(bugmail) → review+
(In reply to Andrew Sutherland (:asuth) from comment #25)
> ::: storage/src/mozStoragePrivateHelpers.cpp
> @@ +157,5 @@
> > +Variant_base *
> > +convertVariantToStorageVariant(nsIVariant* aVariant)
> > +{
> > +  /*
> > +  nsRefPtr<Variant_base> variant = do_QueryObject(aVariant);
> 
> Why is this commented out?  From the bug I understood that you want this
> logic to happen and I think it would be a good idea too.

Uh damn, I had commented out that piece of code some days ago while investigating a crash, and I forgot to re-enable it :/ I will re-enable and see if it's still an issue, clearly.
now I remember, I was missing the IID in Variant_base to be able to queryObject.
The new try is https://tbpl.mozilla.org/?tree=Try&rev=a72134a64d70
Assignee: dteller → mak77
Status: NEW → ASSIGNED
Comment on attachment 8427110 [details] [diff] [review]
patch v1

clearly there is still something wrong.
Attachment #8427110 - Flags: feedback?(dteller)
Blocks: 1137343
It seems that this bug raised with the 36 release (cf 1137343).
Do we have plans for 36.0.1 ?
Flags: needinfo?(mak77)
Noting this is still a topcrash in 36 and the #21 crash in Firefox 37.  Most of the time it's a startup crash.
(In reply to Sylvestre Ledru [:sylvestre] from comment #31)
> Do we have plans for 36.0.1 ?

At this time, I don't have the resources to work on this.
Someone else could complete the wip patch here, or it will take far more time than the release timeframe
Flags: needinfo?(mak77)
This bug has been identified as the cause of bug 1137343, which is the #11 topcrash in 37 Beta. 

mak - If you really can't work on this, can you please help us find someone else to pick up where you left off? Note that the remainder of the Beta gtb schedule is
Mar 12 - Beta 5
Mar 16 - Beta 6
Mar 19 - Beta 7
Flags: needinfo?(mak77)
I'm updating the patch and retesting on Try, I hope to have good news soon.
Flags: needinfo?(mak77)
Attached patch patch v2 (obsolete) — Splinter Review
This one might be good, let's see what Try thinks about it.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=edc0fe0836ab
Attachment #8427110 - Attachment is obsolete: true
Attached patch patch v2.1 (obsolete) — Splinter Review
Wow, I did something very wrong to move from 1 test failure to all-tests failure.
This one should be better:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=33417bd07a6e
Attachment #8577165 - Attachment is obsolete: true
Attached patch patch v2.2Splinter Review
and fixed a couple implicit constructors.

It looks good so far on Try (apart the static analysis due to the implicit constructors I fixed here) so I think we can start (again) the review process.
Whatever happens the next changes will be interdiffs, so we don't have to review again the full thing.
Attachment #8577305 - Attachment is obsolete: true
Attachment #8577365 - Flags: review?(bugmail)
Comment on attachment 8577365 [details] [diff] [review]
patch v2.2

Review of attachment 8577365 [details] [diff] [review]:
-----------------------------------------------------------------

::: storage/test/unit/test_statement_executeAsync.js
@@ +717,5 @@
>    let bp = array.newBindingParams();
> +  try {
> +    bp.bindByIndex(0, run_test);
> +  } catch (e if e.result == Cr.NS_ERROR_UNEXPECTED) {
> +    /* expected */

You need to set a flag in the catch block here and check it outside the catch block so we can verify the error fired.  (The point of this test hunk is to verify we throw an error.)  It's of course fine to use some other helper glue where you call a function with your function and the expected exception.

@@ +737,5 @@
>    let bp = array.newBindingParams();
> +  try {
> +    bp.bindByName("blob", run_test);
> +  } catch (e if e.result == Cr.NS_ERROR_UNEXPECTED) {
> +    /* expected */

ditto on changing test semantics.
Attachment #8577365 - Flags: review?(bugmail) → review+
Thank you for the quick review

https://hg.mozilla.org/integration/fx-team/rev/4bfaf7fd30bd
Target Milestone: --- → mozilla39
https://hg.mozilla.org/mozilla-central/rev/4bfaf7fd30bd
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment on attachment 8577365 [details] [diff] [review]
patch v2.2

Approval Request Comment
[Feature/regressing bug #]: Old bug, crash ratio increased in Firefox 36
[User impact if declined]: top crash
[Describe test coverage new/current, TreeHerder]: m-c
[Risks and why]: The risk is due to the fact this touches the code binding values to any Sqlite statement, but we have a lot of tests going through these code paths in the tree. We should at least get confirmation this change fixes the crash for Nightly population.
[String/UUID change made/needed]: none
Attachment #8577365 - Flags: approval-mozilla-beta?
Attachment #8577365 - Flags: approval-mozilla-aurora?
Triage drive-by: waiting for some confirmation of reduced crash volume on nightly
There are no crashes reported with this signature for 37.0b5, so far.  

It still seems too soon to tell if this fix affected the signature in bug 1137343.
I'm sure I'm saying the obvious here, but the first build with the fix is 20150318030202 Nightly.
(In reply to Liz Henry (:lizzard) from comment #45)
> My mistake, I was looking at beta not at nightly. Searching on nightly from
> yesterday onwards gives me 12 crashes: 

I looked at all of those crashes, and they are different than the ones reported here, as they are happening under FilePickerParent::FileSizeAndDateRunnable.  I'll file a bug for that.  That said, the signature in bug 1137343 is XPCWrappedNative::AddRef(), so the crashes here might be showing up differently.
I filed bug 1144759 for the file picker crashes.
Crash Signature: [@ NS_CycleCollectorSuspect3] → [@ NS_CycleCollectorSuspect3][XPCWrappedNative::AddRef()]
Crash Signature: [@ NS_CycleCollectorSuspect3][XPCWrappedNative::AddRef()] → [@ NS_CycleCollectorSuspect3] [@ XPCWrappedNative::AddRef()]
We don't yet have enough data to warrant including this fix in Beta 7. We will check again on Monday and make a call about whether we should take this fix in the 37 RC.
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #49)
> We don't yet have enough data to warrant including this fix in Beta 7. We
> will check again on Monday and make a call about whether we should take this
> fix in the 37 RC.

Note that the signature of bug 1137343 is #3 in 36 release with ~2.5% of all crashes, and it looks like the patch here probably will fix that, so our release population is hitting this more than our beta population and it would be good to get a fix to them.
Comment on attachment 8577365 [details] [diff] [review]
patch v2.2

Give a chance to this fix in aurora then.
Attachment #8577365 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I see 30 crashes reported on Nightly with build ids from 20150318 or later. All of the reports reference FilePickerParent::FileSizeAndDateRunnable (comment 47) except one:

https://crash-stats.mozilla.com/report/index/c3fccd5e-042e-4939-8751-b91dd2150323

I don't see any reports in Aurora from 20150320 or later.

mak/mccr8 - Although this patch has some risk, 37 is still not in a good state stability wise and this may fix a top crash. What do you think about uplifting to Beta at this point?
Flags: needinfo?(mak77)
Flags: needinfo?(continuation)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #53)
> I see 30 crashes reported on Nightly with build ids from 20150318 or later.
> All of the reports reference FilePickerParent::FileSizeAndDateRunnable
> (comment 47) except one:
> 
> https://crash-stats.mozilla.com/report/index/c3fccd5e-042e-4939-8751-
> b91dd2150323

I meant to also ask if either of you know what the deal is with this one different report.
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #54)
> > https://crash-stats.mozilla.com/report/index/c3fccd5e-042e-4939-8751-
> > b91dd2150323
> 
> I meant to also ask if either of you know what the deal is with this one
> different report.

It is probably due to a similar thread-safety bug, but I don't know that code area, sorry.

By the way, I took a look at the current reported crashes on Nightly and Aurora for the 2 signatures, and the data looks good, all of the reports I checked were pointing to the old code.
So the data looks good.
I also didn't see reported regressions related to this change in Nightly so far. While we didn't have a long baking time, this makes me feel a little bit more optimistic.
So yes, if we think this can have a positive impact on crash rate, I'd be fine to uplift.

now let me see if the patch applies cleanly to the beta tree.
Flags: needinfo?(mak77)
Yes, it applies cleanly, this is a Try run, just in case:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1b706bf80723
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #54)
> I meant to also ask if either of you know what the deal is with this one
> different report.

I'll file a bug about it.  The issue is unrelated to this one.  The signatures in this bug show up any time somebody uses a refcounted cycle-collected object off the main thread.
Flags: needinfo?(continuation)
(In reply to Andrew McCreight [:mccr8] from comment #57)
> I'll file a bug about it.
I filed bug 1146416. It looks e10s-related so it should only affect trunk.
Attached patch compile followupSplinter Review
everything looks good, but the Linux debug build (e10s and static analysis failures can be ignored on beta). Looks like beta doesn't use unified builds, so it needs some minor adjustments to includes and forward declarations.

This is an additional patch with a try run of just linux debug. I think it doesn't need an explicit review, since it's just minor compilation spew, moreover a couple of declarations have been wrongly removed by me in the original patch (I didn't know of the unified build issue).

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7de515afec03

I'd possibly push these minor changes to all branches, since even if not explicitly needed with unified builds, it's still good to have them for declaration correctness.
Attachment #8581846 - Flags: approval-mozilla-beta?
Attachment #8581846 - Flags: approval-mozilla-aurora?
QA Contact: lhenry
Comment on attachment 8577365 [details] [diff] [review]
patch v2.2

With the additional confidence in this patch from having tested it and the need to improve the stability of 37, we're going to take this in Firefox 37.0 RC1. Beta+
Attachment #8577365 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8581846 - Flags: approval-mozilla-beta?
Attachment #8581846 - Flags: approval-mozilla-beta+
Attachment #8581846 - Flags: approval-mozilla-aurora?
Attachment #8581846 - Flags: approval-mozilla-aurora+
Lawrence you asked earlier about crash volume on this bug. There are only a few reports for the first signature listed, and for XPCWrappedNative signature, I commented in bug 1137343: #12 crash for 37.0b7 with 462 crashes in the last week, 1.1% of all beta7 crashes.
QA Contact: lhenry
Let's do one thing at a time.

Follow-up compile patch to fx-team:
https://hg.mozilla.org/integration/fx-team/rev/3fcf3b402ec7

Follow-up compile patch to aurora:
https://hg.mozilla.org/releases/mozilla-aurora/rev/b9480ce8f288
I'm going to wait fx-team and aurora builds before pushing to beta, to be very sure everything is at its place, that means likely mid morning CET (also cause it's very late now).
if there's bigger hurry than that, you can push the 2 aurora changesets (comment 52 and comment 62) to beta, they should work as-is.
Whiteboard: [adv-main37+]
Does this impact ESR31?
Flags: needinfo?(abillings)
I filed this a year ago, so almost certainly.  We landed it for the purposes of improving stability, so I don't know if this crash is showing up on ESR31 or not.
Flags: needinfo?(abillings)
This is a sec-low. We don't take sec-low or sec-moderate bugs on ESR without an overriding reason.
Whiteboard: [adv-main37+] → [adv-main37+][b2g-adv-main2.2+]
Group: core-security → core-security-release
Whiteboard: [adv-main37+][b2g-adv-main2.2+] → [adv-main37+][b2g-adv-main2.2+][post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.