Closed Bug 1408631 Opened 4 years ago Closed 4 years ago

Crash in shutdownhang | nsThread::Shutdown | nsUrlClassifierDBService::Shutdown

Categories

(Toolkit :: Safe Browsing, defect, P1)

45 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 58+ fixed
firefox56 --- wontfix
firefox57 - wontfix
firefox58 --- fixed

People

(Reporter: skywalker333, Assigned: dimi)

References

(Blocks 1 open bug)

Details

(Keywords: crash, topcrash, Whiteboard: #sbv4-m10)

Crash Data

Attachments

(3 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-318d46dc-b5f5-4477-8a8e-842bf0171014.
=============================================================

Firefox crashed while shutting down. 

Four crash reports were generated during the shutdown process. I submitted them manually via about:crashes.


bp-318d46dc-b5f5-4477-8a8e-842bf0171014
	10/14/2017	2:33 am
bp-8481b722-082c-4a64-8a84-fce970171014
	10/14/2017	2:32 am
bp-d800ad24-94aa-44e3-98de-b74930171014
	10/14/2017	2:30 am
bp-dadfd412-a085-4a06-a216-0a7320171014
	10/14/2017	2:18 am
bp-3d0413f6-08d5-4837-b974-f064e0171014
	10/14/2017	1:33 am
bp-318d46dc-b5f5-4477-8a8e-842bf0171014
	10/14/2017	2:33 am
	
Firefox 57.0b8 Crash Report [@ shutdownhang | nsThread::Shutdown | nsUrlClassifierDBService::Shutdown ]

Processor Notes 	processor_prod-processor-i-092db6dce0c531c7c_1383; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang	
	
bp-8481b722-082c-4a64-8a84-fce970171014
	10/14/2017	2:32 am
	
Firefox 57.0b8 Crash Report [@ IPCError-browser | ShutDownKill ]

Processor Notes 	processor_prod-processor-i-0df39363997b2d110_1395; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; Signature replaced with an IPC Channel Error, was: "RtlAnsiStringToUnicodeString | MsgWaitForMultipleObjectsEx | mozilla::widget::WinUtils::WaitForMessage | nsAppShell::ProcessNextNativeEvent"; skunk_classifier: reject - not a plugin hang
	
bp-d800ad24-94aa-44e3-98de-b74930171014
	10/14/2017	2:30 am

Firefox 57.0b8 Crash Report [@ IPCError-browser | ShutDownKill ]

Processor Notes 	processor_prod-processor-i-0807ffe45d5860cab_1392; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; Signature replaced with an IPC Channel Error, was: "js::GCMarker::lazilyMarkChildren"; skunk_classifier: reject - not a plugin hang	
	
	
bp-dadfd412-a085-4a06-a216-0a7320171014
	10/14/2017	2:18 am
	
Firefox 57.0b8 Crash Report [@ IPCError-browser | ShutDownKill ]

Processor Notes 	processor_prod-processor-i-0adac543901330d10_1396; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; Signature replaced with an IPC Channel Error, was: "RtlAnsiStringToUnicodeString | MsgWaitForMultipleObjectsEx | mozilla::widget::WinUtils::WaitForMessage | nsAppShell::ProcessNextNativeEvent"; skunk_classifier: reject - not a plugin hang
	
bp-3d0413f6-08d5-4837-b974-f064e0171014
	10/14/2017	1:33 am
	
Firefox 57.0b8 Crash Report [@ shutdownhang | nsThread::Shutdown | nsUrlClassifierDBService::Shutdown ]

Processor Notes 	processor_prod-processor-i-0dd4f3221581842b7_1390; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang
Signature report for shutdownhang | nsThread::Shutdown | nsUrlClassifierDBService::Shutdown

Showing results from 7 days ago
5,364 Results

Windows 7 	2329 	43.4%
Windows Vista 	1565 	29.2%
Windows XP 	1016 	18.9%
Windows 10 	349 	6.5%
Windows 8.1 	85 	1.6%
Windows 8 	10 	0.2%
Windows Srv2003	10 	0.2%

Firefox 	52.4.0esr 	2840 	52.9% 	2138
Firefox 	45.2.0esr 	1004 	18.7% 	443
Firefox 	52.3.0esr 	380 	7.1% 	377
Firefox 	52.2.1esr 	206 	3.8% 	90
Firefox 	44.0b1 	        200 	3.7% 	168
Firefox 	52.2.0esr 	79 	1.5% 	47
Firefox 	52.0esr 	75 	1.4% 	56
Firefox 	52.4.1esr 	54 	1.0% 	40
Firefox 	45.1.1esr 	51 	1.0% 	37

Uptime Range
1-5 min 	2341 	43.6%
5-15 min 	1471 	27.4%
> 1 hour 	846 	15.8%
15-60 min 	700 	13.0%
< 1 min 	6 	0.1%

Architecture
x86 	5199 	96.9%
amd64 	165 	3.1%
Signature report for shutdownhang | nsThread::Shutdown | nsUrlClassifierDBService::Shutdown
	
ccd9b01d-80e0-4f9d-9b10-f20af0171013 	not connecting 	2017-10-13 10:17:03 	en-GB
f0fd3cac-1a71-4edd-a1d5-3a9670171012 	Firefox keeps not responding I have refreshed firefox at least 8 times today but it is still a problem 	2017-10-12 19:38:28 	en-GB
c4db96d7-54cb-40cd-8613-b6a350171012 	I can't do a Google search. 	2017-10-12 19:15:39 	en-US
4ae3099e-b2fb-4f04-ba1b-4b9da0171012 	have no idea why 	2017-10-12 10:47:43 	en-US
34e61f50-0cd5-4522-acd8-503310171012 	Unable to access. No connection. 	2017-10-12 09:53:25 	en-GB
9fa3153d-389a-48d9-a049-006820171012 	Impossible d'avoir accès aux sites de mes avoris Internet 	2017-10-12 07:25:45 	fr
3c4805b9-42da-41cd-bce8-0699e0171011 	it takes 30 seconds to go from one tab to another tab and the screen freezes a lot. 	2017-10-11 18:48:38 	en-US
06930b16-316c-4175-ba25-5cf990171011 	Unable to open the desk icon. 	2017-10-11 12:35:16 	en-GB
7a61de79-5375-4124-87ef-bd18c0171008 	no internet 	2017-10-08 18:00:12 	en-US
9a2a7b56-ef73-4341-9e2a-e40bb0171008 	it does it alot 	2017-10-08 01:49:04 	en-US
b2c29f1a-7952-49c3-b604-b5eed0171007 	why does firefox ALWAYS crash? 	2017-10-07 23:06:04 	en-US
f3586715-54e8-47e8-a642-678100171007 	Was trying to update Firefox, would not do so, have had these problems since. 	2017-10-07 18:33:10 	en-US
Not sure what happened but I guess that "URL classifier" thread could not be shutdowned. 

Frame	Module	Signature	Source
0	xul.dll	nsTArray_Impl<nsTArray<unsigned short>, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned __int64, unsigned __int64)	xpcom/ds/nsTArray.h:2059
1	xul.dll	nsUrlClassifierPrefixSet::~nsUrlClassifierPrefixSet()	toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp:66
2	xul.dll	nsUrlClassifierPrefixSet::`scalar deleting destructor'(unsigned int)	
3	xul.dll	nsUrlClassifierPrefixSet::Release()	toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp:36

The above frames appeared in "URL classifier" thread of all crash report
Component: Untriaged → Safe Browsing
Priority: -- → P1
Product: Firefox → Toolkit
As Thomas mentioned in Comment 4, almost every shutdown hang happened while releasing nsUrlClassifierPrefixSet::mIndexDeltas [1] which is a two-dimension array stores all the prefix data.

From crash reports, it doesn't look like we have any deadlock here because we are just freeing memory at this moment (nsTArray).

I guess most cases free |mIndexDelta| won't take too much time but maybe when CPU is very busy then freeing this big array may cause shutdown hang, not sure if we should add a telemetry to confirm this.

If free the array is really time-consuming in certain circumstances, one solution I can think of now is replace
fixed-length prefix set with variable-length prefix set implementation. The drawback of this approach will be lost the memory space optimization we have done for 4-bytes prefix set, and also, it is a big change for safebrowsing.

[1] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.h#83
(In reply to Dimi Lee[:dimi][:dlee] from comment #5)
> I guess most cases free |mIndexDelta| won't take too much time but maybe
> when CPU is very busy then freeing this big array may cause shutdown hang,
> not sure if we should add a telemetry to confirm this.

GCP, have you dealt with something like that before in the Safe Browsing code?

> If free the array is really time-consuming in certain circumstances, one
> solution I can think of now is replace
> fixed-length prefix set with variable-length prefix set implementation. The
> drawback of this approach will be lost the memory space optimization we have
> done for 4-bytes prefix set, and also, it is a big change for safebrowsing.

While it could in theory help with this shutdown hang, it's likely to make memory usage significantly worse.
Flags: needinfo?(gpascutto)
Whiteboard: #sbv4-m10
> > If free the array is really time-consuming in certain circumstances, one
> > solution I can think of now is replace
> > fixed-length prefix set with variable-length prefix set implementation. The
> > drawback of this approach will be lost the memory space optimization we have
> > done for 4-bytes prefix set, and also, it is a big change for safebrowsing.
> 
> While it could in theory help with this shutdown hang, it's likely to make
> memory usage significantly worse.

Not sure if we could replace nsTArray<nsTArray<uint16_t>> with nsTArray<nsCString>.
Maybe it is faster to release nsCString ? Also need to check the "append" performance for nsTArray & nsCString
(In reply to François Marier [:francois] from comment #6)
> (In reply to Dimi Lee[:dimi][:dlee] from comment #5)
> > I guess most cases free |mIndexDelta| won't take too much time but maybe
> > when CPU is very busy then freeing this big array may cause shutdown hang,
> > not sure if we should add a telemetry to confirm this.
> 
> GCP, have you dealt with something like that before in the Safe Browsing
> code?

No.

>Not sure if we could replace nsTArray<nsTArray<uint16_t>> with nsTArray<nsCString>.

If you try to use contiguous storage, you'll just move this bug from a slow shutdown to OOM crashes, see the original bug which added this code, bug 1046038.

If the string uses ropes or some other kind of fragmented storage, then you're at a fairly large risk of simply ending back up at the beginning of this bug.

I see 3 approaches:

1) Try to find a middle ground. Right now we use ~6000 arrays instead of one. Maybe a 100 or so will solve this bug while not regressing the OOM problem much. Of course the handling code might end up more complicated :-/
2) Fix bug 662444. This has enormous benefits for the project aside from making this bug moot.
3) Investigate why the freeing takes so long. 6000 arrays is indeed a lot, but should it really take so long that we pick it up? Surely a complicated webpage is worse? Note that the user comments in comment 3 talk about the browser being entirely broken.
Flags: needinfo?(gpascutto)
From reading bug 1046038, fixing this might be as simple as tuning DELTAS_LIMIT = 120. There is some trickiness there if you want to keep the file format compatible, but you might be in luck that for increasing the value there's no issue...
...and note that DELTAS_LIMIT also affects the time for a SafeBrowsing lookup (it's your worst case linear scan for a certain prefix). But I suspect you can get away with quite a bit bigger values without any measurable performance effect.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #8)
> 1) Try to find a middle ground. Right now we use ~6000 arrays instead of
> one. Maybe a 100 or so will solve this bug while not regressing the OOM
> problem much. Of course the handling code might end up more complicated :-/
> 2) Fix bug 662444. This has enormous benefits for the project aside from
> making this bug moot.
> 3) Investigate why the freeing takes so long. 6000 arrays is indeed a lot,
> but should it really take so long that we pick it up? Surely a complicated
> webpage is worse? Note that the user comments in comment 3 talk about the
> browser being entirely broken.

GCP, Thanks for all these great suggestions!

Hi francois,
The shutdown hang happened when clear lookupCache[1], I am thinking maybe we could do this first in "quit-application" WITHOUT blocking main-thread. which may give us some extra time before the blocking call inside nsUrlClassifierDBService::Shutdown[2].

I just checked the code I think it is doable in current SafeBrowinsg implementation (After Henry modified the update mechanism).

This approach doesn't fix the huge nsTArray release problem mentioned by GCP and if "quit-application" to "profile-before-change" doesn't take too long it also won't solve this issue. But I think we can give it a try because in theory, it should be able to improve shutdown time a little bit. If the crash rate is still high, then we can take one of the approaches suggested by GCP, how do you think?

[1] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/Classifier.cpp#906
[2] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#2441
Flags: needinfo?(francois)
(In reply to Dimi Lee[:dimi][:dlee] from comment #11)
> This approach doesn't fix the huge nsTArray release problem mentioned by GCP
> and if "quit-application" to "profile-before-change" doesn't take too long
> it also won't solve this issue. But I think we can give it a try because in
> theory, it should be able to improve shutdown time a little bit. If the
> crash rate is still high, then we can take one of the approaches suggested
> by GCP, how do you think?

That's a good idea. It's worth trying this simple change before playing with the DELTA_LIMITS.

Also, thanks GCP for all of this background on the existing storage code!
Flags: needinfo?(francois)
Assignee: aosmond → dlee
Status: NEW → ASSIGNED
Comment on attachment 8920265 [details]
Bug 1408631 - Release SafeBrowsing lookupcache in worker thread while shutdown.

https://reviewboard.mozilla.org/r/191278/#review196546

Code looks great, as always :)

Just a few documentation suggestions to make the intention clearer.

::: commit-message-31af3:2
(Diff revision 1)
> +Bug 1408631 - Release SafeBrowsing lookupcache in worker thread while shutdown. r?francois
> +The blocking shutdown code is handled in ::Shutdown while receive

I'm not entirely sure, but I think you might need an empty line between the short description and the long one.

However, given that this comment is already in the code, you can simply remove it from the commit message.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.h:227
(Diff revision 1)
>  
>    // Provide a way to forcibly close the db connection.
>    nsresult GCC_MANGLING_WORKAROUND CloseDb();
>  
> +  // Release objects that won't affect update thread.
> +  // Currently only Classifier::mLookupcache is released.

We can drop that comment since it's obvious from the code in the cpp file.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:820
(Diff revision 1)
>    LOG(("urlclassifier db closed\n"));
>  
>    return NS_OK;
>  }
>  
> +// PreShutdown is executed prior than close the update thread.

"prior to terminating the update thread."

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:831
(Diff revision 1)
> +  if (mClassifier) {
> +    // Classifier close will release all lookup caches which may be a time-consuming job.
> +    // See Bug 1408631.
> +    mClassifier->Close();
> +  }
> +

I would actually move the comment that you have before the function name, right in here.

That way, it will be closer to the place where a person would mistakenly add more cleanups:

    // WARNING: nothing we put here should affect an ongoing update in the update thread. When in doubt, put things in Shutdown() instead.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:2446
(Diff revision 1)
>      }
>    } else if (!strcmp(aTopic, "quit-application")) {
>      // Tell the update thread to finish as soon as possible.
>      gShuttingDownThread = true;
> +
> +    // The blocking shutdown code is handled in ::Shutdown while receive

I think I know what you mean, but this comment is a little bit confusing.

Does the following match what you meant?

> The code in ::Shutdown() is run on a 'profile-before-change' event and ensures that objects are freed by blocking on this freeing.
>
> We can however speed up the shutdown time by using the worker thread to release, in an earlier event, any objects that cannot affect an ongoing update on the update thread."

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:2461
(Diff revision 1)
>    }
>  
>    return NS_OK;
>  }
>  
> +// Post a PreShutdown task to work thread to release objects without blocking

"worker thread"
Attachment #8920265 - Flags: review?(francois) → review-
(In reply to François Marier [:francois] from comment #14)
> 
> Just a few documentation suggestions to make the intention clearer.
> 

Thanks for your review! I am really appreciate that help me make the comment way much better than the original version :)
Comment on attachment 8920265 [details]
Bug 1408631 - Release SafeBrowsing lookupcache in worker thread while shutdown.

https://reviewboard.mozilla.org/r/191278/#review196890
Attachment #8920265 - Flags: review?(francois) → review+
Pushed by dlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1a4d6bea605f
Release SafeBrowsing lookupcache in worker thread while shutdown. r=francois
https://hg.mozilla.org/mozilla-central/rev/1a4d6bea605f
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
This signature has over 2500 reports from ESR52 in the last week. Is this something we can/should try to uplift?
Flags: needinfo?(dlee)
We should probably consider a fix for ESR52.5.0. I wonder if we can figure out the reason for the regression in ESR 52.4.0.
Reopen this bug because the patch landed in Comment 19 is only an improvement of nsUrlClassifierDBService shutdown time, it doesn't guarantee the bug is fixed. I'll still need to keep tracking the crash rate to see if it really helps.

Hi Ryan, Liz
I'll set this bug resolve fixed if the crash rate is decreased, then we can see if we want to uplift this patch or not.
Status: RESOLVED → REOPENED
Flags: needinfo?(dlee)
Resolution: FIXED → ---
Target Milestone: mozilla58 → ---
Status: REOPENED → ASSIGNED
Just an update that there is no crash on nightly since we landed this patch.
There is no crash on nightly for the past two weeks.
I think the patch landed did reduce the reproduce rate of this bug.

Hi francois,
I'll set this bug as resolve fixed if you are ok with it.
Flags: needinfo?(francois)
(In reply to Dimi Lee[:dimi][:dlee] from comment #25)
> I'll set this bug as resolve fixed if you are ok with it.

Sounds good to me. Do you want to port your fix to ESR52 too?
Flags: needinfo?(francois)
We would normally build ESR 52.5.0 today, but I have put it off until tomorrow morning.

I don't think we would want to uplift this to 57, but if it lands on ESR52 branch by the time I go to build then, great. If not then it will have to wait till 52.6.0.
Flags: needinfo?(dlee)
Hi Liz,
Sorry I found ESR52 doesn't contain the latest update scheme introduced in Bug 1338017.
This patch is built on the top of that, so I can't uplift this.
Flags: needinfo?(dlee)
Status: ASSIGNED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Leaving 57 set to fix-optional in case the crash rate is high enough to warrant dot release consideration eventually.
The crash rate on ESR52 is extremely high (~40000 crashes in the last week) - is there anything we can do to stop the bleeding there?
[Tracking Requested - why for this release]:

55,632 Results in the past 7 days.

Top Crasher for 52.5.0esr.

Still high crash rate for 57.x.

Signature report for shutdownhang | nsThread::Shutdown | nsUrlClassifierDBService::Shutdown

Showing results from 7 days ago 

Windows Vista 	22024 	39.6%
Windows 7 	17702 	31.8%
Windows XP 	6549 	11.8%
Windows 10 	6182 	11.1%
Windows 8.1 	2611 	4.7%
Windows 8 	331 	0.6%
Windows Srv'03 	232 	0.4%
Linux 	1 	0.0%

Firefox 	57.0.1  	62 	0.1% 	85
Firefox 	57.0    	499 	0.9% 	399
Firefox 	52.5.0esr 	23123 	41.8% 	12435
Firefox 	52.4.1esr 	2240 	4.1% 	1582
Firefox 	52.4.0esr 	8834 	16.0% 	8314
Firefox 	52.3.0esr 	4162 	7.5% 	3251
Firefox 	52.2.1esr 	1123 	2.0% 	1198
Firefox 	52.2.0esr 	1148 	2.1% 	839
Firefox 	52.1.2esr 	361 	0.7% 	224
Firefox 	52.1.1esr 	158 	0.3% 	134
Firefox 	52.1.0esr 	148 	0.3% 	135
Firefox 	52.0esr 	985 	1.8% 	722
Firefox 	52.0.2esr 	359 	0.6% 	200
Firefox 	52.0.2  	488 	0.9% 	522
Firefox 	52.0.1esr 	484 	0.9% 	222

x86 	54029 	97.1%
amd64 	1603 	2.9%

Top Crasher for Firefox 52.5.0esr
Top 50 Crashing Signatures. 7 days ago 

1 	12.46% 	2.73% 	shutdownhang | nsThread::Shutdown | nsUrlClassifierDBService::Shutdown
application
	23123 	23123 	0 	0 	17052 	1 	2016-10-22 
	
	
Recent crash report: 

Report ID 	Date Submitted
bp-0d12ed6b-2951-490b-9e2a-dfe400171204
	12/03/17	7:09 PM
Firefox 52.5.0esr
Status: RESOLVED → REOPENED
Keywords: topcrash
Resolution: FIXED → ---
I just checked the crash report for ESR52, most of shutdown hangs happened because we were updating.
We can apply the fix in Bug 1315386 to ESR 52 to reduce the crash rate.

As for 57, we can use the patch in thus bug.

Hi francois, do you have any concern about this?
Flags: needinfo?(dlee)
Flags: needinfo?(francois)
(In reply to Dimi Lee[:dimi][:dlee] from comment #32)
> I just checked the crash report for ESR52, most of shutdown hangs happened
> because we were updating.
> We can apply the fix in Bug 1315386 to ESR 52 to reduce the crash rate.
> 
> As for 57, we can use the patch in thus bug.
> 
> Hi francois, do you have any concern about this?

That sounds fine to me. It's not a small patch, but since it's the top crasher for 52, we probably need it.

BTW, we'll have to grab the patch on bug 1336005 too.

Thomas, do you remember any other fixes you may have made to the shutdown-aware patch that we should also backport to 52?
Flags: needinfo?(francois) → needinfo?(tnguyen)
I think that's all. Other shutdown fixes are landed after we rewrote thread model of url-classifier.
Flags: needinfo?(tnguyen)
Hi francois,
This patch merged the fix in Bug 1315386 and Bug 1336005.
Would you mind taking a quick look to make sure everything is ok?
Attachment #8934406 - Flags: review?(francois)
Comment on attachment 8934406 [details] [diff] [review]
esr52-uplift.patch

Review of attachment 8934406 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for taking care of this Dimi!
Attachment #8934406 - Flags: review?(francois) → review+
Comment on attachment 8934406 [details] [diff] [review]
esr52-uplift.patch

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
Not a sec bug
User impact if declined:
High crash rate when shutdown
Fix Landed on Version:
53
Risk to taking this patch (and alternatives if risky):
The risk is that this fix doesn't reduce the rate of shutdown hang or causes other shutdown crash.
But since we have test this for a while (This fix first landed on Nov 2016), I think it won't be too risky.

String or UUID changes made by this patch: 
None
Attachment #8934406 - Flags: approval-mozilla-esr52?
Comment on attachment 8934406 [details] [diff] [review]
esr52-uplift.patch

This patch isn't trivial. Typically ESR uplifts are limited to security and stability fixes. The crash volume for this signature is high enough to justify inclusion in ESR52.6.
Attachment #8934406 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Keywords: crash, topcrash
Uplifts don't need checkin-needed set on them.
Keywords: checkin-needed
Also, not seeing any crash reports on Fx58, so re-resolving this as fixed. Trevor - status flags track older branches, bug resolution tracks m-c.
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Dimi, it would appears that the ESR52 patch is hitting Valgrind leaks extremely frequently. Can you please take a look?
https://treeherder.mozilla.org/logviewer.html#?job_id=155218195&repo=mozilla-esr52
Flags: needinfo?(dlee)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #41)
> Dimi, it would appears that the ESR52 patch is hitting Valgrind leaks
> extremely frequently. Can you please take a look?
> https://treeherder.mozilla.org/logviewer.html#?job_id=155218195&repo=mozilla-
> esr52

Hi Ryan,
I suspect this is related to a fix(Bug 1336240) we didn't include in the patch uplifted, but I am not 100% sure
I'll upload a patch that includes the fix in Bug 1336240, could you help check if it fixes the Valgrind leaks? Thanks!
I am out of office tomorrow, also, I don't know how to run try on mozilla-esr52.
Flags: needinfo?(ryanvm)
Attached patch valgrind leak fix patch (obsolete) — Splinter Review
For Valgrind, pushing to Try from ESR52 should Just Work with the |try: -b o -p linux64-valgrind -u none -t none| syntax. That said, I ran a push your attached patch and it still leaks :(

https://treeherder.mozilla.org/logviewer.html#?job_id=155645562&repo=try
Flags: needinfo?(ryanvm)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #44)
> For Valgrind, pushing to Try from ESR52 should Just Work with the |try: -b o
> -p linux64-valgrind -u none -t none| syntax. That said, I ran a push your
> attached patch and it still leaks :(
> 
> https://treeherder.mozilla.org/logviewer.html#?job_id=155645562&repo=try

Thanks for help, I am working on this now.
Attached patch valgrind.patchSplinter Review
Attachment #8941769 - Attachment is obsolete: true
Flags: needinfo?(dlee)
Comment on attachment 8942859 [details] [diff] [review]
valgrind.patch

Hi Thomas,
This patch fixes the Valgrind leaks in esr52, could you help check this? Thanks!
Attachment #8942859 - Flags: review?(tnguyen)
Comment on attachment 8942859 [details] [diff] [review]
valgrind.patch

Review of attachment 8942859 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
@@ +1514,5 @@
>      return NS_OK;
>    }
> +  if (gShuttingDownThread) {
> +    *didLookup = false;
> +    return NS_ERROR_ABORT;

This lgtm. But I am still a bit concerned about the root cause of the leak.
Given that adding this could fix the leak (or at lease slim the leak). Because we add a shutdown check to mWorkerProxy->Lookup https://dxr.mozilla.org/mozilla-esr52/rev/e8ff30ea74ad56ef53d435613672f05ff7675740/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#1577, we could guess that the leak is related to our queue (thread queue and lookup request queue).
The leak log looks weird, I don't see anything from that. But if it relates to thread queue, I think it would be worth to add the shutdown check to other calls contain mWorkerProxy like GetTables
Attachment #8942859 - Flags: review?(tnguyen) → review+
(In reply to Thomas Nguyen[:tnguyen] ni? plz from comment #49)
> This lgtm. But I am still a bit concerned about the root cause of the leak.
> Given that adding this could fix the leak (or at lease slim the leak).
> Because we add a shutdown check to mWorkerProxy->Lookup
> https://dxr.mozilla.org/mozilla-esr52/rev/
> e8ff30ea74ad56ef53d435613672f05ff7675740/toolkit/components/url-classifier/
> nsUrlClassifierDBService.cpp#1577, we could guess that the leak is related
> to our queue (thread queue and lookup request queue).
> The leak log looks weird, I don't see anything from that. But if it relates
> to thread queue, I think it would be worth to add the shutdown check to
> other calls contain mWorkerProxy like GetTables

Thanks for your review, thomas.
I agree it would worth to add shutdown check to other calls, also maybe need to check why
we don't have this in central.

But for uplift, I want to make the changes as few as possible, so I'll only add one more in this case.

try looks good:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=87c81806244cf581865306927a49bb1319f4cbf6
Comment on attachment 8942859 [details] [diff] [review]
valgrind.patch

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
Not a sec bug
User impact if declined:
No, this fix a try error. The memory leaks only occurs while shutdown 
Fix Landed on Version:
This fix previous try error for the uplift patch, it doesn't include in other branch.
Risk to taking this patch (and alternatives if risky): 
I think this patch should be safe seems it just add one more shutdown check to prevent memory leak.
String or UUID changes made by this patch: 
No
Attachment #8942859 - Flags: approval-mozilla-esr52?
Comment on attachment 8942859 [details] [diff] [review]
valgrind.patch

Low risk, ESR52+
Attachment #8942859 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
You need to log in before you can comment on or make changes to this bug.