Open Bug 1503704 Opened 6 years ago Updated 1 month ago

Very high CPU usage on fast.com

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

Performance Impact low
Tracking Status
firefox65 --- affected

People

(Reporter: rowbot, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf:resource-use, Whiteboard: [necko-triaged])

STR:
1) Visit https://fast.com
2) Run the speed test.

When running the speed test, my CPU usage is at 100%. This is on an Intel Core-i5 2500k. Internet speed is roughly 1 Gbps. Not sure if this is normal for my CPU, which is kind of old now or if this is a performance problem in Firefox or Windows. The profile is showing some pretty long wait times on the DNS resolver threads. Let me know if I can provide anymore information.

Profile: http://bit.ly/2yHS12C
I should also add that this is on Windows 10 64-bit with the latest version of Nightly.
Markus, I'm looking at the profile, but it's not clear how to recognize which thread(s) is behind the load.  I can see some dispatch delays on the socket thread and also on the content (main?) thread.  Is that the ultimate indicator?

It's been a long time I was looking at perf-help.  The new cleopatra is so pretty!!
The Gecko profiler is not well suited to debug CPU usage. It's more targeted at finding sources of jank, and for looking at how  things are scheduled.

I do see though that there's a lot of activity on the parent process main thread: https://perfht.ml/2RusYr4
That's the thread that's most critical to responsiveness and should be kept as idle as possible. But it's busy running all kinds of code in response to the data that's arriving over the network, especially in these two time ranges: https://perfht.ml/2RqAl2w and https://perfht.ml/2RzomQz

In the first of those sections:
 - it's telling the new tab page about icons: https://perfht.ml/2RvYxkl
 - it's servicing IndexedDB requests, probably for kinto (bug 1502146): https://perfht.ml/2RsVh9b
 - it's allocating + deallocating memory when processing network responses: https://perfht.ml/2RtfR9k
 - there's lots of event loop and thread coordination overhead from tons of tiny events

In the second section, it's:
 - notifying WebRequest listeners in extensions: https://perfht.ml/2RqQuVR
 - actually I think that's all it's doing. There's just tons of messages and promise callbacks it's coordinating with the extension which it's notifying.

Oh, and the extension is uBlock origin, whose code is running in the WebExtension process, which is called "Content (5 of 5)" in this profile: https://perfht.ml/2RnRuK8
However, they may be more CPU usage happening on other threads which are not profiled by default. On Linux, the system "perf" profiler is a good profiler to investigate such issues.
Here is a profile in a clean Fx profile: https://perfht.ml/2yKTvJQ

I still see 100% CPU usage on my end. Let me know if there are other threads you would like me to include in a profile.
Hi Vicky,
Could you find someone on your team to take a look first?
Thanks.
Flags: needinfo?(vchin)
Whiteboard: [qf]
Flags: needinfo?(vchin)
Whiteboard: [qf] → [qf:p3:resource]
Here's a profile recorded with the "perf" tool (sorry for the long URL, this is what perf-html.io is giving me right now):
https://perf-html.io/public/f6040fb789fbf772e0887327616349bb742ec901/calltree/?globalTrackOrder=0-1-2-3-4-5-6-7-8-9&localTrackOrderByPid=27831-0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29-30-31-32-33-34-35-36-37-38-39-40-41-42-43-44-45-46-47-48-49-50-51-52-53-54-55-56-57-58-59-60-61-62-63-64-65-66-67-68-69-70-71-72-73-74-75-76-77-78-79-80-81-82-83-84-85-86-87-88-89-90-91-92-93-94-95-96-97-98-99-100-101-102-103-104-105-106-107-108-109-110-111-112-113-114-115-116-117-118-119-120-121-122-123-124-125-126-127-128-129-130-131-132-133-134-135-136-137-138-139-140-141-142-143-144-145-146-147-148-149-150-151-152-153-154-155-156-157-158-159-160-161-162-163-164-165-166-167-168-169-170-171-172-173-174-175-176-177-178-179-180-181-182-183-184-185-186-187-188-189-190-191-192-193-194-195-196-197-198-199-200-201-202-203-204-205-206-207-208-209-210-211-212-213-214-215-216-217-218~27832-0~27834-0~27859-0~27905-0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29-30-31-32-33-34-35-36-37-38-39-40-41-42-43-44-45-46-47-48-49-50-51-52-53-54-55-56-57-58-59-60-61-62-63-64-65-66-67-68-69-70-71-72-73-74-75-76-77-78-79-80-81-82-83-84-85-86-87-88-89-90-91-92-93-94-95-96-97-98-99-100-101-102-103-104-105-106-107-108-109-110-111-112-113-114-115-116-117-118-119-120-121-122-123-124-125-126-127-128-129-130-131-132-133-134-135-136-137-138-139-140-141-142-143-144-145-146-147-148-149-150-151-152-153-154-155-156-157-158-159-160-161-162-163-164-165-166-167-168-169-170-171-172-173-174-175-176-177-178-179-180-181-182-183-184-185-186-187-188-189-190-191-192-193-194-195-196-197-198-199-200-201-202-203-204-205-206-207-208-209-210-211-212-213-214-215-216-217-218-219-220-221-222-223-224-225-226-227-228-229-230-231-232-233-234-235-236-237-238-239-240-241-242-243-244-245-246-247-248-249-250-251-252-253-254-255-256-257-258-259-260-261-262-263-264-265-266-267-268-269-270-271-272-273-274-275-276-277-278-279-280-281-282-283-284-285-286-287-288-289-290-291-292-293-294-295-296-297-298-299-300-301-302-303-304-305-306-307-308-309-310-311-312-313-314-315-316-317-318-319-320-321-322-323-324-325-326-327-328-329-330-331-332-333-334-335-336-337-338-339-340-341-342-343-344-345-346-347-348-349-350-351-352-353-354-355-356-357-358-359-360-361-362-363-364-365-366-367-368-369-370-371-372-373-374-375-376-377-378-379-380-381-382-383-384-385-386-387-388-389-390-391-392-393-394-395-396-397-398-399-400-401-402-403-404-405-406-407-408-409-410-411-412-413-414-415-416-417-418-419-420-421-422-423-424-425-426-427-428-429-430-431-432-433-434-435-436-437-438-439-440-441-442-443-444-445-446-447-448-449-450-451-452-453-454-455-456-457-458-459-460-461-462-463-464-465-466-467-468-469-470-471-472-473-474-475-476-477-478-479-480-481-482-483-484-485-486-487-488-489-490-491-492-493-494-495-496-497-498-499-500-501-502-503-504-505-506-507-508-509-510-511-512-513-514-515-516-517-518-519-520-521-522-523-524-525-526-527-528-529-530-531~27948-0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29~27989-0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22~28091-0-1-2-3-4-5-6-7-8-9-10-11-12~29709-0-1~29712-0-1~&thread=127&v=3
This profile shows a number of things responsible for the CPU usage:
 - The main threads of the parent process and of the content process are doing a bunch of work to handle the large number of input stream ready events. Some of this work is in JS so it's easier to look at the profile from comment 5 for the main threads:
   - A lot of it seems to be "death by a thousand cuts" simply from event loop overhead + scheduling + copying + notifying.
   - In the content process, the DOM API Performance.getEntriesByName shows up pretty heavily, and it seems to be spending most of its time sorting entries.
 - The IPDL background thread in the parent process is using a bunch of CPU just forwarding data along to the content process.
 - The IOThread uses some amount of CPU, but most of it seems to be in the kernel, not sure what's so expensive there. Could be event loop overhead again.
 - The URL Classifier thread shows some serious CPU usage, mostly updating hashes, and in locking overhead.
 - The Socket Thread in the parent process is showing a lot of CPU usage. I can't identify a single culprit.
 - The Socket Thread in the content process is also showing a lot of CPU usage, almost completely due to event loop / runnable overhead.
 - A ton of StreamTransport threads seem to be getting created (and probably destroyed) during the course of this run, both in the parent process and in the content process. The parent process goes from StreamTrans#52 all the way up to StreamTrans#260 in the window from 18.5s to 21s, that's a rate of about 80 threads being created per second. The content process goes from StreamTrans#1 all the way to StreamTrans#1363 in the window from 5.5s to 13.3s, which is a rate of 175 threads being created per second.

I think, of all the issues identified above, the large number of StreamTransport threads is the most concerning and the most actionable. If this could use a thread pool of limited size, that should help a fair amount.

We should also check if Performance.getEntriesByName really has to sort a list every time it is called.

And it would be nice to make each individual load completion cheaper, by reducing the number of copies and threads involved. It would be very nice to not have to go through the parent process's main thread for each load, but I don't know how realistic that is.

Oh, and the URL classifier thread's locking behavior might warrant investigation.
This is something to be analyzed and maybe split to sub-bugs.  Would be good to look at this soon while we (or somebody) can reproduce.
Priority: -- → P2
Whiteboard: [qf:p3:resource] → [qf:p3:resource][necko-triaged]

Hey everyone,

I happened to have this computer turned on this weekend (it's no longer my daily driver) and figured I'd check to see if there were any changes. I updated to the latest nightly and I updated my Ethernet Controller driver and Firefox's CPU usage dropped by roughly 15%, which is good, though my entire computer still reports 100% CPU usage.

I took 2 new profiles, both on clean profiles. The second profile is quite large as I ticked the option for "StreamTrans" in the Gecko Profiler since Markus brought up StreamTransport threads. I'm no expert at reading these things, but it looks like there are over 9200 StreamTransport threads in a 32 second window in the second profile. That's roughly 287 threads per second.

https://perfht.ml/2Bjt6Uq
https://perfht.ml/2Bjy8QX (StreamTrans enabled)

The number of stream transport threads will very likely be solved with bug 1535361. But even with that patch I still can see (in an opt build) a huge CPU load (two cores full load), so it's not a factor.

See Also: → 1716156
Performance Impact: --- → P3
Whiteboard: [qf:p3:resource][necko-triaged] → [necko-triaged]
Severity: normal → S3

Current profile: https://share.firefox.dev/49MAd5C
Threadripper, 1Gbps connection (fast.com says ~900Mbs).
CPU usage during transfer ~30-40% socket thread, 5-10% MainThread

(In reply to Markus Stange [:mstange] from comment #8)

  • A ton of StreamTransport threads seem to be getting created (and probably
    destroyed) during the course of this run, both in the parent process and in
    the content process. The parent process goes from StreamTrans#52 all the way
    up to StreamTrans#260 in the window from 18.5s to 21s, that's a rate of
    about 80 threads being created per second. The content process goes from
    StreamTrans#1 all the way to StreamTrans#1363 in the window from 5.5s to
    13.3s, which is a rate of 175 threads being created per second.

This part of the issue may have been related to bug Bug 1667581. If we're still creating a large amount of StreamTransport threads, we should look into it - could be a similar but different issue.

See Also: → 1667581
You need to log in before you can comment on or make changes to this bug.