Closed Bug 1286895 Opened 8 years ago Closed 8 years ago

No limit on number of dedicated JS web workers in Firefox

Categories

(Core :: DOM: Workers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox51 --- fixed

People

(Reporter: egoktas, Assigned: baku)

References

Details

Attachments

(1 file, 1 obsolete file)

Tested on Linux, there is no limit on the number of JS dedicated web worker in Firefox.
Although there is a limit on the number of web workers per page or iframe, there is no limit on the total number of web workers in an opened tab. It is possible to spawn many web workers by spawning many iframes, of which each spawn the maximum amount of web workers. 
Since each new dedicated web worker corresponds to a new thread in the process, spawning many web workers essentially sprays the memory with stacks. This increases the easiness of finding a stack in the program memory. Preventing the spawning of unlimited can mitigate this issue. Think of stack spraying as heap spraying.
Summary: No limit on number of web workers in Firefox → No limit on number of dedicated JS web workers in Firefox
Component: Security → DOM: Workers
Product: Firefox → Core
We used to limit worker threads but recently removed that to conform to how other browsers operate.
I see. Would it hurt if we would cap the number of web workers in a page to a certain amount, e.g. 100? I think it is an easy mitigation against stack spraying.
From my experience other browsers allow tens of web workers, while FF allows thousands of web workers.
(In reply to Enes Göktaş [:egoktas] from comment #2)
> From my experience other browsers allow tens of web workers, while FF allows
> thousands of web workers.

Are you using something to measure this?  My impression chrome did not limit workers.

What is the issue with "stack spraying"?
I did some empirical tests to see how many web workers other browsers would allow.
On Chrome I noticed that it uses so much memory for every new web worker, such that after spawning tens of web workers the process runs out of memory for web workers and throws an error. So there is like a soft limit.
Spawning many stacks increases the chance of being able to randomly hit a stack. In the case of being able to spawn an unlimited amount of stacks, the chance of missing a stack can get to zero. This gets more significant when for example a defense mechanism like SafeStack would get deployed. SafeStack is a new feature in LLVM that separates the return addresses and some other safe values from the actual stack to protect against buffer overflows. It ensures there are no pointers to it and to protect against arbitrary writes it randomizes the location of the separated stack(safestack). Stack spraying can potentially reduce its security.

Also spawning many threads would slow down the system and potentially bring it to a halt if it reaches the total number of processes, e.g. on Linux. Imagine a page injected with a malicious script that just aims at DoSsing the system.
Thanks for the explanation.

Sounds like we should put the limit back in with some quite large constant.  For example, limit threads to 256 or 512.  Andrea, what do you think?
Flags: needinfo?(amarchesini)
> Sounds like we should put the limit back in with some quite large constant. 
> For example, limit threads to 256 or 512.  Andrea, what do you think?

There are many possible DoSsing a script can do to test the limit of a browser: memory allocation, workers, are just 2 of them.

The reason why we decided to remove the limit is that, if a malicious page wants to crash a browser, it can use multiple domains to workaround the limit per domain.
Flags: needinfo?(amarchesini)
But I think the issue here is a bit different from normal DoS:

1) You can DoS the entire OS by exhausting process space.  Effectively a fork bomb.
2) The ability to spam stacks reduces the effectiveness of the ASLR security technique. [1]

These suggest we should have *some* limit, even if its very high.

[1]: https://en.wikipedia.org/wiki/Address_space_layout_randomization
See comment 7.
Flags: needinfo?(amarchesini)
ok. I think, having a limit up to 512 should be more than enough for any webpage. I'll reintroduce the limit.
Maybe adding some new telemetry IDs (in case the old ones have been removed) to see if this limit is reach by some website.
Flags: needinfo?(amarchesini)
Thanks!
Assignee: nobody → amarchesini
Status: NEW → ASSIGNED
Attached file backed_out_changeset_feb0e93ac30d (obsolete) —
This is just the backout of bug 1280241 with these changes:

1. 512 limit
2. a telemetry ID for dedicated workers.
Attachment #8782789 - Flags: review?(bkelly)
I could not find the number 512 in this patch.
wrong patch.
Attachment #8782789 - Attachment is obsolete: true
Attachment #8782789 - Flags: review?(bkelly)
Attachment #8782803 - Flags: review?(bkelly)
Attachment #8782803 - Attachment is patch: true
Comment on attachment 8782803 [details] [diff] [review]
backed_out_changeset_feb0e93ac30d

Review of attachment 8782803 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.  I did not reverse engineer the mechanism completely since we are restoring code that use to run before.  If you made extensive changes, please separate the backout patch from your additions.  (I assume you did not make extensive changes, though.)

Thanks!

::: dom/workers/RuntimeService.cpp
@@ +2113,5 @@
>  RuntimeService::CancelWorkersForWindow(nsPIDOMWindowInner* aWindow)
>  {
>    AssertIsOnMainThread();
>  
> +  AutoTArray<WorkerPrivate*, MAX_WORKERS_PER_DOMAIN> workers;

Looks like these are about 4k on the stack.  I guess thats ok.

@@ +2456,5 @@
>        numberOfProcessors = 1; // Must be one there somewhere
>      }
> +    uint32_t clampedValue = std::min(uint32_t(numberOfProcessors),
> +                                     gMaxWorkersPerDomain);
> +    clampedHardwareConcurrency.compareExchange(0, clampedValue);

I thought we wanted this clamped to a smaller value to reduce finger-printing.  Is that not the case?

::: modules/libpref/init/all.js
@@ +139,5 @@
>  pref("dom.select_events.enabled", true);
>  
>  // Whether or not Web Workers are enabled.
>  pref("dom.workers.enabled", true);
> +// The number of workers per domain allowed to run concurrently.

Maybe extend the comment that we're going for effectively infinite, while preventing abuse.

::: toolkit/components/telemetry/Histograms.json
@@ +8478,5 @@
>    },
> +  "SERVICE_WORKER_SPAWN_GETS_QUEUED": {
> +    "alert_emails": ["amarchesini@mozilla.com"],
> +    "bug_numbers": [1286895],
> +    "expires_in_version": "58",

Are the guidelines for when to expire telemetry?  Any reason to expire these at all?  Hardware is going to change over time so we may want these in place.
Attachment #8782803 - Flags: review?(bkelly) → review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f4c8b3de527e
Reintroduce a limit on number of dedicated JS web workers in Firefox, r=bkelly
Backed this out for crashing in test_bug1241485.html, at least on Windows 7 opt:

https://hg.mozilla.org/integration/mozilla-inbound/rev/0a3247080eba

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=f4c8b3de527ee6b47a3167248e8a398b6254f119
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=34342370&repo=mozilla-inbound

01:28:20  WARNING -  TEST-UNEXPECTED-FAIL | dom/workers/test/test_bug1241485.html | application terminated with exit code 1
01:28:20     INFO -  runtests.py | Application ran for: 0:00:13.246000
01:28:20     INFO -  zombiecheck | Reading PID log: c:\users\cltbld\appdata\local\temp\tmpocpsz3pidlog
01:28:20     INFO -  mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/F__ynWzdSZeb0aeG5T-izA/artifacts/public/build/firefox-51.0a1.en-US.win32.crashreporter-symbols.zip
01:28:25     INFO -  mozcrash Copy/paste: C:\slave\test\build\win32-minidump_stackwalk.exe c:\users\cltbld\appdata\local\temp\tmpe1ozr8.mozrunner\minidumps\5b90142c-3ce9-46aa-95b7-84f65d563bfe.dmp c:\users\cltbld\appdata\local\temp\tmpmsng85
01:28:33     INFO -  mozcrash Saved minidump as C:\slave\test\build\blobber_upload_dir\5b90142c-3ce9-46aa-95b7-84f65d563bfe.dmp
01:28:33     INFO -  mozcrash Saved app info as C:\slave\test\build\blobber_upload_dir\5b90142c-3ce9-46aa-95b7-84f65d563bfe.extra
01:28:33  WARNING -  PROCESS-CRASH | dom/workers/test/test_bug1241485.html | application crashed [@ mozilla::dom::workers::GetCurrentThreadJSContext()]
01:28:33     INFO -  Crash dump filename: c:\users\cltbld\appdata\local\temp\tmpe1ozr8.mozrunner\minidumps\5b90142c-3ce9-46aa-95b7-84f65d563bfe.dmp
01:28:33     INFO -  Operating system: Windows NT
01:28:33     INFO -                    6.1.7601 Service Pack 1
01:28:33     INFO -  CPU: x86
01:28:33     INFO -       GenuineIntel family 6 model 30 stepping 5
01:28:33     INFO -       8 CPUs
01:28:33     INFO -  Crash reason:  EXCEPTION_ACCESS_VIOLATION_READ
01:28:33     INFO -  Crash address: 0x2d0
01:28:33     INFO -  Process uptime: 13 seconds
01:28:33     INFO -  Thread 393 (crashed)
01:28:33     INFO -   0  xul.dll!mozilla::dom::workers::GetCurrentThreadJSContext() [RuntimeService.cpp:f4c8b3de527e : 1291 + 0x5]
01:28:33     INFO -      eip = 0x61bba83b   esp = 0x749df80c   ebp = 0x749df900   ebx = 0x7e7a0100
01:28:33     INFO -      esi = 0x00000000   edi = 0x19e53b20   eax = 0x00000000   ecx = 0x0000002c
01:28:33     INFO -      edx = 0x749df94c   efl = 0x00010246
01:28:33     INFO -      Found by: given as instruction pointer in context
01:28:33     INFO -   1  xul.dll!mozilla::dom::workers::WorkerRunnable::Run() [WorkerRunnable.cpp:f4c8b3de527e : 289 + 0x5]
01:28:33     INFO -      eip = 0x61bdf377   esp = 0x749df810   ebp = 0x749df900
01:28:33     INFO -      Found by: call frame info
01:28:33     INFO -   2  xul.dll!nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:f4c8b3de527e : 1058 + 0x6]
01:28:33     INFO -      eip = 0x60c7d307   esp = 0x749df908   ebp = 0x749df970
01:28:33     INFO -      Found by: call frame info
01:28:33     INFO -   3  xul.dll!NS_ProcessNextEvent(nsIThread *,bool) [nsThreadUtils.cpp:f4c8b3de527e : 290 + 0xd]
01:28:33     INFO -      eip = 0x60c937bc   esp = 0x749df978   ebp = 0x749df984
01:28:33     INFO -      Found by: call frame info
01:28:33     INFO -   4  xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate *) [MessagePump.cpp:f4c8b3de527e : 338 + 0x8]
01:28:33     INFO -      eip = 0x60f0c64c   esp = 0x749df98c   ebp = 0x749df9ac
01:28:33     INFO -      Found by: call frame info
01:28:33     INFO -   5  xul.dll!MessageLoop::RunHandler() [message_loop.cc:f4c8b3de527e : 225 + 0x8]
01:28:33     INFO -      eip = 0x60ef948e   esp = 0x749df9b4   ebp = 0x749df9e4
01:28:33     INFO -      Found by: call frame info
01:28:33     INFO -   6  xul.dll!MessageLoop::Run() [message_loop.cc:f4c8b3de527e : 205 + 0x7]
01:28:33     INFO -      eip = 0x60ef9281   esp = 0x749df9ec   ebp = 0x749dfa04
01:28:33     INFO -      Found by: call frame info
01:28:33     INFO -   7  xul.dll!nsThread::ThreadFunc(void *) [nsThread.cpp:f4c8b3de527e : 459 + 0x7]
01:28:33     INFO -      eip = 0x60c804a0   esp = 0x749dfa0c   ebp = 0x749dfa24
01:28:33     INFO -      Found by: call frame info
01:28:33     INFO -   8  nss3.dll!_PR_NativeRunThread [pruthr.c:f4c8b3de527e : 397 + 0x6]
01:28:33     INFO -      eip = 0x6a011520   esp = 0x749dfa2c   ebp = 0x749dfa40
01:28:33     INFO -      Found by: call frame info
01:28:33     INFO -   9  nss3.dll!pr_root [w95thred.c:f4c8b3de527e : 95 + 0xa]
01:28:33     INFO -      eip = 0x6a006b2c   esp = 0x749dfa48   ebp = 0x749dfa4c
01:28:33     INFO -      Found by: call frame info
01:28:33     INFO -  10  ucrtbase.dll!_crt_at_quick_exit + 0x104
01:28:33     INFO -      eip = 0x69d162a4   esp = 0x749dfa54   ebp = 0x749dfa88
01:28:33     INFO -      Found by: call frame info
01:28:33     INFO -  11  kernel32.dll!BaseThreadInitThunk + 0x12
01:28:33     INFO -      eip = 0x77743c45   esp = 0x749dfa90   ebp = 0x749dfa94
01:28:33     INFO -      Found by: call frame info
01:28:33     INFO -  12  ntdll.dll!__RtlUserThreadStart + 0x27
01:28:33     INFO -      eip = 0x77c837f5   esp = 0x749dfa9c   ebp = 0x749dfad4
01:28:33     INFO -      Found by: call frame info
01:28:33     INFO -  13  ntdll.dll!_RtlUserThreadStart + 0x1b
01:28:33     INFO -      eip = 0x77c837c8   esp = 0x749dfadc   ebp = 0x749dfaec
01:28:33     INFO -      Found by: call frame info
Flags: needinfo?(amarchesini)
Furthermore Windows XP and 7 opt + pgo failed with dom/workers/test/test_transferable.html | uncaught exception - uncaught exception: out of memory at undefined:

https://treeherder.mozilla.org/logviewer.html#?job_id=34343632&repo=mozilla-inbound
Sorry, meant only Windows XP.
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1a91a95d5db4
Reintroduce a limit on number of dedicated JS web workers in Firefox, r=bkelly
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9e935bd97a1c
Remove a test for checking the number of created workers, r=me
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/328618be592c
Fix a talos regression, r=me
https://hg.mozilla.org/mozilla-central/rev/1a91a95d5db4
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Flags: needinfo?(amarchesini)
Comment on attachment 8782803 [details] [diff] [review]
backed_out_changeset_feb0e93ac30d

can you do a data review for this patch?
Attachment #8782803 - Flags: review?(francois)
Comment on attachment 8782803 [details] [diff] [review]
backed_out_changeset_feb0e93ac30d

Review of attachment 8782803 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, except for the expiry. Please use 56.

::: toolkit/components/telemetry/Histograms.json
@@ +8478,5 @@
>    },
> +  "SERVICE_WORKER_SPAWN_GETS_QUEUED": {
> +    "alert_emails": ["amarchesini@mozilla.com"],
> +    "bug_numbers": [1286895],
> +    "expires_in_version": "58",

The guidelines are here: https://wiki.mozilla.org/Firefox/Data_Collection

In this case, can you change 58 for 56? It's not a problem renewing this later if you are still using the data.
Attachment #8782803 - Flags: review?(francois) → review-
> > +    "expires_in_version": "58",

Actually I prefer to have 'never' here. In this patch I increased the number of running workers per domain up to 512.
If a domain has more than 512 workers, the following one are queued. It seems reasonable to have expire: never because this telemetry ID is meant to be used as alert. Feedback?
Flags: needinfo?(francois)
Benjamin, what would you recommend (re: comment 26)?
Flags: needinfo?(francois) → needinfo?(benjamin)
Has somebody committed to building and maintaining that alerting mechanism? If so this seems like a reasonable permanent quality metric.
Flags: needinfo?(benjamin)
Depends on: 1468036
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: