Closed Bug 1021274 Opened 10 years ago Closed 10 years ago

Worker does not receive message in Massive

Categories

(Core :: DOM: Workers, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: azakai, Unassigned)

Details

Attachments

(1 file)

Attached file testcase.zip
In the Massive benchmark it often stalls near the end. I reduced this as much as I could to the attached testcase. STR:

1. unzip into a dir
2. Make sure window.dump is enabled in about:config (for the debugging output discussed below)
3. Run a webserver there (e.g. python -m SimpleHTTPServer)
4. Browse to that location, start the benchmark, wait for it to stop working

It should stall at sqlite-warm-preparation (happens 100% consistently for me on 2 linux machines). You can see that there is no CPU activity, and window.dumps shows logging ending with

===

requesting benchmark sqlite-warm-preparation
posted 1402002933553
later 1402002933553
===

Dumps come from driver.js and sqlite/benchmark-worker.js. The first of those 3 lines is when we are about to send a message to the worker, the second is after we post, the third done later on the main thread, showing that time passed and the main event loop is running ok. Yet the stall happens, and the worker never receives the message. We expect to see "worker received msg" dumped from the worker when the message arrives, which does not show up - which shows that the main thread sent a message, but it does not reach the worker.

When stalled, the line with sqlite-warm-preparation shows "(..running..)".

To "break" the stall, opening and closing the web console will work. Then a number will show up next to sqlite-warm-preparation. The benchmark will then halt with "(..running..)" on the next line, box2d-variance, which *IS* expected (that benchmark is not included in this testcase), and is the proper way for the testcase to stop.

This testcase works in chrome, and works if the web console is opened and closed after the stall, which is why I suspect a bug in message passing code.
When the stall is broken, the dump output continues to show

===


worker received msg


requesting benchmark sqlite-warm-preparation
posted 1402003799553
later 1402003799553
worker received msg


requesting benchmark box2d-variance
posted 1402003799637
later 1402003799637


requesting benchmark box2d-variance
JavaScript error: http://localhost:8003/driver.js, line 268: jobMap['box2d-throughput'] is undefined
===

which is the expected output (the last error is because box2d is not included here).
I tried to see if the worker responds after being created, and it does not. It seems to just be in a zombie-like state.

I also tried to kill it and create another worker as a workaround, but the other workers hit the same problem. It's like at some point, creating new workers is not going to work.
Hi bent, khuey, this seems to be a bug where a web page creating lots of workers eventually finds they are unusable, and I can't seem to find a workaround. This is blocking Massive, a benchmark project for asm.js I am working on. I would really appreciate it if you could take a look.
There's a per-domain limit of 20 workers to prevent DOSing the system.  Are they hitting it?
Definitely over 20 workers are created, however only one is active at a time, we call terminate() before creating the next. Perhaps that isn't good enough though?

It doesn't feel like we're hitting a hard limit: Sometimes this works, and if not then opening the web console prods it into working, as mentioned above. Or is it just that sometimes gc happens to occur fast enough for the limit not to be hit (and opening the console triggers a gc or something like that)?
Calling terminate is enough to free up the trhread.  That's much better than relying on the GC.
Ok, thanks, with that information I took another look and it seems I had a bug where terminate() was not always immediately called. With that fixed, it looks like things work ok.

I wonder if we could throw an error on the 21st live worker creation? Or emit a warning to the console "more then 20 workers active"? Currently it fails silently (the worker never starts to run code) which is confusing.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INVALID
Yeah, we should definitely warn the console, at least.
Ok, filed bug 1037725.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: