Closed
Bug 1021274
Opened 10 years ago
Closed 10 years ago
Worker does not receive message in Massive
Categories
(Core :: DOM: Workers, defect)
Tracking
()
RESOLVED
INVALID
People
(Reporter: azakai, Unassigned)
Details
Attachments
(1 file)
3.37 MB,
application/x-zip
|
Details |
In the Massive benchmark it often stalls near the end. I reduced this as much as I could to the attached testcase. STR: 1. unzip into a dir 2. Make sure window.dump is enabled in about:config (for the debugging output discussed below) 3. Run a webserver there (e.g. python -m SimpleHTTPServer) 4. Browse to that location, start the benchmark, wait for it to stop working It should stall at sqlite-warm-preparation (happens 100% consistently for me on 2 linux machines). You can see that there is no CPU activity, and window.dumps shows logging ending with === requesting benchmark sqlite-warm-preparation posted 1402002933553 later 1402002933553 === Dumps come from driver.js and sqlite/benchmark-worker.js. The first of those 3 lines is when we are about to send a message to the worker, the second is after we post, the third done later on the main thread, showing that time passed and the main event loop is running ok. Yet the stall happens, and the worker never receives the message. We expect to see "worker received msg" dumped from the worker when the message arrives, which does not show up - which shows that the main thread sent a message, but it does not reach the worker. When stalled, the line with sqlite-warm-preparation shows "(..running..)". To "break" the stall, opening and closing the web console will work. Then a number will show up next to sqlite-warm-preparation. The benchmark will then halt with "(..running..)" on the next line, box2d-variance, which *IS* expected (that benchmark is not included in this testcase), and is the proper way for the testcase to stop. This testcase works in chrome, and works if the web console is opened and closed after the stall, which is why I suspect a bug in message passing code.
Reporter | ||
Comment 1•10 years ago
|
||
When the stall is broken, the dump output continues to show === worker received msg requesting benchmark sqlite-warm-preparation posted 1402003799553 later 1402003799553 worker received msg requesting benchmark box2d-variance posted 1402003799637 later 1402003799637 requesting benchmark box2d-variance JavaScript error: http://localhost:8003/driver.js, line 268: jobMap['box2d-throughput'] is undefined === which is the expected output (the last error is because box2d is not included here).
Reporter | ||
Comment 2•10 years ago
|
||
I tried to see if the worker responds after being created, and it does not. It seems to just be in a zombie-like state. I also tried to kill it and create another worker as a workaround, but the other workers hit the same problem. It's like at some point, creating new workers is not going to work.
Reporter | ||
Comment 3•10 years ago
|
||
Hi bent, khuey, this seems to be a bug where a web page creating lots of workers eventually finds they are unusable, and I can't seem to find a workaround. This is blocking Massive, a benchmark project for asm.js I am working on. I would really appreciate it if you could take a look.
There's a per-domain limit of 20 workers to prevent DOSing the system. Are they hitting it?
Reporter | ||
Comment 5•10 years ago
|
||
Definitely over 20 workers are created, however only one is active at a time, we call terminate() before creating the next. Perhaps that isn't good enough though? It doesn't feel like we're hitting a hard limit: Sometimes this works, and if not then opening the web console prods it into working, as mentioned above. Or is it just that sometimes gc happens to occur fast enough for the limit not to be hit (and opening the console triggers a gc or something like that)?
Calling terminate is enough to free up the trhread. That's much better than relying on the GC.
Reporter | ||
Comment 7•10 years ago
|
||
Ok, thanks, with that information I took another look and it seems I had a bug where terminate() was not always immediately called. With that fixed, it looks like things work ok. I wonder if we could throw an error on the 21st live worker creation? Or emit a warning to the console "more then 20 workers active"? Currently it fails silently (the worker never starts to run code) which is confusing.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INVALID
Yeah, we should definitely warn the console, at least.
Reporter | ||
Comment 9•10 years ago
|
||
Ok, filed bug 1037725.
You need to log in
before you can comment on or make changes to this bug.
Description
•