Closed Bug 1021274 Opened 11 years ago Closed 11 years ago

Worker does not receive message in Massive

Tracking

()

Status:

RESOLVED INVALID

People

(Reporter: azakai, Unassigned)

Details

Attachments

(1 file)

testcase.zip 11 years ago Alon Zakai (:azakai) 3.37 MB, application/x-zip		Details

Alon Zakai (:azakai)

Reporter

Description

•

11 years ago

Attached file testcase.zip — Details

In the Massive benchmark it often stalls near the end. I reduced this as much as I could to the attached testcase. STR: 1. unzip into a dir 2. Make sure window.dump is enabled in about:config (for the debugging output discussed below) 3. Run a webserver there (e.g. python -m SimpleHTTPServer) 4. Browse to that location, start the benchmark, wait for it to stop working It should stall at sqlite-warm-preparation (happens 100% consistently for me on 2 linux machines). You can see that there is no CPU activity, and window.dumps shows logging ending with === requesting benchmark sqlite-warm-preparation posted 1402002933553 later 1402002933553 === Dumps come from driver.js and sqlite/benchmark-worker.js. The first of those 3 lines is when we are about to send a message to the worker, the second is after we post, the third done later on the main thread, showing that time passed and the main event loop is running ok. Yet the stall happens, and the worker never receives the message. We expect to see "worker received msg" dumped from the worker when the message arrives, which does not show up - which shows that the main thread sent a message, but it does not reach the worker. When stalled, the line with sqlite-warm-preparation shows "(..running..)". To "break" the stall, opening and closing the web console will work. Then a number will show up next to sqlite-warm-preparation. The benchmark will then halt with "(..running..)" on the next line, box2d-variance, which *IS* expected (that benchmark is not included in this testcase), and is the proper way for the testcase to stop. This testcase works in chrome, and works if the web console is opened and closed after the stall, which is why I suspect a bug in message passing code.

Alon Zakai (:azakai)

Reporter

Comment 1

•

11 years ago

When the stall is broken, the dump output continues to show === worker received msg requesting benchmark sqlite-warm-preparation posted 1402003799553 later 1402003799553 worker received msg requesting benchmark box2d-variance posted 1402003799637 later 1402003799637 requesting benchmark box2d-variance JavaScript error: http://localhost:8003/driver.js, line 268: jobMap['box2d-throughput'] is undefined === which is the expected output (the last error is because box2d is not included here).

Alon Zakai (:azakai)

Reporter

Comment 2

•

11 years ago

I tried to see if the worker responds after being created, and it does not. It seems to just be in a zombie-like state. I also tried to kill it and create another worker as a workaround, but the other workers hit the same problem. It's like at some point, creating new workers is not going to work.

Alon Zakai (:azakai)

Reporter

Comment 3

•

11 years ago

Hi bent, khuey, this seems to be a bug where a web page creating lots of workers eventually finds they are unusable, and I can't seem to find a workaround. This is blocking Massive, a benchmark project for asm.js I am working on. I would really appreciate it if you could take a look.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 4

•

11 years ago

There's a per-domain limit of 20 workers to prevent DOSing the system. Are they hitting it?

Alon Zakai (:azakai)

Reporter

Comment 5

•

11 years ago

Definitely over 20 workers are created, however only one is active at a time, we call terminate() before creating the next. Perhaps that isn't good enough though? It doesn't feel like we're hitting a hard limit: Sometimes this works, and if not then opening the web console prods it into working, as mentioned above. Or is it just that sometimes gc happens to occur fast enough for the limit not to be hit (and opening the console triggers a gc or something like that)?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 6

•

11 years ago

Calling terminate is enough to free up the trhread. That's much better than relying on the GC.

Alon Zakai (:azakai)

Reporter

Comment 7

•

11 years ago

Ok, thanks, with that information I took another look and it seems I had a bug where terminate() was not always immediately called. With that fixed, it looks like things work ok. I wonder if we could throw an error on the 21st live worker creation? Or emit a warning to the console "more then 20 workers active"? Currently it fails silently (the worker never starts to run code) which is confusing.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → INVALID

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 8

•

11 years ago

Yeah, we should definitely warn the console, at least.

Alon Zakai (:azakai)

Reporter

Comment 9

•

11 years ago

Ok, filed bug 1037725.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Worker does not receive message in Massive

Categories

(Core :: DOM: Workers, defect)

Tracking

()

People

(Reporter: azakai, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Attachment

General

Description

File Name

Content Type