Closed Bug 1227494 Opened 9 years ago Closed 2 years ago

Atomics.futexWait glitching

Categories

(Core :: JavaScript Engine, defect)

x86_64
macOS
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: grantgalitz, Unassigned)

References

()

Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20151123030237

Steps to reproduce:

Ran various GameBoy Advance ROMs within http://taisel.github.io/IodineGBA/

Audio was enabled, and BIOS was loaded in as well.


Actual results:

Emulator would flip between deciding to pause or play depending on clicking on UI elements and typing within the web console. Using futexWake calls inside the console was able to unstick the stuck webworkers. Restarting Firefox Nightly fixed the issue (Problem persisted across refreshes of the page, but not restarting of the browser).

This is extremely hard to reproduce, and looks like a glitched state firefox gets in with futexes.


Expected results:

Futex wakeups occurring when they should.
OS: Unspecified → Mac OS X
Hardware: Unspecified → x86_64
When did you first notice this happening?
Could you try to reproduce these glitches using Firefox with a new profile?
Component: Untriaged → Graphics
Product: Firefox → Core
Component: Graphics → JavaScript Engine
Version: 45 Branch → Trunk
This is probably a bug in your locking code, as follows.  We have two threads, T0 and T1, that are contending for the lock.  Locking and unlocking are implemented as follows in your code:

lock()
    v = exchange(flag, 1)
    if (v == 0) return
    wait(flag, 1)

unlock()
    store(flag, 0)
    wake(flag)

Suppose T0 tries to acquire the lock while T1 tries to acquire it, release it, and then acquire it again.  The following is a possible interleaving of the two threads (I'm leaving out the test, for clarity):

T0                                          T1
                                            v = exchange(flag, 1) // read 0, store 1
v = exchange(flag, 1) // read 1, store 1
                                            store(flag, 0)
                                            wake(flag)            // no effect
wait(flag, 1) // flag=0, won't wait
                                            v = exchange(flag, 1) // read 0, store 1

Now both T0 and T1 think they hold the lock, which is wrong.

The bug is that coming back from a call to futexWait is not in itself enough to guarantee that the lock is held, because we don't know the state of the lock variable.  The thread trying to acquire the lock must always repeat the acquisition attempt after being woken, ie, there needs to be a loop here:

lock()
  again:
    v = exchange(flag, 1)
    if (v == 0) return
    wait(flag, 1)
    goto again

(Your current code may not be a bug - I'm not sure - if the threads run in lock-step with first one and then the other entering the critical section, back and forth, but I've not looked deeply enough at your code to see if that's the case here.)
I should say, though, that it's worrisome if the problem is able to persist across page reloads; that could indicate that there's a separate issue having to do with how reloads are (or are not) able to break waiting futexes.
By the way, are you running with e10s (multiprocess) enabled?  (Go to Preferences, there's a checkbox near the top of the General tab.)
Flags: needinfo?(grantgalitz)
I changed the if to a while loop on the exchanges, in case of threading problems. I have not been able to reproduce this bug since though.

Yes, e10s is enabled.

The code in question hasn't experienced threading bugs since.
Flags: needinfo?(grantgalitz)
Status: UNCONFIRMED → RESOLVED
Closed: 2 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.