Closed Bug 1027612 Opened 10 years ago Closed 8 years ago

OdinMonkey: browser stops when loading asm.js code, maybe cache related.

Categories

(Core :: JavaScript Engine: JIT, defect)

ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox30 --- unaffected
firefox31 --- unaffected
firefox32 --- affected
firefox33 --- affected
firefox-esr24 --- unaffected

People

(Reporter: dougc, Unassigned)

References

Details

Noticed problems reloading asm.js code using Firefox for Android, m-c unpatched local build, Nexus 4 Android 4.4.3.

How to repeat:

1. Load an asm.js demo with a clean profile. Note it compiles but reports it has not been stored it in the cache - perhaps due to it being a large unminified source file.

2. Close the tab, and reopen the same demo. There is a little activity in the progress bar in the browser but it then stops loading and cpu usage drops. There is no message in the log reporting that the asm.js code has been loaded.

3. Close the tab, and clear the cache in the privacy settings.

4. Reopen the same demo. It compiles again and loads.

I see similar problems on b2g on the Flame, loading of asm.js pages just stops, but this might be unrelated.
Are these demos with large heaps?  If so, this sounds like the usual symptoms of an OOM: at step 2, the heap from step 1 is likely still in memory, so the second heap allocation fails.  By step 4, GC has cleared out the heap.  You could confirm this by looking at the error console log during step 2 (looking for "out of memory").

If OOM is what is happening here, it'd be good to understand why the OOM-mitigation path added in bug 936236 isn't working (it should force a full sync GC that clears out any dead tabs).  Could be bug 865959.
Thank you for the clues. I'll probably need to instrument it to make progress, but not now.

The heaps are large, but not huge. The device has 2G RAM. I have seen some OOM messages, but not all the time when it stops. Not all the OOMs seem right, the device still has free memory, but maybe it hits some fragmentation problems. The problem persists if the browser is closed and reopened which would exclude a GC issue, but the problem is not 100% reproducible.

If the browser is opened fresh, and the cache cleared, then the demo usually loads.

If the browser is opened fresh, and the cache not cleared, then loading usually stops.
(In reply to Douglas Crosher [:dougc] from comment #2)
> The heaps are large, but not huge. The device has 2G RAM.

It's pretty easy for even a small bit of fragmentation to cause allocations to fail even when there is 2x as much free.

> I have seen some OOM messages, but not all the time when it stops.

It'd be good to double check this... perhaps the OOM messages are getting lost or happening in some part of Gecko that doesn't report as well as the JS engine?

Also, does this only happen with unminified (and, I assume, quite large) asm.js modules?  It could be that compilation itself is using up all the memory and caching is adding a bit of memory pressure that pushes it over the edge.  One way to test this would be, at the end of (I assume) CheckFunctionsParallel, sum the size of all the allocated memory in all the LifoAllocs in 'tasks' (there is no function to do this in LifoAlloc, but you can add one by copying 'used()' and taking out the "if (chunk == latest) break;").  This problem is often exacerbated by parallel compilation and the fact that Emscripten puts all the biggest functions first.  A fix we've discussed is to throttle the number of outstanding compilation jobs based on total LifoAlloc usage and the size of physical memory.
Oh, a quick way partial test of the theory in comment 3 is to try setting javascript.options.ion.offthread_compilation = false.
(In reply to Luke Wagner [:luke] from comment #4)
> Oh, a quick way partial test of the theory in comment 3 is to try setting
> javascript.options.ion.offthread_compilation = false.

Thanks again. A quick test of this did not resolve the problem. It still stops after freshly starting the browser and loading the asm.js page unless the cache is cleared. I'll explore some of the other suggestions, and check some other releases to try and get some more clues.
Oh, I forgot, that config option just changed names.  If you are testing release/beta (maybe aurora), you'll need the old one which is javascript.options.ion.parallel_compilation.
Bisected this issue to:

changeset:   183410:d61ae091de9c
user:        Honza Bambas <honzab.moz@firemni.cz>
date:        Thu May 15 16:31:26 2014 -0700
summary:     Bug 913806 - turn HTTP cache v2 on by default, r=jduell

Will check if reverting this helps m-c tip, and on b2g.
So, what is the result?

To disable the new cache, you can just switch "browser.cache.use_new_backend_temp" pref to "false".  Maybe also make sure that "browser.cache.use_new_backend" is at "0".
Flags: needinfo?(dtc-moz)
(In reply to Honza Bambas (:mayhemer) from comment #8)
> So, what is the result?
> 
> To disable the new cache, you can just switch
> "browser.cache.use_new_backend_temp" pref to "false".  Maybe also make sure
> that "browser.cache.use_new_backend" is at "0".

Sorry I have not had time to explore this far. Tried the 'parallel compilation' suggestion and it did not make much difference. Disabled the new cache and it helped. Seeing OOM crashes, so suspected running out of virtual memory or fragmentation, so compiled a custom kernel with an address map that gives more room for mmap and this helped.

Have not seen this using the ARM simulator running on Linux.

Firefox for Android and b2g on the Flame have so many issues that it's a challenge to isolate. I just disabled the new cache and any asm.js caching to work on other issues.

I am still seeing progress stop on Firefox for Android, no cpu usage, no OOM reports, even after a cold start to minimise fragmentation risk. Some problem here.

It might not be the new cache, but I just do not know yet. Shall follow up.
Lack of progress here and probably WFM in the meantime.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(dtc-moz)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.