Closed
Bug 1334239
Opened 7 years ago
Closed 7 years ago
WebAssembly application crashes at launch, corrupted call stack and no crash handler giving a report
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
mozilla54
People
(Reporter: jujjyl, Assigned: luke)
Details
Attachments
(4 files)
5.91 KB,
patch
|
bbouvier
:
review+
|
Details | Diff | Splinter Review |
3.50 KB,
patch
|
bbouvier
:
review+
|
Details | Diff | Splinter Review |
2.48 KB,
patch
|
bbouvier
:
review+
|
Details | Diff | Splinter Review |
11.49 KB,
patch
|
luke
:
review+
lizzard
:
approval-mozilla-aurora+
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
STR: 1. Download https://autodesk.app.box.com/s/vmil3m4d4jq1dikpm1g6uz8hgwx00ce4 2. Unzip, enable javascript.options.wasm;true pref and run stingray_webgl_release.html e.g. in python -m SimpleHTTPServer (file:// might work as well) Observed: When the page loads, the content process will crash. Attempted to attach VS2015 debugger to the application (had remote.autostart/e10s disabled), and the crash does also occur and VS2015 is able to catch the crash, but the callstack is completely bogus and VS2015 says that the local stack resides outside the application module, and there's very random looking code in those locations.
Comment 1•7 years ago
|
||
Doesn't crash in the shell, but causes a RuntimeError: index out of bounds. Can repro the crash on a browser under linux x64, investigating.
Assignee | ||
Comment 2•7 years ago
|
||
Ah, I think I see the bug: we have a signature mismatch error happening when calling an imported JS function from a table (quite the corner case) and I've forgotten to call masm.wasmEmitTrapOutOfLineCode() from the GenerateImportFunction() stub. Jukka/Alon: there does seem to be a dynamic wasm error here: we're calling the 81st imported function, _glClearColor, which is imported with signature (f64,f64,f64,f64)->void but called with signature (f32,f32,f32,f32)->(). (The caller has function index 33004.) Btw, I see we import _emscripten_glClearColor *twice* with the signature (f32,f32,f32,f32)->void, but one of these imports is not used at all and the other is only called directly and not present in the (elem) section.
Flags: needinfo?(jujjyl)
Flags: needinfo?(azakai)
Reporter | ||
Comment 3•7 years ago
|
||
Hmm, what is the difference between (f32,f32,f32,f32)->() versus (f32,f32,f32,f32)->void ? Is that relevant here? In C side, glClearColor() should be GL_APICALL void GL_APIENTRY glClearColor (GLfloat red, GLfloat green, GLfloat blue, GLfloat alpha); where GL_APICALL and GL_APIENTRY expand to empty strings and GLfloat is float in C code. We expect to call the function _glClearColor(), and the function _emscripten_glClearColor() should not be called (but due to an inefficiency/bug, it is known to be present in the source code, https://github.com/kripken/emscripten/issues/4908). The function _glClearColor() is an extern import from JS. I'm not sure how it picks up the import signature in current Binaryen? We do have the glClearColor__sig: 'viiii' fields in GL code, though I think that might only apply to the glGetProcAddress() machinery.
Flags: needinfo?(jujjyl)
Reporter | ||
Comment 4•7 years ago
|
||
Oh, also, I'm surprised what could be special about glClearColor(), because a number of GL API functions take in floats/doubles, and I'd expect all of those would be wrong if this one is wrong. Is that the case?
Reporter | ||
Comment 5•7 years ago
|
||
If I do a simple test on C side with the following: > #include <emscripten/html5.h> > > extern "C" { > // void glClearColor(double red, double green, double blue, double alpha); (*) > void glClearColor(float red, float green, float blue, float alpha); > void glClear(unsigned int mask); > } > > int main() > { > EmscriptenWebGLContextAttributes attrs; > emscripten_webgl_init_context_attributes(&attrs); > attrs.majorVersion = 1; > attrs.minorVersion = 0; > EMSCRIPTEN_WEBGL_CONTEXT_HANDLE context = emscripten_webgl_create_context(0, &attrs); > emscripten_webgl_make_context_current(context); > glClearColor(1.f, 1.f, 0.f, 1.f); > glClear(0x00004000/*GL_COLOR_BUFFER_BIT*/); > return 0; > } built with > em++ a.cpp -o a.html -O3 -g2 -s BINARYEN=1 that does work properly. What gets generated as import and call are > (import "env" "_glClearColor" (func $import$19 (param f64 f64 f64 f64))) > > (call $import$19 > (f64.const 1) > (f64.const 1) > (f64.const 0) > (f64.const 1) > ) > which occurs both in float and double case (toggling the line marked with (*) above), so it looks like at least the signature that the function is declared in C side does not affect this.
Comment 6•7 years ago
|
||
Hmm, when I use binaryen to wasm-dis that wasm, I see * we indeed import it once as f32 params, once as f64 params * the first (f32s) is never called. recent binaryen should dce it out, so if this was built with a recent version, we should investigate that * the second (f64s) is called from $legalfunc$_emscripten_glClearColor, which receives f32 inputs, promotes them to f64, then calls that second import (it "legalizes" the f32 type for the call out to JS, where we accept only the basic asm.js ffi types) * $legalfunc$_emscripten_glClearColor in turn is placed in the function table, and can be called from there. So C code would have a function pointer to a function with f32s - as that is how it is declared in C - which is $legalfunc$_emscripten_glClearColor, which then converts and calls out to the f64 import. So C code calls with f32s, the stub converts, and we call out to JS with f64s. That all seems to be ok - I don't see a method imported one way and called another. Perhaps there is a difference between the disassembly of wasm-dis and spidermonkey's? How can I run spidermonkey's to compare?
Flags: needinfo?(azakai)
Comment 7•7 years ago
|
||
Jukka - in your last example, I think a difference from the original testcase is that glClearColor is only called by a function pointer there (maybe from glGetProcAddress or such?).
Assignee | ||
Comment 8•7 years ago
|
||
(In reply to Jukka Jylänki from comment #3) > (f32,f32,f32,f32)->() versus (f32,f32,f32,f32)->void ? Is that relevant here? Sorry, just I should've typed "void" in both cases. (In reply to Alon Zakai (:azakai) from comment #6) From what I'm seeing, _glClearColor is the 81st function import and it's imported with signature (f64,f64,f64,f64) and it's going into the (elem) section as element 600.
Assignee | ||
Comment 9•7 years ago
|
||
Although this isn't the bug, I noticed that the sig-id comparison should be pointer-sized.
Comment 10•7 years ago
|
||
Oh, right, there is both glClearColor and emscripten_glClearColor. For glClearColor, it is imported as f64 (because we detect it that way based on usage, due to asm.js ffis), and as such we place it directly in the table, with no stub. Then a function pointer call to it with f32s is wrong, so yes, there is a problem here. Not obvious to me how to fix it yet.
Assignee | ||
Comment 11•7 years ago
|
||
Stepping through the code, I also noticed that, when we updated to 0xD, the new binary constant values of ValType implicitly disabled the packed-word optimization for signature checks (b/c the ValTypes are no longer in the range 0-4 but some big constant). This patch does what I should've done initially and explicitly maps the four types to their packed numeric value. It'll be interesting to see how much this wins on indirect-call-heavy workloads like bullet.
Attachment #8832665 -
Flags: review?(bbouvier)
Assignee | ||
Comment 12•7 years ago
|
||
And here's the actual, trivial fix (with test).
Attachment #8832666 -
Flags: review?(bbouvier)
Assignee | ||
Comment 13•7 years ago
|
||
Alon: maybe you have the binaryen bug understood, but just in case it helps, here's the error stack (with names from the names section, I see) of the indirect call signature mismatch: __ZN9Scaleform6Render2GL11GLImmediate12glClearColorEffff __ZN9Scaleform6Render2GL23GraphicsDeviceImmediate12glClearColorEffff __ZN9Scaleform6Render2GL3HAL19setRenderTargetImplEPNS0_16RenderTargetDataEjRKNS0_5ColorE __ZN9Scaleform6Render3HAL16PushRenderTargetEPNS0_12RenderTargetEjRKNS_4RectIfEENS0_5ColorE __ZN9Scaleform6Render3HAL11PushFiltersEPNS0_15FilterPrimitiveE __ZThn8_N9Scaleform6Render15FilterPrimitive9EmitToHALERNS0_15RenderQueueItemERNS0_20RenderQueueProcessorE __ZN9Scaleform6Render20RenderQueueProcessor23drawProcessedPrimitivesEv __ZN9Scaleform6Render20RenderQueueProcessor12ProcessQueueENS1_16QueueProcessModeE __ZN9Scaleform6Render20RenderQueueProcessor5FlushEv __ZN9Scaleform6Render3HAL5FlushEv __ZN9Scaleform6Render3HAL10EndDisplayEv __ZN9Scaleform6Render13TreeCacheRoot4DrawEPNS0_3HALE __ZN9Scaleform6Render3HAL4DrawEPNS0_8TreeRootE __ZN22scaleformstudio_plugin9S2DThread6renderEP27RenderDevicePluginArgumentsRN9Scaleform6Render11ContextImpl13DisplayHandleINS4_8TreeRootEEE __ZN22scaleformstudio_plugin9S2DSystem6renderEP27RenderDevicePluginArguments __ZN22scaleformstudio_plugin9S2DPlugin6renderEP27RenderDevicePluginArguments __ZZN8stingray18resource_generator26RenderDevicePluginModifier6renderEPNS_13RenderContextEyRNS_19ResourceGeneratorIOEEN3__08__invokeEPvS7_S7_S7_ __ZN8stingray15GLRenderContext8dispatchEjPKNS_13RenderContext7CommandE __ZN8stingray14GLRenderDevice8dispatchEjPPNS_13RenderContextEj __ZN8stingray11RenderWorld6renderERNS0_12RenderParamsE __ZN8stingray15RenderInterface15process_messageEPNS0_13RenderMessageE __ZN8stingray15RenderInterface12post_messageINS_14RenderWorldMsgEEEvjPKT_ __ZN8stingray15RenderInterface12render_worldERNS_5WorldERKNS_6CameraERKNS_8ViewportERKNS_18ShadingEnvironmentEj __ZN8stingray10plugin_api11application12render_worldEP9CApiWorldPK10CApiCameraPK12CApiViewportPK22CApiShadingEnvironmentPK10CApiWindow __ZZN12_GLOBAL__N_121load_world_managementERN8stingray14LuaEnvironmentEEN3__78__invokeEP9lua_State _luaD_precall _luaV_execute _luaD_call _f_call dynCall_vii invoke_vii _luaD_rawrunprotected _luaD_pcall _lua_pcall __ZN8stingray14LuaEnvironment4callEP9lua_StateNS_15HandleErrorTypeEii __ZN8stingray14LuaEnvironment11call_globalEPKcNS_15HandleErrorTypeENS_8LuaStackE __ZN12_GLOBAL__N_16MyGame6renderEv __ZN8stingray11Application6updateEv __Z6updatePv dynCall_vi dynCall browserIterationFunc runIter Browser_mainLoop_runner
Comment 14•7 years ago
|
||
Thanks, yeah, already managed to reduce it to a tiny testcase: https://github.com/kripken/emscripten/commit/16bc53481f7ccecc96a5c4cc7a0443300204c3ce Kind of surprising we didn't run into this before... but it does happen only on a js library function, which has a type not allowed in asm.js ffis (like f32), and which is called both as a function pointer and as a regular function. Only all those together hit the bug.
Comment 15•7 years ago
|
||
Comment on attachment 8832664 [details] [diff] [review] word-size Review of attachment 8832664 [details] [diff] [review]: ----------------------------------------------------------------- Good catch, thanks. Does the none architecture needs to be modified too?
Attachment #8832664 -
Flags: review?(bbouvier) → review+
Comment 16•7 years ago
|
||
Comment on attachment 8832665 [details] [diff] [review] reoptimize-sig-id Review of attachment 8832665 [details] [diff] [review]: ----------------------------------------------------------------- Easier to maintain indeed, thanks. ::: js/src/wasm/WasmTypes.cpp @@ +596,5 @@ > + case ValType::F32x4: > + case ValType::B8x16: > + case ValType::B16x8: > + case ValType::B32x4: > + break; nit: i find it always weird to have mixed \n decisions at the end of case lines. Can you make it consistent, please? (personal preference is \n for everyone)
Attachment #8832665 -
Flags: review?(bbouvier) → review+
Comment 17•7 years ago
|
||
Comment on attachment 8832666 [details] [diff] [review] fix-table-import-mismatch Review of attachment 8832666 [details] [diff] [review]: ----------------------------------------------------------------- Nice.
Attachment #8832666 -
Flags: review?(bbouvier) → review+
Reporter | ||
Comment 18•7 years ago
|
||
When this gets fixed, will these kinds of invalid wasm modules fail to compile altogether? Or will they abort at runtime if they happen to perform an invalid call?
Comment 19•7 years ago
|
||
(In reply to Jukka Jylänki from comment #18) > When this gets fixed, will these kinds of invalid wasm modules fail to > compile altogether? Or will they abort at runtime if they happen to perform > an invalid call? The latter, a runtime exception will be thrown.
Assignee | ||
Comment 20•7 years ago
|
||
(In reply to Jukka Jylänki from comment #18) Yeah, the fault here is essentially calling a C++ function with a wrong-typed func-ptr: void foo(float) {} typedef void (*PFD)(double); int main() { PFD pfd = (PFD)foo; pfd(1.0); } This is caught at runtime with a wasm signature check at the ptr-to-fun call.
Assignee | ||
Comment 21•7 years ago
|
||
(In reply to Benjamin Bouvier [:bbouvier] from comment #15) > Does the none architecture needs to be modified too? Seems like no because it's marked as PER_SHARED_ARCH, I guess; there's no branch32/branchPtr in none/MacroAssembler.h.
Comment 22•7 years ago
|
||
Pushed by lwagner@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/3d1891948fe9 Baldr: use word-size comparison for signature pointer (r=bbouvier) https://hg.mozilla.org/integration/mozilla-inbound/rev/fb57bb8234ad Baldr: re-enable SigIdDesc immediate optimization (r=bbouvier) https://hg.mozilla.org/integration/mozilla-inbound/rev/223ff75af17c Baldr: emit trap out-of-line paths in function import wrappers (r=bbouvier)
Assignee | ||
Comment 23•7 years ago
|
||
Approval Request Comment [Feature/Bug causing the regression]: bug 1313180 [User impact if declined]: crash in corner case error behavior [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: verified manually [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: very low [Why is the change risky/not risky?]: tiny patch that fixes an obvious bug
Attachment #8833197 -
Flags: review+
Attachment #8833197 -
Flags: approval-mozilla-beta?
Attachment #8833197 -
Flags: approval-mozilla-aurora?
Comment 24•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3d1891948fe9 https://hg.mozilla.org/mozilla-central/rev/fb57bb8234ad https://hg.mozilla.org/mozilla-central/rev/223ff75af17c
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Comment 25•7 years ago
|
||
Comment on attachment 8833197 [details] [diff] [review] branch.patch Regression from 52, causes a startup crash in some cases, let's uplift the fix.
Attachment #8833197 -
Flags: approval-mozilla-beta?
Attachment #8833197 -
Flags: approval-mozilla-beta+
Attachment #8833197 -
Flags: approval-mozilla-aurora?
Attachment #8833197 -
Flags: approval-mozilla-aurora+
Comment 26•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/0c4810ba31c7
status-firefox53:
--- → fixed
Flags: in-testsuite+
Comment 27•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/e69d59dae7b2
status-firefox52:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•