Closed Bug 1015027 Opened 6 years ago Closed 5 years ago

Crash at [@ js::ShapeTable::search(jsid, bool)]

Categories

(Core :: JavaScript Engine, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED WORKSFORME
Tracking Status
b2g-v2.0 --- affected
b2g-v2.1 --- affected

People

(Reporter: fabrice, Unassigned)

References

Details

(Keywords: crash, intermittent-failure, Whiteboard: [b2g-crash])

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #1008357 +++

This first started happening around 7-May, but was getting starred under bug 1000883. Seems to happen on startup in mozJSComponentLoader code. It's happening with pretty high frequency, so attentions from the JS team would be much appreciated :)

https://tbpl.mozilla.org/php/getParsedLog.php?id=39377639&tree=B2g-Inbound

b2g_emulator_vm b2g-inbound opt test mochitest-2 on 2014-05-09 11:52:26 PDT for push 3422ae7c9be6
slave: tst-linux64-spot-732

11:58:02     INFO -  runtests.py | Server pid: 2901
11:58:02     INFO -  runtests.py | Websocket server pid: 2903
11:58:02     INFO -  INFO | runtests.py | SSL tunnel pid: 2905
11:58:03     INFO -  Mochitest INFO | runtestsb2g.py | Running tests: start.
11:58:15     INFO -  1399661895792	Marionette	INFO	marionette enabled via build flag and pref
11:58:17     INFO -  1399661898132	Marionette	INFO	marionette-server.js loaded
11:58:18     INFO -  1399661899368	Marionette	INFO	B2G emulator: yes
11:58:18     INFO -  1399661899381	Marionette	INFO	Device detected is generic
11:58:18     INFO -  1399661899390	Marionette	INFO	Bypassing offline status.
11:58:19     INFO -  1399661899413	Marionette	INFO	Listening on port 2828
11:58:19     INFO -  1399661899423	Marionette	INFO	Marionette server ready
11:58:19     INFO -  System JS : WARNING resource://gre/modules/Preferences.jsm:378 - mutating the [[Prototype]] of an object will cause your code to run very slowly; instead create the object with the correct initial [[Prototype]] value using Object.create
11:58:21     INFO -  1399661901940	Marionette	DEBUG	accepted connection on 127.0.0.1:50004
11:58:31     INFO -  1399661911822	Marionette	DEBUG	accepted connection on 127.0.0.1:50005
11:59:08     INFO -  1399661948754	Marionette	INFO	loaded marionette-listener.js
11:59:08     INFO -  1399661948781	Marionette	INFO	sendToClient: {"from":"0","value":{"browserName":"B2G","browserVersion":"32.0a1","platformName":"ANDROID","platformVersion":"32.0a1","cssSelectorsEnabled":true,"handlesAlerts":false,"javascriptEnabled":true,"nativeEvents":false,"rotatable":true,"secureSsl":false,"takesElementScreenshot":true,"takesScreenshot":true,"platform":"ANDROID","XULappId":"{3c2e2abc-06d4-11e1-ac3b-374f68613e61}","appBuildId":"20140509095333","device":"qemu","version":"32.0a1","b2g":true}}, {f2dfce17-bb08-46d5-94cd-fc6d6057737a}, {f2dfce17-bb08-46d5-94cd-fc6d6057737a}
11:59:08     INFO -  ###################################### forms.js loaded
11:59:08     INFO -  ############################### browserElementPanning.js loaded
11:59:08     INFO -  ######################## BrowserElementChildPreload.js loaded
11:59:08     INFO -  *** UTM:SVC TimerManager:registerTimer - id: user-agent-updates-timer
11:59:11     INFO -  1399661951489	Marionette	DEBUG	Got request: setContext, data: {"to":"0","sessionId":{"rotatable":true,"browserVersion":"32.0a1","takesScreenshot":true,"appBuildId":"20140509095333","cssSelectorsEnabled":true,"javascriptEnabled":true,"XULappId":"{3c2e2abc-06d4-11e1-ac3b-374f68613e61}","secureSsl":false,"platform":"ANDROID","browserName":"B2G","version":"32.0a1","device":"qemu","b2g":true,"nativeEvents":false,"platformVersion":"32.0a1","takesElementScreenshot":true,"platformName":"ANDROID","handlesAlerts":false},"name":"setContext","parameters":{"value":"chrome"}}, id: {4b3adac6-2c17-41cf-9d01-f0abda522373}
11:59:11     INFO -  1399661951499	Marionette	INFO	sendToClient: {"from":"0","ok":true}, {4b3adac6-2c17-41cf-9d01-f0abda522373}, {4b3adac6-2c17-41cf-9d01-f0abda522373}
11:59:12     INFO -  1399661953010	Marionette	DEBUG	Got request: execute, data: {"to":"0","sessionId":{"rotatable":true,"browserVersion":"32.0a1","takesScreenshot":true,"appBuildId":"20140509095333","cssSelectorsEnabled":true,"javascriptEnabled":true,"XULappId":"{3c2e2abc-06d4-11e1-ac3b-374f68613e61}","secureSsl":false,"platform":"ANDROID","browserName":"B2G","version":"32.0a1","device":"qemu","b2g":true,"nativeEvents":false,"platformVersion":"32.0a1","takesElementScreenshot":true,"platformName":"ANDROID","handlesAlerts":false},"name":"executeScript","parameters":{"scriptTimeout":null,"specialPowers":false,"script":"\n                Components.utils.import(\"resource://gre/modules/Services.jsm\");\n                Services.io.manageOfflineStatus = false;\n                Services.io.offline = false;\n                ","newSandbox":true,"args":[],"filename":"remote.py","line":214}}, id: {64ee8f1b-6702-4104-84e4-13000b91907c}
11:59:19     INFO -  mozcrash INFO | Downloading symbols from: http://pvtbuilds.pvt.build.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/b2g-inbound-emulator/20140509095333/b2g-32.0a1.en-US.android-arm.crashreporter-symbols.zip
11:59:38  WARNING -  PROCESS-CRASH | b2ginstance.py | application crashed [@ js::ShapeTable::search(jsid, bool)]
11:59:38     INFO -  Crash dump filename: /tmp/tmpDNWWSh/2f035f84-0493-4090-54f0b6f4-72939b66.dmp
11:59:38     INFO -  Operating system: Android
11:59:38     INFO -                    0.0.0 Linux 2.6.29-00302-g586075d #31 Mon Feb 24 10:28:23 PST 2014 armv7l Android/full/generic:4.0.4.0.4.0.4/OPENMASTER/eng.cltbld.20140509.133455:eng/test-keys
11:59:38     INFO -  CPU: arm
11:59:38     INFO -       0 CPUs
11:59:38     INFO -  Crash reason:  SIGSEGV
11:59:38     INFO -  Crash address: 0x0
11:59:38     INFO -  Thread 0 (crashed)
11:59:38     INFO -   0  libxul.so!js::ShapeTable::search(jsid, bool) [Shape.cpp:3422ae7c9be6 : 189 + 0x0]
11:59:38     INFO -       r4 = 0x40210fc0    r5 = 0x43ce8058    r6 = 0xfffff5c4    r7 = 0xbee5c5b4
11:59:38     INFO -       r8 = 0x40210fc0    r9 = 0xbee5c5b0   r10 = 0x40210fc0    fp = 0x00000000
11:59:38     INFO -       sp = 0xbee5c488    lr = 0x41cb5cbd    pc = 0x41ccd986
11:59:38     INFO -      Found by: given as instruction pointer in context
11:59:38     INFO -   1  libxul.so!js::ObjectImpl::nativeLookup(js::ExclusiveContext*, jsid) [Shape-inl.h:3422ae7c9be6 : 133 + 0x3]
11:59:38     INFO -       r4 = 0x40210fc0    r5 = 0x43ce8058    r6 = 0xfffff5c4    r7 = 0xbee5c5b4
11:59:38     INFO -       r8 = 0x40210fc0    r9 = 0xbee5c5b0   r10 = 0x40210fc0    fp = 0x00000000
11:59:38     INFO -       sp = 0xbee5c4a0    pc = 0x41cb5cbd
11:59:38     INFO -      Found by: call frame info
11:59:38     INFO -   2  libxul.so!js::LookupNativeProperty(js::ExclusiveContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JSObject*>, JS::MutableHandle<js::Shape*>) [jsobj.cpp:3422ae7c9be6 : 3969 + 0x7]
11:59:38     INFO -       r4 = 0xbee5c758    r5 = 0x42579fa0    r6 = 0xfffff5c4    r7 = 0xbee5c5b4
11:59:38     INFO -       r8 = 0x40210fc0    r9 = 0xbee5c5b0   r10 = 0x40210fc0    fp = 0x00000000
11:59:38     INFO -       sp = 0xbee5c4b8    pc = 0x41c3d87d
11:59:38     INFO -      Found by: call frame info
11:59:38     INFO -   3  libxul.so!bool js::baseops::SetPropertyHelper<(js::ExecutionMode)0>(js::ExecutionModeTraits<(js::ExecutionMode)0>::ContextType, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, js::baseops::QualifiedBool, JS::MutableHandle<JS::Value>, bool) [jsobj.cpp:3422ae7c9be6 : 4848 + 0x7]
11:59:38     INFO -       r4 = 0x42579fa0    r5 = 0x40210fc0    r6 = 0xbee5c7fc    r7 = 0x00000000
11:59:38     INFO -       r8 = 0xbee5c758    r9 = 0xbee5c5b4   r10 = 0xbee5c5b0    fp = 0x00000000
11:59:38     INFO -       sp = 0xbee5c530    pc = 0x41c46115
11:59:38     INFO -      Found by: call frame info
11:59:38     INFO -   4  libxul.so!JS_SetPropertyById [jsobj.h:3422ae7c9be6 : 1024 + 0x11]
11:59:38     INFO -       r4 = 0xbee5c758    r5 = 0x449eef30    r6 = 0x43c28c40    r7 = 0xffffff87
11:59:38     INFO -       r8 = 0x00000000    r9 = 0xbee5c758   r10 = 0xbee5c708    fp = 0xbee5c7fc
11:59:38     INFO -       sp = 0xbee5c5e0    pc = 0x41bf73eb
11:59:38     INFO -      Found by: call frame info
11:59:38     INFO -   5  libxul.so!mozJSComponentLoader::ImportInto(nsACString_internal const&, JS::Handle<JSObject*>, JSContext*, JS::MutableHandle<JSObject*>) [mozJSComponentLoader.cpp:3422ae7c9be6 : 1314 + 0xb]
11:59:38     INFO -       r4 = 0x402d9d70    r5 = 0x449eef30    r6 = 0x43ae1a60    r7 = 0xbee5c710
11:59:38     INFO -       r8 = 0x00000000    r9 = 0xbee5c758   r10 = 0xbee5c708    fp = 0xbee5c7fc
11:59:38     INFO -       sp = 0xbee5c608    pc = 0x412a7997
11:59:38     INFO -      Found by: call frame info
11:59:38     INFO -   6  libxul.so!mozJSComponentLoader::Import(nsACString_internal const&, JS::Handle<JS::Value>, JSContext*, unsigned char, JS::MutableHandle<JS::Value>) [mozJSComponentLoader.cpp:3422ae7c9be6 : 1127 + 0xd]
11:59:38     INFO -       r4 = 0x43ef8c00    r5 = 0x402d9d70    r6 = 0x449eef30    r7 = 0xbee5c7f8
11:59:38     INFO -       r8 = 0xbee5c948    r9 = 0x0000001a   r10 = 0x4200e56a    fp = 0x449eef20
11:59:38     INFO -       sp = 0xbee5c7c0    pc = 0x412a7bb7
11:59:38     INFO -      Found by: call frame info
11:59:38     INFO -   7  libxul.so!nsXPCComponents_Utils::Import(nsACString_internal const&, JS::Handle<JS::Value>, JSContext*, unsigned char, JS::MutableHandle<JS::Value>) [XPCComponents.cpp:3422ae7c9be6 : 2724 + 0x11]
11:59:38     INFO -       r4 = 0x412a7b01    r5 = 0xbee5c918    r6 = 0x449eef30    r7 = 0x43ef8c00
11:59:38     INFO -       r8 = 0x00000005    r9 = 0x0000001a   r10 = 0x4200e56a    fp = 0x449eef20
11:59:38     INFO -       sp = 0xbee5c818    pc = 0x412785a9
11:59:38     INFO -      Found by: call frame info
11:59:38     INFO -   8  libxul.so!NS_InvokeByIndex [xptcinvoke_arm.cpp:3422ae7c9be6 : 164 + 0x23]
11:59:38     INFO -       r4 = 0x4127856d    r5 = 0x449eef30    r6 = 0xbee5c948    r7 = 0xbee5c870
11:59:38     INFO -       r8 = 0x00000005    r9 = 0x0000001a   r10 = 0x4200e56a    fp = 0x449eef20
11:59:38     INFO -       sp = 0xbee5c840    pc = 0x40c77f95
11:59:38     INFO -      Found by: call frame info
11:59:38     INFO -   9  libxul.so!XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) [XPCWrappedNative.cpp:3422ae7c9be6 : 2397 + 0xd]
11:59:38     INFO -       r4 = 0x42579fa0    r5 = 0xbee5c948    r6 = 0xbee5c8d8    r7 = 0x00000010
11:59:38     INFO -       r8 = 0x00000003    r9 = 0x0000001a   r10 = 0x4200e56a    fp = 0x421ad62c
11:59:38     INFO -       sp = 0xbee5c8a0    pc = 0x41299835
11:59:38     INFO -      Found by: call frame info
11:59:38     INFO -  10  libxul.so!XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) [XPCWrappedNativeJSOps.cpp:3422ae7c9be6 : 1273 + 0x7]
11:59:38     INFO -       r4 = 0x43ef8c00    r5 = 0xbee5ca9c    r6 = 0x00000001    r7 = 0x00000001
11:59:38     INFO -       r8 = 0x4023b284    r9 = 0x43ce2da0   r10 = 0x43ce2da0    fp = 0xffffff87
11:59:38     INFO -       sp = 0xbee5ca78    pc = 0x4129d5cf
11:59:38     INFO -      Found by: call frame info
Bobby & Nicolas, can anyone look at that?
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(bobbyholley)
I don't have the bandwidth right now, sorry.
Flags: needinfo?(bobbyholley)
Blocking a 1.3T blocker.
blocking-b2g: --- → 1.3T+
I still haven't managed to reproduce it locally because the emulator does not boot.

Fabrice suggested that this could be caused by the fact that we are reusing the same global for the jsm.
Njn, do you have any knowledge about the reuseGlobal, and could it be the source of this failure based on the patches of Bug 993282.
blocking-b2g: 1.3T+ → ---
Flags: needinfo?(nicolas.b.pierron) → needinfo?(n.nethercote)
btw, bug 961488 just fixed a mozJSComponentLoader::Import() crash with a similar stack trace on PPC32.
See Also: → 961488
> Njn, do you have any knowledge about the reuseGlobal, and could it be the
> source of this failure based on the patches of Bug 993282.

I know very little about it. khuey, bholley and billm know more...
Flags: needinfo?(n.nethercote)
(In reply to Chris Peterson (:cpeterson) from comment #5)
> btw, bug 961488 just fixed a mozJSComponentLoader::Import() crash with a
> similar stack trace on PPC32.

per Bug 961488 comment 6, this sounds unrelated.
Thanks for the attempt.
(In reply to Nicholas Nethercote [:njn] from comment #6)
> > Njn, do you have any knowledge about the reuseGlobal, and could it be the
> > source of this failure based on the patches of Bug 993282.
> 
> I know very little about it. khuey, bholley and billm know more...
Flags: needinfo?(wmccloskey)
Flags: needinfo?(khuey)
This looks a lot like what can be found on crash-stat, where "entries" is NULL.  Which seems to happen on a stock version of Firefox.

https://crash-stats.mozilla.com/report/list?range_unit=days&range_value=28&signature=js%3A%3AShapeTable%3A%3Asearch%28jsid%2C+bool%29#tab-reports
This is pretty deep in the JS engine.  Even if the global reusing code is causing it I wouldn't be able to tell you how ...
Flags: needinfo?(khuey)
It seems like this could be the result of an OOM at [1]. The OOM handling code there doesn't look very useful. It reports the error, but leaves |entries| null.

If someone can get this in a debugger, you could just check rt->hadOutOfMemory to test this theory.

[1] http://hg.mozilla.org/releases/mozilla-release/annotate/f60bc49e6bd5/js/src/vm/Shape.cpp#l54
Flags: needinfo?(wmccloskey)
decoder, have you ever seen this signature?
Flags: needinfo?(choller)
(In reply to Nicolas B. Pierron [:nbp] from comment #7)
> (In reply to Chris Peterson (:cpeterson) from comment #5)
> > btw, bug 961488 just fixed a mozJSComponentLoader::Import() crash with a
> > similar stack trace on PPC32.
> 
> per Bug 961488 comment 6, this sounds unrelated.
> Thanks for the attempt.

This is a new try-run. It still shows the crash :(
(In reply to Bill McCloskey (:billm) from comment #11)
> It seems like this could be the result of an OOM at [1]. The OOM handling
> code there doesn't look very useful. It reports the error, but leaves
> |entries| null.
> 
> If someone can get this in a debugger, you could just check
> rt->hadOutOfMemory to test this theory.
> 
> [1]
> http://hg.mozilla.org/releases/mozilla-release/annotate/f60bc49e6bd5/js/src/
> vm/Shape.cpp#l54

The crash from the link above happens during startup so I would be surprised if this is OOM.
Attached file shape-table-audit.txt
(In reply to Bill McCloskey (:billm) from comment #11)
> It seems like this could be the result of an OOM at [1]. The OOM handling
> code there doesn't look very useful. It reports the error, but leaves
> |entries| null.

I audited the guards used before using ShapeTable::search, as well as the conditions for allocating the entries, and the error codes for ShapeTable::change.  So far I have not found anything obvious.
I am running rr on the JS test suite, with various random scheduling parameters, hopping that I would be able to catch any issue in the shell, and crossing fingers to find a similar signature.  As I have no way to investigate on the emulator at the moment.

Gregor, fabrice, are you able to reproduce this issue outside tbpl?
(In reply to Nicolas B. Pierron [:nbp] from comment #17)
> I am running rr on the JS test suite, with various random scheduling
> parameters, hopping that I would be able to catch any issue in the shell,
> and crossing fingers to find a similar signature.  As I have no way to
> investigate on the emulator at the moment.
> 
> Gregor, fabrice, are you able to reproduce this issue outside tbpl?

No, I never hit that on device or desktop.
(In reply to Nicolas B. Pierron [:nbp] from comment #17)
> I am running rr on the JS test suite, with various random scheduling
> parameters, hopping that I would be able to catch any issue in the shell,
> and crossing fingers to find a similar signature.

This is unsuccessful so far, I did not found any failure of the JS Shell in an opt-build of release (yet).
I am working on getting the patch, started by terrence, to highlight if there is any corruption of the JS Heap from non JS code.  If I manage to get it working, and if this is indeed a memory corruption, then we should see a different kind of failure as frequent, or even more frequent than the current one.

I am working on this patch as it might also help investigate other failures which are blamed on the JS Engine (such as Bug 1008791).
After discussing with Bobby Holley, we came up with the following hypotheses:
 - the target object is a global. (Cu.import within a Window)
 - the target object is a FakeBackstagePass. (Cu.import within a jsm)
 - the target object is an empty object. (Cu.import within defineLazyGetter's callback)

The next question, is why do we have a ShapeTable on the target object, and what mutations caused us to fail.  One hypothesis is that this might be caused by the implementation of XPCU_defineLazyGetter which is doing a delete and then set the same property with the result of the Cu.import.  Note, if this indeed the source of the ShapeTable, this might be a way to make a small work-around that we can make to get this patch landed on 1.3T.

I will try to both approaches and submit try pushes to see if we can reproduce the bug.
(In reply to Nicolas B. Pierron [:nbp] from comment #21)
> After discussing with Bobby Holley, we came up with the following hypotheses:
>  - the target object is a global. (Cu.import within a Window)
>  - the target object is a FakeBackstagePass. (Cu.import within a jsm)
>  - the target object is an empty object. (Cu.import within
> defineLazyGetter's callback)

https://tbpl.mozilla.org/php/getParsedLog.php?id=40779601&tree=Try&full=1 :
> FindTargetObject: Find target object:
> FindTargetObject:  > mThisObjects.Get(script)
> FindTargetObject:  class-of(target) = FakeBackstagePass
> FindTargetObject:  nb-slot(target) = 6
> Gecko   :
> Gecko   : ###!!! [Child][MessageChannel::SendAndWait] Error: Channel error: cannot send/recv
> Gecko   :

Looking above the crash, I do not see any target being an "Object", which would have been the case if we were going through the logic added in the callback of defineLazyGetter which appears in part 2.

This also means that the work-around, to replace the delete by non-enumerable property, will not work as it is not even executed before the failure.

As this is a FakeBackstage, I will try to spew the __URI__ out of it, and do the same in Cu.import, to know where the failure is happening.
(In reply to Nicolas B. Pierron [:nbp] from comment #22)
> As this is a FakeBackstage, I will try to spew the __URI__ out of it, and do
> the same in Cu.import, to know where the failure is happening.

I tried adding more spew[1], including spewing the __URI__ and the name of the property name behind the symbolId used for SetProperty (called from ImportInto).  Strangely, the bug disappear, even with fabrice patches.

I have no explanations for that except that this might be a scheduling issue, which might involve another thread.  Still we are not supposed to modify Shapes from a thread different than the Runtime thread.  Ion's compilation is the only one that I am aware of, and it is indeed using linearSearch and not the table lookup that we are dealing with here.

Also, I tried to check if this was a corruption of the entries pointer[2], by doing a checksum of the pointer value and storing the result of the checksum on the low bits of the pointer. I was able to reproduce the bug (without the extra logging) and without triggering any of the release assertions added with the checksum.  This means that this memory is corrupted by a location which is aware of the Shape type.

I will try to assert that shapes are manipulated by the thread of the JS Runtime, and at the same time try to remove/move some spew such as I can have more idea on what is the exact context (jsm) which are involved here.

[1] https://tbpl.mozilla.org/?tree=Try&rev=98b128375ae0
[2] https://tbpl.mozilla.org/?tree=Try&rev=381f31d01431
(In reply to Gregor Wagner [:gwagner] from comment #3)
> Blocking a 1.3T blocker.

FYI, I just noticed that I accidentally removed the 1.3T+ in comment 4.
blocking-b2g: --- → 1.3T?
Flags: needinfo?(anygregor)
blocking-b2g: 1.3T? → 1.3T+
Flags: needinfo?(anygregor)
(In reply to Nicolas B. Pierron [:nbp] from comment #23)
> (In reply to Nicolas B. Pierron [:nbp] from comment #22)
> > As this is a FakeBackstage, I will try to spew the __URI__ out of it, and do
> > the same in Cu.import, to know where the failure is happening.
> 
> I tried adding more spew[1], including spewing the __URI__ and the name of
> the property name behind the symbolId used for SetProperty (called from
> ImportInto).  Strangely, the bug disappear, even with fabrice patches.

(In reply to Nicolas B. Pierron [:nbp] from comment #23)
> I will try […] to remove/move some spew such as I can
> have more idea on what is the exact context (jsm) which are involved here.

After modifying the I/O rate of the patch, and based on the following (broken) log,

https://tbpl.mozilla.org/php/getParsedLog.php?id=40872232&tree=Try&full=1#error0 :
> ImportInto target:
>  0xbecceec4['AppsUtils'] = ...
>  0xbecceec4['ManifestHelper'] = ...
>  0xbecceec4['isAbsoluteURI'] = ...
>  0xbecceec4['mozIApplication'] = ...
> ImportInto target:
>  0xbecceec4['Services'] = ...

I think I managed to restrict this issue to loading of dom/apps/src/AppsServiceChild.jsm which includes both AppsUtils.jsm, and Services.jsm.

In the mean time, to get Bug 993282's patches landed, I can think of 2 stupid work-around (which I have no idea if why they could work nor why the current one fails):
 - The first one would be to remove the modification of AppsUtils.jsm
 - The second is to swap the order of the Cu.imports of AppsServiceChild.jsm, such as we load these in the same order as OfflineCacheInstaller.jsm

(In reply to Nicolas B. Pierron [:nbp] from comment #23)
> I will try to assert that shapes are manipulated by the thread of the JS
> Runtime, […].

This was unsuccessful. This means that the ShapeTable::entries is not corrupted by another thread, nor it is manipulated concurrently.  Which might also mean that my audit might be incomplete.

I will try to track the origin of the entries pointer as part of the value stored in the pointer, and change the assertion to report which location failed and which location defined it.
(In reply to Nicolas B. Pierron [:nbp] from comment #25)
> In the mean time, to get Bug 993282's patches landed, I can think of 2
> stupid work-around (which I have no idea if why they could work nor why the
> current one fails):
>  - The first one would be to remove the modification of AppsUtils.jsm
>  - The second is to swap the order of the Cu.imports of
> AppsServiceChild.jsm, such as we load these in the same order as
> OfflineCacheInstaller.jsm

Ok, for the moment, both work-around seems to be working, so one thing we can do in order to backport patches to 1.3T, is to land one of the work-around with fabrice patches.  Note, this only hide this issue from our jsm, but this will no longer be a blocker for 1.3T.

Gregor, what do you think?

I've scheduled more jobs of the second work-around as it does not modify the intent of the patch.
First work-around: https://tbpl.mozilla.org/?tree=Try&rev=68e3372f6337
Second work-around: https://tbpl.mozilla.org/?tree=Try&rev=ca579e21501a
Flags: needinfo?(anygregor)
(In reply to Nicolas B. Pierron [:nbp] from comment #26)
> Gregor, what do you think?

I am removing the ni? as Gregor provided a r+ on the second work-around suggested. (attached to Bug 993282)

(In reply to Nicolas B. Pierron [:nbp] from comment #25)
> I will try to track the origin of the entries pointer as part of the value
> stored in the pointer, and change the assertion to report which location
> failed and which location defined it.

I tried to do so, by annotating every setter with an integer to identify the origin, but I failed to get any release assertion raising on a NULL pointer on the entries pointer. (I also verified that release assertion were working well)

My best guess now, is that the report might not be about the "entries" pointer being null, but it might be about the ShapeTable pointer.  In fact, it might better correspond to the two different crash addresses[1] observed on crash-stats and among the builds of the emulator.

So, let re-do some of these investigations with a different target.

NB: I might win the grand prize soonish [2] :)

[1] 0x0 for hashShift, and 0x10 for entries fields in the ShapeTable.
[2] https://secure.pub.build.mozilla.org/builddata/reports/reportor/daily/highscores/highscores.html
Flags: needinfo?(anygregor)
bug 993282 did not get backed out from 1.3T
the issue is only happening on master
revert the 1.3T+ flag
blocking-b2g: 1.3T+ → ---
(In reply to Nicolas B. Pierron [:nbp] from comment #27)
> My best guess now, is that the report might not be about the "entries"
> pointer being null, but it might be about the ShapeTable pointer.  In fact,
> it might better correspond to the two different crash addresses[1] observed
> on crash-stats and among the builds of the emulator.

Hypothesis confirmed \o/
https://tbpl.mozilla.org/php/getParsedLog.php?id=41029184&tree=Try&full=1#error4

Now the question is how did we managed to fail with the stacktrace displayed in comment 0.

I hardly believe the stacktrace of comment 0 to be correct, as Shape-inl.h:133 [1] is under the condition that we managed to create the table. So unless this is a compiler error with an aliasing issues of the table value, I would assume that the compiler is just sharing the code for the 3 cases which are leading to the same ending code.

Knowing that both the third and second cases ensure that the tabple pointer is non-null, I will assume that the error would be with the IN_DICTIONARY flag of the first condition.  I will audit all cases where we are setting this flag (while also making sure on tbpl that my assumption is correct).

[1] http://dxr.mozilla.org/mozilla-central/source/js/src/vm/Shape-inl.h?from=Shape-inl.h#133
See Also: → 1020805
(In reply to Nicolas B. Pierron [:nbp] from comment #29)
> (In reply to Nicolas B. Pierron [:nbp] from comment #27)
> > My best guess now, is that the report might not be about the "entries"
> > pointer being null, but it might be about the ShapeTable pointer.  In fact,
> > it might better correspond to the two different crash addresses[1] observed
> > on crash-stats and among the builds of the emulator.
> 
> Hypothesis confirmed \o/
> https://tbpl.mozilla.org/php/getParsedLog.
> php?id=41029184&tree=Try&full=1#error4
> 
> Now the question is how did we managed to fail with the stacktrace displayed
> in comment 0.
> 
> I hardly believe the stacktrace of comment 0 to be correct, as
> Shape-inl.h:133 [1] is under the condition that we managed to create the
> table. So unless this is a compiler error with an aliasing issues of the
> table value, I would assume that the compiler is just sharing the code for
> the 3 cases which are leading to the same ending code.
> 
> Knowing that both the third and second cases ensure that the tabple pointer
> is non-null, I will assume that the error would be with the IN_DICTIONARY
> flag of the first condition.  I will audit all cases where we are setting
> this flag (while also making sure on tbpl that my assumption is correct).

I have added shape consistency checks before and after the SetPropertyById which is in ImportInto.
So it seems that I was looking at the wrong process, and now I am able to say that the fakeBackstagePass is likely one instanciated by Marionette.

The assertion is reported after the request from Marionette, and before the SetPropertyId call.

https://tbpl.mozilla.org/php/getParsedLog.php?id=41102679&tree=Try&full=1#error4 :

> Gecko   : 1401961891308 Marionette DEBUG Got request: execute, data: {{{see below}}}
> MOZ_Assert: Assertion failure: shape->hasTable(), at ../../../gecko/js/src/vm/ObjectImpl.cpp:203
> 
> { "to":"0",
>   "sessionId": …,
>   "name":"executeScript",
>   "parameters":{
>     "scriptTimeout":null,
>     "specialPowers":false,
>     "script": {{{see below}}},
>     "newSandbox":true,
>     "args":[],
>     "filename":"remote.py",
>     "line":222
>   }
> }
> 
>    Components.utils.import("resource://gre/modules/Services.jsm");
>    Services.io.manageOfflineStatus = false;
>    Services.io.offline = false;

So I guess Marionette's part of Gecko might have some code which is badly initializing the fakeBackstagePass object.

The reported assertion also confirm that the problem is related to the Dictionary mode which is not correctly initialized on the Object, as indicated by:

> MOZ_Assert: Assertion failure: shape->hasTable(), at ../../../gecko/js/src/vm/ObjectImpl.cpp:203
> 
>   201 
>   202     if (inDictionaryMode()) {
>   203         MOZ_RELEASE_ASSERT(shape->hasTable());
>   204 
>

I will look into the code which handles the executeScript command in Gecko, from the creation of the fakeBackstagePass to the evaluation of the script given as argument.
blocking-b2g: --- → 1.3T+
I do not know if this is related, but I will try to check if we still have any failure after the landing of Bug 1022027.
(In reply to Joe Cheng [:jcheng] from comment #28)
> bug 993282 did not get backed out from 1.3T
> the issue is only happening on master
> revert the 1.3T+ flag

Hi Gregor, i don't believe this bug needs to be 1.3T+ with my reason in comment 28
Can you please confirm this is really needed for tarako? thanks
Flags: needinfo?(anygregor)
(In reply to Joe Cheng [:jcheng] from comment #32)
> (In reply to Joe Cheng [:jcheng] from comment #28)
> > bug 993282 did not get backed out from 1.3T
> > the issue is only happening on master
> > revert the 1.3T+ flag
> 
> Hi Gregor, i don't believe this bug needs to be 1.3T+ with my reason in
> comment 28
> Can you please confirm this is really needed for tarako? thanks

Oh I missed this. Sorry about that.
blocking-b2g: 1.3T+ → ---
Flags: needinfo?(anygregor)
Blocks: 1024157
I have other bugs on v1.4, As this is no longer blocking v1.3, I will continue my investigation as soon as we are done with the v1.4 bugs. (sorry)
No longer blocks: 950653
Fabrice, I'm hoping this was fixed by bug 1026860. Can you try relanding your patch and seeing if the crashes re-appear?
Flags: needinfo?(fabrice)
(In reply to Jason Orendorff [:jorendorff] from comment #12)
> decoder, have you ever seen this signature?

Clearing up some needinfo that I forgot. No I haven't seen this signature in fuzzing.
Flags: needinfo?(choller)
(In reply to Bobby Holley (:bholley) (PTO 6/25-6/29) from comment #35)
> Fabrice, I'm hoping this was fixed by bug 1026860. Can you try relanding
> your patch and seeing if the crashes re-appear?

Pushed to try:  https://tbpl.mozilla.org/?tree=Try&rev=430791e8220b
Flags: needinfo?(fabrice)
I relanded the patches in bug 1024157
Inactive; closing (see bug 1180138).
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.