Closed
Bug 1154411
Opened 10 years ago
Closed 10 years ago
Stability test uncovered attempted to close invalid file descriptor
Categories
(Firefox OS Graveyard :: Stability, defect)
Tracking
(blocking-b2g:2.2+, firefox38 wontfix, firefox39 wontfix, firefox40 fixed, b2g-v2.1 unaffected, b2g-v2.1S unaffected, b2g-v2.2 fixed, b2g-master fixed)
Tracking | Status | |
---|---|---|
firefox38 | --- | wontfix |
firefox39 | --- | wontfix |
firefox40 | --- | fixed |
b2g-v2.1 | --- | unaffected |
b2g-v2.1S | --- | unaffected |
b2g-v2.2 | --- | fixed |
b2g-master | --- | fixed |
People
(Reporter: ggrisco, Assigned: sotaro)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 336][caf priority: p1][CR 822157])
Crash Data
Attachments
(7 files, 3 obsolete files)
CAF debug code checks for attempt to close invalid fd and is aborting when detecting this. This instance happened during stability test.
cafbot will upload the logs.
Comment 1•10 years ago
|
||
Comment 2•10 years ago
|
||
Updated•10 years ago
|
Whiteboard: [CR 822157]
Updated•10 years ago
|
Whiteboard: [CR 822157] → [caf priority: p1][CR 822157]
Updated•10 years ago
|
Whiteboard: [caf priority: p1][CR 822157] → [b2g-crash][caf-crash 336][caf priority: p1][CR 822157]
Comment 3•10 years ago
|
||
Observed on:
Device: msm8909
Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.129
Moz BuildID: 20150409002503
Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.129.xml?h=release
Gecko Version: 37.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=ea735c21bfb0d78333213ff0376fce1eac89ead6
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=ac12d858a58574b2db482b5eed61750a3e210040
Patches: bug 1150923, bug 1133147, bug 1150271, bug 1148641, bug 1150924
Comment 4•10 years ago
|
||
Comment 5•10 years ago
|
||
Comment 6•10 years ago
|
||
sotaro/:fabrice, You guys helped in the past on identifying where the invalid fd close is happening, hence NI'ing you for help. Apparently, CAF has seen abort this ~14 times in the last run.
blocking-b2g: 2.2? → 2.2+
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(fabrice)
Assignee | ||
Comment 7•10 years ago
|
||
Okey, tomorrow, I am going to check if the crash has an useful information.
Comment 8•10 years ago
|
||
Hi! Rex,
Need your help to take a look first. Thanks
--
Keven
Flags: needinfo?(rhung)
Comment 9•10 years ago
|
||
According to the minidump result, I though that both this bug and bug 1154416 have the same issue:
"Try to close a BAD file descriptor".
Why and where this BAD file descriptor comes from still needs to be investigated, but originally closing the BAD file descriptor should not crash the process, but just returns an error code for the reason.
However, both of call stack after calling the system call, close(), are really weird to me and need someone to provide the answer if possible:
1. Where is libnoselect.so from?
I locally synced source tree based on "caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.129.xml", but cannot find this shared library. Is there any patch file not available for us?
2. Why does libnoselect.so catch the system call, close(), and trigger an abortion further for this case? Originally, libc.so, not libnoselect.so, should be used after calling the system call wraper function, __wrap_close(). So, is there any extra patch files here?
3. Based on the following call stack, it seems that the different version of libnoselect.so are used because different functions are called to trigger an abortion. Can anyone tell me the purpose of this shared library(libnoselect.so)?
Bug#1154411
0 plugin-container!mozalloc_abort [mozalloc_abort.cpp : 37 + 0x4]
^^^^^^^^^^^^^^^
1 libnoselect.so!close [noselect.c : 89 + 0x3]
2 libmozglue.so!__wrap_close [Nuwa.cpp : 1360 + 0x3]
3 libnss3.so!pt_Close [ptio.c : 1252 + 0x5]
Bug#1154416
0 libc.so!tgkill + 0xc
1 libc.so!pthread_kill [pthread_kill.cpp : 49 + 0xb]
2 libc.so!raise [raise.cpp : 32 + 0x9]
3 libc.so!__libc_android_abort [abort.cpp : 47 + 0x5]
4 libc.so!abort + 0x6
^^^^^
5 libnoselect.so!close [noselect.c : 89 + 0x3]
6 libmozglue.so!__wrap_close [Nuwa.cpp : 1360 + 0x3]
7 libnss3.so!pt_Close [ptio.c : 1252 + 0x5]
Comment 10•10 years ago
|
||
In short, libnoselect.so is the root cause which results in process crash, but we have no idea for it so far. Need someone help to clarify the purpose of libnoselect.so first.
Flags: needinfo?(rhung)
Comment 11•10 years ago
|
||
Hi, Greg,
Could you check comment 9 and give us some feedback if possible? Thank you.
Flags: needinfo?(ggrisco)
Reporter | ||
Comment 12•10 years ago
|
||
(In reply to Rex Hung[:rhung] from comment #9)
> According to the minidump result, I though that both this bug and bug
> 1154416 have the same issue:
> "Try to close a BAD file descriptor".
>
> Why and where this BAD file descriptor comes from still needs to be
> investigated, but originally closing the BAD file descriptor should not
> crash the process, but just returns an error code for the reason.
>
> However, both of call stack after calling the system call, close(), are
> really weird to me and need someone to provide the answer if possible:
> 1. Where is libnoselect.so from?
> I locally synced source tree based on
> "caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.129.xml", but cannot find this
> shared library. Is there any patch file not available for us?
In internal builds, we are overriding the close() function and check for invalid file descriptors and abort() when this situation is detected.
> 2. Why does libnoselect.so catch the system call, close(), and trigger an
> abortion further for this case? Originally, libc.so, not libnoselect.so,
> should be used after calling the system call wraper function,
> __wrap_close(). So, is there any extra patch files here?
>
We do this in hopes that this will catch problems that would otherwise be much more difficult to debug. The change is an internal one, but basically we call the system close() and if the return value is < 0 and fd is not -1 and errno is not ENODEV then we abort.
> 3. Based on the following call stack, it seems that the different version of
> libnoselect.so are used because different functions are called to trigger an
> abortion. Can anyone tell me the purpose of this shared
> library(libnoselect.so)?
> Bug#1154411
> 0 plugin-container!mozalloc_abort [mozalloc_abort.cpp : 37 + 0x4]
> ^^^^^^^^^^^^^^^
> 1 libnoselect.so!close [noselect.c : 89 + 0x3]
> 2 libmozglue.so!__wrap_close [Nuwa.cpp : 1360 + 0x3]
> 3 libnss3.so!pt_Close [ptio.c : 1252 + 0x5]
> Bug#1154416
> 0 libc.so!tgkill + 0xc
> 1 libc.so!pthread_kill [pthread_kill.cpp : 49 + 0xb]
> 2 libc.so!raise [raise.cpp : 32 + 0x9]
> 3 libc.so!__libc_android_abort [abort.cpp : 47 + 0x5]
> 4 libc.so!abort + 0x6
> ^^^^^
> 5 libnoselect.so!close [noselect.c : 89 + 0x3]
> 6 libmozglue.so!__wrap_close [Nuwa.cpp : 1360 + 0x3]
> 7 libnss3.so!pt_Close [ptio.c : 1252 + 0x5]
Flags: needinfo?(ggrisco)
Reporter | ||
Comment 13•10 years ago
|
||
BTW, we recently turned this check back on (after discovering it had been turned off for some time). So this particular crash might not be catching a new regression.
Comment 14•10 years ago
|
||
Greg, any chance you would share your noselect library? I'd like moz to run our tests with it (or something similar). thanks!
Flags: needinfo?(fabrice) → needinfo?(ggrisco)
Reporter | ||
Comment 15•10 years ago
|
||
Sorry, Fabrice, but this library can't be shared. However, I've pretty much described the close() handling in comment 12 if you want to try to implement something similar.
Flags: needinfo?(ggrisco)
Assignee | ||
Comment 16•10 years ago
|
||
I confirmed it is a problem of ArrayBufferBuilder::mapToFileInPackage(). one fd is closed twice by the following. Then it seems regression of Bug 945152.
> mozilla::AutoFDClose pr_fd;
> mozilla::ScopedClose fd;
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsXMLHttpRequest.cpp#4082
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 17•10 years ago
|
||
Assignee | ||
Comment 18•10 years ago
|
||
Assignee | ||
Comment 19•10 years ago
|
||
By the following STR, I reproduced the problem.
[1] Created ROM by applying attachment 8593608 [details] [diff] [review] and flash the ROM.
It prevent file compression. mapToFileInPackage() is used only when zipItem is not compressed.
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsXMLHttpRequest.cpp#4079
[2] Enable Chinese keyboard by the following.
Setting -> Keyboards -> 拼音
[3] Start Browser and touch "search or enter address".
Keyborad should pop up.
[4] Change to Chinese keyboard and type something.
Assignee | ||
Comment 20•10 years ago
|
||
Assignee | ||
Comment 21•10 years ago
|
||
Greg, can you check if attachment 8593613 [details] [diff] [review] fix the problem?
Flags: needinfo?(ggrisco)
Reporter | ||
Comment 22•10 years ago
|
||
Ok, I'm bringing this in now, thanks. We'll probably won't see results until Monday.
Flags: needinfo?(ggrisco)
Assignee | ||
Comment 23•10 years ago
|
||
Thanks!
Comment 24•10 years ago
|
||
Observed on:
Device: msm8909
Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.132
Moz BuildID: 20150413002502
Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.132.xml?h=release
Gecko Version: 37.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=cec00d643f517ffd96cde559cd3bbd43ab85816c
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=a47c1908cca4b834ae92e9a0507f09da46993f8b
Patches: bug 1150923, bug 1154439, bug 1133147, bug 1142806, bug 1150924
Comment 25•10 years ago
|
||
Comment 26•10 years ago
|
||
Assignee | ||
Comment 27•10 years ago
|
||
(In reply to cafbot (PoC: ggrisco) from comment #26)
> Created attachment 8593792 [details]
> decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.132
It seems not applying attachment 8593613 [details] [diff] [review].
Reporter | ||
Comment 29•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #27)
> (In reply to cafbot (PoC: ggrisco) from comment #26)
> > Created attachment 8593792 [details]
> > decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.132
>
> It seems not applying attachment 8593613 [details] [diff] [review].
We have to wait for AU 134. I thought it would be picked up for testing on Sunday, but there was a slight delay. I'll update you by tomorrow hopefully.
Flags: needinfo?(ggrisco)
Updated•10 years ago
|
Assignee: nobody → sotaro.ikeda.g
Reporter | ||
Comment 31•10 years ago
|
||
Hi Sotaro,
We haven't seen this crash in a full day of testing on AU134, so I think we definitely want this fix, thanks.
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 32•10 years ago
|
||
(In reply to Greg Grisco from comment #31)
> Hi Sotaro,
>
> We haven't seen this crash in a full day of testing on AU134, so I think we
> definitely want this fix, thanks.
Good! I am going to proceed to review.
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Updated•10 years ago
|
Attachment #8593613 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 33•10 years ago
|
||
Comment on attachment 8593613 [details] [diff] [review]
patch - Do not close same fd twice
Review of attachment 8593613 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/base/nsXMLHttpRequest.cpp
@@ +4124,1 @@
> mMapPtr = JS_CreateMappedArrayBufferContents(fd, offset, size);
This looks fine, but can you just do:
mozilla::AutoFDClose pr_fd;
rv = aJarFile->OpenNSPRFileDesc(PR_RDONLY, 0, &pr_fd.rwget());
// ...
mMapPtr =
JS_CreateMappedArrayBufferContents(PR_FileDesc2NativeHandle(pr_fd),
offset,
size);
Not having another fd on the stack seems simpler to reason about.
Attachment #8593613 -
Flags: review?(bent.mozilla) → review+
And let's please backport this as far as we can.
Assignee | ||
Comment 36•10 years ago
|
||
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #34)
> Comment on attachment 8593613 [details] [diff] [review]
> patch - Do not close same fd twice
>
> Review of attachment 8593613 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: dom/base/nsXMLHttpRequest.cpp
> @@ +4124,1 @@
> > mMapPtr = JS_CreateMappedArrayBufferContents(fd, offset, size);
>
> This looks fine, but can you just do:
>
> mozilla::AutoFDClose pr_fd;
> rv = aJarFile->OpenNSPRFileDesc(PR_RDONLY, 0, &pr_fd.rwget());
> // ...
> mMapPtr =
> JS_CreateMappedArrayBufferContents(PR_FileDesc2NativeHandle(pr_fd),
> offset,
> size);
>
> Not having another fd on the stack seems simpler to reason about.
Yeah, it is simpler. I'll update the patch.
Assignee | ||
Comment 37•10 years ago
|
||
Apply the comment. Carry "r=bent".
Attachment #8593607 -
Attachment is obsolete: true
Attachment #8593608 -
Attachment is obsolete: true
Attachment #8593613 -
Attachment is obsolete: true
Attachment #8596721 -
Flags: review+
Assignee | ||
Updated•10 years ago
|
status-b2g-v2.1:
--- → unaffected
status-b2g-v2.1S:
--- → unaffected
status-b2g-v2.2:
--- → affected
status-b2g-master:
--- → affected
Assignee | ||
Comment 38•10 years ago
|
||
swu, do you know if ArrayBufferBuilder::mapToFileInPackage() is used on desktop Firefox?
Flags: needinfo?(swu)
Assignee | ||
Comment 39•10 years ago
|
||
Comment 40•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #38)
> swu, do you know if ArrayBufferBuilder::mapToFileInPackage() is used on
> desktop Firefox?
No, it's only enabled on B2G.
https://dxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#1094
Flags: needinfo?(swu)
Assignee | ||
Comment 41•10 years ago
|
||
(In reply to Shian-Yow Wu [:swu] from comment #40)
>
> No, it's only enabled on B2G.
> https://dxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#1094
Thanks!
Assignee | ||
Comment 42•10 years ago
|
||
Comment 43•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox40:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → 2.2 S11 (1may)
Assignee | ||
Comment 44•10 years ago
|
||
Comment on attachment 8596721 [details] [diff] [review]
patch - Do not close same fd twice
NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 945152
User impact if declined: System could cause a crash with unexpected reasons.
Testing completed: locally tested and codeaurora did tests.
Risk to taking this patch (and alternatives if risky): low risk.
String or UUID changes made by this patch: none.
Attachment #8596721 -
Flags: approval-mozilla-b2g37?
Updated•10 years ago
|
Attachment #8596721 -
Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Comment 45•10 years ago
|
||
status-firefox38:
--- → wontfix
status-firefox39:
--- → wontfix
You need to log in
before you can comment on or make changes to this bug.
Description
•