Closed Bug 1154411 Opened 10 years ago Closed 10 years ago

Stability test uncovered attempted to close invalid file descriptor

Categories

(Firefox OS Graveyard :: Stability, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

(blocking-b2g:2.2+, firefox38 wontfix, firefox39 wontfix, firefox40 fixed, b2g-v2.1 unaffected, b2g-v2.1S unaffected, b2g-v2.2 fixed, b2g-master fixed)

RESOLVED FIXED
2.2 S11 (1may)
blocking-b2g 2.2+
Tracking Status
firefox38 --- wontfix
firefox39 --- wontfix
firefox40 --- fixed
b2g-v2.1 --- unaffected
b2g-v2.1S --- unaffected
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: ggrisco, Assigned: sotaro)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 336][caf priority: p1][CR 822157])

Crash Data

Attachments

(7 files, 3 obsolete files)

CAF debug code checks for attempt to close invalid fd and is aborting when detecting this. This instance happened during stability test. cafbot will upload the logs.
Attached file decoded minidump -
Whiteboard: [CR 822157]
Whiteboard: [CR 822157] → [caf priority: p1][CR 822157]
Whiteboard: [caf priority: p1][CR 822157] → [b2g-crash][caf-crash 336][caf priority: p1][CR 822157]
Keywords: crash
sotaro/:fabrice, You guys helped in the past on identifying where the invalid fd close is happening, hence NI'ing you for help. Apparently, CAF has seen abort this ~14 times in the last run.
blocking-b2g: 2.2? → 2.2+
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(fabrice)
See Also: → 1154416
Okey, tomorrow, I am going to check if the crash has an useful information.
Hi! Rex, Need your help to take a look first. Thanks -- Keven
Flags: needinfo?(rhung)
According to the minidump result, I though that both this bug and bug 1154416 have the same issue: "Try to close a BAD file descriptor". Why and where this BAD file descriptor comes from still needs to be investigated, but originally closing the BAD file descriptor should not crash the process, but just returns an error code for the reason. However, both of call stack after calling the system call, close(), are really weird to me and need someone to provide the answer if possible: 1. Where is libnoselect.so from? I locally synced source tree based on "caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.129.xml", but cannot find this shared library. Is there any patch file not available for us? 2. Why does libnoselect.so catch the system call, close(), and trigger an abortion further for this case? Originally, libc.so, not libnoselect.so, should be used after calling the system call wraper function, __wrap_close(). So, is there any extra patch files here? 3. Based on the following call stack, it seems that the different version of libnoselect.so are used because different functions are called to trigger an abortion. Can anyone tell me the purpose of this shared library(libnoselect.so)? Bug#1154411 0 plugin-container!mozalloc_abort [mozalloc_abort.cpp : 37 + 0x4] ^^^^^^^^^^^^^^^ 1 libnoselect.so!close [noselect.c : 89 + 0x3] 2 libmozglue.so!__wrap_close [Nuwa.cpp : 1360 + 0x3] 3 libnss3.so!pt_Close [ptio.c : 1252 + 0x5] Bug#1154416 0 libc.so!tgkill + 0xc 1 libc.so!pthread_kill [pthread_kill.cpp : 49 + 0xb] 2 libc.so!raise [raise.cpp : 32 + 0x9] 3 libc.so!__libc_android_abort [abort.cpp : 47 + 0x5] 4 libc.so!abort + 0x6 ^^^^^ 5 libnoselect.so!close [noselect.c : 89 + 0x3] 6 libmozglue.so!__wrap_close [Nuwa.cpp : 1360 + 0x3] 7 libnss3.so!pt_Close [ptio.c : 1252 + 0x5]
In short, libnoselect.so is the root cause which results in process crash, but we have no idea for it so far. Need someone help to clarify the purpose of libnoselect.so first.
Flags: needinfo?(rhung)
Hi, Greg, Could you check comment 9 and give us some feedback if possible? Thank you.
Flags: needinfo?(ggrisco)
(In reply to Rex Hung[:rhung] from comment #9) > According to the minidump result, I though that both this bug and bug > 1154416 have the same issue: > "Try to close a BAD file descriptor". > > Why and where this BAD file descriptor comes from still needs to be > investigated, but originally closing the BAD file descriptor should not > crash the process, but just returns an error code for the reason. > > However, both of call stack after calling the system call, close(), are > really weird to me and need someone to provide the answer if possible: > 1. Where is libnoselect.so from? > I locally synced source tree based on > "caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.129.xml", but cannot find this > shared library. Is there any patch file not available for us? In internal builds, we are overriding the close() function and check for invalid file descriptors and abort() when this situation is detected. > 2. Why does libnoselect.so catch the system call, close(), and trigger an > abortion further for this case? Originally, libc.so, not libnoselect.so, > should be used after calling the system call wraper function, > __wrap_close(). So, is there any extra patch files here? > We do this in hopes that this will catch problems that would otherwise be much more difficult to debug. The change is an internal one, but basically we call the system close() and if the return value is < 0 and fd is not -1 and errno is not ENODEV then we abort. > 3. Based on the following call stack, it seems that the different version of > libnoselect.so are used because different functions are called to trigger an > abortion. Can anyone tell me the purpose of this shared > library(libnoselect.so)? > Bug#1154411 > 0 plugin-container!mozalloc_abort [mozalloc_abort.cpp : 37 + 0x4] > ^^^^^^^^^^^^^^^ > 1 libnoselect.so!close [noselect.c : 89 + 0x3] > 2 libmozglue.so!__wrap_close [Nuwa.cpp : 1360 + 0x3] > 3 libnss3.so!pt_Close [ptio.c : 1252 + 0x5] > Bug#1154416 > 0 libc.so!tgkill + 0xc > 1 libc.so!pthread_kill [pthread_kill.cpp : 49 + 0xb] > 2 libc.so!raise [raise.cpp : 32 + 0x9] > 3 libc.so!__libc_android_abort [abort.cpp : 47 + 0x5] > 4 libc.so!abort + 0x6 > ^^^^^ > 5 libnoselect.so!close [noselect.c : 89 + 0x3] > 6 libmozglue.so!__wrap_close [Nuwa.cpp : 1360 + 0x3] > 7 libnss3.so!pt_Close [ptio.c : 1252 + 0x5]
Flags: needinfo?(ggrisco)
BTW, we recently turned this check back on (after discovering it had been turned off for some time). So this particular crash might not be catching a new regression.
Greg, any chance you would share your noselect library? I'd like moz to run our tests with it (or something similar). thanks!
Flags: needinfo?(fabrice) → needinfo?(ggrisco)
Sorry, Fabrice, but this library can't be shared. However, I've pretty much described the close() handling in comment 12 if you want to try to implement something similar.
Flags: needinfo?(ggrisco)
I confirmed it is a problem of ArrayBufferBuilder::mapToFileInPackage(). one fd is closed twice by the following. Then it seems regression of Bug 945152. > mozilla::AutoFDClose pr_fd; > mozilla::ScopedClose fd; https://dxr.mozilla.org/mozilla-central/source/dom/base/nsXMLHttpRequest.cpp#4082
Flags: needinfo?(sotaro.ikeda.g)
By the following STR, I reproduced the problem. [1] Created ROM by applying attachment 8593608 [details] [diff] [review] and flash the ROM. It prevent file compression. mapToFileInPackage() is used only when zipItem is not compressed. https://dxr.mozilla.org/mozilla-central/source/dom/base/nsXMLHttpRequest.cpp#4079 [2] Enable Chinese keyboard by the following. Setting -> Keyboards -> 拼音 [3] Start Browser and touch "search or enter address". Keyborad should pop up. [4] Change to Chinese keyboard and type something.
Greg, can you check if attachment 8593613 [details] [diff] [review] fix the problem?
Flags: needinfo?(ggrisco)
Ok, I'm bringing this in now, thanks. We'll probably won't see results until Monday.
Flags: needinfo?(ggrisco)
Thanks!
Blocks: 945152
(In reply to cafbot (PoC: ggrisco) from comment #26) > Created attachment 8593792 [details] > decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.132 It seems not applying attachment 8593613 [details] [diff] [review].
ni Greg per comment 27.
Flags: needinfo?(ggrisco)
(In reply to Sotaro Ikeda [:sotaro] from comment #27) > (In reply to cafbot (PoC: ggrisco) from comment #26) > > Created attachment 8593792 [details] > > decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.132 > > It seems not applying attachment 8593613 [details] [diff] [review]. We have to wait for AU 134. I thought it would be picked up for testing on Sunday, but there was a slight delay. I'll update you by tomorrow hopefully.
Flags: needinfo?(ggrisco)
Assignee: nobody → sotaro.ikeda.g
Hi Sotaro, We haven't seen this crash in a full day of testing on AU134, so I think we definitely want this fix, thanks.
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Greg Grisco from comment #31) > Hi Sotaro, > > We haven't seen this crash in a full day of testing on AU134, so I think we > definitely want this fix, thanks. Good! I am going to proceed to review.
Flags: needinfo?(sotaro.ikeda.g)
Attachment #8593613 - Flags: review?(bent.mozilla)
Comment on attachment 8593613 [details] [diff] [review] patch - Do not close same fd twice Review of attachment 8593613 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/base/nsXMLHttpRequest.cpp @@ +4124,1 @@ > mMapPtr = JS_CreateMappedArrayBufferContents(fd, offset, size); This looks fine, but can you just do: mozilla::AutoFDClose pr_fd; rv = aJarFile->OpenNSPRFileDesc(PR_RDONLY, 0, &pr_fd.rwget()); // ... mMapPtr = JS_CreateMappedArrayBufferContents(PR_FileDesc2NativeHandle(pr_fd), offset, size); Not having another fd on the stack seems simpler to reason about.
Attachment #8593613 - Flags: review?(bent.mozilla) → review+
And let's please backport this as far as we can.
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #34) > Comment on attachment 8593613 [details] [diff] [review] > patch - Do not close same fd twice > > Review of attachment 8593613 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: dom/base/nsXMLHttpRequest.cpp > @@ +4124,1 @@ > > mMapPtr = JS_CreateMappedArrayBufferContents(fd, offset, size); > > This looks fine, but can you just do: > > mozilla::AutoFDClose pr_fd; > rv = aJarFile->OpenNSPRFileDesc(PR_RDONLY, 0, &pr_fd.rwget()); > // ... > mMapPtr = > JS_CreateMappedArrayBufferContents(PR_FileDesc2NativeHandle(pr_fd), > offset, > size); > > Not having another fd on the stack seems simpler to reason about. Yeah, it is simpler. I'll update the patch.
Apply the comment. Carry "r=bent".
Attachment #8593607 - Attachment is obsolete: true
Attachment #8593608 - Attachment is obsolete: true
Attachment #8593613 - Attachment is obsolete: true
Attachment #8596721 - Flags: review+
swu, do you know if ArrayBufferBuilder::mapToFileInPackage() is used on desktop Firefox?
Flags: needinfo?(swu)
(In reply to Sotaro Ikeda [:sotaro] from comment #38) > swu, do you know if ArrayBufferBuilder::mapToFileInPackage() is used on > desktop Firefox? No, it's only enabled on B2G. https://dxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#1094
Flags: needinfo?(swu)
(In reply to Shian-Yow Wu [:swu] from comment #40) > > No, it's only enabled on B2G. > https://dxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#1094 Thanks!
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.2 S11 (1may)
Comment on attachment 8596721 [details] [diff] [review] patch - Do not close same fd twice NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 945152 User impact if declined: System could cause a crash with unexpected reasons. Testing completed: locally tested and codeaurora did tests. Risk to taking this patch (and alternatives if risky): low risk. String or UUID changes made by this patch: none.
Attachment #8596721 - Flags: approval-mozilla-b2g37?
Attachment #8596721 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: