Stability test uncovered attempted to close invalid file descriptor

RESOLVED FIXED in Firefox 40

Status

defect
--
critical
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: ggrisco, Assigned: sotaro)

Tracking

({crash})

unspecified
2.2 S11 (1may)
ARM
Gonk (Firefox OS)
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:2.2+, firefox38 wontfix, firefox39 wontfix, firefox40 fixed, b2g-v2.1 unaffected, b2g-v2.1S unaffected, b2g-v2.2 fixed, b2g-master fixed)

Details

(Whiteboard: [b2g-crash][caf-crash 336][caf priority: p1][CR 822157], crash signature)

Attachments

(7 attachments, 3 obsolete attachments)

Reporter

Description

4 years ago
CAF debug code checks for attempt to close invalid fd and is aborting when detecting this.  This instance happened during stability test.

cafbot will upload the logs.
Whiteboard: [CR 822157]
Whiteboard: [CR 822157] → [caf priority: p1][CR 822157]
Whiteboard: [caf priority: p1][CR 822157] → [b2g-crash][caf-crash 336][caf priority: p1][CR 822157]
Keywords: crash
sotaro/:fabrice, You guys helped in the past on identifying where the invalid fd close is happening, hence NI'ing you for help. Apparently, CAF has seen abort this ~14 times in the last run.
blocking-b2g: 2.2? → 2.2+
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(fabrice)
See Also: → 1154416
Assignee

Comment 7

4 years ago
Okey, tomorrow, I am going to check if the crash has an useful information.

Comment 8

4 years ago
Hi! Rex,

Need your help to take a look first. Thanks

--
Keven
Flags: needinfo?(rhung)

Comment 9

4 years ago
According to the minidump result, I though that both this bug and bug 1154416 have the same issue:
"Try to close a BAD file descriptor".

Why and where this BAD file descriptor comes from still needs to be investigated, but originally closing the BAD file descriptor should not crash the process, but just returns an error code for the reason.

However, both of call stack after calling the system call, close(), are really weird to me and need someone to provide the answer if possible:
1. Where is libnoselect.so from?
I locally synced source tree based on "caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.129.xml", but cannot find this shared library. Is there any patch file not available for us?

2. Why does libnoselect.so catch the system call, close(), and trigger an abortion further for this case? Originally, libc.so, not libnoselect.so, should be used after calling the system call wraper function, __wrap_close(). So, is there any extra patch files here?

3. Based on the following call stack, it seems that the different version of libnoselect.so are used because different functions are called to trigger an abortion. Can anyone tell me the purpose of this shared library(libnoselect.so)?
Bug#1154411
 0  plugin-container!mozalloc_abort [mozalloc_abort.cpp : 37 + 0x4]
                    ^^^^^^^^^^^^^^^
 1  libnoselect.so!close [noselect.c : 89 + 0x3]
 2  libmozglue.so!__wrap_close [Nuwa.cpp : 1360 + 0x3]
 3  libnss3.so!pt_Close [ptio.c : 1252 + 0x5]
Bug#1154416
 0  libc.so!tgkill + 0xc
 1  libc.so!pthread_kill [pthread_kill.cpp : 49 + 0xb]
 2  libc.so!raise [raise.cpp : 32 + 0x9]
 3  libc.so!__libc_android_abort [abort.cpp : 47 + 0x5]
 4  libc.so!abort + 0x6
            ^^^^^
 5  libnoselect.so!close [noselect.c : 89 + 0x3]
 6  libmozglue.so!__wrap_close [Nuwa.cpp : 1360 + 0x3]
 7  libnss3.so!pt_Close [ptio.c : 1252 + 0x5]

Comment 10

4 years ago
In short, libnoselect.so is the root cause which results in process crash, but we have no idea for it so far. Need someone help to clarify the purpose of libnoselect.so first.
Flags: needinfo?(rhung)

Comment 11

4 years ago
Hi, Greg,

Could you check comment 9 and give us some feedback if possible? Thank you.
Flags: needinfo?(ggrisco)
Reporter

Comment 12

4 years ago
(In reply to Rex Hung[:rhung] from comment #9)
> According to the minidump result, I though that both this bug and bug
> 1154416 have the same issue:
> "Try to close a BAD file descriptor".
> 
> Why and where this BAD file descriptor comes from still needs to be
> investigated, but originally closing the BAD file descriptor should not
> crash the process, but just returns an error code for the reason.
>
> However, both of call stack after calling the system call, close(), are
> really weird to me and need someone to provide the answer if possible:
> 1. Where is libnoselect.so from?
> I locally synced source tree based on
> "caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.129.xml", but cannot find this
> shared library. Is there any patch file not available for us?

In internal builds, we are overriding the close() function and check for invalid file descriptors and abort() when this situation is detected.  

> 2. Why does libnoselect.so catch the system call, close(), and trigger an
> abortion further for this case? Originally, libc.so, not libnoselect.so,
> should be used after calling the system call wraper function,
> __wrap_close(). So, is there any extra patch files here?
> 

We do this in hopes that this will catch problems that would otherwise be much more difficult to debug.  The change is an internal one, but basically we call the system close() and if the return value is < 0 and fd is not -1 and errno is not ENODEV then we abort.

> 3. Based on the following call stack, it seems that the different version of
> libnoselect.so are used because different functions are called to trigger an
> abortion. Can anyone tell me the purpose of this shared
> library(libnoselect.so)?
> Bug#1154411
>  0  plugin-container!mozalloc_abort [mozalloc_abort.cpp : 37 + 0x4]
>                     ^^^^^^^^^^^^^^^
>  1  libnoselect.so!close [noselect.c : 89 + 0x3]
>  2  libmozglue.so!__wrap_close [Nuwa.cpp : 1360 + 0x3]
>  3  libnss3.so!pt_Close [ptio.c : 1252 + 0x5]
> Bug#1154416
>  0  libc.so!tgkill + 0xc
>  1  libc.so!pthread_kill [pthread_kill.cpp : 49 + 0xb]
>  2  libc.so!raise [raise.cpp : 32 + 0x9]
>  3  libc.so!__libc_android_abort [abort.cpp : 47 + 0x5]
>  4  libc.so!abort + 0x6
>             ^^^^^
>  5  libnoselect.so!close [noselect.c : 89 + 0x3]
>  6  libmozglue.so!__wrap_close [Nuwa.cpp : 1360 + 0x3]
>  7  libnss3.so!pt_Close [ptio.c : 1252 + 0x5]
Flags: needinfo?(ggrisco)
Reporter

Comment 13

4 years ago
BTW, we recently turned this check back on (after discovering it had been turned off for some time).  So this particular crash might not be catching a new regression.
Greg, any chance you would share your noselect library? I'd like moz to run our tests with it (or something similar). thanks!
Flags: needinfo?(fabrice) → needinfo?(ggrisco)
Reporter

Comment 15

4 years ago
Sorry, Fabrice, but this library can't be shared.  However, I've pretty much described the close() handling in comment 12 if you want to try to implement something similar.
Flags: needinfo?(ggrisco)
Assignee

Comment 16

4 years ago
I confirmed it is a problem of ArrayBufferBuilder::mapToFileInPackage(). one fd is closed twice by the following. Then it seems regression of Bug 945152.

>  mozilla::AutoFDClose pr_fd;
>  mozilla::ScopedClose fd;

https://dxr.mozilla.org/mozilla-central/source/dom/base/nsXMLHttpRequest.cpp#4082
Flags: needinfo?(sotaro.ikeda.g)
Assignee

Comment 18

4 years ago
Assignee

Comment 19

4 years ago
By the following STR, I reproduced the problem.

[1] Created ROM by applying attachment 8593608 [details] [diff] [review] and flash the ROM.
      It prevent file compression. mapToFileInPackage() is used only when zipItem is not compressed.
      https://dxr.mozilla.org/mozilla-central/source/dom/base/nsXMLHttpRequest.cpp#4079
[2]  Enable Chinese keyboard by the following.
     Setting -> Keyboards -> 拼音
[3] Start Browser and touch "search or enter address".
     Keyborad should pop up.
[4] Change to Chinese keyboard and type something.
Assignee

Comment 20

4 years ago
Assignee

Comment 21

4 years ago
Greg, can you check if attachment 8593613 [details] [diff] [review] fix the problem?
Flags: needinfo?(ggrisco)
Reporter

Comment 22

4 years ago
Ok, I'm bringing this in now, thanks.  We'll probably won't see results until Monday.
Flags: needinfo?(ggrisco)
Assignee

Comment 23

4 years ago
Thanks!
Assignee

Updated

4 years ago
Blocks: 945152
Assignee

Comment 27

4 years ago
(In reply to cafbot (PoC: ggrisco) from comment #26)
> Created attachment 8593792 [details]
> decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.132

It seems not applying attachment 8593613 [details] [diff] [review].

Comment 28

4 years ago
ni Greg per comment 27.
Flags: needinfo?(ggrisco)
Reporter

Comment 29

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #27)
> (In reply to cafbot (PoC: ggrisco) from comment #26)
> > Created attachment 8593792 [details]
> > decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.132
> 
> It seems not applying attachment 8593613 [details] [diff] [review].

We have to wait for AU 134.  I thought it would be picked up for testing on Sunday, but there was a slight delay.  I'll update you by tomorrow hopefully.
Flags: needinfo?(ggrisco)

Updated

4 years ago
Assignee: nobody → sotaro.ikeda.g

Updated

4 years ago
Duplicate of this bug: 1154416
Reporter

Comment 31

4 years ago
Hi Sotaro,

We haven't seen this crash in a full day of testing on AU134, so I think we definitely want this fix, thanks.
Flags: needinfo?(sotaro.ikeda.g)
Assignee

Comment 32

4 years ago
(In reply to Greg Grisco from comment #31)
> Hi Sotaro,
> 
> We haven't seen this crash in a full day of testing on AU134, so I think we
> definitely want this fix, thanks.

Good! I am going to proceed to review.
Flags: needinfo?(sotaro.ikeda.g)
Assignee

Updated

4 years ago
Attachment #8593613 - Flags: review?(bent.mozilla)
Comment on attachment 8593613 [details] [diff] [review]
patch - Do not close same fd twice

Review of attachment 8593613 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsXMLHttpRequest.cpp
@@ +4124,1 @@
>      mMapPtr = JS_CreateMappedArrayBufferContents(fd, offset, size);

This looks fine, but can you just do:

  mozilla::AutoFDClose pr_fd;
  rv = aJarFile->OpenNSPRFileDesc(PR_RDONLY, 0, &pr_fd.rwget());
  // ...
  mMapPtr =
    JS_CreateMappedArrayBufferContents(PR_FileDesc2NativeHandle(pr_fd),
                                       offset,
                                       size);

Not having another fd on the stack seems simpler to reason about.
Attachment #8593613 - Flags: review?(bent.mozilla) → review+
And let's please backport this as far as we can.
Assignee

Comment 36

4 years ago
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #34)
> Comment on attachment 8593613 [details] [diff] [review]
> patch - Do not close same fd twice
> 
> Review of attachment 8593613 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/base/nsXMLHttpRequest.cpp
> @@ +4124,1 @@
> >      mMapPtr = JS_CreateMappedArrayBufferContents(fd, offset, size);
> 
> This looks fine, but can you just do:
> 
>   mozilla::AutoFDClose pr_fd;
>   rv = aJarFile->OpenNSPRFileDesc(PR_RDONLY, 0, &pr_fd.rwget());
>   // ...
>   mMapPtr =
>     JS_CreateMappedArrayBufferContents(PR_FileDesc2NativeHandle(pr_fd),
>                                        offset,
>                                        size);
> 
> Not having another fd on the stack seems simpler to reason about.

Yeah, it is simpler. I'll update the patch.
Assignee

Comment 37

4 years ago
Apply the comment. Carry "r=bent".
Attachment #8593607 - Attachment is obsolete: true
Attachment #8593608 - Attachment is obsolete: true
Attachment #8593613 - Attachment is obsolete: true
Attachment #8596721 - Flags: review+
Assignee

Comment 38

4 years ago
swu, do you know if ArrayBufferBuilder::mapToFileInPackage() is used on desktop Firefox?
Flags: needinfo?(swu)
(In reply to Sotaro Ikeda [:sotaro] from comment #38)
> swu, do you know if ArrayBufferBuilder::mapToFileInPackage() is used on
> desktop Firefox?

No, it's only enabled on B2G.
https://dxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#1094
Flags: needinfo?(swu)
Assignee

Comment 41

4 years ago
(In reply to Shian-Yow Wu [:swu] from comment #40)
> 
> No, it's only enabled on B2G.
> https://dxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#1094

Thanks!
https://hg.mozilla.org/mozilla-central/rev/cb6dc5904aaf
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.2 S11 (1may)
Assignee

Comment 44

4 years ago
Comment on attachment 8596721 [details] [diff] [review]
patch - Do not close same fd twice

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 945152
User impact if declined: System could cause a crash with unexpected reasons.
Testing completed: locally tested and codeaurora did tests.
Risk to taking this patch (and alternatives if risky): low risk.
String or UUID changes made by this patch: none.
Attachment #8596721 - Flags: approval-mozilla-b2g37?
Attachment #8596721 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
You need to log in before you can comment on or make changes to this bug.