Closed Bug 1264209 Opened 8 years ago Closed 5 years ago

crash in ReceivePort::WaitForMessage | mozilla::ipc::SharedMemoryBasic::ShareToProcess

Categories

(Core :: WebRTC, defect, P3)

48 Branch
Unspecified
macOS
defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox45 --- unaffected
firefox46 --- unaffected
firefox47 --- unaffected
firefox48 --- affected

People

(Reporter: adalucinet, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-72ccc986-cba3-46bf-8b73-e37c82160413.
=============================================================
[Note]: logged as requested in bug 1263929 comment 1

[Affected versions]: latest Nightly 48.0a1 (from 2016-04-12), e10s enabled/disabled

[Affected platforms]: Mac OS X 10.10.5

[Steps to reproduce]:
1. Start Firefox, visit a webpage and Click on Hello icon (chat bubble with smiley face).
2. Send a copy link of the room to another tab/browser/PC and join the conversation.
3. Close the conversation on Host side.

[Regression range]:
- not reproducible with latest Aurora 47.0a2 nor with 46 beta 10; will investigate further.

[Additional notes]:
1. Unable to reproduce under Windows 8.1 x86 nor Ubuntu 12.04 x64.
2. Using the same STR also encountered the signature from bug 1263667.
3. Note that with STR from bug 1260702 comment 0, this crash is no longer reproducible with latest 48.0a1, under Mac OS X 10.10.5.
4. More reports:
https://crash-stats.mozilla.com/signature/?signature=ReceivePort%3A%3AWaitForMessage+%7C+mozilla%3A%3Aipc%3A%3ASharedMemoryBasic%3A%3AShareToProcess&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#reports
Crashes in CamerasParent in some shmem/ipc code - perhaps the ICE Restart bug she was looking at/for killed the Content process, and that hit a race condition in the parent IPC/shmem code?
Rank: 23
Flags: needinfo?(gpascutto)
Priority: -- → P2
(In reply to Randell Jesup [:jesup] from comment #1)
> Crashes in CamerasParent in some shmem/ipc code - perhaps the ICE Restart
> bug she was looking at/for killed the Content process, and that hit a race
> condition in the parent IPC/shmem code?

Yep. The IPC code hangs because one side (that has already crashed) isn't responding. (Interesting question is if it recovers when we restart content, but that's another bug (maybe))
Flags: needinfo?(gpascutto)
Has CamerasParent::ActorDestroy already been called? This shouldn't crash if ActorDestroy hasn't happened yet, and is illegal (will probably crash) after ActorDestroy. Can you check and move this to the IPC component if we're crashing on a valid call pattern?
This looks like a result of Hello sending iceRestart=true on every create offer (including the first one).  Current nightly has a fix that should catch this case (Bug 1264344).
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> Has CamerasParent::ActorDestroy already been called? This shouldn't crash if
> ActorDestroy hasn't happened yet, and is illegal (will probably crash) after
> ActorDestroy. Can you check and move this to the IPC component if we're
> crashing on a valid call pattern?

It's right in the middle of an active conversation. The line that's in the stacks is one that can't crash (not an uncommon issue on our Mac OS X backtraces IIRC).
I can reproduce this bug (at least I think it is this bug) on demand in 49.0.1 stable. https://crash-stats.mozilla.com/report/index/3ef2016d-1c5a-469e-8498-20ebb2161012
Steps to reproduce would be useful in that case :-)
STR:
1. Use https://gist.github.com/tjsail33/712494b8f1f18849f5fea95c82b6b60d to create a RecordRTC based webcam recorder.
2. Use https://gist.github.com/tjsail33/ae3bcc5c34f8525e2b43c0b618919786 as your uploader to S3
3. Use https://gist.github.com/tjsail33/d521e5348b0f0acd5abfc5106e8cd975 as your angular view controller
4. Open webpage, run `recorder.initRTC(); recorder.startRTCRecording(); setTimeout(recorder.stopRTCRecording, 5000);`
5. Upload should trigger based on event bubbling
6. FF window crashes

I am not sure about whether the crash is due to the upload or due to the ending of the recording. If you need a fully functional example I can set one up, but due to company IP stuff it may take a day or two.
A quick update - upon more research I can confirm that the crash only occurs when attempting to upload the Blob to S3 via a POST request. Here are the headers from the equivalent actions in Chrome, as I can't see them in FF.

--------------------------
Request:
OPTIONS / HTTP/1.1
Host: my-bucket.s3.amazonaws.com
Connection: keep-alive
Access-Control-Request-Method: POST
Origin: http://localhost:7001
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36
Access-Control-Request-Headers:
Accept: */*
Referer: http://localhost:7001/
Accept-Encoding: gzip, deflate, sdch, br
Accept-Language: en-US,en;q=0.8

Response:
HTTP/1.1 200 OK
x-amz-id-2: FFx+DTO/pLltjo5hzIKi3cudWihpawXVA8Hobau0GFIkYp/Uht709voZ60tvhAaOZqNvFMFyo3Y=
x-amz-request-id: 6BDB28072D275909
Date: Thu, 13 Oct 2016 00:44:06 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PUT
Vary: Origin, Access-Control-Request-Headers, Access-Control-Request-Method
Content-Length: 0
Server: AmazonS3

----------------------------
Request:
POST / HTTP/1.1
Host: my-bucket.s3.amazonaws.com
Connection: keep-alive
Content-Length: 637847
Origin: http://localhost:7001
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryCehnZKhu2Nz03urz
Accept: */*
Referer: http://localhost:7001/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8

Response:
HTTP/1.1 200 OK
x-amz-id-2: 1j4fn7fqhbdj1t8MmGHBHVuTxxEo+47T/ZDA8N5aiTZpenyRswB8JZh9yF1GP6DZSuI6X5F8Qq8=
x-amz-request-id: F61FB6E79713C067
Date: Thu, 13 Oct 2016 00:44:07 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PUT
Vary: Origin, Access-Control-Request-Headers, Access-Control-Request-Method
ETag: "039824c5a79f5cd780b5cc7e221e9849"
Location: https://my-bucket.s3.amazonaws.com/files-test%2Fbdeba4ab-4fa0-4fee-8e94-b71cbecf2c98%2Fwebcam.webm
Content-Length: 0
Server: AmazonS3



------------------

Not sure which of these two requests is the trigger.

Docs for the upload function being used:
http://docs.fineuploader.com/branch/master/api/methods.html#addFiles

Which goes to https://github.com/FineUploader/fine-uploader/blob/c1b4954f21381dbd664e379abf571c988e2265c0/client/js/uploader.basic.api.js#L22

Which goes to https://github.com/FineUploader/fine-uploader/blob/c1b4954f21381dbd664e379abf571c988e2265c0/client/js/uploader.basic.api.js#L1010

Fine Uploader is reporting that it gets to "Sending upload request for 0" while in debug mode. Here's a screencast of my console: https://www.dropbox.com/s/f938m7sdt3pj7mb/ff-upload-crash.mov?dl=0

The crash occurs around 23s right after this log is printed: https://github.com/FineUploader/fine-uploader/blob/c1b4954f21381dbd664e379abf571c988e2265c0/client/js/s3/s3.xhr.upload.handler.js#L311

Hopefully this information is helpful - looks like a bug in the XHR handling of RTC blobs.
STR from bug 1310441:

Use MediaRecorder to record a short video, then upload it via XmlHttpRequest.

You can try it out here:
https://dl.dropboxusercontent.com/u/2378440/Ziggeo/Tests/firefox-crash.html

I can fix the bug by copying the blob via an intermediate ArrayBuffer to a new blob and then uploading it.

There's a testcase in attachment 8801536 [details]
Is there any chance this is related to bug 1167730? (which was recently fixed, and then uplifted to beta a couple of weeks ago).
Just experienced this crash when someone joined my WebRTC call on meet.jit.si
https://crash-stats.mozilla.com/report/index/caeacb23-66aa-4296-8a4e-a75532170410
(In reply to Nils Ohlmeier [:drno] from comment #13)
> Just experienced this crash when someone joined my WebRTC call on meet.jit.si
> https://crash-stats.mozilla.com/report/index/caeacb23-66aa-4296-8a4e-
> a75532170410

Potentially what happened in my case is that this crash https://crash-stats.mozilla.com/report/index/f24b91e9-4382-4fb4-8681-7b74d2170410 first brought down the content process which was then followed by the whole browser being taken down by this crash subsequently.
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3
See Also: → 1409167
Blocks: meet
See Also: → 1443102
See Also: → 1461813
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.