Closed Bug 918372 Opened 11 years ago Closed 11 years ago

Repeated WebRTC sessions exhaust the JNI LocalRef table

Categories

(Core :: WebRTC: Audio/Video, defect)

ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
firefox26 --- fixed
firefox27 --- fixed

People

(Reporter: gcp, Assigned: gcp)

References

Details

(Keywords: verifyme)

Attachments

(4 files, 5 obsolete files)

2.25 KB, patch
blassey
: review+
Details | Diff | Splinter Review
10.56 KB, patch
blassey
: review+
Details | Diff | Splinter Review
5.74 KB, patch
blassey
: review+
Details | Diff | Splinter Review
29.63 KB, patch
blassey
: review+
Details | Diff | Splinter Review
Filing this based on the issue discovered here: https://bugzilla.mozilla.org/show_bug.cgi?id=902431#c27 It may or may not be the cause of that bug, so filing separately.
Blocks: 902431
The problem is this code, which is in any WebRTC entry/setup point: http://dxr.mozilla.org/mozilla-central/source/content/media/webrtc/MediaEngineWebRTC.cpp#l234 We attach the JNI there, from what I can tell only because we want to be able to delete the GlobalRef, but then don't detach. The upstream webrtc.org code all assumes it can Attach and Detach to the JVM and doesn't need to clean up LocalRefs because Detaching cleans them up anyway. It's also smart enough to check if it's already attached, and sets a flag and avoids the detach in that case. The above code doesn't detach, which means we're always attached, never clean up LocalRefs, and eventually die after enough WebRTC usage.
There's several possible things to do here: 1) Change the GetGlobalContext thing to make a single global ref and keep a hold of it for subsequent callers, and forget about cleaning it up. This makes the attach/detach unnecessary. 2) There's some argument if we want to keep the webrtc.org behavior of doing constant attach/detatch, or clean it up to do single attach at the start and keep it attached afterwards, much like AndroidBridge is doing. This does mean we have to clean up all local refs, using PushLocalFrame, possibly coupled with a check if we're attached. I guess we can change the existing AttachAndUseAndroidDeviceInfoObjects and Detach equivalent to use Push/PopLocal if we're already attached.
Assignee: nobody → gpascutto
Attachment #807386 - Flags: review?(blassey.bugs)
Attachment #807388 - Flags: review?(blassey.bugs)
Attachment #807390 - Flags: review?(blassey.bugs)
Attachment #807391 - Flags: review?(blassey.bugs)
Comment on attachment 807391 [details] [diff] [review] Patch 4. Use JNI frames when not attaching/detaching Review of attachment 807391 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/webrtc/trunk/webrtc/modules/video_capture/android/video_capture_android.cc @@ +255,5 @@ > + int res = env->PushLocalFrame(64); > + if (res != 0) { > + WEBRTC_TRACE(webrtc::kTraceError, webrtc::kTraceVideoCapture, -1, > + "%s: Could not push local stack frame"); > + } you shouldn't be pushing and popping local frames across functions. What references are you trying to hold onto with this frame? Really they should just have a global reference.
Attachment #807391 - Flags: review?(blassey.bugs) → review-
Attachment #807386 - Flags: review?(blassey.bugs) → review+
Attachment #807390 - Flags: review?(blassey.bugs) → review+
Comment on attachment 807388 [details] [diff] [review] Patch 2. Keep a single global Context reference around Review of attachment 807388 [details] [diff] [review]: ----------------------------------------------------------------- ::: widget/android/AndroidBridge.cpp @@ +1436,5 @@ > jobject > AndroidBridge::GetGlobalContextRef() { > + if (sGlobalContext) { > + return sGlobalContext; > + } else { this needs to be done per thread. I think you may be only calling this on one thread, but if that is the case record the thread id and assert that (I'd be fine with that being debug only)
Attachment #807388 - Flags: review?(blassey.bugs) → review-
>I think you may be only calling this on one thread, but if that is the case record the thread id and assert that (I'd be fine with that being debug only) I would've presumed that too, but I added the assert and BOOM: http://dxr.mozilla.org/mozilla-central/source/dom/media/MediaManager.cpp#l1336
Comment on attachment 807388 [details] [diff] [review] Patch 2. Keep a single global Context reference around Review of attachment 807388 [details] [diff] [review]: ----------------------------------------------------------------- As discussed on IRC: http://android-developers.blogspot.ca/2011/11/jni-local-reference-changes-in-ics.html " A global is usable from any thread, using that thread’s JNIEnv*, and is valid until an explicit call to DeleteGlobalRef()." This means it's okay to pass the GlobalRef'ed GlobalContext across threads. The other path in the code is using GetJNIForThread so it should use the correct JNIEnv. <blassey> then you're good <blassey> comment in the bug telling me I'm wrong <blassey> and r+
Attachment #807388 - Flags: review- → review+
We don't have any global references we want to keep, but the upstream code assumes it can Attach/Detach to free up LocalRefs. In quite a few (but not all - thanks to the GlobalRef caching of the previous patch) of our calling sequences we will actually attach the JVM already, which makes this code not do the attach/detach cycle. This patch adds an AutoLocalJNIFrame that does the right thing in both circumstances, by noting when the Detach will work and being a NOP in that case, or making a frame if it won't.
Attachment #807391 - Attachment is obsolete: true
Attachment #807902 - Flags: review?(blassey.bugs)
Comment on attachment 807902 [details] [diff] [review] Patch 4. v2 Use JNI frames when not attaching/detaching Review of attachment 807902 [details] [diff] [review]: ----------------------------------------------------------------- As discussed, let's use a RAII pattern for attaching and detaching the current thread. You can call GetEnv() to get the JNIEnv, if it returns JNI_EDETACHED, then call AttachCurrentThread() to get the JNIEnv. If you have to attach, then set a member variable so you can detach in the destructor. Whether we attach the thread or not, we should push a local frame (it allows us to catch and handle exceptions as well as clean up local refs). Just need to make sure we push after attaching and pop before detaching. Finally, this patch should be upstreamed
Attachment #807902 - Flags: review?(blassey.bugs) → review-
Comment on attachment 807388 [details] [diff] [review] Patch 2. Keep a single global Context reference around Review of attachment 807388 [details] [diff] [review]: ----------------------------------------------------------------- realized that there is another problem with this. Our context can change (and will change if the phone rotates for instance). Keeping a global reference to that context will keep a stale GeckoActivity in memory along with anything it has references to as well as leave us calling functions on the wrong context. The best option may be to have GeckoApp call a native method to set the context whenever it changes and store that statically with a global reference. If the static context is non null when we set it, that reference needs to be deleted. Setting the static context from the main ui thread will have the added benefit of not calling GetJNIForThread and attaching the thread, so we can run as the upstream code expects.
Attachment #807388 - Flags: review+ → review-
Attachment #807902 - Attachment is obsolete: true
Attachment #808748 - Flags: review?(blassey.bugs)
>Our context can change (and will change if the phone rotates for instance) This is not true because we deal with the rotation event ourselves and don't reinitialize. However I'm less certain that applies to GeckoView users. >The best option may be to have GeckoApp call a native method to set the context whenever it changes and store that statically with a global reference Alternatively we should use the Application context for GetGlobalContextRef. For your solution it's not so obvious to me what "whenever it changes is", whereas the Application Context is exactly there for things that need a long-lived reference. >Setting the static context from the main ui thread will have the added benefit of not calling GetJNIForThread and attaching the thread, so we can run as the upstream code expects. At the cost of adding locking everywhere, as the webrtc.org code is not expecting you to change the context it's operating on under it's feet...
Comment on attachment 808748 [details] [diff] [review] Patch 4. v3 Use RAII for all attach/detach/localframes in Android Video Review of attachment 808748 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/webrtc/trunk/webrtc/modules/video_capture/android/video_capture_android.h @@ +80,5 @@ > + } > + > + if (VideoCaptureAndroid::g_jvm->GetEnv((void**) &mJNIEnv, JNI_VERSION_1_4) != JNI_OK) { > + // Try to attach the thread and get the env. > + // Attach this thread to JVM. should probably only attach if the return value is JNI_EDETACHED, not sure what the beviour will be if another error is returned @@ +108,5 @@ > + } > + > + int Init(JNIEnv*& env, > + jclass& javaCmDevInfoClass, > + jobject& javaCmDevInfoObject) I'm not a fan of these public Init() functions. You should be able to pass a JNIEnv into the constructor (though I don't think there will be any callers in the WebRTC code). But please get rid of the out params on the Init/constructors. Add public GetJNIEnv(), GetCmDevInfoClass() and GetCmDevInfoObject() getters.
Attachment #808748 - Flags: review?(blassey.bugs) → review+
Switch to getApplicationContext().
Attachment #807388 - Attachment is obsolete: true
Attachment #808851 - Flags: review?(blassey.bugs)
Add some error detection/logging.
Attachment #808851 - Attachment is obsolete: true
Attachment #808851 - Flags: review?(blassey.bugs)
Attachment #808858 - Flags: review?(blassey.bugs)
The review comments wrt. passing env to the constructor vs. removing the out params was a bit mutually exclusive, so passing final version for review. We use env as the out parameter and to indicate an error from the constructor. GetCmDevInfo* are a bit strange to have in AutoLocalFrame now, though I guess for device_info_android.* them being in AutoLocalFrame does mean that you can't actually use them without a valid env+being attached.
Attachment #808748 - Attachment is obsolete: true
Attachment #808882 - Flags: review?(blassey.bugs)
Comment on attachment 808882 [details] [diff] [review] Patch 4. v4 Use RAII for all attach/detach/localframes in Android Video Review of attachment 808882 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/webrtc/trunk/webrtc/modules/video_capture/android/video_capture_android.h @@ +160,5 @@ > + __FUNCTION__); > + } > + > + // Scope will detach JVM when finished, no need to clean up. > + if (!mAttached) This line should go, see previous comment: "Whether we attach the thread or not, we should push a local frame (it allows us to catch and handle exceptions as well as clean up local refs)."
Comment on attachment 808882 [details] [diff] [review] Patch 4. v4 Use RAII for all attach/detach/localframes in Android Video Review of attachment 808882 [details] [diff] [review]: ----------------------------------------------------------------- r=blassey with nits addressed ::: media/webrtc/trunk/webrtc/modules/video_capture/android/device_info_android.cc @@ +56,5 @@ > uint32_t DeviceInfoAndroid::NumberOfDevices() { > JNIEnv *env; > + AutoLocalJNIFrame JNIFrame(env); > + if (!env) > + return 0; I'd really rather this be: AutoLocalJNIFrame jniFrame(); JNIEnv* env = jniFrame.GetJNIEnv(); if (!env) return 0; reasoning being that out params in constructors are really weird. It is also the same signature as the implementation in AndroidBridge but with a different meaning. Please make that change through out the patch. Also note the lower case jni, having it upper case makes it look like a class ::: media/webrtc/trunk/webrtc/modules/video_capture/android/video_capture_android.h @@ +137,5 @@ > + if (!VideoCaptureAndroid::g_jvm) { > + WEBRTC_TRACE(webrtc::kTraceError, webrtc::kTraceVideoCapture, -1, > + "%s: SetAndroidObjects not called with a valid JVM.", > + __FUNCTION__); > + return -1; null out mJNIEnv on errors, or better yet start out by nulling it out.
Attachment #808882 - Flags: review?(blassey.bugs) → review+
Attachment #808858 - Flags: review?(blassey.bugs) → review+
STR for QA: On a Nexus 4: 1) Go to http://www.simpl.info/getusermedia/ 2) Approve WebRTC permission dialog 3) Observe video starting 4) Reload site Unpatched Firefox will crash after about 10 iterations. Patched Firefox shouldn't crash before you get bored.
Keywords: verifyme
Target Milestone: mozilla27 → ---
Target Milestone: --- → mozilla27
Comment on attachment 807386 [details] [diff] [review] Patch 1. Add some debug assertions [Approval Request Comment] Bug caused by (feature/regressing bug #): Android WebRTC User impact if declined: WebRTC crash after many WebRTC sessions Testing completed (on m-c, etc.): Landed on m-c Risk to taking this patch (and alternatives if risky): The whole patchstack is a reasonable amount of changes, all of which remove technical debt/clean things up. At worst it will crash more.
Attachment #807386 - Flags: approval-mozilla-aurora?
Comment on attachment 807390 [details] [diff] [review] Patch 3. Add more logging for WebRTC startup [Approval Request Comment] Bug caused by (feature/regressing bug #): Android WebRTC User impact if declined: WebRTC crash after many WebRTC sessions Testing completed (on m-c, etc.): Landed on m-c Risk to taking this patch (and alternatives if risky): The whole patchstack is a reasonable amount of changes, all of which remove technical debt/clean things up. At worst it will crash more.
Attachment #807390 - Flags: approval-mozilla-aurora?
Comment on attachment 808858 [details] [diff] [review] Patch 2. v3 Keep a single global Context reference around [Approval Request Comment] Bug caused by (feature/regressing bug #): Android WebRTC User impact if declined: WebRTC crash after many WebRTC sessions Testing completed (on m-c, etc.): Landed on m-c Risk to taking this patch (and alternatives if risky): The whole patchstack is a reasonable amount of changes, all of which remove technical debt/clean things up. At worst it will crash more.
Attachment #808858 - Flags: approval-mozilla-aurora?
Comment on attachment 808882 [details] [diff] [review] Patch 4. v4 Use RAII for all attach/detach/localframes in Android Video [Approval Request Comment] Bug caused by (feature/regressing bug #): Android WebRTC User impact if declined: WebRTC crash after many WebRTC sessions Testing completed (on m-c, etc.): Landed on m-c Risk to taking this patch (and alternatives if risky): The whole patchstack is a reasonable amount of changes, all of which remove technical debt/clean things up. At worst it will crash more.
Attachment #808882 - Flags: approval-mozilla-aurora?
(In reply to Gian-Carlo Pascutto (:gcp) from comment #28) > Comment on attachment 808882 [details] [diff] [review] > Patch 4. v4 Use RAII for all attach/detach/localframes in Android Video > > [Approval Request Comment] > Bug caused by (feature/regressing bug #): Android WebRTC > User impact if declined: WebRTC crash after many WebRTC sessions > Testing completed (on m-c, etc.): Landed on m-c > Risk to taking this patch (and alternatives if risky): The whole patchstack > is a reasonable amount of changes, all of which remove technical debt/clean > things up. At worst it will crash more. If it risks crashing more, then why would this be safe enough for an aurora uplift?
Because the expected behavior is for it crash far less. Worst case is exactly what it says.
Comment on attachment 807386 [details] [diff] [review] Patch 1. Add some debug assertions I understand what you meant, let's get this on Aurora and watch the results - we have time to back out if there's higher crashing (which is not expected).
Attachment #807386 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #807390 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #808858 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #808882 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: