Closed Bug 605457 Opened 11 years ago Closed 11 years ago

Ogg code uses a singlethreaded hashtable on multiple threads, causing assertions during random tests (e.g. 432561-1.html)

Categories

(Core :: Audio/Video, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
Tracking Status
blocking2.0 --- -
status2.0 --- ?

People

(Reporter: mounir, Assigned: cajbir)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

###!!! ASSERTION: RECURSION_LEVEL_SAFE_TO_FINISH(table): 'RECURSION_LEVEL_SAFE_TO_FINISH(table)', file pldhash.c, line 406
PL_DHashTableFinish [pldhash.c:409]
nsTHashtable<nsBaseHashtableET<nsUint32HashKey, nsAutoPtr<nsOggCodecState> > >::~nsTHashtable [nsTHashtable.h:318]
nsBaseHashtable<nsUint32HashKey,nsAutoPtr<nsOggCodecState>,nsOggCodecState*>::~nsBaseHashtable [nsBaseHashtable.h:84]
nsClassHashtable<nsUint32HashKey,nsOggCodecState>::~nsClassHashtable [nsClassHashtable.h:56]
nsOggReader::~nsOggReader [content/media/ogg/nsOggReader.cpp:116]
nsAutoPtr<nsBuiltinDecoderReader>::~nsAutoPtr [nsAutoPtr.h:104]
nsBuiltinDecoderStateMachine::~nsBuiltinDecoderStateMachine [content/media/nsBuiltinDecoderStateMachine.cpp:164]
nsOggDecoderStateMachine::~nsOggDecoderStateMachine [content/media/ogg/nsOggDecoderStateMachine.h:45]
nsRunnable::Release [nsThreadUtils.cpp:55]
nsCOMPtr<nsDecoderStateMachine>::assign_assuming_AddRef [nsCOMPtr.h:518]
nsCOMPtr<nsDecoderStateMachine>::assign_with_AddRef [nsCOMPtr.h:1204]
nsCOMPtr<nsDecoderStateMachine>::operator= [nsCOMPtr.h:664]
nsBuiltinDecoder::Stop [content/media/nsBuiltinDecoder.cpp:137]
nsRunnableMethodImpl<void (nsBuiltinDecoder::*)(),true>::Run [nsThreadUtils.h:348]
nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:547]
NS_ProcessNextEvent_P [nsThreadUtils.cpp:250]
nsThread::Shutdown [xpcom/threads/nsThread.cpp:476]
nsBuiltinDecoder::Stop [content/media/nsBuiltinDecoder.cpp:135]
nsRunnableMethodImpl<void (nsBuiltinDecoder::*)(),true>::Run [nsThreadUtils.h:348]
nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:547]
NS_ProcessNextEvent_P [nsThreadUtils.cpp:250]
nsThread::Shutdown [xpcom/threads/nsThread.cpp:476]
nsBuiltinDecoder::Stop [content/media/nsBuiltinDecoder.cpp:135]
nsRunnableMethodImpl<void (nsBuiltinDecoder::*)(),true>::Run [nsThreadUtils.h:348]
nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:547]
NS_ProcessPendingEvents_P [nsThreadUtils.cpp:200]
nsBaseAppShell::NativeEventCallback [widget/src/xpwidgets/nsBaseAppShell.cpp:132]
nsAppShell::ProcessGeckoEvents [widget/src/cocoa/nsAppShell.mm:400]
CoreFoundation + 0x4d271
CoreFoundation + 0x4b469
CoreFoundation + 0x4ac2f
HIToolbox + 0x2ea4e
HIToolbox + 0x2e7b1
HIToolbox + 0x2e70c
AppKit + 0x441f2
-AppKit + 0x43b41
nsAppShell::ProcessNextNativeEvent [widget/src/cocoa/nsAppShell.mm:675]
nsBaseAppShell::DoProcessNextNativeEvent [widget/src/xpwidgets/nsBaseAppShell.cpp:161]
nsBaseAppShell::OnProcessNextEvent [widget/src/xpwidgets/nsBaseAppShell.cpp:299]
nsAppShell::OnProcessNextEvent [widget/src/cocoa/nsAppShell.mm:833]
nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:519]
NS_ProcessPendingEvents_P [nsThreadUtils.cpp:200]
nsBaseAppShell::NativeEventCallback [widget/src/xpwidgets/nsBaseAppShell.cpp:132]
nsAppShell::ProcessGeckoEvents [widget/src/cocoa/nsAppShell.mm:400]
CoreFoundation + 0x4d271
CoreFoundation + 0x4b469
CoreFoundation + 0x4ac2f
HIToolbox + 0x2ea4e
HIToolbox + 0x2e7b1
HIToolbox + 0x2e70c
AppKit + 0x441f2
-AppKit + 0x43b41
-AppKit + 0x9747
nsAppShell::Run [widget/src/cocoa/nsAppShell.mm:746]
nsAppStartup::Run [toolkit/components/startup/src/nsAppStartup.cpp:191]
XRE_main [toolkit/xre/nsAppRunner.cpp:3670]
main [browser/app/nsBrowserApp.cpp:158]
REFTEST TEST-PASS | file:///Users/cltbld/talos-slave/mozilla-central_snowleopard-debug_test-crashtest/build/reftest/tests/layout/style/crashtests/432561-1.html | (LOAD ONLY)
REFTEST INFO | Loading a blank page
++DOMWINDOW == 16 (0x11c83c8d8) [serial = 2767] [outer = 0x11825b9c0]
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/mozilla-central_snowleopard-debug_test-crashtest/build/reftest/tests/layout/style/crashtests/432561-1.html | assertion count 1 is more than expected 0 assertions

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1287495658.1287496043.17882.gz
See also bug 613564 for this assertion cropping up in a different test.
Blocks: 613564
Component: XPCOM → Video/Audio
QA Contact: xpcom → video.audio
Summary: 432561-1.html | assertion count 1 is more than expected 0 assertions → Ogg code uses a singlethreaded hashtable on multiple threads, causing assertions during random tests (e.g. 432561-1.html)
Assignee: nobody → chris.double
blocking2.0: --- → final+
We could just use nsClassHashTableMT  instead of nsClassHashTable, but:

[11:57]	<bent>	cpearce: there's a gotcha though
[11:57]	<bent>	which i think makes the MT variants useless
[11:57]	<bent>	you can't do something like "get this entry, or make a new one"
[11:58]	<bent>	which i almost always want to do
[11:58]	<cpearce>	bent: so it's read only?
[11:58]	<bent>	no, it's that
[11:58]	<jlebar>	cpearce, in that case, you don't need MT!
[11:58]	<bent>	cpearce: the MT versions only lock during "Get" and "Put"
[11:59]	<bent>	there's no way to say "lock during Get *and* Put"
[11:59]	<cpearce>	ah
[11:59]	<cpearce>	this is ok for my use case, but in general, yeah, annoying.
[11:59]	<bent>	so other threads can hop in
[11:59]	<bent>	it's a mess, i wish we would remove them
[11:59]	<bent>	people think "oh, this one is threadsafe, lah dee dah"

We only ever use the hash table by one thread at a time, so we'd not be bitten by this.

Or we could change the nsBuiltinDecoderStateMachine to read the metadata and shutdown on the decoder thread as well, but that's much more work. We're going to rewrite the decoder anyway to reduce its number of threads, maybe we should do that change then?
(In reply to comment #3)
> [11:59]    <bent>    people think "oh, this one is threadsafe, lah dee dah"

Yeah, that's par for the course for threadsafe data structures in general.

> We only ever use the hash table by one thread at a time, so we'd not be bitten
> by this.

Are you sure? Because this assertion would only fire if we had concurrent access from multiple threads, AFAIK.

If we can locate the concurrent access and prevent it, then this bug will be fixed.
(In reply to comment #4)
> > We only ever use the hash table by one thread at a time, so we'd not be bitten
> > by this.
> 
> Are you sure? Because this assertion would only fire if we had concurrent
> access from multiple threads, AFAIK.
> 
> If we can locate the concurrent access and prevent it, then this bug will be
> fixed.

Wait no. We use the hash table on the decoder thread as well as on the main thread in the Ogg GetBuffered() implementation. GetBuffered() needs it to identify playing streams, and convert granulepos to timecodes. The only way we could remove the concurrent access is to duplicate the list of playing streams for Ogg, and factor out the granulepos to timecode logic. We'd need to reimplement th_granule_frame() (which is easy) and maintain it to do that.
We talked about this over lunch, it's safe enough to just use an nsTArray to store the serialno to nsOggCodecState mapping. There's usually only 3 streams, so a linear scan in an nsTArray won't be slower than a hash table lookup.
Instead of using the hash table in the GetBuffered that runs on the main thread can we just compare the serial number against the streams we are interested in (mTheoraState, mVorbisState, mSkeletonState)? This way we avoid using the hashtable completely.

I notice we call methods/access data in the state objects themselves in GetBuffered. How threadsafe is that?
(In reply to comment #7)
> Instead of using the hash table in the GetBuffered that runs on the main thread
> can we just compare the serial number against the streams we are interested in
> (mTheoraState, mVorbisState, mSkeletonState)? This way we avoid using the
> hashtable completely.

Sounds reasonable.

> I notice we call methods/access data in the state objects themselves in
> GetBuffered. How threadsafe is that?

GetBuffered() and the functions it calls access things that shouldn't change after metadata is loaded (like the Theora granuleshift and the nsOggCodecState::mActive field, so I think we're safe.
I took an approach similar to that suggested by Chris Pearce in comment 5 about reimplementing th_granule_frame.
Attachment #498017 - Flags: review?(chris)
Comment on attachment 498017 [details] [diff] [review]
Remove concurrent access to hash table

Looks good, just a small typo "Vorbis/Tehora data used to compute timestamps" in the comment starting above nsOggReader::mVorbisSerial.
Attachment #498017 - Flags: review?(chris) → review+
status2.0: --- → ?
OS: Mac OS X → All
Hardware: x86_64 → All
Keywords: checkin-needed
Status: NEW → ASSIGNED
http://hg.mozilla.org/mozilla-central/rev/c6c33df76468
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1296500234.1296500612.9600.gz&fulltext=1
OS X 10.6 comm-central-trunk debug test crashtest on 2011/01/31 10:57:14
Depends on: 631953
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.