Closed Bug 934542 Opened 9 years ago Closed 9 years ago

Investigate why "possibly lost" Valgrind errors aren't being ignored even though --show-possibly-lost=no is present

Categories

(Testing :: Mozbase, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: froydnj, Assigned: n.nethercote)

References

Details

Attachments

(2 files, 1 obsolete file)

There are a lot of these, as one might expect; they usually look like:

==19788== 96 bytes in 3 blocks are possibly lost in loss record 13,568 of 20,628
==19788==    at 0x4C28A49: malloc (vg_replace_malloc.c:270)
==19788==    by 0x9C6430B: js::PropertyTree::insertChild(js::ExclusiveContext*, js::Shape*, js::Shape*) (Utility.h:142)
==19788==    by 0x9C65450: js::PropertyTree::getChild(js::ExclusiveContext*, js::Shape*, unsigned int, js::StackShape const&) (jspropertytree.cpp:194)
==19788==    by 0x9C52347: JSObject::sealOrFreeze(JSContext*, JS::Handle<JSObject*>, JSObject::ImmutabilityType) (jsobj.cpp:1137)
==19788==    by 0x9BC7D57: JS_FreezeObject(JSContext*, JSObject*) (jsobj.h:578)
==19788==    by 0x9E7F33B: js::ctypes::InitTypeConstructor(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JSFunctionSpec, JSFunctionSpec const*, JSPropertySpec const*, JSFunctionSpec const*, JSPropertySpec const*, JS::MutableHandle<JSObject*>, JS::MutableHandle<JSObject*>) (CTypes.cpp:1045)
==19788==    by 0x9E8F2B6: JS_InitCTypesClass(JSContext*, JSObject*) (CTypes.cpp:1174)
==19788==    by 0x8999554: mozilla::dom::workers::DefineChromeWorkerFunctions(JSContext*, JS::Handle<JSObject*>) (ChromeWorkerScope.cpp:57)
==19788==    by 0x89C14C7: mozilla::dom::workers::CreateGlobalScope(JSContext*) (WorkerScope.cpp:1484)
==19788==    by 0x89B0D60: (anonymous namespace)::CompileScriptRunnable::WorkerRun(JSContext*, mozilla::dom::workers::WorkerPrivate*) (WorkerPrivate.cpp:808)
==19788==    by 0x89B3A0B: mozilla::dom::workers::WorkerRunnable::Run() (WorkerPrivate.cpp:1769)
==19788==    by 0x89BAC4C: mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) (WorkerPrivate.cpp:3573)

or

==19788== 96 bytes in 3 blocks are possibly lost in loss record 13,572 of 20,628
==19788==    at 0x4C28A49: malloc (vg_replace_malloc.c:270)
==19788==    by 0x9C6430B: js::PropertyTree::insertChild(js::ExclusiveContext*, js::Shape*, js::Shape*) (Utility.h:142)
==19788==    by 0x9C65450: js::PropertyTree::getChild(js::ExclusiveContext*, js::Shape*, unsigned int, js::StackShape const&) (jspropertytree.cpp:194)
==19788==    by 0x9D0CAA5: js::Shape::replaceLastProperty(js::ExclusiveContext*, js::StackBaseShape const&, js::TaggedProto, JS::Handle<js::Shape*>) (Shape.cpp:344)
==19788==    by 0x9D0CB8D: js::Shape::setObjectFlag(js::ExclusiveContext*, js::BaseShape::Flag, js::TaggedProto, js::Shape*) (Shape.cpp:1372)
==19788==    by 0x9D0CDBE: js::ObjectImpl::setFlag(js::ExclusiveContext*, unsigned int, js::ObjectImpl::GenerateShape) (Shape.cpp:1336)
==19788==    by 0x9C29DB2: JSObject::setNewTypeUnknown(JSContext*, js::Class const*, JS::Handle<JSObject*>) (jsinfer.cpp:3689)
==19788==    by 0x9BD656E: js_InitArrayClass(JSContext*, JS::Handle<JSObject*>) (jsarray.cpp:3077)
==19788==    by 0x9C44316: js_GetClassObject(js::ExclusiveContext*, JSObject*, JSProtoKey, JS::MutableHandle<JSObject*>) (jsobj.cpp:3075)
==19788==    by 0x9CA93CE: js::StartOffThreadParseScript(JSContext*, JS::CompileOptions const&, char16_t const*, unsigned long, JS::Handle<JSObject*>, void (*)(void*, void*), void*) (jsworkers.cpp:242)
==19788==    by 0x9BCA2EC: JS::CompileOffThread(JSContext*, JS::Handle<JSObject*>, JS::CompileOptions, char16_t const*, unsigned long, void (*)(void*, void*), void*) (jsapi.cpp:4425)
==19788==    by 0x8B2C3B9: nsXULPrototypeScript::Compile(char16_t const*, int, nsIURI*, unsigned int, nsIDocument*, nsXULPrototypeDocument*, nsIOffThreadScriptReceiver*) (nsXULElement.cpp:2645)

Should this memory get released at some point, or should we just ignore these sorts of stacks in Valgrind?
"Possibly lost" errors from Valgrind found in Gecko products should be ignored.

We should ignore them, as per:

http://hg.mozilla.org/build/tools/file/default/scripts/valgrind/valgrind.sh#l70

However, I don't know why they are still showing. Perhaps njn has ideas... ?
Yeah, should be ignored.  I'll investigate as to why they aren't getting ignored.
Assignee: nobody → n.nethercote
I think something is wrong with how Valgrind is invoked by the script -- judging from the output in
https://tbpl.mozilla.org/php/getParsedLog.php?id=30044729&tree=Mozilla-Central&full=1, --show-possibly-lost=no and --gen-suppressions=all and --num-callers=50 are all being ignored.

Line 83 of http://hg.mozilla.org/build/tools/file/default/scripts/valgrind/valgrind.sh is the invocation:

    make pgo-profile-run EXTRA_TEST_ARGS="--debugger=valgrind --debugger-args='$debugger_args'" || exit 1

The actual invocation, as reported by the shell, is this:

  + make pgo-profile-run 'EXTRA_TEST_ARGS=--debugger=valgrind --debugger-args='\''--error-exitcode=1 --smc-check=all-non-file --gen-suppressions=all --leak-check=full --num-callers=50 --show-possibly-lost=no --track-origins=yes --suppressions=/builds/slave/m-cen-l64-valgrind-00000000000/objdir/_valgrind/cross-architecture.sup --suppressions=/builds/slave/m-cen-l64-valgrind-00000000000/objdir/_valgrind/x86_64-redhat-linux-gnu.sup'\'''

Shell quoting can be confusing, and I wonder if something is wrong here.  Or does the definition of EXTRA_TEST_ARGS need to precede |make pgo-profile-run|?  Or should there not be a '=' after |--debugger-args|?

And yet, when I try running a similar command locally:

  make -C cd64 mochitest EXTRA_TEST_ARGS="--debugger=valgrind --debugger-args='$debugger_args'"

(with $debugger_args set appropriately) I see this invocation:

  + make -C cd64 mochitest 'EXTRA_TEST_ARGS=--debugger=valgrind --debugger-args='\''--show-possibly-lost=no --num-callers=3 --gen-suppressions=all'\'''

which matches the syntax of the try run, and it works fine, i.e. Valgrind's output indicate that it sees the options correctly.

So I don't know what's going on.
Ted, I'm not sure if you wrote the original Valgrind bootstrap script here, but do you possibly have any ideas/tips?
Flags: needinfo?(ted)
Summary: Valgrind on tbpl detects leak - stacks containing js::PropertyTree::insertChild → Investigate why "possibly lost" Valgrind errors aren't being ignored even though --show-possibly-lost=no is present
Ok, I've worked this out.

--debugger-args works with |make mochitest|.  That target runs testing/mochitest/runtests.py, which uses getDebuggerInfo from automationutils.py.

--debugger-args doesn't work with |make pgo-profile-run|;  I've confirmed this locally.  That target runs build/pgo/profileserver.py.  Prior to changeset 134096:8b379faea096 (bug 746244), it used getDebuggerInfo from automationutils.py, which respects --debugger-args.  Now it uses cli.debugger_arguments(), which ignores --debugger-args and instead ends up using these default arguments for Valgrind from testing/mozbase/mozrunner/mozrunner/local.py:

  debuggers = {'gdb': {'interactive': True,
                       'args': ['-q', '--args'],},
               'valgrind': {'interactive': False,
                            'args': ['--leak-check=full']}
               }

So I think this has been busted since June 5th or so, when bug 746244 landed.
Assignee: n.nethercote → ted
QA Contact: n.nethercote
There are two problems relating to debugger_args, which holds the result of
any --debugger-args options.

- First, it's not passed to debugger_arguments().

- Second, there's some type confusion.  As written, debugger_arguments is a
  list of arguments which gets appended to with multiple --debugger-arguments
  options, e.g. like this:

> EXTRA_TEST_ARGS="--debugger=valgrind --debugger-args=--num-callers=10 --debugger-args=--gen-suppressions=all"

  But this is different to runtests.py, which can only handle a single
  --debugger-arguments option, like this:

> EXTRA_TEST_ARGS="--debugger=valgrind --debugger-args='--num-callers=10 --gen-suppressions=all'"

  and stores them as a single string that uses space as a separator.

  I don't know which is preferred, but runtests.py uses the latter form, as doe
  valgrind.sh (in the build/tools/ repo), I've changed local.py to work like
  that.

With this patch in place, Valgrind correctly receives its arguments when I run
it locally.
Attachment #828597 - Flags: review?(ted)
Assignee: ted → n.nethercote
Status: NEW → ASSIGNED
Oh, the previous patch didn't work when --debugger-args was omitted.  This one
does.
Attachment #828599 - Flags: review?(ted)
Attachment #828597 - Attachment is obsolete: true
Attachment #828597 - Flags: review?(ted)
Blocks: 746244
QA Contact: n.nethercote
Comment on attachment 828599 [details] [diff] [review]
Make --debugger-args work in EXTRA_TEST_ARGS with |make pgo-profile-run|.

Review of attachment 828599 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, that's pretty obvious after looking at your patch. Sorry that this was broken in mozbase and thus in profileserver.py when I ported it over.

This needs to land upstream in github, I can do that for you:
https://github.com/mozilla/mozbase/

You can then cherry-pick it to inbound.
Attachment #828599 - Flags: review?(ted) → review+
Component: JavaScript Engine → Mozbase
Flags: needinfo?(ted)
Product: Core → Testing
QA Contact: hskupin
Pushed to github:
https://github.com/mozilla/mozbase/commit/ac5ad6efe4a8adf35c0a05c7219720bf25f2e43e

Feel free to land in inbound to get this fixed faster than the next mozbase sync from github->hg.
pushed all mozrunner changes to try: https://tbpl.mozilla.org/?tree=Try&rev=8008fa03d45f

If this goes well, I can+will bump mozrunner -> 5.27 + mirror -> m-c
jhammel, am I right in thinking that I don't need to do anything else here, I can just wait for you?  If so, thanks!

> https://tbpl.mozilla.org/?tree=Try&rev=8008fa03d45f

That push contains a couple of changes to remote.py and runner.py that weren't in my patch.  Is that intentional?
(In reply to Nicholas Nethercote [:njn] from comment #11)
> jhammel, am I right in thinking that I don't need to do anything else here,
> I can just wait for you?  If so, thanks!
 
Assuming the try run goes well, yes.

> > https://tbpl.mozilla.org/?tree=Try&rev=8008fa03d45f
> 
> That push contains a couple of changes to remote.py and runner.py that
> weren't in my patch.  Is that intentional?

Yes.  They are all outstanding changes to mozrunner in github.
bump mozrunner version in m-c
Attachment #830911 - Flags: review?(ted)
Ok, so the next Valgrind run (3am on Nov 13) will be the first one to see the changes, and will hopefully work.
(In reply to Nicholas Nethercote [:njn] from comment #15)
> Ok, so the next Valgrind run (3am on Nov 13) will be the first one to see
> the changes, and will hopefully work.

We could also politely ask the buildduty releng folk to do a manual trigger. Just mentioning an option.
> We could also politely ask the buildduty releng folk to do a manual trigger.
> Just mentioning an option.

I'm about to finish up for the day, so I didn't bother, but if someone else wants to, please do! :)
> Ok, so the next Valgrind run (3am on Nov 13) will be the first one to see
> the changes, and will hopefully work.

Oh wait, jhammel's patch hasn't landed yet.
(In reply to Nicholas Nethercote [:njn] from comment #18)
> > Ok, so the next Valgrind run (3am on Nov 13) will be the first one to see
> > the changes, and will hopefully work.
> 
> Oh wait, jhammel's patch hasn't landed yet.

no review yet.  Please feel free to review
Attachment #830911 - Flags: review?(n.nethercote)
Comment on attachment 830911 [details] [diff] [review]
ae38a09787dfb2357afb2851dcb1e0840572b9e7.diff

Review of attachment 830911 [details] [diff] [review]:
-----------------------------------------------------------------

Er, ok...

- The local.py changes are identical to the ones from my patch, so that's fine.

- The remote.py change is a trivial change from double-quotes to single-quotes, so that's fine.

- The runner.py change is just removal of what I assume is a dead function.  I don't know if it really is dead, but if try says it's ok it must be.

- The setup.py change (PACKAGE_VERSION) I also don't know about but it seems reasonable.

r=me, I guess.
Attachment #830911 - Flags: review?(n.nethercote) → review+
Attachment #830911 - Flags: review?(ted)
I landed this because I want to make progress on the Valgrind-on-TBPL work.  Hopefully I did it correctly...

https://hg.mozilla.org/mozilla-central/rev/d58ab6f6ca0a
The first run post-landing is here:
https://tbpl.mozilla.org/php/getParsedLog.php?id=30692824&tree=Mozilla-Central

The good news is that the right flags are now being passed.  Yay.

The bad news is that we still get "Output exceeded 52428800 bytes, remaining output has been truncated".  Although we are getting many fewer errors due to ignoring "possibly lost" leaks, we are printing more info for each error because we're (a) generating suppressions, and (b) printing stacks up to 50 entries deep.

The good news is that it looks like there are a small number of distinct leaks that account for most of the errors, and they show up as many separate errors due to minor variations in the stack traces.

I will investigate further and file follow-up bugs.
> The bad news is that we still get "Output exceeded 52428800 bytes, remaining
> output has been truncated".  Although we are getting many fewer errors due
> to ignoring "possibly lost" leaks, we are printing more info for each error
> because we're (a) generating suppressions, and (b) printing stacks up to 50
> entries deep.

And we're using --track-origins=yes, printing both more info and causing it to be slower. However, we get the benefit of not having to re-run any instance to get these suppression stacks/additional information.

We print 50 entries deep because of bug 714690 comment 5, likewise for the same reason. We can definitely tweak this number, though..
Oh, I'm not complaining about the current options.  If we fix 5 or 6 leaks the output will be reduced to almost nothing.
And which allows us to spot the hard-to-reproduce occasional error. :)
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: valgrind-on-tbpl
No longer blocks: valgrind-tbpl-bugs
You need to log in before you can comment on or make changes to this bug.