LifoAlloc Valgrind annotations are broken

RESOLVED FIXED in mozilla21

Status

()

Core
JavaScript Engine
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

Trunk
mozilla21
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

5 years ago
Bug 838150 added Valgrind annotations to LifoAlloc.  I now get two warnings when running the jit-tests with --valgrind, this one and a similar one:

> Invalid write of size 4
>    at 0x4C2DD2F: memset (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>    by 0x76AD7A: js::detail::BumpChunk::delete_(js::detail::BumpChunk*) (LifoAlloc.cpp:37)
>    by 0x76AE57: js::LifoAlloc::freeAll() (LifoAlloc.cpp:59)
>    by 0x478CE4: js::LifoAlloc::~LifoAlloc() (LifoAlloc.h:221)
>    by 0x4310B4: js::LifoAlloc::~LifoAlloc() (LifoAlloc.h:221)
>    by 0x430C8D: JSRuntime::~JSRuntime() (jsapi.cpp:1024)
>    by 0x44A6C2: _ZL9js_deleteI9JSRuntimeEvPT_ (Utility.h:503)
>    by 0x44A690: JS_DestroyRuntime(JSRuntime*) (jsapi.cpp:1137)
>    by 0x4052DB: main (js.cpp:5411)
>  Address 0x60e1670 is 32 bytes inside a block of size 4,096 alloc'd
>    at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>    by 0x407F30: js_malloc(unsigned long) (Utility.h:148)
>    by 0x76AC49: js::detail::BumpChunk::new_(unsigned long) (LifoAlloc.cpp:19)
>    by 0x76AFA7: js::LifoAlloc::getOrCreateChunk(unsigned long) (LifoAlloc.cpp:94)
>    by 0x494250: js::LifoAlloc::alloc(unsigned long) (LifoAlloc.h:236)
>    by 0x7301B4: js::frontend::ParseNodeAllocator::allocNode() (ParseNode.cpp:248)
>    by 0x732A29: js::frontend::Parser::allocParseNode(unsigned long) (Parser.h:320)
>    by 0x730282: js::frontend::ParseNode* js::frontend::Parser::new_<js::frontend::ParseNode, js::frontend::ParseNodeKind, JSOp, js::frontend::ParseNodeArity, js::
frontend::TokenPos>(js::frontend::ParseNodeKind, JSOp, js::frontend::ParseNodeArity, js::frontend::TokenPos) (Parser.h:334)
>    by 0x730248: js::frontend::ParseNode::create(js::frontend::ParseNodeKind, js::frontend::ParseNodeArity, js::frontend::Parser*) (ParseNode.cpp:260)
>    by 0x6F65A1: js::frontend::ListNode::create(js::frontend::ParseNodeKind, js::frontend::Parser*) (ParseNode.h:925)
>    by 0x740868: js::frontend::Parser::variables(js::frontend::ParseNodeKind, js::StaticBlockObject*, js::frontend::VarContext) (Parser.cpp:4025)
>    by 0x73C9C3: js::frontend::Parser::statement() (Parser.cpp:3933)

The problem is that setBump() can mark part of the BumpChunk's data area as NOACCESS, but then BumpChunk::delete_ does memset(0xcd) over the entire data area before freeing it.

It's easily fixed (patch coming shortly).  What I want to know is why am I the first one to notice this?  I'm only running the jit-tests with --valgrind, which is a very weak stress test.
(Assignee)

Comment 1

5 years ago
Created attachment 711117 [details] [diff] [review]
Fix Valgrind and ASAN annotations in LifoAlloc.
Attachment #711117 - Flags: review?(choller)
(Assignee)

Updated

5 years ago
Assignee: general → n.nethercote

Comment 2

5 years ago
Comment on attachment 711117 [details] [diff] [review]
Fix Valgrind and ASAN annotations in LifoAlloc.

I think "size_t size = ..." should be inside the "ifdef DEBUG" to avoid warnings about it being unused.

Comment 3

5 years ago
> What I want to know is why am I the first one to notice this?

Most ASan testing happens in non-debug builds.
Created attachment 711270 [details] [diff] [review]
Patch v2

Stealing this so we can land the patch right now. Patch just moves the size_t into the #ifdef DEBUG, otherwise as njn fixed it :) Thanks!
Attachment #711117 - Attachment is obsolete: true
Attachment #711117 - Flags: review?(choller)
Attachment #711270 - Flags: review?(bhackett1024)
Comment on attachment 711270 [details] [diff] [review]
Patch v2

Review of attachment 711270 [details] [diff] [review]:
-----------------------------------------------------------------

This seems like it would be cleaner if the memset just happened when !defined(MOZ_ASAN) && !defined(MOZ_VALGRIND), does the memset really do anything when these tools already know you're not supposed to use memory after free'ing it?
Attachment #711270 - Flags: review?(bhackett1024) → review+
(Assignee)

Comment 6

5 years ago
> This seems like it would be cleaner if the memset just happened when
> !defined(MOZ_ASAN) && !defined(MOZ_VALGRIND), does the memset really do
> anything when these tools already know you're not supposed to use memory
> after free'ing it?

I thought about that too.  To the tools (Valgrind at least, I don't know much about ASAN) the memset is effectively a no-op, because the js_free marks the whole thing NOACCESS immediately afterwards.  However, it might still be useful to users to have the telltale 0xcd marks.  So I figure it doesn't hurt.  

(Relatedly:  Valgrind has a --free-fill option that lets you choose a value to overwrite freed blocks with, so clearly some people think writing canary values like this while using Valgrind is useful.  Valgrind has a similar --malloc-fill option that's used when blocks are first allocated.)
(Assignee)

Comment 7

5 years ago
decoder, are you going to land this now, or should I?
(Assignee)

Comment 8

5 years ago
> (Relatedly:  Valgrind has a --free-fill option that lets you choose a value
> to overwrite freed blocks with, so clearly some people think writing canary
> values like this while using Valgrind is useful.  Valgrind has a similar
> --malloc-fill option that's used when blocks are first allocated.)

Gary, decoder:  maybe we should use these options in all of our Valgrind runs?
(In reply to Nicholas Nethercote [:njn] from comment #7)
> decoder, are you going to land this now, or should I?

Sorry, got distracted. I'm going to land this now. I also discussed with bhackett on IRC why the memset is still important. At least for Valgrind builds, it's not guaranteed that we run under Valgrind and people might be using --enable-valgrind as the default for their debug builds. In this case, we would miss use-after-free bugs if we don't do the memset and run without Valgrind, but with valgrind support enabled.
(In reply to Nicholas Nethercote [:njn] from comment #8)
> > (Relatedly:  Valgrind has a --free-fill option that lets you choose a value
> > to overwrite freed blocks with, so clearly some people think writing canary
> > values like this while using Valgrind is useful.  Valgrind has a similar
> > --malloc-fill option that's used when blocks are first allocated.)
> 
> Gary, decoder:  maybe we should use these options in all of our Valgrind
> runs?

Our current options are at: http://hg.mozilla.org/build/tools/file/default/scripts/valgrind/valgrind.sh#l70

What are your suggested additions?
(Assignee)

Comment 12

5 years ago
> Our current options are at:
> http://hg.mozilla.org/build/tools/file/default/scripts/valgrind/valgrind.
> sh#l70
> 
> What are your suggested additions?

--free-fill=0xcd seems sensible, and mirrors what we already do at least some of the time.

And --malloc-fill=0xbd, perhaps?
> --free-fill=0xcd seems sensible, and mirrors what we already do at least
> some of the time.
> 
> And --malloc-fill=0xbd, perhaps?

I've filed bug 839252.
https://hg.mozilla.org/mozilla-central/rev/c3bf9571e80d
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21

Comment 15

5 years ago
(In reply to Jesse Ruderman from comment #3)
> > What I want to know is why am I the first one to notice this?
> 
> Most ASan testing happens in non-debug builds.

Strange. I have been testing thunderbird (comm-central) using valgrind
on my local PC for a few months now, and this problem appeared first time yesterday and got me excited. (After a refresh, that is.)
I am running TB's "make mozmill" test, and this particular problem did not occur
there until yesterday.

So something must have changed in the mozilla code
to cause the problem to manifest itself.

Oh, another thing about 20 (or about 15) hours ago, the modified source file is visible
on mxr.mozilla.org, but I could not check it out from comm-central using "python client.py checkout".

So I manually copied the patch into my local file, and reran the test.
Unfortunately, I got the following message now: Any idea why this happens?
Before the modification, I got exactly the same symptom in comment 1, "Invalid write of size 4"

[Well, actually, after seeing the new invalid write of size 4 myself, 
I tried to locatel LifoAlloc.cpp using mxr.mozilla.org, and then
noticed the annotation,  and also noticed the difference between my local copy.,
I ran python client.py checkout, but it was not updated and so I manually
modified my local copy, and reran "make mozmill" with TB running under memcheck .]
 

That patch fixed the problem, but I thought the sudden appearance of this
issue in memcheck run of TB's "make mozmill" may be worth mentiong. I wonder why it did not show up before. Something did
change the internal memory handling slightly. The symptom was not observed until yesterday since I began testing TB  using valgrind in last November.

TIA
(In reply to ISHIKAWA, chiaki from comment #15)

> So something must have changed in the mozilla code
> to cause the problem to manifest itself.


Yes, this is a regression from bug 838150, the annotations added there are not right :)

Comment 17

5 years ago
(In reply to Christian Holler (:decoder) from comment #16)

> Yes, this is a regression from bug 838150, the annotations added there are
> not right :)

Aha. Mystery solved :-) 
Thank you.
You need to log in before you can comment on or make changes to this bug.