Open Bug 1329202 Opened 7 years ago Updated 2 years ago

consider adding a pointer validation assertion

Categories

(Core :: MFBT, defect)

defect

Tracking

()

People

(Reporter: bkelly, Unassigned)

References

(Depends on 1 open bug)

Details

Given that we tend to mark free'd memory with a known pattern ("e5e5" I think), can we add a stricter pointer validation macro?  Something like:

#define BYTE_VALUE_AT(a, n) (a & (0xff << (n*2)))

#define MOZ_ASSERT_PTR(a) \
  (MOZ_ASSERT(a && \
              BYTE_VALUE_AT(a, 3) != 0xef && \
              BYTE_VALUE_AT(a, 7) != 0xef))

Is it fair to make these kind of restrictions on pointers?  Or we could look for 2 consecutive bytes like 0xefef to restrict it further.

If we did this it would be nice to have MOZ_DIAGNOSTIC_ASSERT_PTR() and MOZ_RELEASE_ASSERT_PTR() as well.
I have at least one use for such a macro in mind.  We have cases where we crash on something like 0xe5e5e5e5 typically way after the value was freed.  With this kind of macro we can plant in additional checks to try to move the crash closer to the cause.  We should probably ensure somehow that the addresses we check for can never be allocated, if we don't already.
(In reply to :Ehsan Akhgari from comment #1)
> We should probably ensure somehow that the
> addresses we check for can never be allocated, if we don't already.

We don't, which is bad. (bug 1108693)
Depends on: 1108693
(In reply to Ben Kelly [:bkelly] from comment #0)
> #define BYTE_VALUE_AT(a, n) (a & (0xff << (n*2)))
> 
> #define MOZ_ASSERT_PTR(a) \
>   (MOZ_ASSERT(a && \
>               BYTE_VALUE_AT(a, 3) != 0xef && \
>               BYTE_VALUE_AT(a, 7) != 0xef))

I think that this check logic should be pulled into a function so that the expression `a` is evaluated at most once, because otherwise that could lead to weird unexpected behavior.
This sort of thing is something that standard libraries tend to frown upon.

First, what constitutes invalid bytes in a pointer depends on the pointer type.  String data, for example, can easily contain any pattern, depending where the string came from and its encoding.  It seems much better for users to implement these things themselves, without relying on some central definition.

Second, I think there's solid reason to worry that an assertion like this, unless designed for the specific case at issue (written by the authors of the relevant code, contained within that relevant code), will give a false sense of security.  Even if such a thing is limited to an assertion, and isn't provided as a general primitive for use in all builds.  Wrongly assuming "we got past this assertion, therefore we can't have a bad pointer" is going to be a pretty bad wild goose chase.

Third, having an effective central definition requires knowing *all* possible poisoning values that could be used.  There are a lot of them.  Working in the JS engine, for example, we've stumbled across many, many such values (and I doubt even this lists all of them):

440:[2014-05-02 16:29:32] <jimb> firebot: what is 0xa5a5a5a5
441-[2014-05-02 16:29:37] <firebot> jimb: hmm... I think 0xa5a5a5a5 is jemalloc allocated uninitialized junk memory (cf. 0x5a5a5a5a)

15:[2014-09-11 12:33:02] <sfink> firebot: what is 0xaaaaaaaa
16-[2014-09-11 12:33:03] <firebot> sfink: Maybe 0xAAAAAAAA is the MallocPreScribble pattern (fills uninitialized memory)

139:[2015-04-08 14:41:55] <jimb> firebot: what is 0x4f4f4f4f
140-[2015-04-08 14:41:56] <firebot> jimb: everyone knows that! 0x4f4f4f4f is The poison value written to tenured heap Chunks when they are initially allocated.

225:[2015-04-22 16:50:54] <Waldo> firebot: what is 0xdadadada
226-[2015-04-22 16:50:57] <firebot> Waldo: everyone knows that! 0xdadadada is what 0xbabybaby calls his pop or the MSVC debug-alloc boundary-marker or PL_FREE_PATTERN (deallocated PLArena) or JS_FREE_PATTERN (deallocated JSArena - GC()d)

346:[2015-06-04 16:06:08] <bbouvier> firebot: what is 0x5a5a5a5a
347-[2015-06-04 16:06:08] <firebot> bbouvier: 0x5a5a5a5a is jemalloc freed junk memory (cf. 0xa5a5a5a5)

263:[2015-07-27 14:38:21] <shu> firebot: what is 0xcdcdcdcd
264-[2015-07-27 14:38:22] <firebot> shu: iirc, 0xcdcdcdcd is crt: clean land (new, uninitd objects) or LifoAlloc's free space poison pattern

462:[2015-11-17 14:50:41] <efaust> firebot: what is 0x49494949?
463-[2015-11-17 14:50:42] <firebot> efaust: 0x49494949 is a poisoned value added when memory is moved by a compacting GC

539:[2016-07-27 22:10:57] <efaust> firebot: what is 0x2f2f2f2f
540-[2016-07-27 22:10:58] <firebot> efaust: Oh, I know this one! 0x2f2f2f2f is The poison value written over the nursery when it is initially allocated.

165:[2016-08-19 16:24:21] <sfink> firebot: what is 0xe4e4e4e4e4e4e4e4?
166-[2016-08-19 16:24:22] <firebot> sfink: 0xe4e4e4e4e4e4e4e4 is jemalloc uninitialized memory

78:[2016-10-03 14:24:23] <sfink> firebot: what is 0x2b2b2b2b
79-[2016-10-03 14:24:24] <firebot> sfink: 0x2b2b2b2b is The poison value written over the nursery after it is swept. When debugging a GGC crash, 0x2b or not 0x2b: that is the question.

149:[2016-11-15 13:02:29] <jimb> firebot: what is 0xe5e5e5e5
150-[2016-11-15 13:02:30] <firebot> jimb: 0xe5e5e5e5 is jemalloc freed junk memory

106:[2016-11-22 13:10:23] <jimb> firebot: what is 0xbad0bad1
107-[2016-11-22 13:10:25] <firebot> jimb: Well, 0xbad0bad1 is a Relocated Cell

Perhaps perfect is the enemy of good and all that.  But we're not talking hitting one of two or three, it's one of maybe a dozen.

All these points suggest to me that this is WONTFIX, but I'd at least listen to additional argument on the point.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #4)
> This sort of thing is something that standard libraries tend to frown upon.

We're not writing a standard library, are we?

> First, what constitutes invalid bytes in a pointer depends on the pointer
> type.  String data, for example, can easily contain any pattern, depending
> where the string came from and its encoding.  It seems much better for users
> to implement these things themselves, without relying on some central
> definition.

Well, I'm not proposing we automatically apply this throughout the tree.  This is providing an assertion that authors can apply themselves where it makes sense.  It doesn't need to be perfectly general purpose like ASAN.

> Second, I think there's solid reason to worry that an assertion like this,
> unless designed for the specific case at issue (written by the authors of
> the relevant code, contained within that relevant code), will give a false
> sense of security.  Even if such a thing is limited to an assertion, and
> isn't provided as a general primitive for use in all builds.  Wrongly
> assuming "we got past this assertion, therefore we can't have a bad pointer"
> is going to be a pretty bad wild goose chase.

I don't see how this assertion would be any different then the innumerable MOZ_ASSERT(ptr) checks in the tree.  We just know that the ptr is at least not nullptr there.  This is just improving these types of checks for other cases we can say in advance are bad.

> Third, having an effective central definition requires knowing *all*
> possible poisoning values that could be used.  There are a lot of them. 
> Working in the JS engine, for example, we've stumbled across many, many such
> values (and I doubt even this lists all of them):

It seems to me there is an incremental improvement by validating a pointer is not a known bad pointer.  Even if we can't prove its not any of the bad pointers, at least we have excluded some known bad pointers.

While this assertion may not apply to every piece of code in the tree it would be a great benefit in the code I work on.
I threw together this poorly commented patch: https://hg.mozilla.org/try/rev/0c405119d712555c4cd3fc118dae056bcfb270e3 which tries to use template specialization to make MOZ_ASSERT automatically check for the garbage values of pointers in debug mode. It doesn't seem to build correctly on MSVC (I haven't looked into this too much), but in a try run I didn't notice any problems which it caught.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.