> > I am a bit surprised that this bug persisted for a long time (not sure how long, though.) > > My theory as to why this bug hasn't caused more chaos is that the three char variables on the stack stomp over each other, but in order. To be honest, the compiler can do anything to store the character variables as it desires, just like CLANG compiler does for ASAN compilation. As Ben noted the direction of stack growth is also important. So it is anyone's guess which version of TB was affected in the past (depending on compiler version, optimization flag, and used CPU, of course). [I am afraid that the particular version of valgrind on my local PC is a bit buggy. It could not detect the problem somehow. I am looking into it.] (In reply to Ben Campbell from comment #3) >So, by the time they've all been scanned, they all contain the right values. And the following three bytes on the stack will also have been blatted. > But if the _next_ variable on the stack isn't a char, it'll probably be shifted to the next 8-byte boundary (or 4-byte boundary? I forget the rules for padding on 32bit vs 64bit architectures...). > So those three extra destroyed bytes will just be harmless padding. There is no guarantee that the three variables are stored on the stack in the mentioned order. (It could be reversed.) > Would also have different (likely really bad and incorrect) results on big-endian architecture. Right. There *IS* a chance either the frame pointer or return PC could be affected. > > Doesn't make the bug any better, but explains why we might not have crashed every time we sent an email :-) I hope not. :-) If it did, we would have noticed this bug earlier earlier. The problem would have been a silent weird behavior, etc. Anyway, I am glad I could address this issue.
Bug 1677338 Comment 6 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
(In reply to Ben Campbell from comment #3) > > I am a bit surprised that this bug persisted for a long time (not sure how long, though.) > > My theory as to why this bug hasn't caused more chaos is that the three char variables on the stack stomp over each other, but in order. To be honest, the compiler can do anything to store the character variables as it desires, just like CLANG compiler does for ASAN compilation. As Ben noted the direction of stack growth is also important. So it is anyone's guess which version of TB was affected in the past (depending on compiler version, optimization flag, and used CPU, of course). [I am afraid that the particular version of valgrind on my local PC is a bit buggy. It could not detect the problem somehow. I am looking into it.] >So, by the time they've all been scanned, they all contain the right values. And the following three bytes on the stack will also have been blatted. > But if the _next_ variable on the stack isn't a char, it'll probably be shifted to the next 8-byte boundary (or 4-byte boundary? I forget the rules for padding on 32bit vs 64bit architectures...). > So those three extra destroyed bytes will just be harmless padding. There is no guarantee that the three variables are stored on the stack in the mentioned order. (It could be reversed. Anything goes.) So if this is how the char variables are stored back to back, we have a problem. low <--- address ---> high |codeDetail| codeSubject|codeClass|???|???... codeClass is set by PR_sscanf(). But it is trashed when codeSubject is set. Both codeSubject and codeClass are trashed when codeDetail is set. Please note that |???|???...| can contain either frame pointer or stored return PC or something related to function call mechanism on a CPU where stack grows downward. (Do we have canary for stack protection depending on compiler used?). This part can be trashed when codeClass is set. So there *IS* a chance either the frame pointer or return PC could be affected. Of course, this is again CPU-dependent (whether byte-addressable or not also counts.) > Would also have different (likely really bad and incorrect) results on big-endian architecture. Right. > > Doesn't make the bug any better, but explains why we might not have crashed every time we sent an email :-) I hope not. :-) If it did, we would have noticed this bug earlier earlier. The problem would have been a silent weird behavior, etc. which noone could explain. Anyway, I am glad I could address this issue thanks to ASAN.
(In reply to Ben Campbell from comment #3) > > I am a bit surprised that this bug persisted for a long time (not sure how long, though.) > > My theory as to why this bug hasn't caused more chaos is that the three char variables on the stack stomp over each other, but in order. To be honest, the compiler can do anything to store the character variables as it desires, just like CLANG compiler does for ASAN compilation. As Ben noted the direction of stack growth is also important. So it is anyone's guess which version of TB was affected in the past (depending on compiler version, optimization flag, and used CPU, of course). [I am afraid that the particular version of valgrind on my local PC is a bit buggy. It could not detect the problem somehow. I am looking into it.] >So, by the time they've all been scanned, they all contain the right values. And the following three bytes on the stack will also have been blatted. > But if the _next_ variable on the stack isn't a char, it'll probably be shifted to the next 8-byte boundary (or 4-byte boundary? I forget the rules for padding on 32bit vs 64bit architectures...). > So those three extra destroyed bytes will just be harmless padding. There is no guarantee that the three variables are stored on the stack in the mentioned order. (It could be reversed. Anything goes.) So if this is how the char variables are stored back to back, we have a problem. low <--- address ---> high |codeDetail| codeSubject|codeClass|???|???... codeClass is set by PR_sscanf(). But it is trashed when codeSubject is set. Both codeSubject and codeClass are trashed when codeDetail is set. (Oh, this is a scenario on a little-endian CPU.) Please note that |???|???...| can contain either frame pointer or stored return PC or something related to function call mechanism on a CPU where stack grows downward. (Do we have canary for stack protection depending on compiler used?). This part can be trashed when codeClass is set. So there *IS* a chance either the frame pointer or return PC could be affected. Of course, this is again CPU-dependent (whether byte-addressable or not also counts.) > Would also have different (likely really bad and incorrect) results on big-endian architecture. Right. > > Doesn't make the bug any better, but explains why we might not have crashed every time we sent an email :-) I hope not. :-) If it did, we would have noticed this bug earlier earlier. The problem would have been a silent weird behavior, etc. which noone could explain. Anyway, I am glad I could address this issue thanks to ASAN.
(In reply to Ben Campbell from comment #3) > > I am a bit surprised that this bug persisted for a long time (not sure how long, though.) > > My theory as to why this bug hasn't caused more chaos is that the three char variables on the stack stomp over each other, but in order. To be honest, the compiler can do anything to store the character variables as it desires, just like CLANG compiler does for ASAN compilation. As Ben noted the direction of stack growth is also important. So it is anyone's guess which version of TB was affected in the past (depending on compiler version, optimization flag, and used CPU, of course). [I am afraid that the particular version of valgrind on my local PC is a bit buggy. It could not detect the problem somehow. I am looking into it.] >So, by the time they've all been scanned, they all contain the right values. And the following three bytes on the stack will also have been blatted. > But if the _next_ variable on the stack isn't a char, it'll probably be shifted to the next 8-byte boundary (or 4-byte boundary? I forget the rules for padding on 32bit vs 64bit architectures...). > So those three extra destroyed bytes will just be harmless padding. There is no guarantee that the three variables are stored on the stack in the mentioned order. (It could be reversed. Anything goes.) So if this is how the char variables are stored back to back, we have a problem. low <--- address ---> high |codeDetail| codeSubject|codeClass|???|???... codeClass is set by PR_sscanf(). But it is trashed when codeSubject is set. Both codeSubject and codeClass are trashed when codeDetail is set. (Oh, this is a scenario on a little-endian CPU.) Please note that |???|???...| can contain either frame pointer or stored return PC or something related to function call mechanism on a CPU where stack grows downward. (Do we have canary for stack protection depending on compiler used?). This part can be trashed when codeClass is set. So there *IS* a chance either the frame pointer or return PC could be affected. Of course, this is again CPU-dependent (whether byte-addressable or not also counts.) > Would also have different (likely really bad and incorrect) results on big-endian architecture. Right. > > Doesn't make the bug any better, but explains why we might not have crashed every time we sent an email :-) I hope not. :-) If it did, we would have noticed this bug earlier earlier. The problem could have been a silent weird behavior, etc. which no one could explain. Anyway, I am glad I could address this issue thanks to ASAN.