BeOS and OS/2 should use Unix Async DNS Implementation

VERIFIED FIXED in mozilla0.9.2

Status

()

Core
Networking
P3
normal
VERIFIED FIXED
17 years ago
3 years ago

People

(Reporter: gordon, Assigned: mkaply)

Tracking

Trunk
mozilla0.9.2
Other
Other
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: critical for 0.9.2)

Attachments

(3 attachments)

(Reporter)

Description

17 years ago
The DNS cache doesn't evict lookups on BEOS and OS/2.  We need to call 
AddToEvictionQ() after processing the request on new/pending lookups for 
platforms that don't use the dns thread for async lookups.
(Reporter)

Comment 1

17 years ago
Mike, do you test on both BEOS and OS/2, or just OS/2?  Could you test a patch 
once I get it done?
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.2

Comment 2

17 years ago
afaik mkaply is just a concerned citizen and doesn't have a BeOS box.  I 
know cls builds on BeOS.
Summary: DNS cache doesn't evict on BEOS and OS/2 → DNS cache doesn't evict on BeOS and OS/2

Comment 3

17 years ago
Someone else is going to need to pick up testing this.
Keywords: qawanted
(Reporter)

Comment 4

17 years ago
Created attachment 37548 [details] [diff] [review]
proposed patch for adding eviction of dns cache entries to OS/2 and BeOS.
(Reporter)

Comment 5

17 years ago
If mkaply and cls can review the patch, I'll get an sr= and an a= and check it 
in.

Another possibility is that OS/2 and BeOS may be able to simply use the UNIX 
implementation.  That would clean up the code a bit as well.  I'll submit an 
optimistic patch so someone can try that as well.
(Assignee)

Comment 6

17 years ago
How would I actually test the patch to see if it was working/failing?
(Reporter)

Comment 7

17 years ago
Created attachment 37562 [details] [diff] [review]
Optimistic patch for OS/2 and BeOS to use same async implementation as Unix.
(Reporter)

Comment 8

17 years ago
The second patch is using PR_GetIPNodeByName() for OS/2.  Wan-Teh, is that 
available on OS/2?

Comment 9

17 years ago
Yes, PR_GetIPNodeByName() is available on OS/2.
(Reporter)

Comment 10

17 years ago
Setting target milestone to 0.9.3, but I'll check it in as soon as I get test 
feedback and reviews.
Target Milestone: mozilla0.9.2 → mozilla0.9.3
(Assignee)

Comment 11

17 years ago
OK, without any patches applied, DNS works fine.

First patch wouldn't compile - PRCLIST_IS_EMPTY is not defined.

Second patch compiled, but I got this assert coming up:

###!!! ASSERTION: bad DNS shutdown state: 'mState == 
DNS_SHUTTING_DOWN', file D:
/builds/current/mozilla/netwerk/dns/src/nsDnsService.cpp, line 1219

And DNS did not work at all.
(Reporter)

Comment 12

17 years ago
Created attachment 37712 [details] [diff] [review]
Fix to minimal patch for PR_CLIST_IS_EMPTY typo.
(Reporter)

Comment 13

17 years ago
I don't understand the problem with the optimal patch.  I looks like the dns 
thread is failing to wait on the condition variable in DequeueLookup() or getting 
wrongly notified when there are no lookups in the queue.  It's not clear to me 
how this could be happening on OS/2, but not Unix.

Wan-Teh, are there any known differences with PR_WaitCondVar() on OS/2?

Comment 14

17 years ago
You MUST ALWAYS call PR_WaitCondVar in a while loop.

The following code is wrong.
 
nsDNSLookup *
nsDNSService::DequeuePendingQ()
{
#if defined(XP_UNIX)
    // Wait for notification of a queued request, unless  we're shutting down.
    if (PR_CLIST_IS_EMPTY(&mPendingQ) && (mState != DNS_SHUTTING_DOWN)) {
        PRStatus  status = PR_WaitCondVar(mDNSCondVar, PR_INTERVAL_NO_TIMEOUT);
        NS_ASSERTION(status == PR_SUCCESS, "PR_WaitCondVar failed.");
    }
#endif

You should say
    while (PR_CLIST_IS_EMPTY(&mPendingQ) && (mState != DNS_SHUTTING_DOWN)) {
(Reporter)

Comment 15

17 years ago
Ah, great.  I'll fix it on the trunk as soon as I can push through the review 
process.  I wonder why we haven't seen this as a problem on Unix?

Comment 16

17 years ago
in the original "async dns" XP_UNIX code that i wrote for bug 10733, we had
something similar to:

DNS thread:
  mon.enter();
  if (!pending_lookup && !shutting_down)
      mon.wait();
  if (pending_lookup)
      do_lookup();
  else
      do_shutdown();
  mon.exit();

client thread:
  mon.enter();
  pending_lookup = true;
  mon.notify();
  mon.exit();

in other words, it seemed ok for there not to be a pending_lookup when the
monitor was notified... it meant to shutdown (note: the shutting_down flag was
just used to handle the shutdown case in which the DNS thread had not yet run).
the new code seems to be basically identical.

on OS/2 it seems to be the case that a pending_lookup is added before the
monitor (convar in the new world) is notified, but after the DNS thread resumes
pending_lookup is for some reason not set.

wtc: why is the while loop needed?  i always thought that a while loop was only
need in the case of multiple threads waiting on a condition upon which only one
thread may act.  is this not the case?  thanks!

Comment 17

17 years ago
The OS/2 problem may be not caused by not using a while loop
here.  I am merely pointing out that as a rule, you must call
PR_WaitCondVar or PR_Wait in a while loop.  That's required by
the API.

Two reasons the while loop is required.
1. PR_WaitCondVar or PR_Wait may return due to a spurious wakeup.
   Note that an implementation may not actually generate spurious
   wakeups, but they are allowed by the API specification.
2. If multiple threads may call PR_WaitCondVar, another thread
   may be able to enter the monitor sooner than the awakened thread
   can (a thread awakened from a PR_WaitCondVar call needs to re-enter
   the monitor) and make the boolean expression false again.  This
   condition is called a stolen wakeup.

Again, I am not saying that the OS/2 problem is caused by a
spurious wakeup or a stolen wakeup.  In fact, you may be able
to analyze your code and the PR_WaitCondVar implementation on
OS/2 and determine that these don't happen.  I am just saying
that it is a syntactical requirement (one that can't be enforced
by a compiler) that all PR_WaitCondVar calls must be inside a
while loop.

Comment 18

17 years ago
fair enough... thanks for the thorough explanation :-)

in this case, it is not a stolen wakeup, because we only have one thread 
that may call PR_WaitCondVar.  perhaps it is the spurious wakeup problem,
since we are only seeing this on OS/2 and not any of the unix platforms.

sounds to me like we need to do a scan of all usages of PR_Wait or
PR_WaitCondVar to ensure that they are not subject to similar problems.
I suspect there may be problems with the thread pool implementation.
(Reporter)

Comment 19

17 years ago
The simple patch has been checked in.  So I'm morphing this bug to state the
desire to have BeOS and OS/2 use the Unix implementation of async dns lookups
(unless those platforms have a native async gethostbyname API that someone wants
to use).
Summary: DNS cache doesn't evict on BeOS and OS/2 → BeOS and OS/2 should use Unix Async DNS Implementation
Target Milestone: mozilla0.9.3 → mozilla1.0
(Assignee)

Comment 20

17 years ago
OK, with the current level of nsDnsService.cpp, Os/2 can follow the 
UNIX path no problem (as per the optimistic patch)

Do we want to assume BEOS can and check in the changes? Or just check 
in the pieces with XP_OS2?
(Assignee)

Comment 21

17 years ago
Gordon,

OS/2 can use the Unix implementation. Do you want to assume that BeOS can? Or 
should I create a patch that just fixes this for OS/2?
(Reporter)

Comment 22

17 years ago
You know, I'm tempted to change it for both; if it works - great! If it breaks
BeOS and somebody cares, then we should hear from them, which will let us know
who we can work with to resolve the issues.  I suppose we should make an
announcement on netscape.public.mozilla.netlib first.

Comment 23

17 years ago
you could also post to netscape.public.mozilla.beos, ... I just updated my tree 
so i might try a beos build this week.

Comment 24

17 years ago
Sorry to be late, I tested the 'optimistic' patch on BeOS build and it is
working for me so far.
I'm posting this from patched Bezilla:-)
(Assignee)

Comment 25

17 years ago
Why is this targeted for 1.0?

I'd like to see this in 0.9.2. Don't we leak without this?

Comment 26

17 years ago
mkaply: gordon is out of town for the next couple of weeks... so, this probably
won't go in for 0.9.2 unless some kind soul takes it... how 'bout you? ;-) 
(Assignee)

Comment 27

17 years ago
I'm taking this bug to push for checkin to 0.9.2.

Gordon wrote the code, and I looked at it and said it was OK, so

r=mkaply

Makoto Hamanaka verified it on BeOS, so I am going to take r=VYA04230@nifty.com 
for the BEOS part.

darin: Can I get an sr=darin so I can send this to drivers?
Assignee: gordon → mkaply
Status: ASSIGNED → NEW
Target Milestone: mozilla1.0 → mozilla0.9.2

Comment 28

17 years ago
sr=darin
(Assignee)

Comment 29

17 years ago
Fix checked in to trunk and branch (branch approval from Asa by mail)
Status: NEW → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED
Whiteboard: critical for 0.9.2

Updated

17 years ago
Keywords: verifyme
(Assignee)

Comment 30

17 years ago
Verify fixed
Status: RESOLVED → VERIFIED

Updated

9 years ago
Keywords: verifyme
Keywords: qawanted
You need to log in before you can comment on or make changes to this bug.