Do WHOIS lookups automatically for verification

REOPENED
Unassigned

Status

Webtools
ISPDB Server
--
enhancement
REOPENED
8 years ago
5 years ago

People

(Reporter: BenB, Unassigned)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

8 years ago
To verify that an alternative domain (e.g. yahoo.de) belongs to the main domain (e.g. yahoo.com), the best way is to make a WHOIS lookup and check the domain owner, and (via WHOIS and/or DNS) check that the name server between the main and alternate domain is the same.

Ideal would be, if the ispdb could check that automatically.
If that's not feasible, it would be good to make these lookups and display the result to the reviewer, so that he doesn't have to do it manually.
(Reporter)

Updated

8 years ago
Blocks: 543614
(Reporter)

Comment 1

8 years ago
$ python
import os
data = os.popen("whois mozilla.org").read()
print data

Comment 2

6 years ago
This has been fixed in https://github.com/mozilla/ispdb/pull/7
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Reporter)

Comment 3

6 years ago
Sergio,

could you please describe your algo?

I don't see WHOIS checks, for example. And I can't see whether the DNS checks are sufficient and how exactly they work.

Sorry that I don't just read the code, but |if not tld.tld or tld.subdomain:| is hard for me to decipher. And the pull has a lot of other stuff, too.
(Reporter)

Comment 4

6 years ago
REOPEN, because there's not even a hint to WHOIS in the code, and that doesn't make me confident at all about the rest either.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Reporter)

Comment 5

6 years ago
Also note that these rules are a help for the reviewer, not a requirement for submission. There are cases where the strict checks will fail, but a trusted human can tell that the domain is legitimate. This is intended to help the reviewer with the clear-cut positive cases.

Comment 6

6 years ago
Hi Ben,

DNS checks, similar to what you have described on comment 1, are being done in the begining of the do_domain_checks method: https://github.com/mozilla/ispdb/blob/master/ispdb/config/configChecks.py#L291

We use the first domain as the main domain and the others as alternate domains and check. We fetch the nameserver using dns NS query and check if alternate domains name servers set is a subset of main domain name servers set.

The tld code is comparing the domain part of the Domain or DomainRequest objects and its MX servers against IMAP/POP3 server and SMTP server (if not tld.tld or tld.subdomain: just checks if a domain is valid). This is for bug 530975.

About WHOIS checks, you're right, there is no WHOIS checks. Because there is no reliable way to get owner data from whois queries. You can see that in pywhois parser: https://github.com/unpluggd/pywhois/blob/master/pywhois/parser.py . I think the other checks are sufficient. 

You are right about comment 5. The code just displays the warnings and errors found, but the reviewer can approve it anyway. Do you think we should do something else? I think reviewers know automated checks can create false alarms. 

I should have split the pull request into smaller parts, sorry.

Comment 7

6 years ago
Just one fix from last comment:
Actually I've changed "if not tld.tld or tld.subdomain:" to "if not (tld.tld and tld.domain):" which checks if a domain is valid.
(Reporter)

Comment 8

6 years ago
> We use the first domain as the main domain and the others as alternate domains and check.
> We fetch the nameserver using dns NS query and check if alternate domains name servers set
> is a subset of main domain name servers set.
> The tld code is comparing the domain part of the Domain or DomainRequest objects and its MX
> servers against IMAP/POP3 server and SMTP server (if not tld.tld or tld.subdomain:
> just checks if a domain is valid). This is for bug 530975.

Can you please tell me *what* you check against what? That's not clear to me from the code nor your description.

> there is no WHOIS checks. ... I think the other checks are sufficient.

I disagree, WHOIS data is important and a diligent reviewer should check it. Thus, the software should automate it or at least make it easy.

> parsing... owner... pywhois

Why don't you use pywhois? You don't necessarily need to compare owners, it's sufficient to compare admins. Even if you can match it most of the time, not all the time, that's still a help.

The absolute minimum (which you can easily do) would be that you show the to reviewer the WHOIS (minus the boilerplate blabla of the registry) of all domains.
(Reporter)

Comment 9

6 years ago
("all domains" includes domains of servers listed.)
It would be particularly important to highlight discrepancies you found. If they are legit, the reviewer can approve.

> The code just displays the warnings and errors found, but the reviewer can approve it anyway.

I think that's great, yes.
(Fixing the title, since this bug seems to be more about WHOIS checks at this point…)
Summary: Do WHOIS and DNS lookups automatically for verification → Do WHOIS lookups automatically for verification

Comment 11

6 years ago
> Can you please tell me *what* you check against what? That's not clear to me
> from the code nor your description.

1. Get name servers from all Domain/DomainRequest objects. 
2. We take the first domain as the main domain, and the others as alternate domains
3. For each alternate domain, check if its set of name servers is a subset of the main domain set. It means that we are checking if main domain has all of the alternate domains name servers.

> 
> > there is no WHOIS checks. ... I think the other checks are sufficient.
> 
> I disagree, WHOIS data is important and a diligent reviewer should check it.
> Thus, the software should automate it or at least make it easy.
> 
> > parsing... owner... pywhois
> 
> Why don't you use pywhois? You don't necessarily need to compare owners,
> it's sufficient to compare admins. Even if you can match it most of the
> time, not all the time, that's still a help.
> 
> The absolute minimum (which you can easily do) would be that you show the to
> reviewer the WHOIS (minus the boilerplate blabla of the registry) of all
> domains.

For the majority of the domains, pywhois gives only this information:
_regex = {
        'domain_name':      'Domain Name:\s?(.+)',
        'registrar':        'Registrar:\s?(.+)',
        'whois_server':     'Whois Server:\s?(.+)',
        'referral_url':     'Referral URL:\s?(.+)', # http url of whois_server
        'updated_date':     'Updated Date:\s?(.+)',
        'creation_date':    'Creation Date:\s?(.+)',
        'expiration_date':  'Expiration Date:\s?(.+)',
        'name_servers':     'Name Server:\s?(.+)', # list of name servers
        'status':           'Status:\s?(.+)', # list of statuses
        'emails':           '[\w.-]+@[\w.-]+\.[\w]{2,4}', # list of email addresses
    }

It isn't sufficient. Parsing whois responses (manually or not) is tricky. How can we take off the unnecessary information?

We can display the whois response but we should display it in a good fashion (because imagine we have 10 domains we ISPDB prints 10 whois responses).
(Reporter)

Comment 12

6 years ago
> 1. Get name servers from all Domain/DomainRequest objects. 

I assume you mean the <domain> values in the XML? I.e. the email address domains.

> 3. For each alternate domain...

In the description, I'm missing the check of the domain of the IMAP/POP3/SMTP server hostnames. It must match (as in: same owner of domain) the <domain>s (email address domains) as well. In fact, that's the most important bit, but I can't find it in the code.


What happens if not tld.tld ? Does that skip all tests or lead to fail? (should: fail)

> For the majority of the domains

Have you actually tried that? It seems pywhois does parse registrant, admin and many other fields, and WHOIS of .com, .org and .net (at least) all look very much parsable. If pywhois can't do it, it should be easy to add.

Comment 13

6 years ago
(In reply to Ben Bucksch (:BenB) from comment #12)
> > 1. Get name servers from all Domain/DomainRequest objects. 
> 
> I assume you mean the <domain> values in the XML? I.e. the email address
> domains.

Yes

> 
> > 3. For each alternate domain...
> 
> In the description, I'm missing the check of the domain of the
> IMAP/POP3/SMTP server hostnames. It must match (as in: same owner of domain)
> the <domain>s (email address domains) as well. In fact, that's the most
> important bit, but I can't find it in the code.
> 

Is it the check that you've commented on  bug 530975 (comment 10)? I'm doing it here: https://github.com/mozilla/ispdb/blob/master/ispdb/config/configChecks.py#L310

Basically, I discard the subdomain part of the Domain/DomainRequest objects (the domain value in XML) and its MX servers (using tld lib) and add them to a list. Then I check if the domain (also discarding the subdomain part) of the IMAP/POP3/SMTP server hostnames is in the list.

To get owner details I would have to use WHOIS which I found very hard to parse manually and like I said pywhois won't give this information for the majority part of the domains (all except .name, .me, .us .uk).

> 
> What happens if not tld.tld ? Does that skip all tests or lead to fail?
> (should: fail)

It will add an error message to the list and continue the test for the next domain in the list.

Here is the relevant piece of the code:

for domain in domains: # For each Domain or DomainRequest object (depends whether Config is approved or not) 
        mxservers = get_mxservers(domain.name) # get MX servers of this domain using DNS query
        tld = extract(domain.name) # split domain name into: subdomain, domain, tld parts
        if not (tld.tld and tld.domain): # check if domain is valid
            domain_errors.append("Domain '%s' is not valid." % # add an error to the list and will continue with the next domain
                                 (domain.name))
        else:
            tlds.add(domain.name) # add to domain list (domain.tld part only)

> 
> > For the majority of the domains
> 
> Have you actually tried that? It seems pywhois does parse registrant, admin
> and many other fields, and WHOIS of .com, .org and .net (at least) all look
> very much parsable. If pywhois can't do it, it should be easy to add.

I have tried (but maybe there is a different whois server that gives these informations that I'm not aware of). Registrars whois servers responses can be completely different. For example, google.com and gmx.com (they are both .com).

pywhois gives owner/registrant information only for .name, .me, .us .uk domains.
(Reporter)

Comment 14

6 years ago
> I discard the subdomain part of the Domain/DomainRequest objects (the domain value in XML)
> and its MX servers (using tld lib) and add them to a list.

Why the subdomain drop? Normally, the <domain> should be a SLD, and if it's not (we currently have no such case), it should really be limited to that subdomain.

Why adding the MX domain? Sounds fine in general, but would be good to differentiate that. If the IMAP server doesn't match any of the domains, but the MX server's domain, I would want to have this pointed out to me. We may be missing a domain, or there's something strange.

What happens, if one of the domains in the list doesn't belong there? Let's say I have gmx.com, gmx.net, gmx.de, emailme.com, vanity.com, ihopeyouwriteme.com as domains. Just that ihopeyouwriteme.com is owned by the attacker. Its MX points to mx1.ispservers.com, also owned by the attacker, and the IMAP server imap.ispservers.com is the attacker. Let's say the DNS nameserver test passes somehow.
Can we point out inconsistencies between the domains, i.e. when gmx.net and ihopeyouwriteme.com has a different NS or MX server?

> Registrars whois servers responses can be completely different. For example, google.com and gmx.com

duh, I didn't know registrars have their own WHOIS server now, it used to be the registry.

> pywhois gives owner/registrant information only for .name, .me, .us .uk domains.

Have you checked admin-c, too?
(Assignee)

Updated

5 years ago
Component: ispdb → ISPDB Server
Product: Mozilla Messaging → Webtools
You need to log in before you can comment on or make changes to this bug.