Provide advice on SCM system for Public Suffix List

RESOLVED FIXED

Status

Developer Services
Mercurial: hg.mozilla.org
--
minor
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: gerv, Unassigned)

Tracking

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1472] )

(Reporter)

Description

3 years ago
The master copy of the Public Suffix List <http://publicsuffix.org/> currently lives in mozilla-central. That means someone needs a checkout of Firefox to modify it, and a build of Firefox to test it. This is unnecessarily heavyweight, and leads to checkins without running tests which have broken the tree on more than one occasion.

So, we'd like to set up a separate repo just to hold the list, and we'd like advice on how to do it.

* Where is the best place to host it? We want to have the canonical copy on Mozilla infrastructure but some contributors like the Github PR system for taking submissions, and Travis CI for running tests, so it would be good if we can use that. What git.mozilla.org or hg.mozilla.org integration options are there which aren't too much hassle for IT?

* Can we have a cron job or post-checkin hook or something which automatically checks in any updates to the file to mozilla-central?

Thanks for any advice you can give,

Gerv
(Reporter)

Comment 1

3 years ago
Oh, and can someone make this bug public, please?

Gerv

Updated

3 years ago
Assignee: server-ops → server-ops-webops
Group: infra
Component: Server Operations → WebOps: Source Control
Product: mozilla.org → Infrastructure & Operations
QA Contact: shyam → nmaul
I think the first round of questions/advice will be best answered by RelEng; when it's time to create and deal with hooks, I'll pick it back up.
Assignee: server-ops-webops → nobody
Component: WebOps: Source Control → Repos and Hooks
Product: Infrastructure & Operations → Release Engineering
QA Contact: nmaul → hwine

Comment 3

3 years ago
(In reply to Gervase Markham [:gerv] from comment #0)
> The master copy of the Public Suffix List <http://publicsuffix.org/>
> currently lives in mozilla-central. That means someone needs a checkout of
> Firefox to modify it, and a build of Firefox to test it. This is
> unnecessarily heavyweight, and leads to checkins without running tests which
> have broken the tree on more than one occasion.

If a separate test were written to sanity check the list (this would have to be done in the separate-repo proposal anyway), then we could just have a mach command added to run it, which would make new contributor involvement pretty easy. eg |./mach check-psl|
(Reporter)

Comment 4

3 years ago
edmorley: fair point, although that would still require a full Firefox checkout, which seems overkill for a 4000-line text file. Given that the PSL is a data source used in many places, a separate repo seems logically sane, too.

Is there some reason that's difficult? Or are you just engaged in the admirable pursuit of trying to find the solution to the problem which requires the least change? :-)

Gerv

Comment 5

3 years ago
(In reply to Gervase Markham [:gerv] from comment #4)
> Is there some reason that's difficult? Or are you just engaged in the
> admirable pursuit of trying to find the solution to the problem which
> requires the least change? :-)

The latter, but only from the POV of making sure all alternatives are on the table :-)
(Release engineering/IT would be the ones that need to set things up, so this doesn't impact me much either way)
This is my first exposure to the Public Suffix List, so a few questions.

As it stands now (SSoT in m-c) any changes to the file must ride the trains, and thus be unlikely to be picked up quickly. (E.g. esr will never get it.)
 - is that the desired behavior?
 - does that latency also work for the FFOS use cases?

It looks like this file gets updated >30 times a year, and it sounds like each change does not have a test case added to verify continued inclusion.

aiui, to properly validate a proposed submission, the submitter also needs to (or should) retrieve the test file. (Side note - on the website the test file s/b referenced via a URL to hg.m.o - mxr is not a SoT, and is scheduled for replacement.) And perhaps some support files to be able to run those tests.

Okay, now to the request. Everything you mention is technically possible, and we've recently implemented some of the needed building blocks. However, let me back up and try to understand the goal here. Are you trying to:
 - automate the submission and verification process?
 - provide submitters with a lighter weight manual path for the current process?
 - loosen the coupling between this list and Firefox/Mozilla needs?
Depending on the priorities, I'd recommend different solutions.

There is an ongoing data maintenance cost to keeping the same data in 2 places -- it would be nice to avoid that if we can achieve your goals without such duplication.
Flags: needinfo?(gerv)
(Reporter)

Comment 7

3 years ago
(In reply to Hal Wine [:hwine] (use needinfo) from comment #6)
> As it stands now (SSoT in m-c) any changes to the file must ride the trains,
> and thus be unlikely to be picked up quickly. (E.g. esr will never get it.)
>  - is that the desired behavior?
>  - does that latency also work for the FFOS use cases?

It's not awesome, no. We used to try and remember to check in on ESR, but I think we've forgotten recently. It would be good if that were fixed.

> It looks like this file gets updated >30 times a year, and it sounds like
> each change does not have a test case added to verify continued inclusion.

No. We think that's fine, although we do want to have a test suite of format tests to make sure we don't accidentally break some of the formatting constraints.

> aiui, to properly validate a proposed submission, the submitter also needs
> to (or should) retrieve the test file. 

Well, many submitters don't have code which uses the list. 

> (Side note - on the website the test
> file s/b referenced via a URL to hg.m.o - mxr is not a SoT, and is scheduled
> for replacement.) And perhaps some support files to be able to run those
> tests.

The official URL as referenced on the website is now:
https://publicsuffix.org/list/effective_tld_names.dat
which can be CDNed if necessary.

> Okay, now to the request. Everything you mention is technically possible,
> and we've recently implemented some of the needed building blocks. However,
> let me back up and try to understand the goal here. Are you trying to:
>  - automate the submission and verification process?

No. Every submission needs a sanity check by our team.

>  - provide submitters with a lighter weight manual path for the current
> process?

No, not submitters. Maintainers, yes.

>  - loosen the coupling between this list and Firefox/Mozilla needs?

A little bit, perhaps.

The goals are:

* Make sure the PSL doesn't break the tree. This means making it easier for people to run its formatting tests, and write new ones. This points to a separate repo - building the entire of m-c for this purpose is unnecessary.

* Make the job of maintainers simpler and easier in other ways. So a tiny checkout rather than multiple hundreds of MB, and a simple test harness.

* Make sure the PSL is as up to date as possible on all maintained branches. So we could have a single master repo, and then bots to automatically notice changes and commit them to the various branches.

> There is an ongoing data maintenance cost to keeping the same data in 2
> places -- it would be nice to avoid that if we can achieve your goals
> without such duplication.

Well, there are always going to be multiple copies around the place. But we'd like there to be one single master copy, which is synced out to all relevant branches.

Does that answer your questions? Sorry for the delay.

Gerv
Flags: needinfo?(gerv) → needinfo?(hwine)
(In reply to Gervase Markham [:gerv] from comment #7)
> (In reply to Hal Wine [:hwine] (use needinfo) from comment #6)
> > Okay, now to the request. Everything you mention is technically possible,
> > and we've recently implemented some of the needed building blocks. 
>
> The goals are:
> 
> * Make sure the PSL doesn't break the tree. This means making it easier for
> people to run its formatting tests, and write new ones. This points to a
> separate repo - building the entire of m-c for this purpose is unnecessary.
> 
> * Make the job of maintainers simpler and easier in other ways. So a tiny
> checkout rather than multiple hundreds of MB, and a simple test harness.
> 
> * Make sure the PSL is as up to date as possible on all maintained branches.
> So we could have a single master repo, and then bots to automatically notice
> changes and commit them to the various branches.
> 
> > There is an ongoing data maintenance cost to keeping the same data in 2
> > places -- it would be nice to avoid that if we can achieve your goals
> > without such duplication.
> 
> Well, there are always going to be multiple copies around the place. But
> we'd like there to be one single master copy, which is synced out to all
> relevant branches.
> 

Thanks for the information -- that helps a lot. Unfortunately, there is not a 100% fit, so you'll need to make some tradeoffs depending on what is most important for your use cases.

a) hg.m.o for the commitable repository:
    + known process to write and install commit hooks
    + easy to mirror read only copy to github
    - no "big green button" for PR merges

b) git.m.o for the committable repository:
    + easy to mirror read only copy to github
    - we haven't defined the process to write & install hooks yet
    - no "big green button" for PR merges

c) github for the committable repository:
    + easy to mirror read only copy to hg.m.o or git.m.o
    + has the "big green button" for PR merges
    - no commit hooks, you have to write, deploy, and operate a
      service to listen for web hook calls.

Note: It is not clear that a commit hook can land a change on other branches (permissions, deadlock issues). Thus the "obvious hybrid" solution of commit to github, and mirror to hg.m.o where hooks process things is not viable. There is an effort to get "auto landing" working again - that may be a route, but no ET yet.

Note: Any DVCS can do pull requests. The "big green button" is a github feature, and only usable for PR. (Updating a forked repository still requires the pull-merge-push approach.)

Let me know which way you'd like to proceed, or if you have further questions.
Flags: needinfo?(hwine) → needinfo?(gerv)
(Reporter)

Comment 9

3 years ago
I think being able to do stuff on commit is useful, and merging random PRs from other people is not so useful, so I think a) is probably our best bet. Simone/Jothan/Steve: your views?

Hal wrote:
> Note: It is not clear that a commit hook can land a change on other branches (permissions, deadlock 
> issues).

Well, that's a bit of a shame :-( At the moment, at least the change makes it to m-c with no extra work, because the repository of record _is_ m-c. If we can't do some sort of autolanding (can it work if the file is guaranteed to merge properly?) then it looks like we are adding work here. 

Hal: Am I right?

Gerv
Flags: needinfo?(gerv) → needinfo?(hwine)
Correct, more work unless we prove the nested commits can happen.

Is that something you want to have resolved before making the decision?
Flags: needinfo?(hwine)
(Reporter)

Comment 11

3 years ago
At the moment, then, rather than being an obvious win, it looks like we are trading off simplicity of management of the list in general for greater complexity in getting it shipped by Mozilla. Perhaps that's a net win.

Simone/Jothan/Steve: still hoping for your views :-)

Gerv

Comment 12

3 years ago
Personally, I'd prefer option c for the following reasons:

- I'm way more familiar with Git over Hg and I also tend to prefer the Git features and commit approach compared to the Hg one
- The GitHub pull-request feature should not be underestimated. Offering simpler tools to users would make simpler to contribute (just think about the fact you can edit a file in your browser and send a PR)
- We should also not underestimate the visibility of GitHub compared to the proprietary Mozilla repo
- GitHub has really simple integration with third party services and webhooks. That would make possible to ping CI systems and integrate more tools with minimum effort and a reasonable level of independence

From my personal POV, the PSL so far has been a very hidden project despite it is part of virtually every user daily routine (just think about how Firefox, Google and Google Chrome uses it). Keeping it up-to-date is nowadays limited to a very few people and hosting it in a globally recognized service like GitHub would increase its visibility.

I'm a huge fan of GitHub and git, I have been using both since 2007/2008 so my comment is a little bit biased. However, having maintained the PSL for almost a couple of years, my comment is also the result of an evaluation of the current maintenance workflow.
(Reporter)

Comment 13

3 years ago
Simone: I'm not convinced of the massive value of super-easy pull requests; the PSL doesn't change all that often. (The reason it seems like there's a lot of work is that we aren't very quick at dealing with the bugs!) Still, option a) allows for a read-only mirror on github so people can still use the big green button. It just means that there's a little more work in getting their patch checked in if they use that route.

The disadvantage of github is that updating the Mozilla copy of the PSL becomes a manual process which has to be triggered every time we make a change. And that's a significant downside, and extra work for us.

hwine: does hg.mozilla.org have any kind of "external repo" support, such that someone who does a checkout of the code could automatically pull a copy of the PSL from (e.g.) Github? Or do we try and avoid that because it puts extra point of failure into our build process?

Gerv
Flags: needinfo?(hwine)
(In reply to Gervase Markham [:gerv] from comment #13)
> hwine: does hg.mozilla.org have any kind of "external repo" support, such
> that someone who does a checkout of the code could automatically pull a copy
> of the PSL from (e.g.) Github?

No - that is not a server side feature for any DVCS. It would have to be implemented as a post hook in every client. There is some talk of supporting "enforced hg configurations" via mach, but no consensus yet or timeline.

> Or do we try and avoid that because it puts
> extra point of failure into our build process?

Main reason to avoid it due to requiring all developers to keep a client side hook updated & deployed. If that were addressed, for CI & release builds, we'd need the repo mirrored so we don't depend on non m.o resources. That would require a way to tell the hook to use the mirror.

So, overall a brittle approach that we don't have developer tooling to support diagnosis. Not recommended.
Flags: needinfo?(hwine)
(Reporter)

Comment 15

3 years ago
Hmm. Have I missed something? The b2g build process seems to pull in a pile of external repos - Android, partners etc.. Does that build have all the downsides you outline? Or did I not explain well enough that I meant something like that?

Gerv
Flags: needinfo?(hwine)
Building a bit on Ed's comment 3 -- would the following meet the needs?

For submitters: a bash or python script they download to:
 - pull the current PSL
 - pull _just_ the formatting tests via hgweb
 - run the tests
 - make the submission (create/attach to bug, whatever)

For maintainers: a bash or python script they download that:
 - pulls the submission, PSL, etc.
 - does any testing needed on whichever branches it needs
 - pushes the change to all the appropriate branches (or does the right thing with fxos branches & bugs)

These new scripts could be in a new github repo for ease of maintenance and download.
(In reply to Gervase Markham [:gerv] from comment #15)
> Hmm. Have I missed something? The b2g build process seems to pull in a pile
> of external repos - Android, partners etc.. Does that build have all the
> downsides you outline? Or did I not explain well enough that I meant
> something like that?

In the android build world, all of that is handled on the client side by a script called "repo" which reads a manually maintained manifest. Since "repo" is a required build tool, the update issues are already addressed (the developer's build breaks if they don't have it)

For FF desktop/mobile, we don't have such a required client side tool at the moment.

Does that clarify?
Flags: needinfo?(hwine)
(Reporter)

Comment 18

3 years ago
OK, I'm going to make the call here, as any of the options are better than the status quo. Let's go with a).

a) hg.m.o for the commitable repository:
    + known process to write and install commit hooks
    + easy to mirror read only copy to github
    - no "big green button" for PR merges

hwine: are you in a position to set up a new repo on hg.mozilla.org? I'm not sure what top-level directory it lives in; perhaps the catch-all "projects"? Suggestions? The repo name should be "public-suffix-list". 

If there were some way of getting the latest PSL from m-c into it _with_history_, but without making the checkout enormous, that would be truly awesome. Is that possible? Perhaps by some "replay" mechanism which reconstructs all the checkins with metadata, even if they have different hashes? If not, we'll have to do without the history.

Once we have the PSL in there, we can copy across or reproduce any tests to make sure that if it passes them, it'll pass on m-c. Then, we have easier access to commit, and (we hope) no more tree breakage. (Unless the tree depends on the PSL being of a certain form in ways we don't notice.)

Gerv
Flags: needinfo?(hwine)
(In reply to Gervase Markham [:gerv] from comment #18)
> OK, I'm going to make the call here, as any of the options are better than
> the status quo. Let's go with a).

> hwine: are you in a position to set up a new repo on hg.mozilla.org? I'm not
> sure what top-level directory it lives in; perhaps the catch-all "projects"?
> Suggestions? The repo name should be "public-suffix-list". 

While I'm usually against top level repositories, this may be the right location for a globally used system. The other candidates would be "releases/*" as this really is a release that other folks depend upon, or "integration/*" as this is integrated into many browsers. "projects" is typically for "short lived" projects, and this is not that.

> If there were some way of getting the latest PSL from m-c into it
> _with_history_, but without making the checkout enormous, that would be
> truly awesome. Is that possible?

Yep - relatively simple, but (as you note) the hashes will change. All I need are the names of the set of files you want preserved, and where you want them to end up in the the new repo. (i.e. things can be renamed, moved, gathered together)

> Once we have the PSL in there, we can copy across or reproduce any tests to
> make sure that if it passes them, it'll pass on m-c. 

If you want the test history at the same time, just include those files in the list.
Flags: needinfo?(hwine) → needinfo?(gerv)
(Reporter)

Comment 20

3 years ago
(In reply to Hal Wine [:hwine] (use needinfo) from comment #19)
> While I'm usually against top level repositories, this may be the right
> location for a globally used system. The other candidates would be
> "releases/*" as this really is a release that other folks depend upon, 

Everything in "releases" is release branches of Mozilla core code held elsewhere. 

> or
> "integration/*" as this is integrated into many browsers.

I'm not sure that directory uses that meaning of "integration" :-)

I'm not keen on either of these, so if projects/ is wrong, I'd prefer at the top level.

> Yep - relatively simple, but (as you note) the hashes will change. All I
> need are the names of the set of files you want preserved, and where you
> want them to end up in the the new repo. (i.e. things can be renamed, moved,
> gathered together)

OK. The files I think we should keep are:

netwerk/dns/effective_tld_names.dat
netwerk/dns/prepare_tlds.py
netwerk/test/unit/data/test_psl.txt
netwerk/test/unit/test_psl.js

I'm not totally certain we need all of these, but removing them afterwards is easy, but adding them with history later is (presumably) hard. They would all go in the top-level directory. Again, I assume later moves and renames are easy.

Are you now in a position to take this bug and run with it?

Gerv
Flags: needinfo?(gerv) → needinfo?(hwine)
(In reply to Gervase Markham [:gerv] from comment #20)
> Are you now in a position to take this bug and run with it?

Let's make sure we have the same definition of "run with it". What I'm on the hook for is:
 - make a new repo
 - include the history for the files in comment 20
 - I can't give an ET at this time

What also needs to be done, and has no owner afaik is:
 - any hooks to support auto landing on m-c
 - any hooks to support auto landing on other release branches
 - changes to workflows & documentation for the process changes
 - "unknown unknowns"
Flags: needinfo?(hwine)
(Reporter)

Comment 22

3 years ago
hwine: yes. You are on the hook for the things you list. The other things are not necessary straight away, in that I think things will be better once the first set of things are done, and that list makes them better still.

I've assigned this bug to you :-)

Gerv
Assignee: nobody → hwine
(Reporter)

Comment 23

3 years ago
hwine: any idea when you might be able to do this?

Gerv
Flags: needinfo?(hwine)
(Reporter)

Comment 24

3 years ago
hwine: ping?

Gerv
(Assignee)

Updated

3 years ago
Product: Release Engineering → Developer Services

Updated

3 years ago
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/128]
(Reporter)

Comment 25

3 years ago
https://kanbanize.com/ctrl_board/6/128 doesn't show me anything useful, even after signing up on the website - can someone tell me what's with this bug?

Thanks,

Gerv

Updated

3 years ago
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/128] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1455] [kanban:engops:https://kanbanize.com/ctrl_board/6/128]

Updated

3 years ago
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1455] [kanban:engops:https://kanbanize.com/ctrl_board/6/128] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1458] [kanban:engops:https://kanbanize.com/ctrl_board/6/128]

Updated

3 years ago
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1458] [kanban:engops:https://kanbanize.com/ctrl_board/6/128] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1465] [kanban:engops:https://kanbanize.com/ctrl_board/6/128]

Updated

3 years ago
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1465] [kanban:engops:https://kanbanize.com/ctrl_board/6/128] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1467] [kanban:engops:https://kanbanize.com/ctrl_board/6/128]

Updated

3 years ago
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1467] [kanban:engops:https://kanbanize.com/ctrl_board/6/128] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1472] [kanban:engops:https://kanbanize.com/ctrl_board/6/128]
(In reply to Gervase Markham [:gerv] from comment #25)
> https://kanbanize.com/ctrl_board/6/128 doesn't show me anything useful, even
> after signing up on the website - can someone tell me what's with this bug?
> 
> Thanks,

Sorry gerv, I can't answer your "status" question, (and sorry for the kanban spam), the situation with kanbanize is an attempt at using a work-tracking tool. We (releng/devops/relops) are evaluating it ~ now for our uses.

The tl;dr is that *all* work/discussion will happen in-bugs, kanban is just a view of "who" and "what state" for various things that visually displays for us better than bugzilla.

I've pinged hal off-bug to pop in here though.
(Assignee)

Updated

3 years ago
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1472] [kanban:engops:https://kanbanize.com/ctrl_board/6/128] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1472]
Let's see if we can get this unstuck.

Gerv: is this still something that you're interested in?

If so, things have changed a lot on the repo mgmt side -- we'll run it by gps for process comments (convenient, as he can speak to any build issues as well). There may be better (easier) solutions with the more modern versions of hg we're now running.
Flags: needinfo?(hwine) → needinfo?(gerv)
Assignee: hwine → nobody
(Reporter)

Comment 28

2 years ago
Yes, definitely still interested in this. NI'ing gps for comment.

Gerv
Flags: needinfo?(gerv) → needinfo?(gps)
We can "vendor" the PSL inside mozilla-central. What this means is that the canonical source of truth for PSL would be a separate repository (probably on GitHub). We would periodically import the contents of the external repository into mozilla-central using a script checked into mozilla-central. This import could be manual (if updates to the PSL are infrequent) or we could deploy a bot to do it automatically when the canonical PSL repo changes. There is precedence for both methods. Typically we start at manual and add automation as the human overhead becomes too much of a burden.

It's worth noting that as long as PSL is a dependency for the Firefox build, we need the data in mozilla-central. The build automation cannot connect to the Internet and subrepos/submodules are a) a horrible UX b) would require updating the automation tooling.

Google is currently implementing "narrow clone" support in Mercurial. This allows people to clone only specific directories from a large repository (kinda like how Subversion works). Once this is implemented and deployed at Mozilla, the argument around "cloning mozilla-central just for a small set of files takes too long" becomes invalid and we can revisit decisions around "vendoring" and the use of separate repositories.
Flags: needinfo?(gps)
(Reporter)

Comment 30

2 years ago
(In reply to Gregory Szorc [:gps] from comment #29)
> We can "vendor" the PSL inside mozilla-central. What this means is that the
> canonical source of truth for PSL would be a separate repository (probably
> on GitHub). We would periodically import the contents of the external
> repository into mozilla-central using a script checked into mozilla-central.
> This import could be manual (if updates to the PSL are infrequent) or we
> could deploy a bot to do it automatically when the canonical PSL repo
> changes. There is precedence for both methods. Typically we start at manual
> and add automation as the human overhead becomes too much of a burden.

Sounds like a plan. We should take prepare_tlds.py into the other repo too, if you want the PSL team to make sure they don't accidentally break it. Then you can import both files.

> Google is currently implementing "narrow clone" support in Mercurial. This
> allows people to clone only specific directories from a large repository
> (kinda like how Subversion works). Once this is implemented and deployed at
> Mozilla, the argument around "cloning mozilla-central just for a small set
> of files takes too long" becomes invalid 

Not totally so; one currently needs to build Firefox to run the PSL xpcshell test, which needs to be run before checkin to make sure we don't break the build.

If we move out to an external repo, we might reimplement that test standalone.

Gerv
(Reporter)

Comment 31

2 years ago
Looks like we have a plan here, to move to Github. hwine: how do we go about that? Are you able to do the VCS work to strip out the necessary files history and create a new repo from them? See comment 20 and comment 21.

Gerv
Flags: needinfo?(hwine)
Gerv: great! :) new bike-shedding! :(

comment 20 - now need to know the file structure of the new repo, and which old files match to new files. I'd suggest getting something working at least on your machine to test out things, then yes, it's fairly straightforward to get the data out of m-c. Someone in dev-services can help out.

comment 21 - github is self serve -- you need to decide on organization & repo name. If you anticipate non-mozillians being contributors, it may be best to have a separate org. See https://wiki.mozilla.org/Github for details, including the contact list to help out on this.
Flags: needinfo?(hwine)
(Reporter)

Comment 33

2 years ago
File structure:

netwerk/dns/effective_tld_names.dat  => public_suffix_list.dat (see bug 1155581)
netwerk/dns/prepare_tlds.py          => tests/prepare_tlds.py
netwerk/test/unit/data/test_psl.txt  => tests/test_psl.txt
netwerk/test/unit/test_psl.js        => tests/test_psl.js

We'll also need a README.md and a LICENSE, but we can add those later.

> I'd suggest getting something working at least on your machine to test out things

I think it's fine to just set up the repo, then I can hack the tests to work outside the Mozilla environment.

> Someone in dev-services can help out.

Can you be more specific about what I should do to get the right people to come and do this for me?

Github:

Yes, I think we want a separate org. Organizations require money, right? So it would be best if Mozilla created it? I'll email github-owners. I'd say the org name and repo name would probably both be "publicsuffix".

Gerv
(Reporter)

Updated

2 years ago
Flags: needinfo?(hwine)
You can use `hg convert` with a file map to create a new Mercurial repo containing only the files relevant to you. Then, you can convert that to Git using hg-git or one of the Mercurial import scripts that are in the Git source repository (can't remember their names, sorry).

Alternatively, I believe Git's repo import tools have ways to filter files. But knowing what I know about the inefficiency of those tools, it's probably faster to do a Mercurial to Mercurial conversation and export the smaller Mercurial repo to Git rather than point the Git tools directly at mozilla-central.
I agree with Greg. I'd using the following steps
 - install latest hg
 - install latest hg-git extension
 - use `hg convert` (hg help convert has details) for an hg -> hg conversion
 - use `hg gexport` to convert the resulting hg repo to git
 - push to github

The latest versions of these tools should provide the fastest conversions
Flags: needinfo?(hwine)
(Reporter)

Comment 36

2 years ago
OK, thanks to gps for his help: https://github.com/publicsuffix/list :-)

We now need to change things h(In reply to Gregory Szorc [:gps] from comment #29)
> We can "vendor" the PSL inside mozilla-central. What this means is that the
> canonical source of truth for PSL would be a separate repository (probably
> on GitHub). We would periodically import the contents of the external
> repository into mozilla-central using a script checked into mozilla-central.

How do we arrange this? Is there an existing script we can hack? I'd like it to be automated, ideally.

Gerv
While a script would be preferred, many 3rd party sources document the update process in a README.mozilla or README_MOZILLA file. See media/*/README_MOZILLA or gfx/*/README.mozilla.
(Reporter)

Comment 38

2 years ago
gps: is there a way to make this easier than manually pulling/updating Firefox, pulling the PSL repo, copying the file across, generating a patch, filing a bug, attaching the patch, and marking it "checkin-needed"?

On Github, of course, I could click "edit this file", paste in a newer copy, select "make pull request" and be done...

Gerv
Flags: needinfo?(gps)
There are already processes that update code from 3rd party repositories without the overhead of filing a bug. B2G Bumper Bot is a good example of this. I'd say it's up to the PSL owner to make the determination that a bug isn't warranted.

As for automating things, that's something you'd have to build yourself. I would start with a script checked into mozilla-central that performs this task. If it becomes too burdensome for a human to run periodically, we can teach a machine how to do it.
Flags: needinfo?(gps)
(Reporter)

Comment 40

2 years ago
We moved to github.

https://github.com/publicsuffix/list

Gerv
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.