Open Bug 1444212 Opened 6 years ago Updated 2 years ago

update tp5 pageset - ideally replace with tp6 and better record/replay tooling


(Testing :: Talos, task, P3)



(Not tracked)


(Reporter: jmaher, Unassigned)



(Whiteboard: [PI:July])

this pageset is out of date and we need to get something more modern.  As we do simple pageloads with this, we don't have to go overboard with logins to all websites and navigated solutions- possibly we do this for some of the pages.

the old pageset was the top 100 websites on alexa 500.  I propose taking the top 50 pages and recording a basic pageload:

Here we would need to special case some sites to have logged in information (such as facebook).

As there is work underway to update the tooling/docs for record/replay pages- possibly we can update this pageset at the same time and ensure we update at least once a year going forward (since tooling would be better).
Whiteboard: [PI:May]
previously we have used alexa 500- a few things:
1) this is a paid service now- we can preview the top 50 for free
2) we could use the wikipedia data:, not sure how up to date that is
3) do we use,,,, etc. ?
4) do we strip out pornography?  a handful of the top 50 sites are related to pornography.
5) many of the top sites require a login (instagram, twitter, facebook, netflix, etc.) - we need to establish test accounts and make sure we log in and record sites with useful content.

open to suggestions on the above items.
Depends on: 1454053
Whiteboard: [PI:May] → [PI:July]
tp5 is 7+ years old (pageset) and very outdated, we currently run 49 pages as part of tp5.  When bug 1503990 is resolved, we will have almost 30 pages running in tp6 and monitoring each page with multiple metrics.

A few things to consider:
* tp5 measures responsiveness on linux and windows
* tp5 measure xres (via xrestop) on linux

These measurements have seen change over time, but I am not sure if we have replacements for them, or have a strong desire to continue measuring these other metrics.  For example, responsiveness might end up being tracked as part of the metrics we collect, either as a factor of hero or ttfi.  XRes is only on linux, so while we lose some insight here, we are not testing on our #1 platform (windows) and other work on energy will greatly supercede this- in fact there are no alerts for xres in the last year except for when we switch physical hardware in February.

:vchin, could you help us determine if the linux-xres measurement and linux/windows responsiveness metric are needed until we get a replacement?  Also if it makes sense to retire our old pageload benchmark from July 2011 and focus on the new pageload data we have.
Flags: needinfo?(vchin)
Do the linux-xres measurement and linux/windows responsiveness tests catch regressions? If so I'd be included to keep them until we have replacement. We're looking at determining responsive metrics mostly related to scrolling and key press.

I'm ok with retiring the old pageload benchmarks once the new ones are in place and we've validated the data.
Flags: needinfo?(vchin)
the linux-xres hasn't caught any regressions, but responsiveness has and it appears that we do find some unique regressions (i.e. we only see a responsiveness regression and not a regular pageload regression).

Lets leave this around until we get the new latency replacement running and validated.
Type: enhancement → task
Priority: -- → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.