Closed Bug 894546 Opened 7 years ago Closed 7 years ago

Add telemetry for pdf.js

Categories

(Firefox :: PDF Viewer, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Firefox 26

People

(Reporter: benjamin, Assigned: yury)

References

Details

(Whiteboard: [pdfjs-c-integration] https://github.com/mozilla/pdf.js/pull/3532)

Attachments

(1 file, 1 obsolete file)

Currently there is no telemetry for pdf.js. I discussed this with bdahl today and I think we should have the following telemetry items:

* total number of times pdf.js renders a doc
* count the number of times the yellow "this document may be rendered incorrectly" bar appears, grouped by the internal code which caused the bar to appear

My understand from our conversation is that in general there is a marker in pdf.js for the PDF feature which caused the bar to appear which we can key off of. We need to verify that this marker is just a PDF feature tag and does not contain any personally identifying information.
gps, could you take this?
Assignee: nobody → gps
I filed bug 778800 a while ago on telemetry for the "document rendered incorrectly" banner.
I could take this. However, it would probably take me more time than someone familiar with PDF.js because I'd have to learn how PDF.js works!

The Telemetry pieces are simple: https://developer.mozilla.org/en-US/docs/Performance/Adding_a_new_Telemetry_probe

I have a feeling a PDF.js maintainer could whip this up in a fraction of the time it would take me.
Gregory,

Could you provide more information on how we can test the telemetry data after something is "whipped" in the PDF.js?

Also, could you provide real life example/code/commit that implements https://developer.mozilla.org/en-US/docs/Performance/Adding_a_new_Telemetry_probe#Adding_a_JavaScript_Probe ?

Thanks.
Bug 870104 is a recent patch that added Telemetry probes.

After you add the probe and rebuild, you can go to about:telemetry and look for your data. If the data is there, it will be uploaded.
Priority: -- → P3
Whiteboard: [pdfjs-c-integration]
bdahl, when we talked about this you said there was a common function through which the "yellow bar" notifications were sent. Can you link to that?
Flags: needinfo?(bdahl)
Attached patch Histograms for pdf.js (obsolete) — Splinter Review
Those histograms we would like to see. The logic will be added to to the github.com repository and later sync'd with m-c.

Gregory, can you review and we will land it prior m-c synchronization.
Attachment #784132 - Flags: review?(gps)
Comment on attachment 784132 [details] [diff] [review]
Histograms for pdf.js

Does this cover the reason for the fallback bar being shown? Seems like that would be really useful to have.
(In reply to Ted Mielczarek [:ted.mielczarek] (post-vacation backlog) from comment #9)
> Comment on attachment 784132 [details] [diff] [review]
> Histograms for pdf.js
> 
> Does this cover the reason for the fallback bar being shown? Seems like that
> would be really useful to have.

Not detailed one, no. At the moment PDF.js does not provide "numerical" reason for fallback and we have to redesign how we report an errors. We are planing to do that though. Once we figure out how many reasons we have, we will add a probe with enumerated type.

At the moment, we only provide probe if acroform/xfa is used in the document (which is one of the reasons for fallback)
Flags: needinfo?(bdahl)
Whiteboard: [pdfjs-c-integration] → [pdfjs-c-integration] https://github.com/mozilla/pdf.js/pull/3532
Comment on attachment 784132 [details] [diff] [review]
Histograms for pdf.js

Review of attachment 784132 [details] [diff] [review]:
-----------------------------------------------------------------

I'd feel better if someone closer to Telemetry reviewed Histograms.json changes.
Attachment #784132 - Flags: review?(gps) → review?(nfroyd)
Comment on attachment 784132 [details] [diff] [review]
Histograms for pdf.js

Review of attachment 784132 [details] [diff] [review]:
-----------------------------------------------------------------

Apologies for all the questions; I'm not that familiar with PDF files, except as a user.

I think this looks generally good, but some of the histograms look a little funny, so just f+'ing at this point until I understand what you're trying to measure.

::: toolkit/components/telemetry/Histograms.json
@@ +3093,5 @@
>      "description": "The time (in milliseconds) after showing a PopupNotification that the mainAction was first triggered"
>    },
> +  "PDF_VIEWER_USED": {
> +    "kind": "boolean",
> +    "description": "How many times PDF Viewer was used"

Either this should be "Whether the PDF Viewer was used" (in which case it should be a flag histogram), or the description needs to be changed.

@@ +3097,5 @@
> +    "description": "How many times PDF Viewer was used"
> +  },
> +  "PDF_VIEWER_FALLBACK_SHOWN": {
> +    "kind": "boolean",
> +    "description": "How many times PDF Viewer fallback bar was shown"

What are you trying to do here?

@@ +3102,5 @@
> +  },
> +  "PDF_VIEWER_DOCUMENT_VERSION": {
> +    "kind": "enumerated",
> +    "n_values": 10,
> +    "description": "The PDF document version (1.1, 1.2, etc.)"

I assume that 1.1 corresponds to 1, 1.2 to 2, etc.?  How many versions of PDF are there?  You may want to have some room to grow here (so 20 values or so).

@@ +3107,5 @@
> +  },
> +  "PDF_VIEWER_DOCUMENT_GENERATOR": {
> +    "kind": "enumerated",
> +    "n_values": 25,
> +    "description": "The PDF document generator"

Are there really only 25 here?  Do they have specific IDs assigned?

@@ +3118,5 @@
> +    "description": "The PDF document size (KB)"
> +  },
> +  "PDF_VIEWER_FORM": {
> +    "kind": "boolean",
> +    "description": "A PDF form expected: true for AcroForm and false for XFA"

Is this meant to be set on every document?

@@ +3123,5 @@
> +  },
> +  "PDF_VIEWER_STREAM_TYPES": {
> +    "kind": "enumerated",
> +    "n_values": 9,
> +    "description": "The PDF document compression stream types used"

I assume the size chosen here comes from the PDF spec, is that correct?
Attachment #784132 - Flags: review?(nfroyd) → feedback+
> 
> ::: toolkit/components/telemetry/Histograms.json
> @@ +3093,5 @@
> >      "description": "The time (in milliseconds) after showing a PopupNotification that the mainAction was first triggered"
> >    },
> > +  "PDF_VIEWER_USED": {
> > +    "kind": "boolean",
> > +    "description": "How many times PDF Viewer was used"
> 
> Either this should be "Whether the PDF Viewer was used" (in which case it
> should be a flag histogram), or the description needs to be changed.

During session user can view pdf files more than one time, we want to know how many times they view the pdf documents.

> 
> @@ +3097,5 @@
> > +    "description": "How many times PDF Viewer was used"
> > +  },
> > +  "PDF_VIEWER_FALLBACK_SHOWN": {
> > +    "kind": "boolean",
> > +    "description": "How many times PDF Viewer fallback bar was shown"
> 
> What are you trying to do here?

We want to track how many times the the fallback bar was shown to the user. It can show once per PDF document. See also PDF_VIEWER_USED. 


> 
> @@ +3102,5 @@
> > +  },
> > +  "PDF_VIEWER_DOCUMENT_VERSION": {
> > +    "kind": "enumerated",
> > +    "n_values": 10,
> > +    "description": "The PDF document version (1.1, 1.2, etc.)"
> 
> I assume that 1.1 corresponds to 1, 1.2 to 2, etc.?  How many versions of
> PDF are there?  You may want to have some room to grow here (so 20 values or
> so).

That's correct (1->1.1, 2->2.2, ... 7->1.7) current version is 1.7 (see https://en.wikipedia.org/wiki/Portable_Document_Format). Versions above 1.7 are not interesting to us at this point.

> 
> @@ +3107,5 @@
> > +  },
> > +  "PDF_VIEWER_DOCUMENT_GENERATOR": {
> > +    "kind": "enumerated",
> > +    "n_values": 25,
> > +    "description": "The PDF document generator"
> 
> Are there really only 25 here?  Do they have specific IDs assigned?

There are more, but we choose to track only "top" 25. Other will be tracked as 0. The list will be published shortly.

> 
> @@ +3118,5 @@
> > +    "description": "The PDF document size (KB)"
> > +  },
> > +  "PDF_VIEWER_FORM": {
> > +    "kind": "boolean",
> > +    "description": "A PDF form expected: true for AcroForm and false for XFA"
> 
> Is this meant to be set on every document?

Not every document is/has a form. That will tracked only for documents that have one.

> 
> @@ +3123,5 @@
> > +  },
> > +  "PDF_VIEWER_STREAM_TYPES": {
> > +    "kind": "enumerated",
> > +    "n_values": 9,
> > +    "description": "The PDF document compression stream types used"
> 
> I assume the size chosen here comes from the PDF spec, is that correct?

Yes. We want to track compression methods used in the documents to optimize most common one. One document might use more than one compression. The list with enumeration ids will be published shortly.
Nathan, shall ask for review again or it's good to go?
Flags: needinfo?(nfroyd)
Thank you for the replies.  I think the only remaining bit is clarification on PDF_VIEWER_USED and bumping the maximum # of versions.

(In reply to Yury Delendik (:yury) from comment #13)
> > 
> > ::: toolkit/components/telemetry/Histograms.json
> > @@ +3093,5 @@
> > >      "description": "The time (in milliseconds) after showing a PopupNotification that the mainAction was first triggered"
> > >    },
> > > +  "PDF_VIEWER_USED": {
> > > +    "kind": "boolean",
> > > +    "description": "How many times PDF Viewer was used"
> > 
> > Either this should be "Whether the PDF Viewer was used" (in which case it
> > should be a flag histogram), or the description needs to be changed.
> 
> During session user can view pdf files more than one time, we want to know
> how many times they view the pdf documents.

OK, so what's the "false" value of that histogram useful for?

> > @@ +3102,5 @@
> > > +  },
> > > +  "PDF_VIEWER_DOCUMENT_VERSION": {
> > > +    "kind": "enumerated",
> > > +    "n_values": 10,
> > > +    "description": "The PDF document version (1.1, 1.2, etc.)"
> > 
> > I assume that 1.1 corresponds to 1, 1.2 to 2, etc.?  How many versions of
> > PDF are there?  You may want to have some room to grow here (so 20 values or
> > so).
> 
> That's correct (1->1.1, 2->2.2, ... 7->1.7) current version is 1.7 (see
> https://en.wikipedia.org/wiki/Portable_Document_Format). Versions above 1.7
> are not interesting to us at this point.

They are not interesting to you at *this* point, but they may well be interesting later.  Please bump the top end of this up to 20.

> > @@ +3118,5 @@
> > > +    "description": "The PDF document size (KB)"
> > > +  },
> > > +  "PDF_VIEWER_FORM": {
> > > +    "kind": "boolean",
> > > +    "description": "A PDF form expected: true for AcroForm and false for XFA"
> > 
> > Is this meant to be set on every document?
> 
> Not every document is/has a form. That will tracked only for documents that
> have one.

So this is only valid for documents with forms.  Fine.
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd (:froydnj) from comment #16)
> Thank you for the replies.  I think the only remaining bit is clarification
> on PDF_VIEWER_USED and bumping the maximum # of versions.
> 
> (In reply to Yury Delendik (:yury) from comment #13)
> > > 
> > > ::: toolkit/components/telemetry/Histograms.json
> > > @@ +3093,5 @@
> > > >      "description": "The time (in milliseconds) after showing a PopupNotification that the mainAction was first triggered"
> > > >    },
> > > > +  "PDF_VIEWER_USED": {
> > > > +    "kind": "boolean",
> > > > +    "description": "How many times PDF Viewer was used"
> > > 
> > > Either this should be "Whether the PDF Viewer was used" (in which case it
> > > should be a flag histogram), or the description needs to be changed.
> > 
> > During session user can view pdf files more than one time, we want to know
> > how many times they view the pdf documents.
> 
> OK, so what's the "false" value of that histogram useful for?

'false' will not be used. I could not find any other histogram type that will address this use case -- track total number of viewed documents. Closest I could find in Historgrams.json is "BAD_FALLBACK_FONT"; https://developer.mozilla.org/en-US/docs/Performance/Adding_a_new_Telemetry_probe does not provide any answers or examples for this.
Fly by comment without reading the entire bug:

Could we also add a counter how many times a document was printed on windows/linux from PDF.JS? That gives an idea how many people run into bug 811002.
(In reply to Julian Viereck [Away till end August] from comment #18)
> Fly by comment without reading the entire bug:
> 
> Could we also add a counter how many times a document was printed on
> windows/linux from PDF.JS? That gives an idea how many people run into bug
> 811002.

Yep, I will add that too. Good idea.

Nathan, how shall we track it: (the same way as PDF_VIEWER_USED) as boolean histogram or enumeration with 1 element/bucket?
Flags: needinfo?(nfroyd)
(In reply to Yury Delendik (:yury) from comment #19)
> (In reply to Julian Viereck [Away till end August] from comment #18)
> > Fly by comment without reading the entire bug:
> > 
> > Could we also add a counter how many times a document was printed on
> > windows/linux from PDF.JS? That gives an idea how many people run into bug
> > 811002.
> 
> Yep, I will add that too. Good idea.
> 
> Nathan, how shall we track it: (the same way as PDF_VIEWER_USED) as boolean
> histogram or enumeration with 1 element/bucket?

A boolean histogram is fine.
Flags: needinfo?(nfroyd)
Added PDF_VIEWER_PRINT and increased enumeration sizes for PDF_VIEWER_DOCUMENT_VERSION and PDF_VIEWER_DOCUMENT_GENERATOR
Attachment #784132 - Attachment is obsolete: true
Attachment #789877 - Flags: review?(nfroyd)
Comment on attachment 789877 [details] [diff] [review]
Histograms for pdf.js

Review of attachment 789877 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for letting this sit here!  I thought we were done here.
Attachment #789877 - Flags: review?(nfroyd) → review+
Keywords: checkin-needed
Assignee: gps → ydelendik
OS: Windows 7 → All
Hardware: x86_64 → All
Whiteboard: [pdfjs-c-integration] https://github.com/mozilla/pdf.js/pull/3532 → [pdfjs-c-integration] [leave-open] https://github.com/mozilla/pdf.js/pull/3532
https://hg.mozilla.org/integration/fx-team/rev/3e3000551e51
Keywords: checkin-needed
Whiteboard: [pdfjs-c-integration] [leave-open] https://github.com/mozilla/pdf.js/pull/3532 → [pdfjs-c-integration] [leave-open] https://github.com/mozilla/pdf.js/pull/3532[fixed-in-fx-team]
https://hg.mozilla.org/mozilla-central/rev/3e3000551e51
Whiteboard: [pdfjs-c-integration] [leave-open] https://github.com/mozilla/pdf.js/pull/3532[fixed-in-fx-team] → [pdfjs-c-integration] [leave-open] https://github.com/mozilla/pdf.js/pull/3532
Fixed by bug 912006
Status: NEW → RESOLVED
Closed: 7 years ago
Depends on: 912006
Resolution: --- → FIXED
Whiteboard: [pdfjs-c-integration] [leave-open] https://github.com/mozilla/pdf.js/pull/3532 → [pdfjs-c-integration] https://github.com/mozilla/pdf.js/pull/3532
Target Milestone: --- → Firefox 26
Blocks: 1026256
You need to log in before you can comment on or make changes to this bug.