Correctly implement SpeechRecognitionAlternative::confidence

RESOLVED FIXED in Firefox 43

Status

()

RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: kdavis, Assigned: kdavis)

Tracking

unspecified
FxOS-S5 (21Aug)
Points:
---

Firefox Tracking Flags

(firefox43 fixed)

Details

(Whiteboard: [webspeechapi][vaani][systemsfe])

Attachments

(1 attachment)

(Assignee)

Description

3 years ago
The current implementation of SpeechRecognitionAlternative::confidence
always indicates a value of 100 for all final results. This does not
reflect the value returned from pocketsphinx.

The value of SpeechRecognitionAlternative::confidence should reflect
the value returned from the pocketsphinx method ps_get_hyp().
(Assignee)

Updated

3 years ago
Assignee: nobody → kdavis
Whiteboard: [webspeechapi][vaani][systemsfe]
(Assignee)

Comment 1

3 years ago
(In reply to kdavis from comment #0)
> The value of SpeechRecognitionAlternative::confidence should reflect
> the value returned from the pocketsphinx method ps_get_hyp().

It appears that the score returned here is not supported in pocketsphinx

http://cmusphinx.sourceforge.net/wiki/faq#qcan_pocketsphinx_reject_out-of-grammar_words_and_noises

Hence, it looks as if the best path is to use ps_get_prob()
(Assignee)

Comment 2

3 years ago
Created attachment 8646359 [details] [diff] [review]
Part 1 of 1: Correctly implement SpeechRecognitionAlternative::confidence using ps_get_prob()

Part 1 of 1: Correctly implement SpeechRecognitionAlternative::confidence using ps_get_prob()

Part 1 of 1 for this bug.

Before this patch SpeechRecognitionAlternative::confidence was always set
to 100 for all final SpeechEvent's. This was incorrect for at least two
reasons:

1. According to the spec[1] confidence should lie in [0,1]
2. The value of confidence was not derived from pocketsphinx's decoding

This patch fixes this by obtaining from ps_get_prob() the log posterior
probability of the recognition. It then converts this to a probability
and uses this probability as the confidence.

A  slight technical detail is that according to its documentation
ps_get_prob() requires two conditions to be true in order to work:

1. The -bestpath option must be enabled
2. The result it's being used for must not be partial

To satisfy these two requirements this commit

1. Enables the -bestpath option
2. Only calls ps_get_prob() on final results

An additional bonus should follow from this commit.

Up until now we did not require our results to be final. Hence, we sometimes
incorrectly interpreted partial results as final results. After this commit,
we only use final results as final results. This will increase the accuracy of
our recognition as we will ignore results we are uncertain about.

The try for this patch is running here https://treeherder.mozilla.org/#/jobs?repo=try&revision=b7e6d89f064a


[1] https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html#dfn-confidence
Attachment #8646359 - Flags: review?(bugs)

Comment 3

3 years ago
Just a note: ps_get_prob() on pocketsphinx works only for Language Models, that isn't what we are using on Mozilla API so far. 

For grammars, this part is not properly implemented yet and pocketsphinx codebase should be fixed to return this correctly.
(Assignee)

Comment 4

3 years ago
(In reply to Andre Natal from comment #3)
> Just a note: ps_get_prob() on pocketsphinx works only for Language Models,
> that isn't what we are using on Mozilla API so far. 
> 
> For grammars, this part is not properly implemented yet and pocketsphinx
> codebase should be fixed to return this correctly.

I agree.

That being the case, ps_get_prob() is the correct method to obtain the
logprob from.

Thus, in future when we use a language model or when grammar confidence
is supported, the correct logprob will be returned from ps_get_prob() and
our code will work without change.

As to the current implementation of ps_get_prob() for grammar based recognition,
it returns 0 which leads to a confidence of 1. So, the confidence is within the
correct range [0,1], in contrast and to the previously hard-coded 100.

So, this code is as "correct as possible" with the current pocketsphinx implemen-
tation and is as "future proof as is possible" with the current pocketsphinx im-
plementation.
Target Milestone: --- → FxOS-S5 (21Aug)

Comment 5

3 years ago
I wonder what is float64. Is there really such type in C/C++?

Comment 6

3 years ago
Comment on attachment 8646359 [details] [diff] [review]
Part 1 of 1: Correctly implement SpeechRecognitionAlternative::confidence using ps_get_prob()

Actually, any reason why Andre shouldn't review this?
(This is after all more about pocketsphinx backend than the web phasing API.)
Attachment #8646359 - Flags: review?(bugs) → review?(anatal)

Updated

3 years ago
Attachment #8646359 - Flags: review?(anatal) → review+

Comment 7

3 years ago
This is perfect, we just need to check that comment from Olli about float64.
(Assignee)

Comment 8

3 years ago
(In reply to Andre Natal from comment #7)
> This is perfect, we just need to check that comment from Olli about float64.

The type actually comes from sphinxbase.

The file media/sphinxbase/sphinxbase/prim_type.h defines float64 to be a double.

Comment 9

3 years ago
yeah, I think float64 usage here is fine. It just isn't any real C++ type, which is why I wondered what it was.
(Assignee)

Updated

3 years ago
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/9734ce792065
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox43: --- → fixed
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.