Closed Bug 1032117 Opened 10 years ago Closed 7 years ago

[Stingray] TV Web API for Speech Recognition

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: airpingu, Unassigned)

References

Details

(Whiteboard: [ft:conndevices][webspeechapi])

User Story

From the user's point of view, the following flow needs to be considered to achieve the ASR functionality on TV:

  1. The user presses a special key/button on the TV remote controller or the connected smart phone.

  2. The System App gets notified about the key event and then calls the Web API to start ASR. Later, the recognized result will be returned to the System App.

  3. If the recognized result is not confident enough (under a threshold), the System App has to pop up a dialogue with the result to ask the user to verify it. Redo the ASR until the recognized result is confident/verified.

  4. The System App can then pass the recognized result to the corresponding app to handle it. For example, if the user wants to search a word on the internet, the System App has to deliver the recognized word to the Browser App. If the user wants to control the menu on the Homescreen App, the recognized command has to be passed to the Homescreen App.

-----
From the point of view of Web API, we need to cover the following user cases:

  1. This API is able to let the user activate/deactivate the speech recognition.

  2. This API is able to select/switch the target device to receive the audio stream. For example, to control the TV by ASR, the user may choose the microphone either on the TV remote controller, the TV device or the connected smart phone.

  3. This API is able to return the recognized result to the app, which can be a user-defined command or arbitrary words/sentences.
Mozilla already defined a general Speech Recognition API [1], where the underlying speech recognition engine has to be implemented or ported by vendors. Also, the Media Capture and Streams API is designed for enumerating the media devices [2] as well as retrieving the MediaStream via a specific media device [3], which can let the user be able to switch the media device to handle the audio input.

Regarding how the System App passes the recognized result to the corresponding app (see the User Story), a proposed solution is to listening for a customized mozContentEvent in shell.js [4] and then the shell.js can fire a System Message [5] to launch the target app to handle it.

[1] http://mxr.mozilla.org/mozilla-central/source/dom/webidl/SpeechRecognition.webidl
[2] http://dev.w3.org/2011/webrtc/editor/getusermedia.html#enumerating-devices
[3] http://dev.w3.org/2011/webrtc/editor/getusermedia.html#local-content
[4] http://mxr.mozilla.org/mozilla-central/source/b2g/chrome/content/shell.js
[5] http://mxr.mozilla.org/mozilla-central/source/dom/messages/interfaces/nsISystemMessagesInternal.idl
(In reply to Gene Lian [:gene] (business trip Jun. 16 ~ Jun. 20) from comment #0)
> Also, the Media Capture and Streams API is designed for enumerating
> the media devices [2] as well as retrieving the MediaStream via a specific
> media device [3], which can let the user be able to switch the media device
> to handle the audio input.

This part has been covered at bug 1028777.
Depends on: 1028777
Depends on: 1038061
Whiteboard: [FT:Stream3] → [ft:conndevices]
Whiteboard: [ft:conndevices] → [ft:conndevices][webspeechapi]
Blocks: TV_FxOS2.5
No longer blocks: TV_FxOS2.5
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.