Closed Bug 1162507 Opened 5 years ago Closed 4 years ago

Carry out a security review of the most recent Pocketsphinx/Sphinxbase sources

Categories

(Core :: Security, defect)

defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: kdavis, Assigned: tedd)

References

Details

(Whiteboard: [webspeechapi])

Before introducing the newest Pocketsphix/Sphinxbase sources into m-c a security review of the code should be preformed. 

The diff between the old and new sources can be seen here https://github.com/kdavis-mozilla/gecko-dev-speech/commit/0e288f3889b673135383daac949120cdd9abde02
Blocks: 1051146
Olli has there been any progress on the security review?
No. Sorry, this wasn't in any needinfo/feedback/review queues.
Flags: needinfo?(bugs)
Whiteboard: [webspeechapi]
Depends on: 1169653
No longer depends on: 1169653
Blocks: Meta-Vaani
No longer blocks: 1051146
Summary: Carry out a security review of the most recent Pocketsphix/Sphinxbase sources → Carry out a security review of the most recent Pocketsphinx/Sphinxbase sources
Blocks: 1180668
No longer blocks: 1180668
Assignee: nobody → fbraun
I know :freddyb took this bug, but I also had a look at it. I hope you don't mind :freddyb

I first looked at how speech recognition is integrated in Gecko, and created a Wiki article [1], listing what components are involved.

I also manually audited the code, but it is a large code base and it would take quite some time to fully manually audit it.

After looking at the code for a while, I tried to set up a fuzzing environment with AFL (American Fuzzy Lop), at first I tried to make it use Firefox, but it turned out to be a little problematic.

So instead, I stripped down the code involved for speech recognition and wrote a little wrapper, which is now used by AFL[2]. The fuzzer has been running for well over 20 hours straight now, with almost 200k executions, and nothing was found so far.

Results:
- No bugs were found through manual auditing
- No bugs were found through fuzzing so far
- Speech recognition API is limited and doesn't involve a lot of components of Gecko and mainly interacts with the library.

I mainly looked at the C/C++ part of the code and not too much at the exposed API, but it seems very limited and the only major part that allows to influence the library behavior is through the grammar file and the audio stream.

[1] https://wiki.mozilla.org/User:Tedd/Speech_Recognition
[2] https://github.com/jhector/sphinxfuzz
Thanks for doing this, Julian!
Assignee: fbraun → julian.r.hector
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Flags: needinfo?(bugs)
You need to log in before you can comment on or make changes to this bug.