Closed Bug 841949 Opened 9 years ago Closed 7 years ago
Improve metadata parsing in the Music app
I've noticed a couple of issues with the id3 parsing in the music app: * Unsynchronized frames aren't supported (these seem to account for about 1/3 of my music) * Multi-value frames aren't parsed correctly (we should read all N bytes of the data instead of stopping at the first null) Since these are both bugs in the same part of the parser, I'm filing them together. I have a fix in progress that should address this. The second part is also important if we ever want to support user-defined tags (like Album Artist or ReplayGain), since they use a null separator between the user-defined tag name and the value.
If anyone would like to follow along with this, my branch is here: <https://github.com/mozsquib/gaia/compare/music-id3-improvements>. Currently, it's not quite spec-compliant, especially for id3 2.2 and 2.3, but it's mostly there. I'll probably be putting this on the sidelines for a while (unless it becomes a v1.1 feature), since this works well enough that it doesn't barf on my music, and I have other priorities at the moment.
blocking-b2g: --- → leo?
Not blocking, but would uplift so tracking+. Thanks for looking at this!
Tested in you branch,I found some music file in my sdcard also parse error，such as Artist\Title. Maybe NEED convert characte before show.
I aslo tesed with attachment on the web http://web.ist.utl.pt/antonio.afonso/www.aadsm.net/libraries/id3/#demo. it does not parse correctly too. Need such as libiconv support?
I don't think that file has valid mp3 tags. The possible encodings for text are ISO-8859-1, UTF-16, UTF-16BE, and UTF-8. While the site you linked suggests that some countries use other encodings, this is in opposition so the spec. I'm only concerned with improving the standard-compliance of our id3 parser right now; if someone else wants to add support for non-standard encodings, I can help point them in the right direction, but I don't plan on fixing that here.
(In reply to Jim Porter (:squib) from comment #5) > I don't think that file has valid mp3 tags. The possible encodings for text > are ISO-8859-1, UTF-16, UTF-16BE, and UTF-8. While the site you linked > suggests that some countries use other encodings, this is in opposition so > the spec. As a developer,I agree with you. End-users maybe not think so.Half of music file we often used do not show correctly, but could be parse perfectly in Andorid media player. Because the encodings for text of tags maybe is GBK.If I want fix this, what your suggestion?
(In reply to cheng.luo from comment #6) > As a developer,I agree with you. End-users maybe not think so.Half of music > file we often used do not show correctly, but could be parse perfectly in > Andorid media player. Because the encodings for text of tags maybe is GBK.If > I want fix this, what your suggestion? Step one would be to create something capable of heuristically determining encodings (or find a license-compatible JS library that already exists), since the id3 tag in the sample file reports that its frames are latin-1, when this is obviously not the case. Once you do that, it should be a simple matter of updating the text decoding in the music app's id3 parser to use the heuristic tool you wrote (or found).
Let's move any discussion of fixing id3 encoding issues over to bug 850520. Then we can keep this bug about fixing issues with the id3 spec itself.
See Also: → 850520
I updated the PR with a fix for unsynchronization and added some extra tests to cover it.
Jim, This looks like an important, but not urgent patch. I didn't get to it today, and will be OOO on Monday and Tuesday. Sorry!
Well, it turns out that you not reviewing it yet was probably a good thing, since I still have one small issue with unsynchronization to work out. I really wish the id3 spec were better at documenting unsynchronization.
Ok, I fixed the last of the spec issues, I think. I also did a pass through the parser code and renamed all instances of "tag" that was really referring to a single metadata field. A "tag" in ID3 is the entire metadata block, not just one field, so it can be confusing otherwise. I tried to use the names the specs use ("frame" for ID3, "field" for Vorbis comments, and "atom" for MP4). This commit is purely optional though, and I don't really mind removing it.
Comment on attachment 789300 [details] [review] https://github.com/mozilla-b2g/gaia/pull/11500 Jim, Overall, I'm very excited about this patch. The r- is for the deunsync() function where you do strange things with BlobView internals. I think this may break the readPic() code and mess up album art parsing for unsynchronized stuff. I wonder if instead you can deunsync() a blobview in place, so you don't have to copy to a new array buffer and then try to get that array buffer to masquerade as a blobview. I've left a few comments on github. I've made them on the individual commits, however, so they might be harder to find.
I responded on gitub again.
I think my branch is mostly-correct now. The only thing missing is that I don't delete cached album art when the corresponding song is deleted. Granted, we didn't do that pre-patch either, so I don't think that's a strict necessity. I ran some quick timing tests and found that everything's about as fast (slow?) as it was before, provided you have embedded album art. If not, we take about 10% longer to parse. I'm sure this will vary somewhat based on how many songs you have in each album, though. David, I know you mentioned some concerns with storing the album art on the SD card. Here's my reasoning for it: Before the patch, we cached all the art in indexedDB (via asyncStorage). I didn't see anywhere that we removed the album art, and some tests confirm that the art stays in the indexedDB across restarts, as one would expect. With my patch, we do the same thing, except that we store the cached art on the SD card (indexedDB is on internal memory, right?). This means that when a user swaps SD cards, the album art cache is swapped too, so we're not filling up internal memory with currently-irrelevant data. I think this is a big improvement to the status quo, and it would also let us send album art data to other apps as a file path instead of a Blob (I'm not 100% sure we want to do it that way, but the possibility is there). Does this sound ok to you?
Jim, It does seem like an album art cache on the sdcard is the right way to go long term. I'm nervous about it, but agree that it is probably the right thing to do. Gallery uses the sdcard to store previews for large images that do not have EXIF previews (or do not have large enough previews) and we can do that here, too. Ideally, the album art cache would be smart enough to re-create the cached art dynamically if the files got deleted. It would be nice if we could delete art when it was no longer needed, but I'm not sure how to do that. You'll also want to make sure (you probably already have) that the album art doesn't start showing up in the gallery app. And you'll want to be sure (you may already have) that things are configurable at build time so that we get different album art sizes for different hardware screen resolutions. As for the blob-vs-path issue, I think blobs are always the way to go, because they work even for apps that don't have device storage permissions. Blobs are basically just opaque paths. Note that even when we store images in indexedDB, they end up as files in the sqlite directory structure somewhere, so when we retreive them they are nice efficient file-backed blobs and don't consume much memory. Let me know when (if) you want me to review this again.
(In reply to David Flanagan [:djf] from comment #17) > You'll also want to make sure (you probably already have) that the album art > doesn't start showing up in the gallery app. This works, with the caveat that external album art (e.g. folder.jpg files in the music file's folder) show up in the gallery. I'm not sure there's a way to avoid that. > And you'll want to be sure (you may already have) that things are configurable > at build time so that we get different album art sizes for different hardware > screen resolutions. This is a good idea. I'll look into how to hook this up to the build system.
Attachment mime type: text/plain → text/x-github-pull-request
Comment on attachment 789300 [details] [review] https://github.com/mozilla-b2g/gaia/pull/11500 Asking for review again. I haven't hooked up the thumbnail size to the build system, but the thumbnail dimensions are globals in metadata.js, and it should be easy to hook it up once I learn how.
David - please provide an ETA for this review, so we can set the target milestone appropriately (reviews for blocker bugs are higher in priority)
Target Milestone: 1.3 Sprint 3 - 10/25 → 1.3 Sprint 4 - 11/8
Comment on attachment 789300 [details] [review] https://github.com/mozilla-b2g/gaia/pull/11500 Jim, Once again, my apologies for taking so long to review this. The album art storage changes seemed scary and I was avoiding it, I think. But now that I've wrapped my head around it, I think this is the right way to go. I do think you should anticipate the case (Flatfish) where you want large album art and do not need to create a smaller thumbnail. In that case there is no need to duplicate the art in an external file and you can just extract it as needed from the song files. (If the album art filename is an audio file rather than an image file, use the offset and length in the metadata.picture object to slice() a blob from it, perhaps) Overall, the patch seems great, and the improved ID3 parsing looks really helpful. Two issues need to be addressed: 1) The database upgrade needs to be handled. Read the updateRecord() stuff in mediadb.js and see if you can write an appropriate upgrade function. This is what the r- is for. 2) When you store album art, I think you need to do something special on devices with both internal and external storage. Dave Hylands can provide details if you need. If you haven't already done so, please also file a followup bug to add some kind of periodic mark-and-sweep cover art cleanup function.
Comment on attachment 789300 [details] [review] https://github.com/mozilla-b2g/gaia/pull/11500 Dave: Could you take a look at this? Specifically in regards to comment 21: if I have a Blob (the audio file) from a particular storage area and want to add a new file (the thumbnail for the album art) to the *same* storage area, do I need to do anything special? Right now I'm just calling navigator.getDeviceStorage('pictures') and saving the file there.
(In reply to David Flanagan [:djf] from comment #21) > 1) The database upgrade needs to be handled. Read the updateRecord() stuff > in mediadb.js and see if you can write an appropriate upgrade function. This > is what the r- is for. It looks like I'll need to change mediadb to handle this case; I need to reparse the files, but updateRecord is synchronous (and doesn't let me get the actual file). Any ideas on what the API for this should look like?
Hi Jim, Yes, it is synchronous API. I had tried to run it as async before. It is limited by the lifetime of a transaction object. In onupgradeneeded event, we only have one transaction object which is versionchange transaction and that's the only one we have. We cannot create any other transactions. Once this transaction is committed, we cannot do any upgrading things. According to MDN: Transactions have a well-defined lifetime, so attempting to use a transaction after it has completed throws exceptions. Also, transactions auto-commit and cannot be committed manually. ... Transactions are expected to be short-lived, so the browser can terminate a transaction that takes too long, in order to free up storage resources that the long-running transaction has locked. You can abort the transaction, which rolls back the changes made to the database in the transaction. And you don't even have to wait for the transaction to start or be active to abort it. That's why I write it as sync API. There is another confusing thing that the enumerateOldFiles uses aysnc API to enumerate all files in the db. But we cannot use any async callback to call cursor.continue() in this case, like setTimeout. All of them are failed with inactive transaction error.  https://developer.mozilla.org/en-US/docs/IndexedDB/Basic_Concepts_Behind_IndexedDB
(In reply to Jim Porter (:squib) from comment #22) > Comment on attachment 789300 [details] [review] > https://github.com/mozilla-b2g/gaia/pull/11500 > > Dave: Could you take a look at this? Specifically in regards to comment 21: > if I have a Blob (the audio file) from a particular storage area and want to > add a new file (the thumbnail for the album art) to the *same* storage area, > do I need to do anything special? Right now I'm just calling > navigator.getDeviceStorage('pictures') and saving the file there. So navigator.getDeviceStorage(blah) will return the default storage area, and not necessarily the same one. The generic answer is that if you have the storage area associated with the music, then you can use storage.storageName to determine which storage area the file came from. You can then call navigator.getDeviceStorages('music') walk through them and find the one with the same storage name. If you know the fully qualified path the to picture (i.e. path starts with /) then you can use any storage area (so the navigator.getDeviceStorage('pictures') would be fine). Internally device storage will notice that you're providing a fully qualified name and redirect to the correct storage area.
(In reply to Dave Hylands [:dhylands] from comment #25) > If you know the fully qualified path the to picture (i.e. path starts with > /) then you can use any storage area (so the > navigator.getDeviceStorage('pictures') would be fine). Internally device > storage will notice that you're providing a fully qualified name and > redirect to the correct storage area. This is encouraging news. Can I guarantee that 1) on real devices, the filename for the audio file is an absolute path, and 2) that the storage area is always the first part of the path (e.g. /sdcard/)? If so, then I think I'm done, and I'll just need to put in a comment explaining why this works.
(In reply to Jim Porter (:squib) from comment #26) > (In reply to Dave Hylands [:dhylands] from comment #25) > > If you know the fully qualified path the to picture (i.e. path starts with > > /) then you can use any storage area (so the > > navigator.getDeviceStorage('pictures') would be fine). Internally device > > storage will notice that you're providing a fully qualified name and > > redirect to the correct storage area. > > This is encouraging news. Can I guarantee that 1) on real devices, the > filename for the audio file is an absolute path, and 2) that the storage > area is always the first part of the path (e.g. /sdcard/)? If so, then I > think I'm done, and I'll just need to put in a comment explaining why this > works. 1 - I guess it depends on where the path came from. All of the device storage functions return fully qualified paths. So if the path in question originated from enumerte, then you will have received a fully qualified path (on device). The only place you get non-fully qualified paths is when running on desktop, and there is only one storage area in that case. 2 - The name after the first / in a fully qualified path is the name of the storage area (and gets mapped to a real directory internally in device storage)
Excellent. I'll just update the comments then to be clear about what's happening.
Comment on attachment 789300 [details] [review] https://github.com/mozilla-b2g/gaia/pull/11500 I made a comment on the pull request that I think simplifies the logic a bit.
(In reply to Dave Hylands [:dhylands] from comment #29) > Comment on attachment 789300 [details] [review] > https://github.com/mozilla-b2g/gaia/pull/11500 > > I made a comment on the pull request that I think simplifies the logic a bit. Commented on Github.
Hi Steven, If you care about China market, I suggest you need to fix this issue.
blocking-b2g: - → ---
(In reply to James Zhang from comment #31) > Hi Steven, > If you care about China market, I suggest you need to fix this issue. This issue has nothing to do with the Chinese market; it affects everyone equally. I'm guessing the bug you want is bug 850520, which is specifically about working around the ID3 spec violations regarding character encodings that are common in China and other Asian countries.
(In reply to Jim Porter (:squib) from comment #32) > This issue has nothing to do with the Chinese market; it affects everyone > equally. > > I'm guessing the bug you want is bug 850520, which is specifically about > working around the ID3 spec violations regarding character encodings that > are common in China and other Asian countries. Agreed, it should be bug 850520 if we are targeting the markets that use non-unicode id3 tags.
(In reply to James Zhang from comment #31) > Hi Steven, > If you care about China market, I suggest you need to fix this issue. Let's track bug 850520.
Finally getting back to this needinfo. Not sure I have that much to add to what John has said. I've got various half-baked ideas about how to handle the db upgrade and what kind of changes might be necessary to mediadb. - First, IIRC, the album art is stored in asyncStorage, so copying that to files on the sdcard is a separate issue, I think, right? - During the synchronus updateRecord() calls you could add a new rescan_needed: true property to each record. And store the ratings and # of plays information. Then modify the fullScan() function inside of mediadb so that it would treat rescan_needed as if the file had been deleted and recreated. And then in your metadata parser you'd have to somehow restore the ratings, etc. Seems pretty messy, actually. - Once you figure out the upgrade path you want to take, consider storing user data (like ratings) in a separate database from the raw media data that can remain in MediaDB. MediaDB was originally designed with the assumption that it could be completely recreated just by rescanning the sdcard. It now seems like a mistake to allow data like ratings to be mixed with the media metadata. - Another, even more hardcore approach would be to abandon mediadb entirely and convert to a flat file stored on the sdcard, read and indexed at app start time. That is probably further than you want to go right now, however. - Back to the updateRecord() function. Maybe you can modify mediadb so that if updateRecord returns some special value, the record is deleted. Then the scan that follows will find the file again and rescan it. That might be simpler than a rescan_needed flag. - Ah ha! So you start off by defining a new db for user data like ratings, with the song filename as the key. If that db does not already exist, you know this is the upgrade, so you delete all the album art in the asyncStorage album art cache. Then when the mediadb upgrade kicks in your updateRecord() call copies the user data to the new user data database and marks the mediadb record for deletion or rescanning. (Or maybe updateRecord can actually delete the record itself somehow?). Anyway, at that point you end up with an empty mediadb and a populated user data db and album art deleted. Then rescanning repopulates the mediadb and creates the album art cache on the sdcard. And obviously you modify the rest of the app to get user data out of the new db instead of out of the mediadb. Does that sound like it could work?
Ok, I've managed to get the upgrade process to work. I have no idea how I'm going to write a test for this, though. Ideas welcome! (Note that you can't just do something like `make install-gaia APP=music` because we also need the permissions database to be updated so we have access to deviceStorage for pictures.) Other than that, the only thing remaining is to make the UI a bit nicer when upgrading.
I'm going to expand this to include some improvements for Ogg files as well.
Summary: Improve id3 parsing in the Music app → Improve metadata parsing in the Music app
Yes Ogg file's metadata are hardly parsed: only the Artist and the album, but not completely, see bug #958493. The track's names are not parsed and show the filename instead. Very anoying (80% of my collection being ogg's).
(In reply to Jean Cayron from comment #38) > Yes Ogg file's metadata are hardly parsed: only the Artist and the album, > but not completely, see bug #958493. The track's names are not parsed and > show the filename instead. Very anoying (80% of my collection being ogg's). Unless you have some bizarro Ogg files, this was bug 877474, which was fixed quite a while ago. I'm not sure what version it landed in, but I would guess it was in 1.2. You might have to reset your profile (or at least, clear out the music app's database) if you're upgrading from an earlier version. Once this bug is fixed, everything should update itself (part of this patch is to reload all the metadata, since none of it can actually be trusted).
Jim, you are right. I upgraded from 1.1 to 1.3. I reset my music collection today (removed the file from the SD and then copy them again) and everything is parsed correctly. You can thus forget my comment 38, sorry.
Dominic, I notice that we use asyncStorage to keep track of shuffle/repeat settings. In this PR, I was going to remove asyncStorage since, at the time I started it, we were only using it to hold album art. Which do you think would be better: 1) Keep using asyncStorage for shuffle/repeat settings 2) Switch to using cookies for shuffle/repeat settings Thanks!
(In reply to Jim Porter (:squib) from comment #41) > Dominic, I notice that we use asyncStorage to keep track of shuffle/repeat > settings. In this PR, I was going to remove asyncStorage since, at the time > I started it, we were only using it to hold album art. Which do you think > would be better: > > 1) Keep using asyncStorage for shuffle/repeat settings > 2) Switch to using cookies for shuffle/repeat settings > > Thanks! Jim, do you mean you want to remove asyncStorage entirely from music app so that we can have less module involved? if so, then I am fine with using cookies to store the shuffle/repeat settings, since we only need the simple data structure to keep it, and sounds reasonable if cookies is supported natively.
(In reply to Dominic Kuo [:dkuo] from comment #42) > Jim, do you mean you want to remove asyncStorage entirely from music app so > that we can have less module involved? if so, then I am fine with using > cookies to store the shuffle/repeat settings, since we only need the simple > data structure to keep it, and sounds reasonable if cookies is supported > natively. Correct. I'll work on switching over to cookies then. Thanks!
Comment on attachment 789300 [details] [review] https://github.com/mozilla-b2g/gaia/pull/11500 Ok, this is almost ready for review. The only remaining bit will be adding Hub's disc number/total discs patch when it's finished. David: could you review the metadata.js changes? Dominic: could you make sure everything else makes sense, especially the music.js changes? Here are the main changes since the last version: I reverted some of the changes to the album art cache. Now, the music DB stores the full-resolution image, and we use asyncStorage to hold the cached thumbnails. There are pretty thorough tests for ID3 now; the files are generated by a simple utility I made called id3gen. I've tested that they look right in other music players too (Banshee and foobar2000). I also improved the DB upgrade process. Now, if a file needs reparsed (for this upgrade, that's *every* file), it's not listed until the reparse happens. If you'd like to test this, you'll need to use the Taipei flashing tools to make a backup, and then push *just* the music data over. I think this command will suffice: adb push mozilla-profile/data-local/storage/persistent/1013+f+app+++music.gaiamobile.org /data/local/storage/persistent/1013+f+app+++music.gaiamobile.org In the future, it should be possible for reparses to only affect a subset of the files. (However, note that I intend to do a second round of parser upgrades later that will require a full reparse; see bug 1065858. Phase 2 should be the last big upgrade, though.)  Generally, we try to store a reference to the image, like the byte offsets into the MP3 file, or a filename of the full image, but for unsynced ID3 tags, we need to create an external image file. (Unsynced ID3 tags are just a way of escaping the MPEG sync flag, so we need to de-unsync it, and that makes it hard to grab the image from the file, so we just save a copy.)  https://github.com/jimporter/id3gen
Oh, and I'd also like to improve how we cache album art thumbnails, but we can do that in a separate bug, since it won't require reindexing all the music.
There are two patches of mine. Attached to two different bugs.
Comment on attachment 789300 [details] [review] https://github.com/mozilla-b2g/gaia/pull/11500 Jim, I have tested the patch and it works fine on my local branch, also the changes(except metadata.js) make sense to me, there is one minor issue and please read the github comments, thanks for working on this!
Attachment #789300 - Flags: review?(dkuo) → review+
Jim and Hub, I've finally started reviewing this. The changes to metadata.js all seem fine, though I've left a number of minor suggestions on github. Hub: I commented directly on the two commits you made, and I'm not sure if those comments will show up in the PR, or if you need to look at the commits. I didn't have time to finish this today: I still need to look at the dataview and mediadb changes. Jim: I vaguely recall discussing mediadb changes with you, but don't remember any details, and am surprised by the amount of changes there... Would you write up the high-level overview of what you've changed in MediaDB and why it is necessary? In general, I find changes to shared files scarier than app-specific changes, so more context here will help me with the review.
No problem. The changes in mediadb.js are there to add the concept of "reparsing" a file we've already parsed. This is important because we simply can't trust that the data we have in the DB is valid. However, we also want to be able to examine the old metadata in order to migrate over user-created fields (playcount and rating). Reparsing is triggered during a database upgrade, where the updateRecord function says "I don't have enough information, please let me reparse the whole file". updateRecord can't do it itself because the record updating has to happen in a single transaction. Once mediadb sees the needsReparse flag, it refuses to list the file in the initial phase and remembers that it needs to be reparsed later on when we're scanning for files. Once the reparse for the file is finished, mediadb calls reparsedRecord, which handles the migration of the old metadata fields (again, playcount and rating) to the newly-created metadata object. In this PR, *every* file is potentially-invalid, but I added the tag_type (henceforth "tag_format") property so that in the future, if we have a bug in Vorbis comment parsing, we don't need to reparse all the MP3 files as well. (However, bug 1065858 will necessitate another reparse of all files; hopefully that will be the last one for the foreseeable future.)
Jim/David, Lets wrap up reviews and get this into master for 2.2 (has been pending for a while now). Moving sprint milestone. Thanks Hema
Target Milestone: 2.1 S5 (26sep) → 2.1 S6 (10oct)
Comment on attachment 789300 [details] [review] https://github.com/mozilla-b2g/gaia/pull/11500 r- because the mediadb changes only update enumerateAll() and not the regular enumerate() method. Various other comments on github. Mostly nits, but at least one appears to be an actual bug. Also, I'd like you to consider doing the reparsing before sending the "ready" event, so that none of the scanning code needs to change. I think it would be a less risky change to the scanning code and arguably better UX as well. Here's how I described this on IRC: [2:32pm] djf: squib: Did you consider doing the reparsing as part of the upgrade process instead of integrating it into the scanning process? [2:33pm] djf: from a UX standpoint, if I see an "Upgrading…" notice, then when it goes away, I think I'd expect the upgrade to be complete. But with your patch, the user will see upgrading, then they'll see that their music is gone, and then it will come back with a scan. [2:34pm] djf: Could you do is to that MediaDB makes a list of all files that need reparsing, and then before sending the initial "ready" event it handles those?
Apologies again that the review took so long!
Comment on attachment 789300 [details] [review] https://github.com/mozilla-b2g/gaia/pull/11500 Ok, I updated the PR.
Comment on attachment 789300 [details] [review] https://github.com/mozilla-b2g/gaia/pull/11500 Conditional r+. Please fix the last couple of issues on Github and make sure that the addition of needsReparse is not breaking the use of advancedEnumerate() in the music app. We now have the situation where while there are pending reparses, the number returned by the count() method won't match the number of records returned by advancedEnumerate().
Attachment #789300 - Flags: review?(dflanagan) → review+
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.