Closed Bug 913877 Opened 11 years ago Closed 11 years ago

Implement attribution for files from Internet Archive's TV Archive

Categories

(Webmaker Graveyard :: Popcorn Maker, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: ben, Assigned: mjschranz)

References

Details

Attachments

(2 files)

Per #908370

Tracey from IA has a patch for Butter that displays correct attribution for files hosted from archive.org.

However, she was working from the Butter tree. Matt, can you help implement this on P.M.O?
Blocks: 908370
Of course we can, but I'd need a pull request to look at it we are going to go with the work Tracey has done already or a branch I can look at so I can file up a PR myself.

Tracey, do you have either of these things sitting around already?
Assignee: nobody → schranz.m
Status: NEW → ASSIGNED
Flags: needinfo?(tracey)
hey matt,
been a bit swamped with other work, but was planning to do the minor "port" and paring down of the current butter version to P.M.O.

I can try to revise that by EOD tuesday and you could take a look and suggest,edit,help what you see?  8-)
Flags: needinfo?(tracey)
That's fine with me. You can always post here or find my in irc whenever you have more time.
Attached patch archive.patchSplinter Review
Initial work from Tracey.
This is the first attempt at porting Tracey's work all over.

Two issues:

1) What is the expected video format? I'm trying a video like http://ia600208.us.archive.org/2/items/bb_minnie_the_moocher/bb_minnie_the_moocher_512kb.mp4 and it doesn't catch that as an archive.org video.
2) Do you have an icon somewhere that we can use for attribution?
Attachment #802583 - Flags: feedback?(tracey)
yah, trying to get the best video for non-TV IA items will be just a bit harder so I skipped that for now (we can most of the time offer *both* an .ogv and .mp4 for non-TV, etc. but it takes a bit more work sifting through available files).

so how about trying pasting this url in to the "Create new media clip" section for now?

http://archive.org/details/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart#start/460/end/490
(2) and i had made this for PR previously 

http://archive.org/~tracey/pwo-sprite.png
To clarify, sounds like we're only supporting attribution for clips that users find at http://archive.org/details/tv.

So quality control on this pull request is to make sure all clips from http://archive.org/details/tv work.

The challenge is that all clips in TV Archive are .mp4s, so we'll need to test this in Chrome.

Right?

Thanks!
yes, that's exactly right!   we can take on the easier/more ideal (since has .ogv) non-TV clips sometime later.  but we're all most psyched about *TV* right now... 8-D
(In reply to ben from comment #8)
> The challenge is that all clips in TV Archive are .mp4s, so we'll need to
> test this in Chrome.

For a short while longer yes. Soon enough we will have our flash wrapper fallback ready, which we will use to catch HTML5 video sources that the current browser in use doesn't support.
(In reply to tracey jaquith from comment #6)
> http://archive.org/details/
> COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart#start/460/end/490

This kind of url causes an error event to be thrown when set as the src to a video element, and will never properly trigger loadedmetadata. I recall this problem existing before, but I was under the impression we fixed it.

Although, it could be for the other kind of archive video I linked before myself.
Flags: needinfo?(tracey)
that *should* auto parse, per my suggested patch, to video:

https://archive.org/download/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart.mp4?start=460&end=490

if the patch is working properly.
Flags: needinfo?(tracey)
No, it's parsing URLs correctly. The problem is the data sent back from that server is triggering an error event and never sending loadedmetadata when set to the source of a video element.
hmm.... seems to work fine in butter repo locally here...

you could try tacking on "&x=ignore.mp4" at the end of the url (from this line:
          var src = 'https://archive.org/download/'+id+'/'+id+'.mp4' + '?start='+start + (end ? '&end='+end : '');                                                       
)
and see if it's any happier?
doh!  hmm, i think we can't quite pull off "https://" yet for this actually.  or something is slightly messed up on our end -- can you switch https to http there?
ah, matt and i both simultaneously figured out it was appending the "&butterid=...." that was tripping the IA side up (and denying the request).  fix on IA side to allow this will finish in minutes.  8-)  sorry about that!
Depends on: 915225
Comment on attachment 802583 [details] [review]
https://github.com/mozilla/popcorn.webmaker.org/pull/201

Fixed up.

First, here's what the clip looks like inside the editor with the bolded text being what the "title" property:
https://dl-web.dropbox.com/spa/qxpeeqhhsjhcqcf/52ofojgg.png

Export, with the text show being the "title" property:
https://dl-web.dropbox.com/spa/qxpeeqhhsjhcqcf/m18wnadb.png

For testers, the links expected are one's like http://archive.org/details/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart#start/460/end/490. Of course, test in Chrome as Archive.org only has mp4 videos.
Attachment #802583 - Flags: feedback?(tracey) → review?(scott)
Comment on attachment 802583 [details] [review]
https://github.com/mozilla/popcorn.webmaker.org/pull/201

Also need Tracey to double check how we handle what's displayed for the title is correct, since there are specific needs there.
Attachment #802583 - Flags: feedback?(tracey)
Comment on attachment 802583 [details] [review]
https://github.com/mozilla/popcorn.webmaker.org/pull/201

Two things.

Pasting in a direct link to an mp4 from archive.org now fails, while only supporting the "pretty" link.

The opposite of that, the clip editor, shows the mp4, but does not support the pretty link.

This creates confusion in trying to paste in the editor url, into another project, only to have it keep failing.

Should support both in both spots.

Other than that, which is not obvious, this is looking good.
Attachment #802583 - Flags: review?(scott) → review-
Comment on attachment 802583 [details] [review]
https://github.com/mozilla/popcorn.webmaker.org/pull/201

Fixed up your problem.

Also, Tracey wound up implementing basically a "custom" endpoint for us to use that will give us the data we need rather than us doing the processing on our end. Also makes handling invalid clips much better. Also reduced code!
Attachment #802583 - Flags: review- → review?(scott)
wow, matt, that patch looks fantastic!!     

( we dont need to worry about "URI.makeUnique( respData.thumb ).toString()" like we did (at least one time) in butter repo, i gather, right?)   

ok, now that simple REST service is now picking the "best thumbnail" based upon the passed TV url, yay!  
(eg:  https://archive.org/services/maker.php?callback=caller&url=http%3A%2F%2Farchive.org%2Fdetails%2FCOMW_20130726_023000_The_Daily_Show_With_Jon_Stewart%23start%2F460%2Fend%2F490 )
Comment on attachment 802583 [details] [review]
https://github.com/mozilla/popcorn.webmaker.org/pull/201

Something weird with durations.

If I paste in https://ia601901.us.archive.org/9/items/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart.mp4?start=460&end=485

I get a 19 second clip, but I asked for a 25.

If I paste in http://ia601901.us.archive.org/9/items/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart.mp4?start=460&end=490

I get a 28 second clip, but I asked for 30.

Same if I ask for a 10 second, clip, I get a 6 second clip.

Now, if I start using the correct arcive tv link like http://archive.org/details/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart#start/460/end/490

Which should be a 30 second clip, given the params, I get a 30 second sequencer clip, but it actually only plays 28 seconds. This can cause issues. Sequencer thinks it is 30, but actually ends on 28. We handle this, because YouTube can also throw us these curve balls, but I would rather not rely on that hack stack in other media types if we can help it.

Also, not sure if we care, but:

If I paste in https://archive.org/download/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart.mp4?start=460&end=490

I get an error, but it works in the browser. This probably should be valid and either hit up HTML5 for playback, or convert it to http://archive.org/details/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart#start/460/end/490

I know we're going to hit issues when we attempt to connect this to jwplayer, because jwplayer expects to filter into html5, so we're going to hit cases with 4 variables. Is it html5, is it html5 and jwplayer, is it html5 and archive.org and now what the heck do we send for linkback, but, imo, we should worry about that in the jwplayer patch. Maybe file a third ticket to connect the two once they both land, but just a heads up on this in case you have any thoughts to make it easier.

R- on the duration nonsense, the rest is just forward thinking.
Attachment #802583 - Flags: review?(scott) → review-
> If I paste in
> https://archive.org/download/
> COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart/
> COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart.mp4?start=460&end=490
> 
> I get an error, but it works in the browser. This probably should be valid
> and either hit up HTML5 for playback, or convert it to
> http://archive.org/details/
> COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart#start/460/end/490

Yeah, this is an unindented change because we aren't doing the metadata processing ourselves. I could just limit it to `details` urls, but I'll confirm with Tracey before landing.

> 
> I know we're going to hit issues when we attempt to connect this to
> jwplayer, because jwplayer expects to filter into html5, so we're going to
> hit cases with 4 variables. Is it html5, is it html5 and jwplayer, is it
> html5 and archive.org and now what the heck do we send for linkback, but,
> imo, we should worry about that in the jwplayer patch. Maybe file a third
> ticket to connect the two once they both land, but just a heads up on this
> in case you have any thoughts to make it easier.

The linkback stuff was just to prevent extra processing that needed to be done inside the embed system since it's just easier letting them handle this for us.

I'm going to see if I can get proper handling of this kind (TV) of archive video along with the "normal" ones as well.
Comment on attachment 802583 [details] [review]
https://github.com/mozilla/popcorn.webmaker.org/pull/201

I can't do much about the duration issues on their end. I'm assuming it's processing something wrong and I can't do anything there. What I can do is make sure that we aren't giving the wrong data to the sequencer itself, which is easy.

I'm changed the parsing a bit too so direct download links to archive videos will trigger this as well. One's like http://ia600208.us.archive.org/2/items/bb_minnie_the_moocher/bb_minnie_the_moocher_512kb.mp4 should probably be a new bug but https://archive.org/download/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart.mp4?start=460&end=490 get's caught just fine.

I think it probably would be handy to get a better understanding of how everything is segmented with Archive though. I have no idea what the distinguishment's are between one kind of video to the next, where it comes from on the website, etc.
Attachment #802583 - Flags: review- → review?(scott)
will be able to comment later, but this should mostly be due to us using mod_h264_streaming on our server to custom cut the entire program .mp4 into a smaller .mp4, and remake the moov atom and headers to a smaller set of metadata and smaller file.

we have a generous amount of time, up to max ~10 seconds, between keyframes, so that can effect what you *actually* get out from the server.  we never use B-frames for our .mp4 (for the simplest mobile devices out there to work, low complexity, etc.) so the duration wanted-vs-actual should only be due to the first keyframe served up (ie: the end of the video should be fine/accurate).

we are trying to see if we can come up with a more accurate cutting technology, or update to mod_h264_streaming, or some other workaround...

i could update our metadata script with a bit of time to at least find the keyframe locations and give an accurate duration...   that should probably make popcorn's end happy, i think?  8-)
Comment on attachment 802583 [details] [review]
https://github.com/mozilla/popcorn.webmaker.org/pull/201

Almost there.

The way duration works now is fine. As long as the duration data we send to sequencer is exactly what is going to be played.

We should track this as a bug, and see what it would take to get it fixed, but for now, as long as sequencer isn't hitting it, we're not going to break or get out of sync.

The parsing of the three potential URLs now work, cept this one does not get a thumbnail: https://archive.org/download/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart.mp4?start=460&end=490

Should it have a thumbnail?

R- if we can get a thumbnail, R+ if we cannot get a thumbnail.
Attachment #802583 - Flags: review?(scott) → review-
I see a thumbnail here:

http://archive.org/services/maker.php?url=https%3A%2F%2Farchive.org%2Fdetails%2FCOMW_20130726_023000_The_Daily_Show_With_Jon_Stewart%2Fstart%2F460%2Fend%2F490

(for the /details/ page where that clip lives)

I think that url is what it's going to use to get the thumb, right?
> If I paste in
> https://archive.org/download/
> COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart/
> COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart.mp4?start=460&end=490
> 
> I get an error, but it works in the browser. This probably should be valid
> and either hit up HTML5 for playback, or convert it to
> http://archive.org/details/
> COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart#start/460/end/490
> 
> I know we're going to hit issues when we attempt to connect this to
> jwplayer, because jwplayer expects to filter into html5, so we're going to
> hit cases with 4 variables. Is it html5, is it html5 and jwplayer, is it
> html5 and archive.org and now what the heck do we send for linkback, but,
> imo, we should worry about that in the jwplayer patch. Maybe file a third
> ticket to connect the two once they both land, but just a heads up on this
> in case you have any thoughts to make it easier.

I understand your concerns and thoughts, and don't dismiss or discount them...
However!  8-)
We do not want to expose the .mp4 urls explicitly (we certainly wont show them on our pages anywhere, and we'd prefer they not be used/promoted).  So the only types of things for now we expect to be pasted in to the media box are the /details/ oriented links.
So maybe we are OK w.r.t. that point.
(In reply to tracey_pooh from comment #28)
> I understand your concerns and thoughts, and don't dismiss or discount
> them...
> However!  8-)
> We do not want to expose the .mp4 urls explicitly (we certainly wont show
> them on our pages anywhere, and we'd prefer they not be used/promoted).  So
> the only types of things for now we expect to be pasted in to the media box
> are the /details/ oriented links.
> So maybe we are OK w.r.t. that point.

By and large that's the case, but we can't really stop someone from inspecting the embed page and grabbing the .mp4 link we put there.

As far as the missing thumbnails bit, I think I'll do some string manipulation to make the https://archive.org/download/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart.mp4?start=460&end=490 type of sources still get a thumbnail.

http://archive.org/download/20030524-bbs-nasstrom/20030524-bbs-nasstrom1_512kb.mp4 Still will wind up being done as a separate ticket in my mind.
Comment on attachment 802583 [details] [review]
https://github.com/mozilla/popcorn.webmaker.org/pull/201

It can have a thumbnail.
Attachment #802583 - Flags: review- → review?(scott)
i can talk to our team, but i suspect we'd be more comfortable with going down the "not available yet" blocking for any "naked" TV .mp4 clip.  i realize that kinda sucks, if that's the case, but we're a bit concerned w/ making/encouraging our media directly obvious.  this gets into that political kind of thing...
I don't know what the correct answer is here. It's all up to how you want/expect people to be consuming Archive's URLs and of course, how people actually use them. We also are setting our own precedent here since people by and large don't use Archive with Popcorn Maker. They can technically, if they grab the direct .mp4 links but as you've pointed out it's not ideal to what you want.

We can simply not support the .mp4 links by considering those passed in to be erroneous and rejecting them. From there, we can prevent it from being visible to every except the people who inspect the Popcorn.js code we inject into our embeds. We have to have the .mp4 links there.
OK, i've tried to make most of our video items, including non-TV, work now, best I can tell from my POV...

eg: 

https://archive.org/services/maker.php?url=http://archive.org/details/20030524-bbs-nasstrom

for now, it's only looking for .mp4 files, but does try to group all logical files together when there are 2+ videos in an item, pick the 1st, and pick the best thumbnail, .mp4, and duration for them.

There's going to be cases and items where this won't quite work properly, but most of our items with videos *should* work now...
OK, i've updated to allow for dropping in "raw" .mp4 or .ogv files (and will try to set the "linkback" to the /details/ page version now.

examples:

TV:
https://archive.org/services/maker.php?url=https://archive.org/download/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart.mp4?start=460&end=475

non-TV:
https://archive.org/services/maker.php?url=https://archive.org/download/20030524-bbs-nasstrom/20030524-bbs-nasstrom1.ogv

so we should be in pretty good shape now I think/hope!
Comment on attachment 802583 [details] [review]
https://github.com/mozilla/popcorn.webmaker.org/pull/201

I'm fine with the way this is. Moved from a looking good, to a looks great :D

I think regarding if we should block direct raw mp4s from archive.org, I'm also fine with that, but make sure it is an explicit thing, so it doesn't get "patched" at some point. Just any url coming from archive.org needs to be in the supported format, or we return early with a comment in the code as to why we're returning. Then ensure we also inform the user what has happened, instead of timing out so the user knows why we cannot add a raw mp4.

We can also always file and do it as a follow up patch.
Attachment #802583 - Flags: review?(scott) → review+
Comment on attachment 802583 [details] [review]
https://github.com/mozilla/popcorn.webmaker.org/pull/201

With the concern about allowing direct mp4s, I've removed the ability. If a /download/ url is given, it warns the user that it's an unembedable link. I think this satisfies everyone.

http://archive.org/details/20030524-bbs-nasstrom
http://archive.org/details/COMW_20130726_023000_The_Daily_Show_With_Jon_Stewart#start/460/end/490

Are both examples of expected to function URLs.
Attachment #802583 - Flags: review+ → review?(scott)
Attachment #802583 - Flags: review?(scott) → review+
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Attachment #802583 - Flags: review+
Attachment #802583 - Flags: feedback?(tracey)
On production.
Status: RESOLVED → VERIFIED
Attachment mime type: text/plain → text/x-github-pull-request
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: