Re: [DIYbio] Re: Downloading JoVE videos?

3:47 AM |

#!/usr/bin/env python3
import requests
import re
import argparse
import sys

P = argparse.ArgumentParser(description="A video extractor/downloader for jove.com. Spits out the download link for use with wget/curl/other.")
P.add_argument("videoURL",
help="URL of the video to download.")
P.add_argument("--user-agent",type=str,default="Mozilla/5.0 (X11; Linux i686; rv:20.0) Gecko/20100101 Firefox/20.0",
help="Custom user-agent string to use. Default is a generic firefox string.")
P.add_argument("-v","--verbose",action="store_true",default=False,
help="Be more chatty. May break piping of output to other tools.")
P.add_argument("-g","--get-video",action="store_true",default=False,
help="Don't print video link, just get the video and save locally!")
A = P.parse_args()

# Verbosity!
_print = print
def print(*args, **kwargs):
if A.verbose:
_print(*args, **kwargs)

# Jove gets the video download link by posting a "videoid" argument to jove.com/video-chapters,
# and this videoid is embedded in-page whereas the direct video download link is not.
print("Requesting page..")
try:
videopage = requests.get(A.videoURL, headers={"User-Agent":A.user_agent})
except requests.exceptions.ConnectionError:
_print("Error fetching page, are you connected?")
sys.exit(1)
print("Parsing page for video-id..")
videoline = [x for x in videopage.text.splitlines() if "videoid" in x][0]
videoURL = re.search("data-url='(/video-chapters[^']+)'", videoline)
if videoURL:
videoURL = videoURL.group(1)
else:
raise Exception("Could not parse the data-url link from the provided link. Page content was:\n\n"+videopage.text)
print("Got relative URL for requesting video download URL:",videoURL)
print("Making request for video download URL.")
# To get download link post the videoid to jove.com/video-chapters and then regextract the link
try:
vidlinkrequest = requests.get("http://www.jove.com{0}".format(videoURL),
headers={"User-Agent": A.user_agent,
"Referer": A.videoURL,
# All the below is probably unnecessary but helps maintain the ruse.
"X-Requested-With":"XMLHttpRequest",
"Host":"www.jove.com",
"Connection":"keep-alive",
"Cache-Control":"max-age=0",
"Accept-Language":"en-US,en;q=0.5",
"Accept-Encoding":"gzip, deflate",
"Accept":"*/*"})
except requests.exceptions.ConnectionError:
_print("Error fetching page, are you connected?")
sys.exit(1)

print("Received response, parsing for URL..")
vidlink = re.search('video="([^"]+)"', vidlinkrequest.text.replace(":",":"))
if vidlink:
vidlink = vidlink.group(1)
print("Found final URL for video download:")
if A.get_video:
print(vidlink)
print("Making request for full video download..")
try:
video = requests.get(vidlink,
headers={"User-Agent": A.user_agent,
"Referer": A.videoURL,
"Accept-Encoding":"gzip, deflate"})
except requests.exceptions.ConnectionError:
_print("Error fetching video, connection may have cut or been terminated by server?")
_print("The video link was:",vidlink)
sys.exit(1)
with open(vidlink.rsplit("/",1)[1], "wb") as OutputF:
OutputF.write(video.content)
else:
# Default behaviour: just print the URL to stdout.
# If verbose is not enabled, the output can be piped directly to wget or curl,
# which is a *better option* than using the build-in get function for users
# sensible enough to be using Linux, but isn't practical for nutty win/mac users.
_print(vidlink)
else:
raise Exception("Could not parse video download link from returned data. Returned data was:\n\n"+vidlink.text)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iQIcBAEBCgAGBQJSQr8mAAoJEL0iNgSYi5CZhEwP/ihIPHDmX53mH5azp6Xuhktg
EE5D3ZHmfbMTgza1VfViK7P64UKhlqMEMrQGqRP6yZexeeeRzwuqZjMnJEaoIr/1
VxKVPBbt69TGZmveNfwlotQOzKa8opUWjuXzDjsM46CQWaJYUPrWiM9cylw8nAXJ
0KxfS1+4LEnmre7oTT+W2Etd8gxp8NAGs1vKHMg6sawHC7UWNbOBTahyP9RKg7vo
FCAhkkMW0/YwjgL/XTiddGFHsbi4vsW4dICrjqYjMNjbiZtXV9Wh6AfjXLWoBZUN
/01X96cwhBmMWiQFXWafwmYP3ofnkAFqouWSEp3OVcvsqPM2eWeyW5qbbapyHjci
6BJGpNuLfZLM6IFlhfkNeurxBOFtgANrvgNtRmR19fO4vvMqnAs1EQyWeavulWwy
qIVIa2gWETAIwmiTkq3QSxSInU4H3h0HPehmcc0ZojXHvcVaCXH+BfXhNFKZ3Q6P
umlIOnblK6NUT2pEMMlU03k4wCTUWKvKvGjLwtW6eWDzw/Ja4S0t2vCkj8KdhGrs
6+Sp4U8IQShTmRubZyxDLRRCHXiFS56qU/IT2qEtgf2ZiUZfST99OF3eBxH75bIw
+KBeYOkGjWrnDRp/hgQ5eFsutaO5b747afyZd7gUJMJ9ONVfHKiVu3C6k/yW/n1w
8RBaozckDoeKv+0SLeNE
=GaL9
-----END PGP SIGNATURE-----
Thanks!
Dug into the page behaviour to refine this into a Python script.

It's odd; they don't have the video download link in the page itself;
rather, the page contains the "video-id", and this is used to make a
request to another page, which returns the video link. Thankfully the
video link has a useful filename already.

Attached is a python script to automate this; just give it the URL to
the page of the video you want, and it'll spit out the download link.
It has an optional "verbose mode" with "-v" to keep you up to date on
where it's at, and if you don't have wget/curl to pipe the download link
(i.e. typing "JoveDL.py 'http://www.jove.com/video/myvideo' | wget -c
-") into, you can instead use the "-g" flag to "get" the video using
Python.

This script requires the "requests" module:
http://docs.python-requests.org/en/latest/

best,
Cathal

On Tue, 24 Sep 2013 10:38:01 -0700 (PDT)
code elusive <code.elusive@gmail.com> wrote:

>
> hello :)
>
> A method to extract the full JoVE video files, using Firefox, is
> described below.
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Extraction of full JoVE video files
>
> 1. In Firefox, open a new tab (my FF version is 23.0.1)
>
> 2. Press Ctr+Shift+K to open the Web console window (Or go from
> Firefox>Web
> developer>Web console)
>
> 3. On the web console, the following buttons must be pressed (if they
> are not, press them): "Net" and "Logging"
>
> 4. Paste the url of the webpage you're interested in and start
> playing the short video segment.
>
> From the lines that have appeared on the web console, we want the
> video file links, which most probably include a .mov or .mp4
> extension.
>
> 5. Once the short video segment has ended, filter the lines using
> the"filter output"box of the web console (next to the "clear" button)
> and search for .mov or .mp4.
>
> The lines with the link start as
> GET "http://ecsource.jove.com/CDNSource/.. "
>
> 6. Right click the appropriate line, select "copy link location" and
> either paste in a new tab, or download with your favorite download
> manager.
>
> that's it :)
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Of course it is possible that an even easier method exists.
> I hope the explanation is clear. Let me know if I can clarify
> anything.
>
> For those that are interested in the video files from the links that
> Patrik posted, the links are:
>
> http://ecsource.jove.com/CDNSource/3740_Mahoney_Perfusion_010512_P_Web.mov
> http://ecsource.jove.com/CDNSource/3940_Bueter_050112_F_Web.mov
> http://ecsource.jove.com/CDNSource/1138_Cowan_F2.mp4
>
> have a nice evening :)
>