Download Many Files In Parallel? (linux/python?)
Solution 1:
I normally use pscp
to do things like this, and then call it using subprocess.Popen
for example:
pscp_command = '''"c:\program files\putty\pscp.exe" -pw <pwd> -p -scp -unsafe <file location on my linux machine including machine name and login, can use wildcards here> <where you want the files to go on a windows machine>'''
p = subprocess.Popen( pscp_command, stdout=subprocess.PIPE, stderr=subprocess.PIPE )
stdout, stderr = p.communicate()
p.wait()
of course I'm assuming linux --> windows
Solution 2:
Try wget, a command line utility installed on most Linux distros, also available via Cygwin on Windows.
You may also have a look at Scrapy, which is a library/framework written in Python.
Solution 3:
If youuse a Pool
object from the multiprocessing
module, urllib2
should handle FTP.
results = {}
defget_url(url):
try:
res = urllib2.urlopen(url)
# url should start with 'ftp:'
results[url] = res.read()
except Exception:
# add more meaningful exception handling if you need it. Eg, retry once etc.
results[url] = None
pool = Pool(processes=num_processes)
result = pool.map_async(get_url, url_list)
pool.close()
pool.join()
Of course, spawning processes will have some serious overhead. Non-blocking requests will almost certainly be faster if you can use a 3rd part module like twisted
Whether the overhead is a serious problem will depend on the relative magnitude of download times per file and network latency.
You can try implementing it using python threads rather than processes, but it gets a bit trickier. See the answer to this question to use urllib2 safely with threads. You would also need to use the multiprocessing.pool.ThreadPool
instead of the regular Pool
Solution 4:
Know it's an old post but there is a perfect linux utility for this. If you are transferring files from a remote host, lftp
is great! I mainly use it to quickly push stuff to my ftp server but it works great for pulling stuff off as well using the mirror
command. It also has an option to copy a user defined number of files in parallel like you wanted. If you wanted to copy some files from a remote path to a local path your command line would look something like this;
lftp
open ftp://user:password@ftp.site.com
cd some/remote/path
lcd some/local/path
mirror --reverse --parallel=2
Be very careful with this command though, just like other mirror commands if you screw it up, you WILL DELETE FILES.
For more options or documentation for lftp
I've visited this site http://lftp.yar.ru/lftp-man.html
Post a Comment for "Download Many Files In Parallel? (linux/python?)"