{"id":319,"date":"2010-01-14T21:08:51","date_gmt":"2010-01-15T02:08:51","guid":{"rendered":"http:\/\/nylinuxhelp.com\/blogs\/?p=319"},"modified":"2010-01-17T08:34:13","modified_gmt":"2010-01-17T13:34:13","slug":"download-several-files-part-2","status":"publish","type":"post","link":"https:\/\/nylinuxhelp.com\/blogs\/command-line\/download-several-files-part-2","title":{"rendered":"Download several files: part 2"},"content":{"rendered":"<p>In an earlier post we used <strong>wget<\/strong> to download a single image file, and then used it to get all of the &#8216;gif&#8217; and &#8216;jpg&#8217; files from a single command.\u00a0 Multi-download commands of this type are helpful when you know the URL and exact directory where the image files exist.\u00a0 Let&#8217;s now take it a step further, and get lazy too.\u00a0 Lazy?? Yes, lazy.\u00a0 Since we&#8217;re looking to use Linux for time-saving shortcuts, the less work we have to do to get to accomplish our task, the better.<\/p>\n<p>As previously mentioned, I like podcasts. \u00a0 Podcasts are (usually) available in an RSS feed in the form of a web URL.\u00a0 Programs such as ITunes, Amarok, Rhythmbox (or other) use feed URLs to get info about the available audio files and you can manually download them or set up preferences that do this for you.<\/p>\n<p>We&#8217;re going to look at this from a &#8220;get me all the files\u2014now&#8221; approach using the Linux command line.<\/p>\n<h2>To perform a multi-file &#8220;unattended&#8221; download&#8230;<\/h2>\n<ol>\n<li>Make sure that &#8220;lynx&#8221; (a terminal-based web browser) is installed.\u00a0 To check if lynx is installed, type <strong>which lynx<\/strong> at the prompt.\u00a0 If the shell responds with nothing but the next prompt, then it&#8217;s not installed.\u00a0 To install lynx and you&#8217;re on a debian-based OS such as Ubuntu (or similar) type &#8220;<strong>sudo aptitude install lynx<\/strong>&#8221; at the prompt.\u00a0 If you&#8217;re using a redhat-based system type &#8220;<strong>yum install lynx<\/strong>&#8221; to accomplish the same.\u00a0\u00a0 When lynx is installed, the shell will return the executable path of lynx (it might appear as \/usr\/local\/bin\/lynx) when you type &#8220;which lynx&#8221; at your prompt.<\/li>\n<li>Make sure you have wget installed.\u00a0 In the terminal, type &#8220;which wget&#8221; and see what the shell returns.\u00a0 If it&#8217;s not on your system then install it.\u00a0 Items one and two only have to be done once, if at all.\u00a0 I think wget will be there, but \u00a0lynx is probably not included out of the box at install time.<\/li>\n<li>A URL (or RSS feed URL) where the desired files exist.<\/li>\n<\/ol>\n<p>Here&#8217;s our practical example.\u00a0 Let&#8217;s download all the mp3 files at\u00a0<a title=\"Doctor Who audio files from Steven J Cohen\" href=\"http:\/\/www.stevenjaycohen.com\/audio\/drwho\/feed\" target=\"_blank\">Steven J. Cohen&#8217;s &#8220;Doctor Who&#8221; RSS Feed<\/a>. You should view this link in a web browser to make sure that the page\/feed is still there.<\/p>\n<p>Time for a &#8220;trial-run&#8221; (this next command will not download, just list the <strong>mp3 files<\/strong> at the Feed URL).<\/p>\n<pre>lynx -dump http:\/\/www.stevenjaycohen.com\/audio\/drwho\/feed | egrep -o \"http:.*mp3\"<\/pre>\n<p><strong>lynx -dump [URL] <\/strong>returns a numbered list of web links from a given web page\u00a0(for the complete HTML source, use lynx -source [URL]). Since we only want the links, (and not the numbering) we need to filter this list using the UNIX pipe character &#8220;|&#8221; and the search tool <strong>egrep -o [pattern]<\/strong>.\u00a0 We put in &#8220;http:.*mp3&#8221; as our pattern which will capture any link that <em>starts with http<\/em> and<em> ends with mp3<\/em> (note the .* is a wildcard meaning `any character`).<em><strong> A word of caution.<\/strong><\/em> It&#8217;s ALWAYS a good idea to do a trial run so that you have an idea of what you will request for download and if your command will succeed in building the list properly.\u00a0 This is a very important preliminary step.<\/p>\n<p>Now, let&#8217;s do this for real.\u00a0 The following command downloads files into the current directory of the shell.\u00a0 So if you execute the command from &#8220;\/home\/myUserName\/music&#8221; then the files get saved into &#8220;music&#8221;.<\/p>\n<pre>lynx -dump http:\/\/www.stevenjaycohen.com\/audio\/drwho\/feed | egrep -o \"http:.*mp3\" | xargs -n1 wget<\/pre>\n<p>And that&#8217;s it.\u00a0 The shell shows progress of each file as it downloads.\u00a0 When it&#8217;s done with the first file, it downloads the next one, and so on.\u00a0 It runs unattended, allowing you to do other things with your time.<\/p>\n<p>To perform the &#8220;unattended&#8221; download of all the files specified in the list, we needed another pipe, and another command structure known as &#8220;xargs&#8221;.\u00a0 Why xargs?\u00a0 Sometimes the shell runs into a problem of having &#8220;too many arguments&#8221; in its list to act on.\u00a0 xargs is your friend should this happen.<\/p>\n<p><strong>xargs <\/strong>[options] [command].\u00a0 The option and the command work together as follows.\u00a0 Option &#8220;-n1&#8221; directs the command &#8220;wget&#8221; to work one time per each url from the list resulting from the &#8220;<strong>lynx -dump<\/strong>&#8221; part of the command.\u00a0 Like many shell commands, there&#8217;s usually more than one way to do it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In an earlier post we used wget to download a single image file, and then used it to get all of the &#8216;gif&#8217; and &#8216;jpg&#8217; files from a single command.\u00a0 Multi-download commands of this type are helpful when you know the URL and exact directory where the image files exist.\u00a0 Let&#8217;s now take it a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14,7,23],"tags":[26,34,31,33],"class_list":["post-319","post","type-post","status-publish","format-standard","hentry","category-linux-apps","category-command-line","category-use-linux","tag-commands","tag-pipe","tag-wget","tag-xargs"],"_links":{"self":[{"href":"https:\/\/nylinuxhelp.com\/blogs\/wp-json\/wp\/v2\/posts\/319","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nylinuxhelp.com\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nylinuxhelp.com\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nylinuxhelp.com\/blogs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nylinuxhelp.com\/blogs\/wp-json\/wp\/v2\/comments?post=319"}],"version-history":[{"count":31,"href":"https:\/\/nylinuxhelp.com\/blogs\/wp-json\/wp\/v2\/posts\/319\/revisions"}],"predecessor-version":[{"id":348,"href":"https:\/\/nylinuxhelp.com\/blogs\/wp-json\/wp\/v2\/posts\/319\/revisions\/348"}],"wp:attachment":[{"href":"https:\/\/nylinuxhelp.com\/blogs\/wp-json\/wp\/v2\/media?parent=319"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nylinuxhelp.com\/blogs\/wp-json\/wp\/v2\/categories?post=319"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nylinuxhelp.com\/blogs\/wp-json\/wp\/v2\/tags?post=319"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}