|[Front page] [HTTPget] [AdBlock] [PHP Fortune] [GPS] [Mandelbrot] [XSL tutorial] [C tutorial] [TiddlyWiki] [Contact]|
java -jar HTTPget.jar <main-pattern> [-tos]Main-pattern is the pattern used to describe which files are going to be downloaded. If no switches are applied, the default behaviour is to download the files at all the URLs that are in the pattern.
[a..b] will create a list of integers running from a to b.
Weekday:[Monday, Wednesday, Friday]Will create 3 texts:
Numbers:[1..5]will create 5 texts:
Numbers:1It's possible to combine several brackets in one line.
[1..12][Monday, Wednesday, Friday]This will create 36 lines, with all possible combinations of the numbers from 1 through 12 and the three weekdays.
[001..245]will result in a list, where all numbers have 3 digits. Numbers from 1 to 99 will have leading zeros.
|-t <N>||Used to tell the program how many files to download simultaneously. Default value is one. If you have a high-speed connection, and the target site is also broadband, it's recommendable to set N high.|
||Changes the name the file will be saved with on the disk. Sometimes it may be convenient to rename files. The output-pattern must have at least as many possible combinations as the main pattern.|
||This switch tells the program to search the URLs in the main-pattern
for files that match this pattern.
It's possible to use wildcards in this pattern.
* matches any number of characters.
_ matches exactly one character.
To download all of these, simply type:
java -jar HTTPget.jar http://www.greatcomix.com/archive/gc[97..01][01..12][01..31].gif
Next we would like to rename all the downloaded files. To rename them so that gc is written in capitals, add the switch:
-o GC[97..01][01..12][01..31].gifAnother comic is hosted at Greatcomix. It is not published on a daily basis, so all the files are simply called bar1.gif, bar2.gif, bar3.gif ... bar542.gif. The annoying thing about downloading these is that when you view them in a sorted list, 10 comes immediately after 1, 100 comes after 10 and so on. To avoid this problem we add zeros in front of the files when they're downloaded.
java -jar HTTPget.jar http://www.greatcomix.com/bar/bar[1..542].gif -o bar[001..542].gif
After a while Greatcomix decides that too many people are downloading
their comics without looking at their banner ads. They therefore give all
their files names that have no relation to their publishing date, thus
making it harder/impossible to define a pattern for them and download them
in the correct order. Usually these files will still be linked from HTML-files,
whose addresses are easily locatable by date. We therefore ask HTTPget
to search through the HTML-files for the picture files.
java -jar HTTPget.jar http://www.greatcomix.com/bar/[97..01][01..12][01..31]a.htmlPerformance can be dramatically increased in all of the above examples, by adding -t N, to specify that the program should download N files simultaneously.
-s bar*.gif -o bar[97..01][01..12][01..31].gif
Comments, suggestions and bug-reports to Henrik