2011-10-01

Mirror Websites with HTTrack

Monday, September 12th, 2011 from Hak5.org
GOAL: 
Download a copy of any website and host it locally with a one-line web server.  You can also use this for backing up a website, doing some prototyping.  The dark side most likely use might use this technique in building their phishing sites.


MATERIALS:
HTTrack.- Available for Windows, Mac and Linux this open source, multilingual mirroring tool sports multiple web targets, user selectable recursion levels, resume features and more.


STEP-BY-STEP:
1. Begin by creating a directory to store the website mirror.
Example:  mkdir ~/websites and cd !$

2. Run httrack. Once you get familiar with the tool you can automate the process with flags and such, but the straight forward interactive wizard is much appreciated.

3. Start by naming the project, then provide a directory to save the files and the URL of your website or websites separated by commas or spaces. Finally we'll choose how we'd like to download. I prefer option 2, mirroring the web site with the wizard.

4. We can specify if we're using a proxy, what filetypes we would like, and any additional options. Finally we're provided with a command line so that next time we can perform the same action without the prompts.
Example: httrack -W -O /path -%v

5. Hit Y for Yes and the process will complete in just a moment, or maybe longer depending on the size of the site and how recursive you want to get. Finally we can see our finished work with ls. You'll notice HTTrack creates a a log and cache directory and all of the saved files will be found in your website directory.  


6. With our site newly mirrored and the html files sitting happily in our directory we can actually browse to them with a webserver locally in one command.  
Issue: python -m SimpleHTTPServer 
A webserver will be spawned serving up your current working directory on port 8000. Now we can head over to our web browser and check out http://localhost:8000 to see our finished product.