Flattening a Website with WGET

I like to use WordPress for my course websites because it’s easy to set up, customize, and edit without having to do significant “backend” work on the site. But what about when a course is over and I no longer need all that functionality, am running out of space on my server, and also don’t want to be hassled with ongoing updates? That’s when I flatten the website with WGET.

Flattening a website with WGET is pretty straightforward. First, open the Terminal (or other command line program) on your machine. Type “wget” to see if it is already installed. If it is not installed, you will need to install it. There are a lot of online tutorials on how to install wget, so I won’t reinvent the wheel, just send you to https://phoenixnap.com/kb/wget-command-with-examples as an example of installation instructions.

Once you have wget installed, return to your Terminal/command line program. Now, Terminal will automatically assume I want to work in my user account folder (which is referenced as ~) but if I want to switch to another folder I can use commands like “cd ~/Desktop” to go to the Desktop folder associated with my user account on the machine.

Whatever folder you decide to work in, the next command will go something like

“wget – -mirror – -page-requisites – -html-extension – -convert-links – -no-parent – -wait=2 – -random-wait – -no-check-certificate https://yourURLhere.com”

At this point you may be thinking “ahhhh!” but we can break it down:

  • “wget” = use wget
  • “- -” = the subsequent text is an option that will customize the functionality of wget (NOTE: I’ve put a space between my -‘s to make it display better on the site, but do not put a space between them when you type the command!)
  • “mirror” = copy or mirror a whole website
  • “page-requisites” = get all the bits that are required to display the pages
  • “html-extension” = add .html to the end of the filenames as needed
  • “convert-links” = update all internal links to still work
  • “no-parent” = if this is a subdomain, don’t get everything from the whole domain, just the subdomain files
  • “wait=2” = wait 2 seconds before grabbing the next page, just to be polite and not crash the server
  • “random-wait” = well, okay, wait about 2 seconds so you aren’t pinging the server exactly every 2 seconds like a bot
  • “no-check-certificate” = continues the download even if have certificate/verification failure
  • “https://yourURLhere.com” = the thing I’m trying to save

There are some shortcuts you can learn (e.g. “-p” is the same as “- -page-requisites”, which I’ve avoided to make the command look clearer. There are also numerous other commands you could learn: type “man wget” to see them all.

After you run this command, lots of code will scroll across your Terminal, giving you updates, and eventually end with a line that reads something like “Converted links in 22 files in 0.04 seconds.” and then it will be done.

Navigate to the folder on your computer where you told it to save the website. There should now be a subfolder with the website name. Double-click to open and find the index.html file. Double-click on that and it should bring up your copy of the website in your browser.

If everything looks right and the links all work, great! Now all you need to do is delete your WordPress site and upload the files to the server directory where your WordPress site used to live. NOTE: you will not want to upload the folder that has the website name, just all the files and subfolders that were in it. You’ll probably have to zip all the files together to make it easier to upload them.

And that’s it!

Unless.

I know, I know, I said that was it. But when you’re clicking around in the local copy of your website, you notice that your URLs don’t match the ones you originally assigned them. E.g. your about page is no longer at “www.yourURLhere.com/about” but rather at “www.yourURLhere.com/about/index.html” and also some weird URL of “www.yourURLhere.com/index.html?p=14.html”

If this bothers you, the way to fix it is manually editing the files on your desktop before uploading them. It’s a bit of a pain, but can tidy up the HTML so that your site works the way you wanted (and expected) it to.

Start by renaming all the index.html?p=14.html files as what they ought to have been, e.g. about.html.  Keep a list of what the file name was originally and what you updated it to.

Then you will need to open each file with a text editor such as BBEdit and use its find & replace function to find all instances of the original filename (index.html?p=14.html) and replace with the new filename (about.html). This step will take a while. If you have a main menu, you may want to update the menu in one file (e.g. index.html) and then cut-paste the whole menu to the other files.

Lastly, you will want to delete any extraneous files (e.g. an “about” folder with an “index” file in it) that have been superseded by fixing the randomly named index files. But don’t delete the wp-contet, wp-includes, or wp-json folders as those have content & stylesheets that you need for your site to work.

Once the HTML cleanup is complete, you can now proceed with deleting your WordPress site and uploading the files to the server.