Backup a Website on Shared Hosting

By Jimmy Bonney | July 14, 2012

Backup

Backup Strategy

And here we are: another article about backups. After some setup to back up gmail on a Synology drive, here is another article to continue on our backup strategy.

In this article, we will backup the files and database(s) on a web server that is on a shared hosting plan (in this specific case, hosted on Hostgator). There are usually two different possible approaches to backup files:

  1. Archive the file (zip, tar, gzip, …) and download the archive,
  2. Mirror the folder containing the files

Considering that there are some files I am not interesting in backing up on my webserver, I will go for the first solution so that I can white list what I am interested in and create an archive containing only the files that I want to back up. It should be possible to do that with the mirror strategy as well, but I didn’t have much time to investigate the matter.

Once the archive is created, we will sync the folder containing the archive to a local drive. This procedure was originally inspired by an article from Lifehacker. The main differences are as follow:

  • For each archive that is created, we verify if it is identical to the previous one created. This gives us two benefits: 1. we do not store the exact same content multiple times and 2. we do not transfer the same content multiple time.
  • Since the same archive can remain for days, we do not delete old archives based on their timestamps but based on the number of existing archives.

Backup the DB and the Files

The script below takes care of saving a couple of databases as well as some folders and files. Different archives are created since multiple (sub)sites are hosted on the same account. In order to deploy it, a few variables should be updated based on your server setup. Whenever possible, the content to modify is specified in [].

As mentioned in introduction, for each archive that is created, a MD5 sum is calculated and compared against the one from the backup that was taken previously. In the script below, we create a log file for each archive that contains:

  • the date at which the backup was run,
  • the checksum
  • the path to the file
  • a small text clarifying if we create a new backup or if we keep the previous one

If the checksum of the new archive is identical to the one from the previous file, then only the old file is kept.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
#!/bin/sh

# Redirect the stdout and stderr to the log file
exec 1> /home/[YOURACCOUNT]/backups/files/exec_logs.txt 2>&1

# Global variables for the complete script
THEACCOUNT="[YOURACCOUNT]"
THEDBUSER="[YOURDBUSER]"
THEDBPW="[YOURDBPASSWORD]"
THEDATE=`date +%d%m%y%H%M`

# md5 function to verify if archive has changed since last backup
# the function except the following arguments
# $1: a brief description of the file being checked
# $2: the path to the MD5 file -> currently not used
# $3: the path to the archive to verify against the checksum
# $4: the path to the log file
md5check()
{
    DESCRIPTION=$1
    MD5FILE=$2
    ARCHIVE=$3
    LOG=$4
    # Verify if previous backups were made
    if [ -f $LOG ];
    then
        echo "A previous archive exists for $DESCRIPTION"
        # Grab the date at which the previous script was run
        # Get the last line of the log file and take the first string (it contains the date)
        PREVIOUSDATE=`tail -1 $LOG | awk '{print $1}'`
        echo "Previous backup ran at: $PREVIOUSDATE"
        # Grab the previous md5sum
        SUM=`tail -1 $LOG | awk '{print $2}'`
        # echo "Sum: $SUM"
        # Grab the previous filename
        FILE=`tail -1 $LOG | awk '{print $3}'`
        # echo "File: $FILE"
        # Set the sum and filename in a tmp file
        # echo "${SUM}  ${FILE}" > previous_md5.txt
        # echo "Previous checksum: "; cat previous_md5.txt
        # Verify if the previous checksum applies to the current archive
        # The output of md5sum is as follow 'sum  file': notice the two (2) spaces between the sum and the filename
        echo "${SUM}  ${ARCHIVE}" > tmp_md5.txt
        # echo "Verify checksum on: "; cat tmp_md5.txt
        CHECKSUMRESULT=`md5sum -c tmp_md5.txt | awk '{print $2}'`
        if [ "$CHECKSUMRESULT" = "OK" ]
        then
            echo "Checksums are the same: archive has not changed since last run"
            echo "${THEDATE} ${SUM} ${FILE} => keep previous file" >> $LOG
            echo "Delete newly created archive ($ARCHIVE)"
            echo "Keep previous archive ($FILE)"
            rm $ARCHIVE
        else
            echo "Checksums are not the same: archive has changed since last run"
            NEWSUM=`md5sum $ARCHIVE`
            # echo "Checksum: $NEWSUM"
            SUM=`echo ${NEWSUM} | awk '{print $1}'`
            # echo "Sum: $SUM"
            FILE=`echo ${NEWSUM} | awk '{print $2}'`
            # echo "File: $FILE"
            echo "${THEDATE} ${SUM} ${FILE} => new backup file" >> $LOG
            echo "Keep new archive ($ARCHIVE)"
        fi
    else
        echo "Create new log file for $DESCRIPTION"
        NEWSUM=`md5sum $ARCHIVE`
        # echo "Checksum: $NEWSUM"
        SUM=`echo ${NEWSUM} | awk '{print $1}'`
        # echo "Sum: $SUM"
        FILE=`echo ${NEWSUM} | awk '{print $2}'`
        # echo "File: $FILE"
        echo "${THEDATE} ${SUM} ${FILE} => new backup file" > $LOG
    fi
}

# Clean old backup files
# $1: file pattern
clean_old_backups()
{
    PATTERN=$1
    # Remember current directory
    CURRENTDIR=`pwd`
    cd /home/${THEACCOUNT}/backups/files
    # Remove old backups
    (ls -t|grep $PATTERN|head -n 5;ls|grep $PATTERN)|sort|uniq -u|xargs rm
    # Get back to the directory
    cd $CURRENTDIR
}

# Backup of main website
# Hosted on Hostgator under the handling [YOURACCOUNT]
THEDB="[NAMEOFTHEDATABASE]"
THEOTHERDB="[NAMEOFANOTHERDATABASE]"
THEDBUSER="[THEDATABASEUSER]"
THEDBPW="[THEPASSWORDFORTHEDBUSER]"

echo ''
echo '###################################################################'
echo '                      Backup hostgator server                      '
echo "                      Execution date: ${THEDATE}                   "
echo '###################################################################'
echo ''
# Dump the DB
echo ''
echo '-------------------------------------------------------------------'
echo "Dump the DB"
THEDBZIP="/home/${THEACCOUNT}/backups/files/dbbackup_${THEDB}_${THEDATE}.bak.gz"
mysqldump -u $THEDBUSER -p${THEDBPW} $THEDB | gzip > $THEDBZIP
# Verify the MD5 sum
md5check "The DB" unused.txt $THEDBZIP "/home/${THEACCOUNT}/backups/files/dbbackup_${THEDB}_logs.txt"
clean_old_backups "dbbackup_${THEDB}"
echo '-------------------------------------------------------------------'
echo ''

# Dump the other DB
echo ''
echo '-------------------------------------------------------------------'
echo "Dump the other DB"
THEOTHERZIP="/home/${THEACCOUNT}/backups/files/dbbackup_${THEOTHERDB}_${THEDATE}.bak.gz"
mysqldump -u $THEDBUSER -p${THEDBPW} $THEOTHERDB | gzip > $THEOTHERZIP
# Verify the MD5 sum
md5check "The other DB" unused.txt $THEOTHERZIP "/home/${THEACCOUNT}/backups/files/dbbackup_${THEOTHERDB}_logs.txt"
clean_old_backups "dbbackup_${THEOTHERDB}"
echo '-------------------------------------------------------------------'
echo ''

# Save the necessary files for the main website
echo ''
echo '-------------------------------------------------------------------'
THESITE="[SITENAME]"
THETARFILE="/home/${THEACCOUNT}/backups/files/sitebackup_${THESITE}_${THEDATE}.tar"
THEWEBFOLDER="/home/${THEACCOUNT}/www"
echo "Save files related to the main website"
echo "Create the archive"
echo "Add a first folder to the archive"
tar --create --exclude=.svn --file=$THETARFILE -C $THEWEBFOLDER folder1
echo "Add a second folder to the archive"
tar --append --exclude=.svn --file=$THETARFILE -C $THEWEBFOLDER folder2
echo "Add all .php file to the archive"
find $THEWEBFOLDER/* -maxdepth 0 -name '*.php' -exec basename "{}" \; | tar --append --exclude=.svn --file=$THETARFILE -C $THEWEBFOLDER -T -
echo "Add all .txt file to the archive"
find $THEWEBFOLDER/* -maxdepth 0 -name '*.txt' -exec basename "{}" \; | tar --append --exclude=.svn --file=$THETARFILE -C $THEWEBFOLDER -T -
echo "Add img folder to the archive"
tar --append --exclude=.svn --file=$THETARFILE -C $THEWEBFOLDER img
echo "Add js folder to the archive"
tar --append --exclude=.svn --file=$THETARFILE -C $THEWEBFOLDER js
# Verify the MD5 sum
md5check "d-sight.com" unused.txt $THETARFILE "/home/${THEACCOUNT}/backups/files/sitebackup_${THESITE}_logs.txt"
clean_old_backups "sitebackup_${THESITE}"
# Compressing the archive result in different md5sums even if the content of the zip hasn't changed
# echo "Compress the archive"
# gzip $THETARFILE
echo '-------------------------------------------------------------------'
echo ''

# Save the necessary files for a sub website
echo ''
echo '-------------------------------------------------------------------'
THESITE="another site"
THETARFILE="/home/${THEACCOUNT}/backups/files/sitebackup_${THESITE}_${THEDATE}.tar"
THEWEBFOLDER="/home/${THEACCOUNT}/www"
echo "Save files related to the other website"
echo "Create the archive"
tar --create --exclude=.svn --file=$THETARFILE -C $THEWEBFOLDER some-folder
# Verify the MD5 sum
md5check "the other site" unused.txt $THETARFILE "/home/${THEACCOUNT}/backups/files/sitebackup_${THESITE}_logs.txt"
clean_old_backups "sitebackup_${THESITE}"
# Compressing the archive result in different md5sums even if the content of the zip hasn't changed
# echo "Compress the archive"
# gzip $THETARFILE
echo '-------------------------------------------------------------------'
echo ''

The code is also available on Bitbucket.

Automate the Backup

Simply set up a cron task in order to run the script whenever you want it to run. To run it every day at midnight, simply enter:

0    0    *    *    *    /home/[YOURACCOUNT]/backups/script/backup.sh

Synchronize the Backup

Once the backup archives are available, they can be synced with a local drive. This is pretty straightforward and only requires rsync to be in place.

rsync -vaz -e ssh [user]@[domain].[com]:/home/[YOURACCOUNT]/backups/files/ /local/path

If the synchronization is run from another server or from a network drive, it might be necessary to set up an automatic connection through SSH so that the password does not need to be entered every time.


Credits Image

Data Transferring


For the time being, comments are managed by Disqus, a third-party library. I will eventually replace it with another solution, but the timeline is unclear. Considering the amount of data being loaded, if you would like to view comments or post a comment, click on the button below. For more information about why you see this button, take a look at the following article.