How to Set Up a Cron Job for Automated Backups on DigitalOcean - Django

Oct. 16, 2024, 9:51 p.m.

In any web project, maintaining regular backups is crucial to safeguard your data. Automating these backups can save you time and effort, ensuring that your information is always safe. If you’re running a Django project on a DigitalOcean droplet, you can easily set up cron jobs to automate tasks like backups. This guide will walk you through how to configure a cron job to back up your database and media files every night at 2 AM.

What Is a Cron Job?

A cron job is a Linux utility that schedules commands or scripts to run automatically at specific times and dates. This is particularly useful for tasks like backups, maintenance, or regular updates.

1. Write a Django Management Command for Backups

Before setting up the cron job, make sure you have a management command in Django that performs the backup. If you haven’t written one yet, here’s a simple version of a backup command:

The name of the file where the code resides is: create_backup.py

project/
├── my_app/
    ├── management/
        ├── commands/
            ├── create_backup.py  # This is where the provided code is stored

 

from django.core.management.base import BaseCommand
import os
import shutil
import datetime
from django.core.management import call_command
from decouple import config
from django.conf import settings

class Command(BaseCommand):
    help = 'Backup data from all apps'

    def handle(self, *args, **kwargs):
        # Get the current date in YYYY-MM-DD format to create a unique folder for today's backup
        today_date = datetime.date.today().strftime("%Y-%m-%d")

        # Determine the environment ('production' or 'development') and set the backup directory accordingly
        environment = config("ENVIRONMENT", default="development")
        
        # Set the backup directory based on the environment (use different paths for production and development)
        if environment == "production":
            backup_base_dir = f"/home/username/backup/my_project/"  # Path for production backups
        else:
            backup_base_dir = f"backup/"  # Path for development backups (relative to project folder)
        
        # Create the backup base directory if it doesn't already exist
        os.makedirs(backup_base_dir, exist_ok=True)

        # Manage old backups by deleting ones older than the retention period (15 days)
        self.manage_old_backups(backup_base_dir)

        # Create the specific backup directory for today's date
        backup_dir = f"{backup_base_dir}{today_date}/"

        # If today's backup directory already exists, delete it to ensure fresh data backup
        if os.path.exists(backup_dir):
            shutil.rmtree(backup_dir)

        # Create a new backup directory for today's date
        os.makedirs(backup_dir)

        # Backup the 'users.User' model (this is a dummy model name for demonstration)
        # The data will be saved as a JSON file named 'user_backup.json' in the backup directory
        user_backup_file = os.path.join(backup_dir, 'user_backup.json')
        with open(user_backup_file, 'w') as f:
            # Dump data from the 'users.User' model into the JSON file
            call_command('dumpdata', 'users.User', stdout=f)

        # Additional model backups can be added here, similar to the above code
        # For each model, the data can be dumped to separate files within the same backup directory

        # Backup the media files (files uploaded by users), stored in the MEDIA_ROOT directory
        media_dir = settings.MEDIA_ROOT
        if os.path.exists(media_dir):
            # Copy the entire media directory to the backup location
            media_backup_dir = os.path.join(backup_dir, 'media')
            shutil.copytree(media_dir, media_backup_dir)

    def manage_old_backups(self, backup_base_dir):
        # Set the retention period for backups (15 days)
        retention_days = 15
        
        # Calculate the cutoff date: backups older than this date will be deleted
        cutoff_date = datetime.date.today() - datetime.timedelta(days=retention_days)
        
        # Loop through each folder in the backup base directory
        for backup_folder in os.listdir(backup_base_dir):
            backup_folder_path = os.path.join(backup_base_dir, backup_folder)

            # Check if the folder is indeed a directory (and not a file)
            if os.path.isdir(backup_folder_path):
                # Extract the date from the folder name (assuming it's in the YYYY-MM-DD format)
                folder_date = datetime.datetime.strptime(backup_folder, "%Y-%m-%d").date()

                # If the folder date is older than the cutoff date, delete the backup folder
                if folder_date < cutoff_date:
                    shutil.rmtree(backup_folder_path)

In the above script:

  • users.User is used as a dummy model name.
  • Backup directories and paths are generic and fictional.
  • The backup command handles media files and deletes backups older than 15 days.

2. Log In to Your DigitalOcean Droplet

You’ll need SSH access to your DigitalOcean droplet. To log in, use:

ssh root@your_droplet_ip

Replace your_droplet_ip with the IP address of your droplet.

3. Edit the Crontab

Now that you’re logged in, you need to set up the cron job. To do this, edit the crontab file:

crontab -e

If this is your first time using crontab, you’ll be asked to choose an editor. Selecting nano is a good option for simplicity.

4. Add the Cron Job

To run your backup script every night at 2 AM, add the following line to the crontab:

0 2 * * * /path/to/your/virtualenv/bin/python /path/to/your/project/manage.py create_backup >> /path/to/your/project/backup.log 2>&1
  • 0 2 * * *: This means the script will run at 2 AM daily.
  • /path/to/your/virtualenv/bin/python: The path to your Python executable inside the virtual environment (or name of you virtual environment).
  • /path/to/your/project/manage.py create_backup: The full path to your Django project’s manage.py file with the create_backup command.
  • >> /path/to/your/project/backup.log 2>&1: This logs the output (including any errors) to a file called backup.log.

5. Save the Cron Job

If you’re using nano, press CTRL + X, then press Y to save, and hit ENTER.

6. Verify the Cron Job

To confirm that your cron job has been added successfully, run:

crontab -l

You should see the cron job you just added.

7. Confirm the Backups Are Running

You can check the logs the next day to ensure the backup script is running properly by opening the backup.log file:

cat /path/to/your/project/backup.log

This will display any output or errors that occurred when the cron job ran.

Why You Should Automate Backups

Regular backups protect your project from data loss due to server failures, human error, or hacking incidents. Automating these backups with a cron job ensures they happen without manual intervention, so you can focus on other aspects of your project.

Conclusion

Setting up automated backups on DigitalOcean with cron jobs is simple but crucial for protecting your Django project. By following this guide, you can ensure that your data and media files are regularly backed up and older backups are automatically deleted. This provides a solid safety net for your web application, giving you peace of mind knowing that your project is always secure.

For more tips and guides on web development, stay tuned to our blog.

Back