Backing up to Amazon S3 using s3cmd tool

Amazon S3 has been called the "hard disk of the internet". Short for Simple Storage Service , it is a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web.

S3 has been attractive to developers because of its very affordable pricing model, scalability, high availability and reliability. It can be a great service for serving static content for your website such as videos, music, images etc or backing up your data offsite.

I mainly use it for offsite data backup.

Amazon S3 has a web interface akin to the likes of Google drive, Dropbox, Onedrive for creating, editing and deleting of your data. The web interface accessible via https://console.aws.amazon.com can also be used to create users and set permissions which determines how they access S3 buckets or objects.

However, you can use command line tools such as s3cmd to perform the same task you do with this web interface remotely.

S3 Buckets and Objects

S3 buckets are like folders while objects are like files. Some of the bucket rules include;

  • Each user can only have 100 buckets at the most.
  • Bucket names must be unique amongst all users of S3.
  • Buckets can not be nested into a deeper hierarchy.
  • Name of a bucket can only consist of basic alphanumeric characters plus dot (.) and dash (-). No spaces, no accented or UTF-8 letters, etc.

S3 objects almost have no restrictions.

Getting Amazon S3

Amazon S3 is a paid-for service. You can signup at https://console.aws.amazon.com. First-time users will be given 5GB, 20,000 Get Requests, and 2,000 Put Requests after which you pay.

Amazon products are known to have a notoriously intimidating pricing structure. With S3, you are generally charged for;

  • Storage space: $0.026 per GB.
  • Data download: FREE for first 1GB / month data downloaded, then $0.090 per GB - up to 10 TB / month data downloaded.
  • PUT/GET/LIST requests: 0.005 per 1,000 PUT or COPY or LIST requests $0.004 per 10,000 GET and all other requests.

The important thing to note here is that you'll be charged for bandwidth every time you pull data from S3, but uploading is Free. If you run a website where a lot of users are downloading your S3 objects, then be prepared for a hefty bill at the end of the month.
You can check out the whole pricing structure here.

Installating s3cmd

You can install s3cmd tool from your distro repository which is really cool.
Ubuntu/Debian
sudo apt-get install s3cmd

CentOS/Rehat
yum install s3cmd

You can also git clone from their github repo here https://github.com/s3tools/s3cmd.

IAM User policies

Before you get any further, you need to understand AWS Identity and Access Management (IAM). IAM allow you to securely control access to AWS services and resources for your users.

If S3 is a file system, IAM are users and groups on S3. And just as you can configure different users to access different resources and processes on a conventional filesystem, you do the same with IAM. You can read more about IAM here.

Once signed-in, you have to create a user and add the necessary permissions that allow the user to create, list and delete S3 buckets/objects.

An IAM policy that allow user to access to one of your buckets and it looks like this one below;

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Action":[
            "s3:ListAllMyBuckets"
         ],
         "Resource":"arn:aws:s3:::*"
      },
      {
         "Effect":"Allow",
         "Action":[
            "s3:ListBucket",
            "s3:GetBucketLocation"
         ],
         "Resource":"arn:aws:s3:::examplebucket"
      },
      {
         "Effect":"Allow",
         "Action":[
            "s3:PutObject",
            "s3:GetObject",
            "s3:DeleteObject"
         ],
         "Resource":"arn:aws:s3:::examplebucket/*"
      }
   ]
}

You can see more example policies here.

Configuring s3cmd environments

Once you have installed the s3cmd tool and have set the right IAM access policies, now we can create, delete and list S3 buckets and objects right from the commandline.

But before you do that, you have to setup the environment configuration that s3cmd tool needs to access your S3 account. Basically here, we specify the user access key and secret key both of which are setup when you create an account on the Amazon console. However, you can still create access key and secret key anytime for IAM user by clicking on IAM under services menu.

To get started run s3cmd --configure. You'll be prompted to enter you Access and Secrete Key as you can see from the dialogue below.

sudo s3cmd --configure  
Enter new values or accept defaults in brackets with Enter.  
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3  
**Access Key: XXXXXXXXXXXXXXXXXXXXXXXXXXXX**
**Secret Key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX**

Encryption password is used to protect your files from reading  
by unauthorized persons while in transfer to S3  
Encryption password: xxxx  
Path to GPG program [/usr/bin/gpg]: 

When using secure HTTPS protocol all communication with Amazon S3  
servers is protected from 3rd party eavesdropping. This method is  
slower than plain HTTP and can't be used if you're behind a proxy  
Use HTTPS protocol [No]: 

On some networks all internet access must go through a HTTP proxy.  
Try setting it here if you can't conect to S3 directly  
HTTP Proxy server name: 

New settings:  
  Access Key: XXXXXXXXXXXXXXXXXXXXXXXXXXXX
  Secret Key: XXXXXXXXXXXXXXXXXXXXXXXXXXXX
  Encryption password: xxxx
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: False
  HTTP Proxy server name: 
  HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] y  
Please wait, attempting to list all buckets...  
Success. Your access key and secret key worked fine :-)

Now verifying that encryption works...  
Success. Encryption and decryption worked fine :-)

Save settings? [y/N] y  
Configuration saved to '/root/.s3cfg'  
Using s3cmd tool

With everything setup, you can now create, delete s3 buckets and manipulate objects or rather files.

Here are some of the s3cmd commands;

#list S3 buckets
~ s3cmd ls

#create a new S3 buckets
~ s3cmd mb s3://backup.davidokwii.com

#upload file
~ s3cmd put backup.davidokwii.com.tar.gz s3://backup.davidokwii.com/

#upload directory 
~ s3cmd -r put backup.davidokwii.com s3://backup.davidokwii.com/

#rsync-like copy 
s3cmd sync --skip-existing $BACKUPDIR/weekly/ s3://MYBACKUP/backup/mysql/ ##There is also --delete-removed option which will remove files not existing locally. 

#download/get file 
~ s3cmd get s3://backup.davidokwii.com/backup.davidokwii.com.tar.gz

#delete file/object
~ s3cmd del s3://backup.davidokwii.com/backup.davidokwii.com.tar.gz

#delete folder
~ s3cmd del s3://backup.davidokwii.com/backup.davidokwii.com

#delete bucket 
~ s3cmd rb s3://backup.davidokwii.com

A few gotchas

s3cmd put my-dir s3://backup.davidokwii.com/backup is different from ~ s3cmd put my-dir s3://backup.davidokwii.com/backup/. Not that there's a leading slash in the second instance.

The first command will result in my-dir folder uploaded to s3://backup.davidokwii.com/my-dir while the second behaves in the way you would expect. The folder will be uploaded to s3://backup.davidokwii.com/backup/my-dir. That's really important to note.

Another thing is that s3cmd -r put my-dir/ s3://backup.davidokwii.com/backup will upload contents of my-dir instead of the folder my-dir. To achieve that latter, you should omit the leading slash like s3cmd -r put my-dir s3://backup.davidokwii.com/backup.

Image: eltima.com

David Okwii

David Okwii is a Systems Engineer who currently works with Uganda's country code top-level domain registry.

Kampala Uganda http://www.davidokwii.com

Subscribe to oquidave@geek:~ #

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!