Building an Upload System Backed by S3 and Client-Side Uploads

Client-Side S3 Uploads

One of the things I try to do when building applications is keep servers stateless. This makes those servers easy to throw away — a piece of infrastructure failing (which it always will) is not a big deal with stateless servers. Just spin up a new one.

When the requirement came down the pipeline to add a user upload system to an application I was less than thrilled. User uploads are the opposite of stateless.

The usual way to handle this is supporting file uploads in the backend and server configuration and when the user sends a file ingest it then send it on to some cloud storage backend like S3. This keeps the server stateless, but the servers configuration still ends up impacted by the upload system.

The good news, however, is that with S3 pre-signed URLs can be generated on the backend that client-side code can use to upload to S3 directly. The backend footprint of a file upload system is reduced a single endpoint that generates pre-signed URLs for AWS.

S3 handles all the heavy lifting. And it can be heavy. The largest file S3 accepts in a single PutObject request is 5GB, though they recommend files greater than 100MB use multipart uploads.

Three big parts to this:

The infrastructure set up: how the S3 bucket should be created and configured.
Backend code: how to generate pre-signed URLs.
Client-side code: how to fetch and use those pre-signed urls to upload files.

The backend examples here are JavaScript, but any language can be used. All AWS SDKs support pre-signed URL generation.

Where possible I’m going to link to AWS documentation rather than screencap or writing my own version of what AWS has. AWS is more likely to keep its documentation up to date than I am on this post.

Setting Up S3

The first step here is to create a S3 bucket. This bucket will need some special Cross-Origin Resource Sharing (CORS) configuration that will impact the entire bucket. I’d recommend that a new, upload-specific bucket be created so those changes don’t impact other files.

Once the bucket is created, it’s time to modify the Cross Origin Resource Sharing (CORS) permissions on that bucket. Without this, client-side uploads won’t be able to make requests for the application page(s) to S3.

This can be done in the AWS Console, with the AWS CLI, or with any SDK.

The CORS configuration is a bunch of XML. The goal here is to allow PUT requests from the application domain. Our examples here are going to run on http://localhost:8080, but a production app might run on https://somecoolething.app. We want our S3 bucket to allow requests from whatever protocol, domain, and port combination where the app is running. That value will go in the AllowedOrigin XML element.

An example CORS configuration may look like this, be aware that this is very permissive for AllowedHeader (used in preflight requests and Access-Control-Request-Headers).

<CORSConfiguration>
    <CORSRule>
        <AllowedOrigin>http://localhost:8080</AllowedOrigin>
        <AllowedMethod>PUT</AllowedMethod>
        <MaxAgeSeconds>3000</MaxAgeSeconds>
        <ExposeHeader>ETag</ExposeHeader>
        <ExposeHeader>x-amz-request-id</ExposeHeader>
        <ExposeHeader>x-amz-id-2</ExposeHeader>
        <AllowedHeader>*</AllowedHeader>
    </CORSRule>
</CORSConfiguration>

I like to use terraform. The above configuration would like this in HCL (Hashicorp Configuration Language).

variable "cors_origins" {
    type = "list"
    description = "the `origin` of cors requests"
    default = ["http://localhost:8080"]
}

variable "bucket_name" {
    type = "string"
    description = "The name of the uploads bucket"
    default = "chrisguitarguy-s3-uploads-tutorial"
}

resource "aws_s3_bucket" "upload" {
    bucket = "${var.bucket_name}"
    cors_rule = {
        allowed_origins = ["${var.cors_origins}"]
        allowed_headers = ["*"]
        allowed_methods = ["PUT"]
        expose_headers = [
            "ETag",
            # these are useful for debugging purposes
            "x-amz-request-id",
            "x-amz-id-2"
        ]
        max_age_seconds = 3000
    }
}

New to CORS? MDN is a good place to start.

The Overall Flow

Now that we have some infrastructure set up, it’s a good time to explain the overall flow of an upload through a system like this.

A user adds a file to the UI — via a <input type="file" /> or drag-and-drop or whatever
A request is made to a backend endpoint to get a pre-signed URL
Client-side code takes that pre-signed URL and makes a PUT request to it with the file object as the request data
The client-side code then (probably) submits the s3:// URL with the form in place of the file itself — so whatever is on the backend can track the upload

Creating Pre-Signed URLs

A pre-signed URL lets its client perform an action on S3 without having to mess with any other authentication. In our cause, we’ll make a pre-signed PutObject URL that will let our JS frontend kick in and upload the file directly to S3.

The backend here is javascript, but use what you like. All AWS SDKs have support for pre-signed URLs (there are some examples here).

The example app will has a /presign endpoint that only generates the pre-signed URL and a filename. A real app may do more things like take the incoming filename or MIME type to guess a file extension, create a whole “upload” object in a database somewhere, or anything else.

The biggest thing to note here is that S3 be configured to use AWS v4 signature as it’s the only one that currently supports streaming unsigned content.

We’ll use the uuid package to generate a unique filename.

const express = require('express');
const aws = require('aws-sdk');
const uuid4 = require('uuid/v4');

// S3 Bucket Name
const bucket = 'chrisguitarguy-s3-uploads-tutorial';
// URL expires in 5 minutes
const expires = 60 * 5;
const app = express();
const s3 = new aws.S3({
    signatureVersion: 'v4',
});

// other express middleware and such here

app.post('/presign', function (req, res) {
    const filename = `${uuid4()}.txt`;
    s3.getSignedUrl('putObject', {
        Bucket: bucket,
        Key: filename,
        Expires: expires,
    }, function (err, url) {
        if (err) {
            return res.status(500).json({error});
        }

        res.status(201).json({
            url,
            filename: `s3://${bucket}/${filename}`,
        });
    });
});

Perform the Client-Side Upload

Our JavaScript “app” is just some plain HTML with a form and an onsubmit handler that overrides the default submit.

Here’s the HTML:

<form method="post" action="/submit" onSubmit="return onSubmit(this)">
    <div class="form-group">
        <input name="file" type="file" accept="text/plain" />
    </div>
    <button type="submit" class="btn btn-primary">Submit</button>
</form>

The onSubmit function comes from an external JS file. Remember our flow from above? When the user submits the form, we want to prevent the default and instead…

Request a pre-signed URL
Upload the file to the pre-signed URL
Submit the form with the uploaded file’s URL instead of the file itself

// 1. create pre-signed URL
function createPresignedUrl() {
    // ...
}

// 2. Upload the file to the pre-signed URL
function uploadFile(url, file) {
    // ...
}

// 3. submit the form with the uploaded file's name
function submitForm(filename) {
    // ...
}

exports.onSubmit = function onSubmit(form) {
    if (form.file.files.length < 1) {
        alert('select a file please');
        return false;
    }

    var file = form.file.files[0]

    createPresignedUrl().then(res => {
        return uploadFile(res.url, file).then(() => res.filename);
    }).then(filename => {
        return submitForm(filename).then(() => {
            alert(`Hooray you uploaded ${filename}`);
            form.reset();
        });
    });

    return false;
}

Let’s fill in each bit. All the HTTP interactions here are going to use the fetch API.

Creating a Presigned URL

This is making a request to our /presign endpoint created above and decoding its JSON response.

function createPresignedUrl() {
    return fetch('/presign', {method: 'POST'}).then(r => r.json());
}

Uploading the File

This a bit more tricky as we need to do some things to parse the response. Fetch doesn’t reject its promises on 40X statuses, that’s up to application code to check.

AWS sends back XML in its responses, so this uses response.text() to get the entire response body. The body doesn’t really matter for a successful upload, but it’s helpful to see error messages if something does go wrong.

function uploadFile(url, file) {
    return fetch(url, {
        method: 'PUT',
        body: file,
    }).then(resp => {
        return resp.text().then(body => {
            const result = {
                status: resp.status,
                body,
            };

            if (!resp.ok) {
                return Promise.reject(result);
            }

            return result;
        });
    });
}

Need upload progress? Use XMLHttpRequest to monitor upload progress.

Submitting the Form

This one is making a request to a /submit endpoint that just echo’s back the filename.

function submitForm(filename) {
    return fetch('/submit', {
        method: 'POST',
        body: JSON.stringify({filename}),
        headers: {
            'Content-Type': 'application/json',
        },
    })
}

What’s Missing

Lots of stuff is missing from this example:

Showing the user some sort of loading indicator to show the form is working
Maybe some upload progress indicators
Any sort of error handling

Those are all things that are going to be pretty application specific, so they’ve been skipped here.

Troubleshooting

Preflight CORS Request Failures

Before a browser makes a non-simple cross-origin request it will preflight an OPTIONS request to the endpoint with some special headers to make sure the actual request can be made.

If this fails, check the bucket’s CORS configuration. Something in there is not allowing the request to go through. See Setting Up S3 above.

Invalid Signature on Upload Requests

Sometimes the preflight request will succeed but the actual upload request will 403 with an invalid signature error message. Make sure that the AWS signature version is set to v4. Many of AWS’s SDK default to this version, but some may not.

See Creating Pre-Signed URLs above.

Example Code

The whole app can be found on github.