This week I got a chance to work on implementing AWS Transfer as an SFTP server backed by a set of S3 buckets. Authentication in this new system is handled by another, self-serve SFTP application. Users can create an manage SFTP users there and AWS Transfer uses their usernames, passswords, and public keys to authenticate them.
These are some notes on things I discovered along the way.
Server Config is Thin, User Configuration does the Work
The actual AWS Transfer server configuration is very thin:
- desired protocols (SFTP, FTP, etc)
- Storage backend (S3 or elastic file system)
- Where the server should run (VPC with an elastic IP, VPC endpoint, public, etc), this includes things like security groups and such.
- How the users should be authenticated (lambda, api gateway, system managed, active directory)
Noteably absent from the configuration: what s3 bucket should be used and how that user acquires permissions to to the storage backend (s3 bucket). Which brings me to my next discovery.
Multiple AWS S3 Buckets Can be Used
Per user or per group of user S3 buckets can be used — all the configuration of “where do the files land” type stuff goes into user configuration. The easy way to see that is to check the system managed user docs. Going to explain more about this configuration below.
Custom Lambda Identity Providers
Using a custom lambda identity provider means the AWS Transfer server does a lambda:InvokeFunction
API call to the lambda with an event that looks like this:
{
'username': 'sftp.username',
'password': 'shhh',
'protocol': 'SFTP',
'serverId': 's-abcd123456',
'sourceIp': '192.168.0.100'
}
If the user is attempting public key authentication, the password
field will be absent.
Transfer expect the lambda to return a set of values as result. This just means doing…
return {
'Role': 'iamrolehere',
'Policy': '{}'
// etc
}
In the lambda’s handler function. The fields are explained in the docs but I still ended up fighting with them a little bit so there are some details below.
User doesn’t authenticate? Return an empty object or let the lambda error and authentication will be denied.
This lambda can do anything it needs to do. In my case, it called out to another API to get SFTP user details then authenticated them.
Home Directory and Home Directory Type
The lambda has to return details about where the users files should end up:
HomeDirectory
HomeDirectoryType
HomeDirectoryDetails
HomeDirectoryType
determins how the user sees their path in the SFTP Server. LOGICAL
is the most use here as it provides a way to “chroot” a user to a single directory. If HomeDirectoryType
is set to LOGICAL
, then HomeDirectoryDetails
has to be provided.
HomeDirectory
and HomeDirectoryDetails
are exclusive, both can’t be sent.
What’s in a Home Directory Path?
Looking through those docs and thinking, “wait, how do I actually define what S3 bucket this users files will land in?”
That was me too. Paths in HomeDirectory
or as the target in HomeDirectoryDetails
take the format:
/s3-bucket-name-here/path/to/user
where s3-bucket-name-here
is the actual name of the S3 bucket and the path/to/user
following it is where the user’s home directory is located.
Home Directory Examples
Here’s a PATH
home directory type with HomeDirectory
. When the user connects to this transfer server, they’ll see /s3-bucket-name/username
as their path and be able to move around the entire bucket if permissions allow them too.
def handler(event, context):
# pretend we authed the user here
return {
'HomeDirectoryType': 'PATH',
'HomeDirectory': '/s3-bucket-name/username',
}
Now here’s an example with a logical home directory. Of now here is that HomeDirectoryDetails
is meant to be a JSON blob (dictionary/object encoded to a JSON string) and not an actual dictionary/object.
This will chroot the user to /
and they will not see the s3-bucket-name in the path.
import json
def handler(event, context):
# context
return {
'HomeDirectoryType': 'LOGICAL',
'HomeDirectoryDetails': json.dumps([
{
'Entry': '/',
'Target': '/s3-bucket-name/username',
}
]),
}
Alternatively, a single directory could be specified. I’m migrating to transfer from an SFTP server that has a writeable
directory to facility chrooting users, so to mimic that:
def handler(event, context):
# context
return {
'HomeDirectoryType': 'LOGICAL',
'HomeDirectoryDetails': json.dumps([
{
'Entry': '/writeable', # <- !!
'Target': '/s3-bucket-name/username',
}
]),
}
When the user connects to the transfer server they will only see the writeable
directory. This could likely also be used to send files to different buckets based on the target value.
User IAM Roles and Policies
The other thing returned from the handler is an IAM Role that transfer will assume on behalf of the user as well as a session policy that scopes down the user’s access.
At a minimum the IAM Role has to allow Access to the S3 bucket where the files will land and if the bucket is using KMS encryption, it must allow access to the key used to encrypt files.
The gist here is the IAM role probably will have too much access for a single user in SFTP:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowListBucket",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::S3_BUCKET_NAME_HERE"
},
{
"Sid": "AllowObjectAccess",
"Effect": "Allow",
"Action": [
"s3:PutObjectACL",
"s3:PutObject",
"s3:GetObjectVersion",
"s3:GetObjectACL",
"s3:GetObject",
"s3:DeleteObjectVersion",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::S3_BUCKET_NAME_HERE/*"
},
{
"Sid": "AllowKmsAccess",
"Effect": "Allow",
"Action": [
"kms:GenerateDataKey*",
"kms:DescribeKey",
"kms:Decrypt"
],
"Resource": "arn:aws:kms:*:002096479106:key/KEY_ID_HERE"
}
]
}
This would give a user logging in full access to all object in the bucket, not ideal. But the policy can narrow the scope of permission — it can’t grant more permissions outside what the role allows, but it can scope down permissions and transfer provides a set of variables to help with that.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowListHomeDirectory",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::S3_BUCKET_NAME_HERE",
"Condition": {
"StringLike": {
"s3:prefix": [
"${transfer:UserName}",
"${transfer:UserName}/*"
]
}
}
},
{
"Sid": "AllowHomeDirObjectAccess",
"Effect": "Allow",
"Action": [
"s3:PutObjectACL",
"s3:PutObject",
"s3:GetObjectVersion",
"s3:GetObjectACL",
"s3:GetObject",
"s3:DeleteObjectVersion",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::S3_BUCKET_NAME_HERE/${transfer:UserName}/*"
},
{
"Sid": "AllowKmsAccess",
"Effect": "Allow",
"Action": ["kms:GenerateDataKey*", "kms:DescribeKey", "kms:Decrypt"],
"Resource": "arn:aws:kms:*:002096479106:key/KEY_ID_HERE"
}
]
}
the ${transfer:UserName}
will be replaced with the username of the authenticated user in the policy above. Note that it has pretty much all the permissions, but uses IAM conditions to limit listing the bucket to the username
directory. Similarly objects can only be manipuated in the username
directory.
Using object tagging? IAM permissions must include s3:GetObjectTagging
and s3:PutObjectTagging
.
Role
and Policy
are returned from the authentication handler. Policy
is a JSON blob again, not an object/dictionary.
def handler(event, context):
return {
'Role': 'arn:aws:iam::0001112223333:role/aws-tranfer-role-here
'Policy': json.dumps(iam_policy),
# home directory, etc
}
Very important note: if using a LOGICAL
home directory type only the ${transfer:UserName}
policy variable is available.
Public Key Authentication
If the SFTP client attemps public key authentication, a request will be sent to the lambda IDP without a password. If that happens return all the same stuff as above (role, policy, home directory stuff) as well as a set of public keys the user is allowed to authenticate with:
def handler(event, context):
return {
# same stuff as above
'PublicKeys': [
'ssh-rsa abc123',
'ssh-rsa etc',
]
}
Transfer will take of validating the public keys. If none match, transfer will deny authentication.
If the user does not have public keys or the implementation doesn’t allow them, then return an empty object or have the lambda error.
Encrypted S3 Buckets
If using S3 encryptiong with KMS, the IAM role and policy for each user must have access to at least kms:GenerateDataKey
, kms:Decrypt
and kms:DescribeKey
for the encryption key associated with the bucket.
The S3 bucket must also have default encryption set up. AWS Transfer will not send additiona encryption parameters with requests to S3.
But What About Actual Authentication?
I’ve glossed over this so far, but basically the lambda can do whatever it wants to authenticate the user. It could accept any user, it could call out to a third party service. It could pull user data from DynamoDB or a database — it’s completely flexible.
If the user fails to authenticate, the lambda can error or it can return an empty dictionary/object to deny authentication
def handler(event, context):
if 'password' not in event:
return {} # don't support public key auth
username = event['username']
password = event['password']
response = requests.post(f'sftpserviceexample.com/user/{username}/authenticate', json={'password': password})
response.raise_for_status() # 403 if bad password?
return {
# role, policy, home directory here
}
Conclusion
Overall I’ve been pretty impressed this week with AWS Transfer. When I originally looked into it on its release the only custom IDP support was via API Gateway, which seemed very heavy. The custom lambda identity provider was fairly easy to get set up, it just took a little bit of experimentation.
The info here can be found in the docs, but it’s scattered and I thought I would collect what I’d learned along the way to a working implementation.