Cross-Account S3 Access
This document describes the AWS setup needed to allow Matillion to load data from an S3 file in a different AWS account. Amazon’s authentication model makes this possible without the S3 owner having to make the data world-readable.
This configuration can be used whenever one AWS account needs to securely share S3 files with another.
Overview of Authorization
Resources such as S3 buckets and files are private to the owner by default. Attempting to use a Matillion S3 Load component to access a bucket belonging to another account will result in a failure with error message.
To enable cross-account access, three steps are necessary:
- The S3-bucket-owning account grants S3 privileges to the “root” identity of the Matillion account. This minimises the coupling between the accounts.
- Within the Matillion account, an EC2 instance role delegates the access on to Matillion, and then on again to Redshift
- As part of the Matillion Environment setup, instance credentials are used by Redshift and allow it to inherit AWS permissions.
Configuring the S3-bucket-owning account
To allow selected others to access the data, without actually making it fully public, the owner of this bucket must add an authorization policy.
This account doesn’t need to know anything about the IAM users in the Matillion account, but there’s always a “root” entity, since every AWS account has one. Thus access can be granted to the root user of the Matillion AWS account. This is done by setting up a bucket policy from the Properties window of the S3 management console.
Three statements are needed in the bucket policy:
- Allow s3:ListBucket on the bucket itself
- Allow s3:GetObject on file(s) in the bucket
- Allow 3:GetBucketLocation on the bucket
These are included in the attached example policy file (owner-policy.txt) found at the bottom of this page. (Note this is just an example and should be used to structure your own policy file rather than used as-is)
ListBucket privilege is used by Matillion to validate the bucket name, and also by the Redshift bulk loader during filename prefix matching. The "Resource" of the GetObject statement can be an asterisk (allowing Matillion to read any file in the bucket) or can be a specific name pattern.
Use the Matillion/Redshift AWS account number in the policy editor, and substitute your own bucket name in the example policy.
At this point, the root user of the Matillion account is able to list this bucket and read files. However, a second round of authorization is required to delegate this access to Matillion itself and Redshift.
Configuring The Matillion Account
If your Matillion instance has been set up with an EC2 instance role as described in the “Managing Credentials” documentation, then it is likely no extra configuration in required.
If you’re using coarse-grained access control, the two S3 privileges required by Matillion are included in the AmazonS3FullAccess policy. If you’re using fine-grained access control, the two privileges are both among those in the “ Recommended Actions” list.
To recap - In case you have a different setup, you must first use the IAM console to create an EC2 service role:
The new role must be given the required privileges as described in the “Managing Credentials” documentation, and at a minimum to include those described in the attached policy file (user-policy.txt).
Finally, use this EC2 service role as the “IAM role” when you launch your Matillion instance:
Configuring the Matillion S3 Load component
Now that the EC2 service role is delegating the S3 access to Matillion, you should be able to configure the S3 Load component in an orchestration job.
The dropdown list of the “S3 URL Location” property won’t contain the other account’s bucket, since it only lists your own buckets. You simply need to type the bucket name into the field instead:
It is possible that the other account’s bucket is in a different AWS Region to Matillion. In this case the load will fail, with an error such as:
S3ServiceException: The bucket you are attempting to access must be addressed using the specified endpoint.
This error condition is deliberate: it would be possible for Matillion to automatically find the region of the bucket, but moving data between regions does have an associated cost. If this happens you will also need to adjust the “Region” property, from its default of None, to the actual region where the source S3 bucket exists.
Matillion makes use of the Redshift bulk loader, and never sees any of the S3 data itself. It is Redshift which is accessing S3 and performing the load. We were using an EC2 instance role, so how did Redshift get permission to access the file?
The answer is in the Environment Settings within Matillion. Setting the “Credentials” property to “Instance Credentials” allows Redshift to inherit the permissions granted to Matillion.
The permission-delegation technique also works equally well with a Redshift service role. In this case the service role must be manually associated with the Redshift cluster (under “Manage IAM Roles” in the Redshift console), and supply the role as credentials in the S3 Load component, for example like this: