Snowflake GCP Storage Integration Setup Guide
This topic covers how to set up a storage integration for use in Snowflake instances on the Google Cloud Platform (GCP). Matillion ETL for Snowflake on GCP requires users to select a storage integration when configuring data staging components.
A storage integration is a named Snowflake object that removes the need to pass explicit cloud provider credentials such as secret keys or access tokens. An integration object references a Google Cloud Storage service account.
- Completing the instructions below requires access to your Google Cloud Storage project as a project editor. If you aren't a project editor, please contact your Cloud Storage administrator for assistance.
- Only users with the Snowflake role ACCOUNTADMIN, or a role with the global CREATE INTEGRATION privilege, can execute the SQL commands mentioned in this topic.
- Any identifiers not enclosed in double quotes ( " " ) aren't case sensitive. The example below includes this for demonstrative purposes; the created integration
DocsTeamis passed by Snowflake as
Create a storage integration in Snowflake
- Log in to your Snowflake account.
- In either an existing Snowflake worksheet or a new worksheet, use the
CREATE STORAGE INTEGRATIONcommand.
The form of the command is as follows:
CREATE STORAGE INTEGRATION <integration_name> TYPE = EXTERNAL_STAGE STORAGE_PROVIDER = GCS ENABLED = TRUE STORAGE_ALLOWED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/') [ STORAGE_BLOCKED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/') ]
This uses the following variables:
<integration_name>The name of the new integration.
<bucket>The name of a Google Cloud Storage bucket that stores your data files.
<path>An optional file path used to provide granular control over objects in the Cloud Storage bucket.
STORAGE_BLOCKED_LOCATIONS let you specify any buckets that you wish to restrict access to, or block access from, respectively. As shown in the above template, locations are enclosed in single quotation marks, and multiple locations separated by commas. The
STORAGE_BLOCKED_LOCATIONS parameter is optional.
The following example shows that the command for the integration
DocsTeam has been run successfully and the storage integration
DOCSTEAM has been created.
This command will create a storage integration and explicitly limit external stages that use this integration to reference the specified bucket (the bucket name is blurred for security purposes).
Additional stages that also use this integration can reference the permitted bucket.
- In your Snowflake worksheet, execute the
GRANTcommand to set the usage of this particular integration to PUBLIC.
GRANT usage on integration <integration_name> to ROLE Public
- In your Snowflake worksheet, execute the Describe Integration command to retrieve the ID for the Cloud Storage service account that was automatically created for your Snowflake account.
You can abbreviate DESCRIBE to DESC when writing this command, and the term
STORAGE is optional:
DESC [STORAGE] INTEGRATION <integration_name>
After executing the command, the results will look like this:
- After you run the
DESCcommand, navigate to the row labelled STORAGE_GCP_SERVICE_ACCOUNT in the property column. From the
property_valuecolumn of this row, retrieve the ID (the string the red arrow in our image points at) for this Cloud Storage service. Copy this ID string, as you will require it later.
As Snowflake clarifies in their documentation, they provision a single Cloud Storage service account across your entire Snowflake account. All Cloud Storage integrations use that service account.
The next section focuses on granting permissions within the GCP console.
Granting service account permissions to access bucket objects
Follow the steps below to configure IAM access permissions in your Google GCP Console to use a Cloud Storage bucket to load and unload data:
- Log in to the Google Cloud Platform console.
- Click IAM & admin.
- Click Roles from the drop-down menu.
- Click + CREATE ROLE.
- Complete the Title, Description, and ID fields.
- Click + ADD PERMISSIONS.
- Add the following permissions:
The image below provides an example.
- Click Create (or Update if you are editing a Role).
The next section focuses on assigning a custom role to the Cloud Storage service account.
Assigning the custom role to the Cloud Storage service account
- Click the menu button in the top left of the GCP Console.
- Click Storage.
- Click Browser.
- Navigate to your Cloud Storage bucket or create a new bucket.
- Inside your bucket's details page, click the Permissions tab.
- Click the blue Add members button.
- In the Add members to
form, enter your STORAGE_GCP_SERVICE_ACCOUNT ID obtained during Create a storage integration in Snowflake, earlier.
- Click the Role drop-down and select Storage > Storage Admin.
- Click Save.
Using the integration in Matillion ETL
- In Matillion ETL, create a new orchestration job, and drag your chosen data staging component onto the canvas.
- Click on the data staging component, then click its Properties tab and set the following properties:
- Staging: Select Existing Google Cloud Storage Location. The alternative, Snowflake Managed, is the default setting and will configure sensible defaults automatically.
- GCS Staging Area: Select your Cloud Storage bucket.
- Storage Integration: Select your newly created integration.
If you require assistance creating a storage integration in Snowflake, or with any other aspect of Matillion ETL, please visit our Getting Support page.
Please consult the Snowflake documentation to read deeper into these topics, for example, to learn how to modify an existing stage to use a storage integration.