Cloud Storage Integration Setup Guide

Cloud Storage Integration Setup Guide


Overview

This topic covers how to set up an integration for use in Snowflake instances on the Google Cloud Platform (GCP). Matillion ETL for Snowflake instances on GCP require users to select a storage integration when configuring data staging components.

A storage integration is a named Snowflake object that removes the need to pass explicit cloud provider credentials, such as secret keys or access tokens. An integration object references a Google Cloud Storage service account.

 

Important Information

1. Completing the instructions below requires access to your Cloud Storage project as a project editor. If you are not a project editor, please contact your Cloud Storage administrator for assistance.

2. Only users with the Snowflake role ACCOUNTADMIN, or a role with the global CREATE INTEGRATION privilege, can execute the SQL commands mentioned in this topic.

3. Any identifiers not enclosed in double quotes [" "] are not case sensitive. The example below includes this for demonstrative purposes—the created integration DocsTeam is passed by Snowflake as DOCSTEAM.

 

Step 1: How to Create a Storage Integration

1. Log in to your Snowflake account.

2. In either an existing Snowflake worksheet or a new worksheet, use the CREATE STORAGE INTEGRATION command.

Here is a template image of the command:

 

And here is a breakdown of the variable information:

  • <integration_name> - the name of the new integration.
  • <bucket> - the name of a Google Cloud Storage bucket that stores your data files.
  • <path> - an optional filepath used to provide granular control over objects in the Cloud Storage bucket.

The parameters STORAGE_ALLOWED_LOCATIONS and STORAGE_BLOCKED_LOCATIONS are optional. These parameters let users specify any buckets that users wish to restrict access to, or block access from, respectively. As in the above template, locations should be specified in single quotation marks ['], and each location separated by a comma [,].

In this example, the SQL command looks like this:

 

In the next image, the command has run successfully. The storage integration, DOCSTEAM, has been created.

 

This command will create a storage integration and explicitly limit external stages that use this integration to reference the specified bucket (the bucket name is blurred for security purposes).

Additional stages that also use this integration can reference the permitted bucket.

3. Ensure you execute the GRANT command to set the usage of this particular integration to PUBLIC.

 

4. In your Snowflake worksheet, execute the Describe Integration command to retrieve the ID for the Cloud Storage service account that was automatically created for your Snowflake account.

Note: DESCRIBE can be abbreviated to DESC when writing this command.

 

After executing the command, the results will look like this:

 

After you run the DESC command, navigate to the row labelled STORAGE_GCP_SERVICE_ACCOUNT in the "property" column. Navigate along this row to the "property_value" column to retrieve the ID (the string the red arrow in our image points at) for this Cloud Storage service. Copy this ID string—it will be required later.

As Snowflake clarifies in their documentation, they provision a single Cloud Storage service account across your entire Snowflake account. All Cloud Storage integrations use that service account.

The next section of this topic focuses on granting permissions within the GCP console.

 

Step 2: Granting Service Account Permissions to Access Bucket Objects

Please follow the steps below to configure IAM access permissions in your Google GCP Console to use a Cloud Storage bucket to load and unload data:

  1. Log in to the in the top left of the Google Cloud Platform Console.
  2. Click IAM & admin.
  3. Click Roles from the drop-down menu.
  4. Click the + CREATE ROLE button.
  5. Complete the Title, Description, and ID fields.
  6. Click the + ADD PERMISSIONS button.
  7. Add the following permissions:
    • storage.buckets.get
    • storage.objects.create
    • storage.objects.delete
    • storage.objects.get
    • storage.objects.list
 

The image below provides an example.

 

Click Create (or Update if you are editing a Role).

The next section of this topic focuses on assigning a custom role to the Cloud Storage service account.

 

Step 3: Assigning the Custom Role to the Cloud Storage Service Account

  1. Click the menu button in the top left of the GCP Console.
  2. Click Storage.
  3. Click Browser.
  4. Navigate to your Cloud Storage bucket or create a new bucket.
  5. Inside your bucket's details page, click Permissions as in the image below.

 

Next, click the blue Add members button.

In the Add members to <bucket-name> form, enter your STORAGE_GCP_SERVICE_ACCOUNT ID from earlier. Then, click the Role drop-down and select Storage > Storage Admin.

Click Save.

 

Matillion ETL Usage

  1. In Matillion ETL, create a new Orchestration Job and drag your chosen data staging component onto the canvas.
  2. Click on the chosen data staging component, and click its Properties tab.
  3. Click into the Staging property and select "Existing Google Cloud Storage Location". The alternative, "Snowflake Managed", is the default setting and will configure sensible defaults automatically.
  4. For the GCS Staging Area property, select your Cloud Storage bucket.
  5. For the Integration property, select your newly created integration.


 

Contact Support

If you require assistance creating a storage integration in Snowflake, or with any other aspect of Matillion ETL, please visit our Getting Support page.

Please consult the Snowflake documentation to read deeper into topic—for example, to learn how to modify an existing stage to use a storage integration.