Create Project (Delta Lake on Databricks)
  • Dark
    Light
  • PDF

Create Project (Delta Lake on Databricks)

  • Dark
    Light
  • PDF

Important Information

These instructions assume you have already successfully launched a Matillion ETL instance.


Overview

A Matillion ETL project is a logical grouping of configuration settings and resources—such as jobs—required to use Matillion ETL. When users first log in to their Matillion ETL instance, they will be required to click Confirm in the Product Improvement Metrics dialog, and then they must create a project when no existing projects are available to select.

To create a new project, there are two routes:

  1. The first route is situated in the Join Project dialog, which will appear automatically upon first loading an instance.
  2. The second route can be accessed by clicking Project, then clicking Switch Project, and then (for both methods) clicking Create Project.

Please Note

There are no practical limits to the number of projects you can create. However, only one project is used by the client session at a time, and each project must have a unique name.



Creating a Delta Lake on Databricks project on AWS

The following section describes how to create a Project in Matillion ETL for Delta Lake on Databricks (AWS).

1. Project Details

  • Project Group: Use the drop-down menu to specify an existing name for the project group. Projects are logically arranged in project groups.
  • Project Name: Enter a suitable name for your new project.
  • Project Description: Describe your project. This setting is optional.
  • Private Project: Check this box to make this new project private. Only users granted access can view and work in this project.
  • Include Samples: Untick this box if you do not want to include sample jobs in this project.

2. AWS Connection

  • Environment Name: Enter a name for your new Matillion ETL environment.
  • AWS Credentials: Use the drop-down menu to choose credentials for the AWS cloud platform. Instance Credentials is selected by default. Click Manage to add a new set of credentials. Read Manage Credentials for more information.

3. Delta Lake Connection

Please Note

Prior to completing the following steps in the Create Project dialog, you will be required to create an AWS Databricks account. This will enable you to deploy a Databricks workspace, which will be explained in more detail on this page.


  • Workspace ID: Enter your existing Databricks workspace ID. This can be found as part of the URL for your Databricks workspace portal. Do not include "cloud.databricks.com".
  • Username: Enter the username for your Databricks workspace account. Alternatively, you can enter the word "token". For more information, read How to Generate a New Databricks Token.
  • Password: Enter the password for your Databricks workspace account. Alternatively, provide the Token Value.

Please Note

The following combinations are available in the Username and Password fields:

  1. Set Username as the account email. Set Password as the account password.
  2. Set Username as the account email. Set Password as the token value.
  3. Set Username as "token". Set Password as the token value.

Please Note

To test the connection you must ensure all fields in the Delta Lake Connection dialog are populated with information. Click Test when you are ready.


4. Delta Lake Defaults

Click Finish to create your project and environment.



Creating a Delta Lake on Databricks project on Azure

The following section describes how to create a Project in Matillion ETL for Delta Lake on Databricks (Azure).

1. Project Details

  • Project Group: Use the drop-down menu to specify an existing name for the project group. Projects are logically arranged in project groups.
  • Project Name: Enter a suitable name for your new project.
  • Project Description: Describe your project. This setting is optional.
  • Private Project: Check this box to make this new project private. Only users granted access can view and work in this project.
  • Include Samples: Untick this box if you do not want to include sample jobs in this project.

2. Cloud Connection

  • Environment Name: Enter a name for your new Matillion ETL environment.
  • Azure Credentials: Use the drop-down menu to choose credentials for the Azure cloud platform. Instance Credentials is selected by default. Click Manage to add a new set of credentials. Read Manage Credentials for more information.

Please Note

Ensure your Instance Credentials are correctly configured for the required cloud platform. For example, the Azure Blob Storage Load component relies on credentials with access to Blob Storage.


3. Delta Lake Connection

Please Note

Prior to completing the following steps in the Create Project dialog, you will be required to create a Microsoft account to sign in and access Microsoft Azure portal. This will enable you to deploy a Databricks workspace, which will be explained in more detail here.


  • Workspace ID: Enter your existing Databricks workspace ID. This can be found as part of the URL for your Azure Databricks Workspace portal. Do not include "azuredatabricks.net".
  • Username: The word "token" will appear as default. You do not need to change this. For more information about tokens and setting up this type of authentication for your Databricks workspace account, read Authentication using Azure Databricks personal access tokens.
  • Password: Enter the Token Value for your Databricks workspace account.

Please Note

To test the connection you must ensure all fields in the Delta Lake Connection dialog are populated with information. Click Test when you are ready.


4. Delta Lake Defaults

Click Finish to create your project and environment.



Cluster States

In Matillion ETL, each cluster in the Cluster drop-down menu is assigned a state, with a Databricks equivalent. See the table below for more information:

Matillion ETL Databricks
STOPPED Terminated
STARTING Pending
RUNNING Running

When a cluster is not running, databases will not be retrieved, and the Database drop-down menu will not offer any selections. Attempting to select a database on a cluster that displays a STOPPED state will automatically trigger a cluster to start, but it can take a few minutes for the intended cluster to move from STOPPED to RUNNING, and it will be in the STARTING state during this time.

Clicking Back and returning to Delta Lake Defaults will refresh and update the state of the clusters, and is a required action to show when a cluster has transitioned from STOPPED to Pending, or RUNNING. Refreshing and updating the state of the clusters will also reload the Database drop-down menu.



Next Steps

When you first login to Matillion ETL, we recommend you replace your default username and password with your own secure login credentials. For more information about changing these credentials, read User Configuration in the Admin Menu.