Git Integration with Matillion ETL
  • Dark
    Light

Git Integration with Matillion ETL

  • Dark
    Light

Overview

This article explores the architecture of Matillion ETL's Git integration feature. Read on for a deep-dive exploration of available actions including commit, create branch, merge, push, fetch, and more.


This article is part of a series of technical documentation covering the Git integration feature within Matillion ETL. Additional documentation includes:


Getting Started

When using the Git version control feature in Matillion ETL, it will be advantageous to understand the underlying architecture and concomitant technical terms. There are six components involved:

  1. A Matillion ETL project - the project is the top-level structure, containing jobs and other collateral within Matillion ETL. Each project is isolated, and user access can be granted or denied on a per-project basis.
  2. A Matillion ETL version - a project can contain more than one version. When used with Git, think of a version as an independent working area. Each version points to a single Git commit in the local Git repository.
  3. The local Git repository (repo) - the local repository stores files on the Matillion ETL instance's filesystem, which is created automatically when a project is Git-enabled.
  4. The remote Git repository (repo) - a self-hosted or cloud-hosted Git repository that is external to Matillion ETL, and which was set up by the user prior to Git-enablement in Matillion ETL. Users can push local repository commits to their remote repository; users can also fetch newer commits from the remote repository into the local repository.
  5. Commit - a commit is a point-in-time copy of a Matillion ETL version, typically with collateral stored in the version, such as orchestration and transformation jobs.
  6. Branch - a branch is a collection of one or more commits in Git. Typically, a Git project will have a master branch, from which other branches will be created to development and test code. A branch model typically allows users to develop new code without adding questionable code to the master branch before the code has undergone testing and can be merged safely into the master branch.

The above diagram includes a project (project_Dev), and within this project are three separate version. Each of these versions could, as an example, belong to an individual developer in a development team.

On the right of the project is the Local Repo, which contains two branches. There is the master branch (Branch Master), and an additional branch (Branch Feature_1). Within both of these branches are three commits, and the diagram shows via the shorter white arrows which version is pointing at which commit.

On the right of the Local Repo is the Remote Repo. In this diagram, the Remote Repo contains a backup copy of Local Repo. However, readers should note that the Remote Repo is missing Commit 3 from Branch Feature_1. This simply means that the Local Repo's changes require a push to the Remote Repo, at which point Commit 3 of Branch Feature_1, which is being developed in Version ver_z, will be backed up in the cloud-hosted remote repository.

Please Note

Matillion ETL's Git integration feature does not support multi-factor authentication (MFA) at this time.


User Actions

The rest of this article clarifies what actions a user can take when using Git in Matillion ETL.

For many of the sections below, a fictitious development team's example workflow is referenced in the screenshots, focusing on a master [default] branch and a branch each for a pair of developers, Alice and Bob. Default, Alice, and Bob also have their own Matillion ETL version, with Alice and Bob's version serving as independent working areas for their developer work, which, when tested and approved, is merged.


Initializing Git

In the top-left of the Matillion ETL user interface, click the Project button, then navigate down and click Git.

When performing this action for the first time, users will have two options:

  1. Init Local Repository: select this option to initialize a local Git repository and connect a Matillion ETL project to Git for the first time.
  2. Clone Remote Repository: select this option to connect a new Matillion ETL project to an existing remote Git repository, copying the commits and branches from the remote repository into a local repository.

In this instance, we select Init Local Repository. We click OK to confirm this action and commit the current state of this Matillion ETL project to what will become our master branch.

After this, the Git Integration screen loads. Because this is our first interaction with this screen since we initialized the local repository, we currently only have one commit, labelled in the Git Integration screen: "Initial commit", and this first commit belongs to our master branch, which is currently our only branch. This interface also provides Author and Date details, along with numerous clickable action buttons, all of which are covered in this article.


Clone Remote Repository

To clone a remote repository for use with Matillion ETL, follow these steps:

SSH

1. Click Project and then click Git.

2. Click Clone Remote Repository.

3. In the Clone Repository dialog, paste the SSH URL of your remote repository into the Remote URI field. Matillion ETL will automatically update the dialog upon reading the remote URI and adjust the input fields accordingly.

4. Paste a valid SSH private key into the Private Key field.

5. Input your the passphrase associated with the private key into the Passphrase field.

6. Select an encryption type from the Encryption Type field.

7. If you clicked KMS as the encryption type, you must also select a KMS master key.

8. Click OK.

9. The Remove existing jobs? pop-up will ask whether you are ready to confirm the cloning of your remote repository. If you click Yes, all existing jobs on the Matillion ETL instance will be deleted.

HTTPS

1. Click Project and then click Git.

2. Click Clone Remote Repository.

3. In the Clone Repository dialog, paste the HTTPS URL of your remote repository into the Remote URI field. Matillion ETL will automatically update the dialog upon reading the remote URI and adjust the input fields accordingly.

4. Input the username and password of your repository host account—for example, Bitbucket.

Important Information

If you are using Github, please use the Password text field for a Github personal access token, as Matillion ETL does not support Github passwords. To learn how to create a new token, read Creating a personal access token.

5. Select an encryption type from the Encryption Type field.

6. If you clicked KMS as the encryption type, you must also select a KMS master key.

7. Click OK.

9. The Remove existing jobs? pop-up will ask whether you are ready to confirm the cloning of your remote repository. If you click Yes, all existing jobs on the Matillion ETL instance will be deleted.


Git Action: Commit

We are going to add a commit featuring work added to this project by another team member. Accordingly, a new version is created by going to ProjectManage Versions and clicking the + button.

We assign the new version a name and unlock it. Then, we switch to this version.

To perform another commit, begin by clicking ProjectGit. Then, in the Git Integration UI, click the bottom left button (designated by the arrow in the next image), this is the commit button.

Upon pressing the commit button, the Commit window opens. From here, users can select their branch. Currently, the only option available in this example remains the master branch, so we type a new branch into this field, thus creating a new branch. This forthcoming commit will sit under this new branch. Beneath the Branch Name field, users can tick and un-tick the checkbox next to each "change". In this instance, all three changes have been ticked to commit. Finally, a Commit Message has been left by the user making this commit. A Commit Message is required when making a commit.

To confirm, click OK.

Caution

You cannot checkout a commit when a related Orchestration or Transformation Job is presently running in the same version. You must switch to a different version to run the Job.

Upon confirmation, the user is returned to the Git Integration UI, and we can see in the image below that we now have a second commit in our structure, this time on the "Alice_Branch" branch. The hollowed-out commit circle, which in this case belongs to our newest commit, highlights the currently active commit.

To switch commit, click the button at which the arrow points in the image below, and confirm or cancel the commit switch. This action will switch your current Matillion ETL version to the chosen commit

Please Note

When you switch commit, Matillion ETL will change the current version's Description value to whatever the value was at the time of the Git commit.


Git Action: Create Branch

To create a new branch from the Git Integration UI, click the button at which the arrow is pointing in the below image.

Name the new branch, and click OK.

The below image illustrates all three Git branches in our Git project (Note: a commit has been made on "Bob's_Branch").


Git Action: Merge

Matillion ETL allows users to merge. When performing a merge, one branch commit is merged into the current branch. Performing this action creates a new commit, and will switch the current Matillion ETL version to the new commit.

To begin performing a merge, click the merge symbol, pointed to in the below image.

This will load the Merge user interface. The four fields in this UI are delineated in the list below.

  1. Merge to - Select which branch to merge. Users can choose to merge to a branch that is not the currently selected branch.
  2. Ours - this field denotes the latest commit of the branch to merge into.
  3. Theirs - this field denotes the latest commit of the branch to merge from.
  4. Commit Information - A required message for the new commit.

Users can also tick or un-tick the "Checkout After Merge" checkbox. When ticked, Git will perform the "switch commit" action. By default, this boxed is already checked.

In the next image, the merged commit shows that the corresponding branch has been joined back to the master branch.

Please Note

Remember, a Matillion ETL version points at a specific Git commit. The currently selected branch is determined by which commit the current version is pointing at.


Git Action: Configure Remote

For these steps, users will need to set up a remote repository, such as GitHub, AWS CodeCommit, or Bitbucket.

1. To configure a remote repository, click the button in the Git Integration UI, as in the image below.

2. The Remote URI field then requires an input. This can be https:// or ssh:// or git@.

3. Click OK.

4. Click the button to open Configure Default Credentials.

SSH

1. Paste a valid SSH private key into the Private Key field.

2. Input your the passphrase associated with the private key into the Passphrase field.

3. Select an encryption type from the Encryption Type field.

4. If you clicked KMS as the encryption type, you must also select a KMS master key.

5. Click OK.

HTTPS

1. Input the username and password of your repository host account.

Important Information

If you are using Github, please use the Password text field for a Github personal access token, as Matillion ETL does not support Github passwords. To learn how to create a new token, read Creating a personal access token.

2. Select an encryption type from the Encryption Type field.

3. If you clicked KMS as the encryption type, you must also select a KMS master key.

4. Click OK.


Git Action: Fetch

Performing a fetch means to pull in branches from another, in this case remote, repository. Remote repositories are an effective method of having a backup "master copy" of code.

To fetch from a remote repository, click the middle action button on the right of the Git UI, as in the image below.

Next, provide the Username and Password of your remote repository account.


Git Action: Push

Use the push action to send the branches of a local repository to a remote repository.

Provide your remote repository account's Username and Password. Users can select a type of push to perform:

  • Atomic Push - Guarantees that either all references will be pushed on the remote, or none of them will; this option avoids partial pushes.
  • Force Push - Forces the local revision to be pushed into the remote repository. This action can cause the remote repository to lose commits, and should be used with caution.
  • Thin Push - Reduces the data sent when the sender and receiver share many of the objects.

Click OK.


Local and Remote Branch Divergence

A divergence occurs when the remote branch has commits that the local branch does not have, and the local branch has commits that the remote branch does not have.

This can happen in Matillion ETL's git integration when performing merges outside Matillion ETL in the remote repository (this action causes a commit to exist remotely but not locally) while simultaneously working using the local branch.

A divergence will look similar to the below image.

In the above image, the blue master branch (remote) has a commit that the green master branch (local) does not have, and vice versa.

To fix this divergence, merge the blue branch into the green branch—this will create the pink branch. A pink branch includes both remote and local branches pointing at the same commit(s).

For best practice, Matillion advises users to try to ensure that only pink branches exist—ensuring that remote and local are in synchronization.

Caution

If the blue branch (remote) contains commits that the green branch (local) does not, you stand to lose work when pushing the green branch, as work may be overwritten.


Resolving a Merge Conflict

Read about MergeManager to learn how to efficiently resolve merge conflicts in Matillion ETL.


SSH Authentication

Matillion ETL's Git source control management feature currently supports the following private key formats:

  • DSA
  • RSA
  • ECDSA

Private keys of an OpenSSH format are not currently supported, and will produce an error message when used as a private key for performing a push to a remote repository.

However, you can convert your OpenSSH format private key to a supported key format using the below command:

ssh-keygen -p -f YOUR_PRIVATE_KEY -m pem


Disconnecting Git from a Matillion ETL project

1. SSH onto your Matillion ETL instance using the private key with the CentOS username and the Matillion ETL instance's IP address or DNS name.

  1. From a Mac, you can do this using the following command from the Terminal: ssh -i /path/to/private.key centos@matillion-ip-or-dns
  2. From a PC, follow the instructions here: Connecting to your Linux instance from Windows using PuTTY.

2. Stop Tomcat using the command sudo service tomcat8 stop

3. Remove records from the SCM-related tables in the PostgreSQL repo using the following commands:

  1. sudo su postgres -
  2. psql
  3. SELECT * FROM projects WHERE rec::text like '%project_name%'';
    • This will return the project_id to use in the following steps.
  4. DELETE FROM scm_filesync where namespace = project_id;
  5. Ctrl-d to exit psql
  6. Ctrl-d to go back to the CentOS user

4. Remove any subdirectories under the /usr/share/tomcat8/scm directory using the following command: sudo rm -rf /usr/share/tomcat8/scm/project_id

5. Restart Tomcat: sudo service tomcat8 start

6. Wait approximately 60 seconds and then log back in to Matillion ETL.

You should now have the option to re-initialize your local repository and connect to a remote repository when you click ProjectGit.


Contact Support

If you have any questions about using the Matillion ETL Git feature, please refer to our Git integration FAQs documentation, or Contact Support.