Grid variables are a special type of job variable that can be declared in Matillion ETL. Grid variables are 2D arrays that hold scalar values in named columns.
Grid variables can be used in many components (usually via the Use Grid Variable checkbox in component property dialogs) where lists of data need to be passed around. For example, a grid variable can be used to easily populate table metadata, or to pass arrays of data for use in Python scripts. In this article we give just a few examples of their utility.
The following Matillion ETL components can be used when working with grid variables.
- Table Metadata To Grid: Takes metadata from a table and loads it into a grid variable.
- Remove From Grid: Removes rows from a preexisting grid variable.
- Append To Grid: Adds rows to a preexisting grid variable.
- Query Result To Grid: Queries a table and loads the resulting data into a grid variable.
Creating grid variables
1. Right-click a job and click Variables → Manage Grid Variables to open a dialog listing all grid variables for that particular job. You can also right-click on the job canvas and then click Manage Grid Variables.
2. In the Manage Grid Variables dialog you can perform the following actions:
- Click at the bottom-left of the dialog to create a new grid variable.
- Click to the left of a variable name to display the variable's Description property.
- Click under the Columns heading to the right of the variable to edit the variable's properties and columns.
- Click under the Values heading to the right of the variable to edit the variable's default values.
- Click to the right of the variable to delete the variable. This cannot be undone, so be sure you want to do this before clicking Yes in the confirmation dialog.
3. Clicking to add a grid variable will open the Create Grid Variable wizard. (Note: editing an existing grid variable will open the same wizard but with the title Update Grid Variable.) Provide the following details:
- Name: A name to identify the variable in other dialogs and lists.
- Behaviour: Determines the variable's branch behavior inside a job. That is, how the variable is updated when more than a single job branch is making use of it. For more information on variable behavior, read this article.
- Visibility: Select Public or Private. If Private, this variable cannot be discovered and overwritten when this job is called from a Run Orchestration or Run Transformation component.
- Description: A description of the grid variable. This description has no functionality beyond reminding you what the variable is for.
4. Below the grid variable properties is a list of columns contained by the grid. Each column has a Name and Type. To create a new column, click and enter the following details on the blank line created:
- Column Name: A name to identify the column. This must be unique within the grid variable but can be the same as column names defined in other grid variables. The name can only contain letters, numbers, underscores (_) and dollar symbols ($).
- Column Type: The data type that the column will contain. Select Text, Numeric, or DateTime. For more information on variable types, read this article.
To edit an existing column name or type, simply click on the value you want to change. To delete a column, click the column name to select it and then click .
Click Next to continue to the second page of the wizard, where you can enter default values for the columns.
5. On the second page of the Create Grid Variable wizard, you can enter a default value for each column you have defined in the grid variable. This is not always required, as the grid variable can be populated in a variety of ways that do not require default values to be defined first (for example, using the Table Metadata to Grid component.
When you have finished entering default values, click OK.
Using to include columns from data sources.
Grid variables can be used to populate choices in multiple properties within Data Staging components. The Data Selection property, for example, which is used to select which columns are returned from a query:
If you select Use Grid Variable in this dialog, you can use a pre-defined grid variable to select the columns. You would do this as follows:
1. Create a new grid variable, as described above, which contains a single column named "Columns".
2. Populate the Default Values of the grid variable with the names of the columns you want to include in the Data Selection property.
3. In the Data Selection property of your job component, specify which grid variable and which column from the variable to use. The property will then be populated with the default values you entered in the grid variable. You can use the same grid variable across all components in the job where you need to specify the same data selection, making it easier to consistently set the property and allow you to make global changes to all such properties in the future.
Using to populate Metadata
In the Create Table and Create External Table components, table metadata can be assigned from a grid variable by selecting Use Grid Variable. You must first create a new grid variable, as described above, with columns that contain the data you want to populate the metadata with. You then specify which grid variable the property will use, and which columns in the grid variable map to each piece of metadata:
Using to populate variables
If you have a large number of variables to pass into a job along your production line, it can be convenient to make a grid variable that can then populate the variables for you.
In the Run Transformation and Run Orchestration components, the Set Scalar Variables property is used to set variables used by the component. You can use a grid variable to assign these properties.
1. First create a new grid variable, as described above, with one column that contains variable names and another that contains the values you want to put into each variable.
2. In the Set Scalar Variables dialog of your Run Transformation or Run Orchestration component, select Use Grid Variable.
3. Select the grid variable you have defined and specify which of its columns contains the variable names and which contains the values.
Using to pass grids
Similar to passing regular variables, grid variables can also be passed on to jobs that use the Run Orchestration and Run Transformation components, using the Set Grid Variables property. The orchestration or transformation job that you are running with the component must have a grid variable defined in it for this operation to succeed.
To pass the grid variable, in the Set Grid Variables dialog, select Grid from the Set Values drop-down. This will expand the dialog to give you options to select the grid variable you want to use and the grid variable columns you will use. In this way, grid columns can be mapped from the calling job to the called job.
Using grid variables in Python
Variables become first class variables in Python and Bash scripts, and care should be taken to avoid naming them in a manner that clashes with key words in either language. We recommend using a prefix (for example, v_) to ensure no such conflicts occur.
It's possible to access and update grid variables in Python scripts. To get grid variable data and load it into a Python array, use:
To place Python array data into a grid variable use:
The following is an example Python 3 script that takes data from one grid variable, "people", puts the data into an array, and it the variable before updating a different grid variable, "names", using that array.
array = context.getGridVariable('people') for data in array: print(data) context.updateGridVariable('names', array)