Create Table Component

Create Table Component



Create Table Component

Create or replace a table.

Warning: Depending on the chosen settings, this component is destructive. Take care when running this component as it may remove existing data.


Redshift Properties

Property Setting Description
Name String Input the descriptive name for the component.
Schema Select Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, see Schema Support.
New Table Name Text The name of the table to create or replace.
Create/Replace Select Create: (default option for this property) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed.
Create if not exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct the schema, which could lead to errors later on in the ETL job if:
  1. Users did not expect a table to already exist.
  2. Users did not expect to have a different schema to the one defined in this component.
Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the component succeeds, the table matches the schema defined in this component. However, any data from the existing table (to be replaced) will be lost.
Table Metadata Column Name The name of the new column.
Data Type Text: this type can hold any kind of data, subject to a maximum size. More...
Integer: this type is suitable for whole-number types (no decimals). More...
Numeric: this type is suitable for numeric types, with or without decimals. More...
Real: this type is suitable for data of a single precision floating-point number. More...
Double Precision: this type is suitable for data of a double precision floating-point number. More...
Boolean: this type is suitable for data that is either 'true' or 'false'. More...
Date: this type is suitable for dates without times. More...
DateTime: this type is suitable for dates, times, or timestamps (both date and time). More...
Size For Text types, this is the maximum length. This is a limit on the number of bytes, not characters. With Redshift, since all data is stored using UTF-8, any non-ASCII character will count as 2 or more bytes.
For Numeric types, this is the total number of digits allowed, whether before or after the decimal point.
Decimal Places Relevant only for 'Numeric' data, it is the maximum number of digits that may appear to the right of the decimal point.
Encoding Type The Redshift compression encoding. See the Redshift documentation for details of the various available encodings.
Note: Although users can manually specify the encoding type here, it is advised to use automatic compression analysis to select the optimal compression. When loading into empty tables, this is performed by the S3 Load (unless you disable COMPUPDATE). It can also be performed by the Table Output component, provided you are truncating the table.
Allow Nullable When 'True', Matillion ETL specifies that the column accepts null values. When 'False', Matillion ETL specifies that the column is not allowed to contain null values. Default is 'True'.
Distribution Style Select All: copy rows to all nodes in the Redshift Cluster.
Auto: Redshift assigns an optimal distribution style based on the size of the table data.
Even: the leader node distributes the rows across the slices in a round-robin fashion, regardless of the values in any particular column.
Key: the rows are distributed according to the values in one column. The leader node places matching values on the same node slice. If you distribute a pair of tables on the joining keys, the leader node collocates the rows on the slices according to the values in the joining columns, so that matching values from the common columns are physically stored together.
For more information, please refer to the Redshift documentation.
Distribution Key Select This property is only activated if the Distribution Style is set to 'Key'. Users must select which column is used to determine which cluster node the row is stored on.
Sort Key Select This setting is optional, and allows users to specify the columns from the input that should be set as the table's sort key. Sort keys are critical for efficient performance. For more information, we advise users refer to the Redshift documentation.
Sort Key Options Select Choose whether the sort key is of a 'compound' or 'interleaved' variety. For more information, please refer to the Redshift documentation.
Primary Key Select This setting is optional, and specifies the column from the input that should be set as the table's primary key.
Identity Columns Column Name The name of the column. It must match the name of a column defined in the Table Metadata property, which will be set as an identity column. An identity column contains automatically generated values. The specified column must be of Data Type 'Integer' or 'Numeric' with zero (0) decimal places.
Seed Specifies the starting value.
Step Specifies the increment between values.
Backup Table Select Specify whether the created table is to be included in automated and manual cluster snapshots. 'No' has no effect on automatic replication of data to other nodes within the cluster, meaning that tables set with 'No' in this property are restored in a node failure. The default setting is 'Yes'.

Snowflake Properties

Property Setting Description
Name String Input the descriptive name for the component.
Create/Replace Select Create: (default option for this property) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed.
Create if not exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct the schema, which could lead to errors later on in the ETL job if:
  1. Users did not expect a table to already exist.
  2. Users did not expect to have a different schema to the one defined in this component.
Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the component succeeds, the table matches the schema defined in this component. However, any data from the existing table (to be replaced) will be lost. Note: Since other database objects might depend upon this table,
drop ... cascade
is used in the "Comment" property, which may remove many other database objects.
Database Select Choose the database that the newly created table will belong to.
Schema Select Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, see Schema Support.
New Table Name Text The name of the table to create or replace. Note: This field is case-sensitive by default, since Matillion ETL uses quoted identifiers. To change this behaviour, please consult Snowflake documentation on case sensitivity.
Table Type Select The type of Snowflake table to create:
Permanent: a table that holds data indefinitely and that can be restored using Snowflake's Time Travel.
Temporary: a table that is automatically destroyed at the end of the Snowflake session.
Transient: a table that holds data indefinitely, but that cannot be restored.
Columns Column Name The name of each new column.
Data Type VARCHAR: this type can hold any kind of data, subject to a maximum size. More...
NUMBER: this type is suitable for whole-number types (no decimals). More...
Note: You can enter AUTOINCREMENT or IDENTITY as the default value for any numeric-type column and the column will automatically give an incremented value on each new row starting from 1. This feature is useful when creating a unique key on the table.
FLOAT: this type is suitable for numeric types, with or without decimals. More...
BOOLEAN: this type is suitable for data that is either 'true' or 'false'. More...
DATE: this type is suitable for dates without times. More...
TIMESTAMP: this type is suitable for timestamps. More...
TIME: this type is suitable for times only. More...
VARIANT: a flexible type that can be used for any purpose. More...
Size For Text types, this is the maximum length. This is a limit on the number of bytes, not characters. Generally speaking, in all data stored using UTF-8, any non-ASCII character will count as 2 or more bytes.
Precision Relevant only for Numeric, it is the maximum number of digits that may appear to the right of the decimal point.
Default Value The default value under this column for any row.
Not Null True if this column does not accept null values.
Unique Mark this column as unique, declaring that all values in it are unique with respect to one another. You may have multiple unique columns per table.
Primary Keys Select Declare one column to be a primary key. Primary keys are, by default, unique. Users can only have one primary key per table.
Clustering Keys Select Specify clustering key(s) to use. Defining clustering keys can allow for more efficient sorting and scanning of large tables. More...
Data Retention Time in Days Integer Set a number of days for which data is retained after deletion. More...
Comment Text Attach a comment to the table. More...

BigQuery Properties

Property Setting Description
Name String Input the descriptive name for the component.
Project Select Select the project that the newly created table will belong to.
Dataset Select Select the table dataset. For more information on using datasets, please refer to the BigQuery documentation.
New Table Name Text The name of the table to create or replace.
Table Metadata Field Name The name of the new field.
Data Type For more information on available BigQuery data types please refer to the GCP documentation.
String: this type can hold any kind of data, subject to a maximum size.
Integer: this type is suitable for whole-number types (no decimals).
Float: this type is suitable for numeric types, with or without decimals.
Numeric: this data type is suitable for data of an exact numeric value, allowing 38 digits of precision and 9 decimal digits of scale.
Boolean: this type is suitable for data that is either 'true' or 'false'.
Date: a formatted date object without time. See the GCP documentation.
Time: a formatted time object without date. See the GCP documentation.
DateTime: a formatted timestamp containing both date and time that is easily readable by the user. See the GCP documentation.
Timestamp: this type is a timestamp left unformatted (exists as Unix/Epoch Time).
Mode The field mode. Default is 'NULLABLE'.
NULLABLE: Field allows null values.
REQUIRED: Field does not accept null values.
REPEATED: Field can accept multiple values.
Define Nested Metadata When the Define Nested Metadata checkbox is ticked inside the 'Table Metadata' property, a tree-structure can be defined for metadata.
Create/Replace Select Create: (default option for this property) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed.
Create if not exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct the schema, which could lead to errors later on in the ETL job if:
  1. Users did not expect a table to already exist.
  2. Users did not expect to have a different schema to the one defined in this component.
Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the component succeeds, the table matches the schema defined in this component. However, any data from the existing table (to be replaced) will be lost.
Partitioning Select Choose whether to create a date-partitioned table or not. Selecting 'Day' will create the table as a partition table, which will load data into separate date-partitions and can be queried with the '_PARTITIONTIME' pseudo-column. This allows large data sets to be split into multiple parts with each part corresponding to a single day of data. For more information on date-partitioned tables, consult the GCP documentation.
Enabling partitioning on a table will reveal additional options in the Table Input component, which allows the partition column to be included in the input.
Partitioning Field Select This parameter appears when 'Column' is selected in the Partitioning property. Partitioning Field has a drop-down, which should be populated only by Date or Timestamp column types defined in the Table Metadata. For more information, see here.
Partition Expiration Time Text Specify the number of milliseconds before the partition expires (field can be left blank for no expiration).
KMS Encryption Select Support for customer-managed encryption. By default, this is set to 'No'. Your encryption keys are stored within Cloud KMS. For more information, see here.
Location Select This parameter appears when KMS Encyption is set to 'Yes'. Choose a location that matches the location of the BigQuery dataset.
Key Ring Select This parameter appears when KMS Encyption is set to 'Yes'. Choose from a group of keys. On the GCP console in KMS, a Key Ring or multiple Key Rings will have been created.
Key Select This parameter appears when KMS Encyption is set to 'Yes'. Choose the Encryption Key from the Key Ring.
Cluster Columns Select Creates a clustered table. Allows the optional selection of up to four columns. Columns of types Integer, String, Boolean, Date, and Timestamp are supported. For more information, see here.

Synapse Properties

Property Setting Description
Name String Input the descriptive name for the component.
Create/Replace Select Select the function of the component from the following options: Create: (default setting) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed.
Create If Not Exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data.
Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the Orchestration Job has run, the table matches the schema defined in this component. However, any data from the existing table (that is, the table to be replaced upon the new job run) will be lost.
Schema Select Select the table schema. The special value, [Environment Default], will use the schema defined in the environment.
For more information on schemas, please see the Azure Synapse documentation.
New Table Name String Specify the name of the table to create or replace.
Note: This field is case-sensitive by default, since Matillion ETL uses quoted identifiers. For more information, please refer to the Azure Synapse documentation.
Table Metadata Column Name Provide the name of each new column.
Data Type Select the data type. The available data types are: DATE: this data type is suitable for dates without times. More...
DATETIME: this data type is suitable for timestamps. More...
TIME: this data type is suitable for times only. More...
INTEGER: this data type is suitable for whole number types (no decimals). More...
NUMERIC: this data type is suitable for numeric types, with or without decimals. More...
TEXT: this data type is suitable for text types. More...
FLOAT: this data type is suitable for approximate number data types for use with floating point numeric data. More...
BOOLEAN: this data type is suitable for data whether values are either "true" or "false". More...
Size Define the size. For T-SQL, this is denoted as Precision. More...
Precision Define the precision. For T-SQL, this is denoted as Scale. More...
Allow Nullable Select whether or not to allow nullable values. More...
Distribution Style Select Select the distribution style
Hash: This setting assigns each row to one distribution by hashing the value stored in the distribution_column_name. The algorithm is deterministic, meaning it always hashes the same value to the same distribution. The distribution column should be defined as NOT NULL, because all rows that have NULL are assigned to the same distribution.
Replicate: This setting stores one copy of the table on each Compute node. For SQL Data Warehouse, the table is stored on a distribution database on each Compute node. For Parallel Data Warehouse, the table is stored in an SQL Server filegroup that spans the Compute node. This behavior is the default for Parallel Data Warehouse.
Round Robin: Distributes the rows evenly in a round-robin fashion. This is the default behaviour.
For more information, please read this article.
Distribution Column Select Select the column to act as the distribution column. This property is only available when the Distribution Style property is set to "Hash".
Partition Key Select Select the table's partition key. Table partitions determine how rows are grouped and stored within a distribution.
For more information on table partitions, please refer to this article.
Index Type Select Select the table indexing type. Options include:
Clustered: A clustered index may outperform a clustered columnstore table when a single row needs to be retrieved quickly. The disadvantage to using a clustered index is that only queries that benefit are the ones that use a highly selective filter on the clusted index column. Choosing this option prompts the Index Column Grid property.
Clustered Column Store: This is the default setting. Clustered columnstore tables offer both the highest level of data compression and the best overall query performance, especially for large tables. Choosing this option prompts the Index Column Order property.
Heap: Users may find that using a heap table is faster for temporarily landing data in Synapse SQL pool. This is because loads to heaps are faster than to index tables, and in some cases, the subsequent read can be done from cache. When a user is loading data only to stage it before running additional transformations, loading the table to a heap table is much faster than loading the data to a clustered columnstore table.
For more information, please consult the Azure Synapse documentation.
Index Column Grid Name The name of each column.
Sort Assign a sort orientation of either acending (Asc) or descending (Desc).
Index Column Order Multiple Select Select the columns in the order to be indexed.

Variable Exports

This component makes the following values available to export into variables:

Source Description
Table Recreated Whether or not the table was (re)created. This is useful when "Create/replace" is set to 'Create if not exists' so users can tell whether the table needed to be created or not.

Strategy

Generates any required DDL to manage an existing table of the same name, followed by a CREATE TABLE statement.

Example

In this example, we deploy the Create Table component onto the Matillion ETL canvas to create a table that will hold airport data.

When configuring the component's properties, we give the 'New Table Name' property the value of 'table_airports'. Since this is a small, lookup table, it is distributed to all nodes in the cluster ('All'), and the Primary Key is 'iata'.

A suitable schema is defined to hold the table's data. Eight decimal places are permitted for data in the 'latitude' and 'longitude' columns.

In this example, we deploy the Create Table component onto the Matillion ETL canvas to create a table that will hold airport data.

When configuring the component's properties, we give the 'New Table Name' property the value of 'table_airports'. The table is sorted by its Primary Key: 'iata'.

A suitable schema is defined, with the 'Precision' for 'latitude' and 'longitude' set to 8. The 'Data Retention Time in Days' property is set to the maximum value of 90 days.

In this example, we deploy the Create Table component onto the Matillion ETL canvas to create a table that will hold airport data.

When configuring the component's properties, we give the 'New Table Name' property the value of 'table_airports.

A suitable set of table metadata is defined, and users can check the 'Define Nested Metadata' checkbox to define a tree structure for their metadata.