Calculator Component

Calculator Component



Calculator Component

The Calculator Component adds new columns by performing calculations. Each input row produces one output row. Any pre-existing columns that share a name with a created column will be overwritten.

By default, all input columns are mapped into output columns of the same name. You can then use the Expression editor to add new columns with a user-defined calculation.

It is possible to include Matillion ETL Variables when writing Expressions in the Calculator component using the syntax ${<VariableName>}. Additionally, if you click the Manage Variables button, you can manage both Environment Variables and Job Variables.



Redshift Properties

Property Setting Description
Name Text The descriptive name for the component.
Include Input Column Yes/No When Yes, all input columns are mapped without modification. When No, the component will output only columns that have been mapped manually via the Expression editor. The default is Yes.
Calculations List of Expression A list of expressions calculated per row of data. The name of the expression becomes the output Column. Each expression must be valid SQL and use the built-in Redshift Functions.

Expressions can refer to an earlier expression by enclosing the existing expression name in double quotes.
Note: data types of columns may be expressed as icons, as explained in the Variables documentation.

Snowflake Properties

Property Setting Description
Name Text The descriptive name for the component.
Include Input Column Yes/No When Yes, all input columns are mapped without modification. When No, the component will output only columns that have been mapped manually via the Expression editor. The default is Yes.
Calculations List of Expression

A list of expressions calculated per row of data. The name of the expression becomes the output Column. Each expression must be valid SQL and use the built-in Snowflake Functions.


Expressions can refer to an earlier expression by enclosing the existing expression name in double quotes.


Note: data types of columns may be expressed as icons, as explained in the Variables documentation.

BigQuery Properties

Property Setting Description
Name Text The descriptive name for the component.
Include Input Column Yes/No When Yes, all input columns are mapped without modification. When No, the component will output only columns that have been mapped manually via the Expression editor. The default is Yes.
Calculations List of Expression A list of expressions calculated per row of data. The name of the expression becomes the output Column. Each expression must be valid SQL and use the built-in BigQuery Functions.

Expressions can refer to an earlier expression by enclosing the existing expression name in double quotes.



Note: data types of columns may be expressed as icons, as explained in the Variables documentation.

Synapse Properties

Property Setting Description
Name Text The descriptive name for the component.
Include Input Column Yes/No When Yes, all input columns are mapped without modification. When No, the component will output only columns that have been mapped manually via the Expression editor. The default is Yes.
Calculations List of Expression
Note: data types of columns may be expressed as icons, as explained in the Variables documentation.

Strategy

Generates a select query where the calculated expressions are in-line after the select statement.

Example

In this example, we are going to use the Calculator component to analyse flight delay times and then flag delays as 'long' accordingly. The job layout is shown below.


In the Table Input component, which we have named 'flights', we have loaded a table of flight data and included columns for the flight number, the actual elapsed time of the flight, and both the arrival and the departure delays. If we click the Sample tab of the Table Input component, and then click Data, we can view a sample of this data.


Although the columns and the data are loaded correctly, there are undesirable NULL values in the data. To remove these, we use the Filter component.


The Filter component has a single filter that is set up as below, only allowing rows where the 'actualelapsedtime' column value is not NULL.


Now turning our attention toward the Calculator component, we choose to include all columns mentioned earlier.


Additionally, the Calculator component has two 'calculations' set up. The first Expression, Long Delay Ratio, adds together the departure delay and the arrival delay, and then divides by the flight time. Thus, we judge delays as relative to the flight's total time. The flight time is cast to a decimal to ensure the output is decimal (all input columns are integer only, however).


The second Expression uses a CASE function to add a flag. Flights with a delay of more than 0.2 (20%) of the flight time are flagged as "long" delays, which is either True or False for each flight.


Note that each Expression in the Calculator component constitutes a unique output column. In this case, we have two Expressions and therefore, along with our original data, we will have two additional output columns containing the results of the calculations in each Expression.


We can see that each row is now correctly flagged when the delay time is 0.2x (or more) of the actual flight time. Also, note that there are no empty fields left in our data due to the filter that we established at the start of this transformation job.

Functions

Please consult the Redshift documentation for further information about SQL Functions.

(AWS only) Please consult the Snowflake documentation for further information about SQL Functions.

(Azure only) Please consult the Azure documentation for further information about SQL Functions.

Please consult the BigQuery documentation for further information about SQL Functions.

Video