-
DarkLight
Distinct
-
DarkLight

Distinct Component
Pass on only the distinct (or unique) input records to the next component.
Properties
Snowflake Properties | ||
---|---|---|
Property | Setting | Description |
Name | Text | A human-readable name for the component. |
Columns | Selection | Only these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values. |
Redshift Properties | ||
---|---|---|
Property | Setting | Description |
Name | Text | A human-readable name for the component. |
Columns | Selection | Only these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values.
Note: On Redshift, since records may be located on many nodes, determining uniqueness can be expensive. |
BigQuery Properties | ||
---|---|---|
Property | Setting | Description |
Name | Text | A human-readable name for the component. |
Columns | Selection | Only these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values. |
Synapse Properties | ||
---|---|---|
Property | Setting | Description |
Name | Text | A human-readable name for the component. |
Columns | Selection | Only these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values. |
Property | Setting | Description |
---|---|---|
Name | String | A human-readable name for the component. |
Columns | Column Select | Only these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values. |
Strategy
Generates a select distinct query.
Example
In this example, we have two tables of data that contain project tasks from past months. We want to bring these tables together using a Unite component so that we have just 1 complete table of data. Unfortunately, the dates from the past data tables have some overlap and this means we end up with duplicate rows when uniting the two tables. In order to solve this problem, we use a Distinct component to filter out any duplicate rows. The Transformation job is shown below.
Below we note the number of rows in each input table with the 3rd row count (inside the Unite component) being the sum of these counts. A quick way of checking the success of the Distinct component will be a reduction in row count as duplicates are removed.
In the Distinct component properties, all columns are added so that they all appear in the output.

When run, this component will take in these columns from the input, keep the unique rows and output them. The resulting sample and row count is shown below.
