Intersect Component

Intersect Component



Intersect Component

The Intersect component compares two input flows, and then outputs any rows that are in common (identical) in both tables.

An entire row is compared, but only overlapping columns are considered, otherwise no rows would look unique. Also, because this component uses the SQL INTERSECT operator, only unique rows are passed to the next component if there are duplicate rows in the source components.



Redshift Properties

Property Setting Description
Name String Input the descriptive name for the component.
Cast Types Select Select whether to cast types.
Yes: If the same-named column from both inputs have differing input types, attempt to cast them to a common type. This is not guaranteed to work, so users should check their data carefully.
No: If the same-named column from both inputs have differing input types, report an error and does not continue.

Snowflake Properties

Property Setting Description
Name String Input the descriptive name for the component.
Cast Types Select Select whether to cast types.
Yes: If the same-named column from both inputs have differing input types, attempt to cast them to a common type. This is not guaranteed to work, so users should check their data carefully.
No: If the same-named column from both inputs have differing input types, report an error and does not continue.

BigQuery Properties

Property Setting Description
Name String Input the descriptive name for the component.
Cast Types Select Select whether to cast types.
Yes: If the same-named column from both inputs have differing input types, attempt to cast them to a common type. This is not guaranteed to work, so users should check their data carefully.
No: If the same-named column from both inputs have differing input types, report an error and does not continue.
Note: Care should be taken when casting float types into string types as this may introduce numerical inaccuracies.

Strategy

Generates multiple select queries separated by INTERSECT.



Example

This example includes two tables of user data. One table is an updated version of the other. When viewing the table with updated data, it is apparent that some users have changed their details since the details were last recorded. For this example, we wish to find any data relating to users who have not changed their details. To do this, we will use the Intersect component.

The image below shows the data and its updated counterpart.

As can be seen, several users have changed their plan. If attempting to analyse this data, especially on a large scale, it might be useful to find only users that have not recently changed their data to ensure a more stable dataset.

When setting up the component parameters, we choose to cast types because it is preferred to try to reconcile the data than to skip the record; however, it is unlikely to be important since we know these data are of common types.

Finally, we can take a sample of the data output from Intersect and find the common rows.