Extract Nested Data
  • Dark
    Light

Extract Nested Data

  • Dark
    Light

This article is specific to the following platforms - Snowflake - Redshift - BigQuery - Delta Lake.


Extract Nested Data

The Extract Nested Data component flattens nested data into rows. This is generally done by taking nested data in the form of key:value pairs (such as a JSON dictionary) and using those keys as column names.

The flow into this component should include a single variant-type column that is to be unpacked.

In Amazon Redshift, the flow into this component should include a single, SUPER type column to be unpacked.

In Google BigQuery, when flattening data, the number of rows output will be the product of the number of items in every array field selected, per input row. If any are empty, no output rows will be returned.


Properties

Snowflake Properties

Property Setting Description
Name String A human-readable name for the component.
Include Input Columns Select Choose whether to include input columns.
Columns Tree Structure Define the variant's structure here. Unique endpoints in this structure will be unpacked into columns. The variant from the input can usually be automatically detected, and its structure determined, using the Autofill button. Elements of the structure can be manually edited by right-clicking them on the tree and selecting "Delete" or "Edit" as appropriate.
Outer Join True/False If False, any input rows that can't be expanded—either because they can't be accessed in the path or because they have zero fields or entries—are completely omitted from the output. If True, a single row is generated for expansions with zero rows.
Input Alias String If two properties have identical names, one will be given this prefix to differentiate them. More than two identically named properties will result in an error. This does not need changing in the vast majority of use cases.
Array Prefix String If two array structures have identical names, one will be given this prefix to differentiate them. More than two identically named structures will result in an error. This does not need changing in the vast majority of use cases.
Casting Method Select Select a casting method. Options include Fail on invalid data (default), Replace all unparseable values with null, or Replace unparseable dates and timestamps with null.
Case Column Alias Names Select Set the casing for alias columns names. Settings include Lower, No, or Upper. The default is No.

Redshift Properties

Property Setting Description
Name String A human-readable name for the component.
Columns Select Select the columns to be written to the table.
Column Alias String An alias for the input component reference (for example, Table Input or Fixed Flow). The default alias is i.
Include Input Columns Select Choose whether or not to include input columns. The default setting is No.
Array Prefix String A prefix for the input's array. This property will increment inputs with one-based indexing (for example, f1, f2, f3, ...) where differentiation is required. The default value is f.

BigQuery Properties

Property Setting Description
Name String A human-readable name for the component.
Columns Select Select the columns to be written to the table.
Column Aliases String Give an alternative name for the column(s) you select.
Include Input Columns Select Choose whether or not to include input columns.
Input Column Prefix String Give a prefix name for the input columns.

Delta Lake Properties

Property Setting Description
Name String A human-readable name for the component.
Columns Select Select the columns to be written to the table.
Column Aliases Source Column Select the source column that you wish to provide an alternative name (an alias) for.
Target Column Provide an alternative name for the column.
This property is optional and both columns can be left empty if the user does not wish to provide alternative names for columns.
Include Input Columns Select Choose whether or not to include input columns in the data extraction.
Input Column Prefix String Give a prefix name for the input columns. The default is input_

What's Next