DynamoDB Load Component

DynamoDB Load Component



DynamoDB Load Component

Load data into an existing table from objects stored in Amazon DynamoDB. This uses an inate ability of DynamoDB to push to Redshift, unlike the similar DynamoDB Query component that goes via a 3rd party driver. It is generally advised that for pushing full data sets, this component should be used whereas DynamoDB Query should be used for more selective data loads.

Note: This component requires working AWS Credentials with read access to the DynamoDB tables you want to read from. This is easily achieved by attaching an IAM role to the instance when launching Matillion ETL for Redshift, however it can also be managed manually by editing an Environment.


Properties

Property Setting Description
For more information on all the settings in this component, see the Amazon Redshift COPY from Amazon DynamoDB syntax.
Name Text The descriptive name for the component.
Schema Select Select the table schema. The special value, [Environment Default] will use the schema defined in the environment. For more information on using multiple schemas, see this article.
Target Table Select Select an existing table to load data into.
Load Columns Select Multiple One or more columns that exist in the target table. Since Redshift will attempt to match the fields found in the DynamoDB records to existing columns names, this parameter exists so you can ignore target fields, even if the incoming data would have populated them.
DynamoDB Table Text Enter the name of the DynamoDB table to read from.
Read Ratio Number The percentage of the DynamoDB table's provisioned throughput to use for the data load. To understand the trade-offs in changing this number, please see the Amazon Redshift documentation.
Explicit IDs Select Whether or not to load data from the S3 Objects into an IDENTITY column. See the Redshift documentation for more information.
Region Select The Amazon S3 region hosting the S3 bucket. This is not normally required and can be left as "None" if the bucket is in the same region as your Redshift cluster.
Maximum Errors Text The maximum number of individual parsing errors that cause the whole load to fail. Values up to this will be substituted as null values.This value defaults to 0, but the Amazon default is 1000.
Date Format Text Defaults to 'auto' - this can be used to manually specify a date format.
Time Format Text Defaults to 'auto' - this can be used to manually specify a time format.
Accept Any Date Select If this is enabled, invalid dates such as '45-65-2018' are not considered an error, but will be loaded as the null value.
Truncate Columns Select If this is set, any instance of data in the input file that is too long to fit into the specified target column width will be truncated to fit instead of causing an error.
Trim Blanks Select Removes trailing and leading whitespace from the input data.
Empty As Null Select If this is set, empty columns in the input file will become NULL.
Blanks As Null Select If this is set, blank columns in the input file will become NULL.
Comp Update Select Controls whether compression encodings are automatically applied during a COPY. This is usually a good idea to optimise the compression used when storing the data.
Stat Update Select Governs automatic computation and refresh of optimizer statistics at the end of a successful COPY command.
Round Decimals Select If this option is set, round any decimals to fit into the column when the number of decimal places in the input data is larger than defined for the target column.

Example

This example loads the DynamoDB Airports data into a Redshift table.

The default options work fine here. We simply need to specify the source and target tables.

On completion, the number of rows loaded is shown in the status.