Create File Format Component
Create File Format Component
Creates a named file format that can be used for bulk loading data into (and unloading data out of) Snowflake tables. This format can then be used in other components to simplify the component's options and use the custom file format.
|Name||Text||The descriptive name for the component.|
|Create/Replace||Select||Create: the default option, creates a new file format. This will generate an error if a file format with the same name already exists, but will never destroy existing data.
Create if not exists: This will only create a new format if one of the same name does not already exist.
Drop: Drop the file format of the given name - useful for deleting file formats. This will fail if a file format of the given name does not exist.
Drop if exists: Drop the file format of the given name - useful for deleting file formats. This will succeed even if a file format of the given name does not exist.
Replace: Creates a new format and replaces any format of the same name, overwriting it.
|Database||Select||Select the database that the newly-created file format will be stored in.|
|Schema||Select||Select the schema that the new file format will be stored in. The special value, [Environment Default] will use the schema defined in the environment. For more information on using multiple schemas, see Schema Support.|
|File Format Name||Text||The name of the file format being created.|
|Compression||Select||Whether the input file is compressed in GZIP format, BROTLI, BZ2, DEFLATE, RAW_DEFLATE, ZSTD or not compressed at all.|
|Record Delimiter||Text||The delimiter to be used that separates records (rows) in the file. Defaults to newline. \n can also signify a newline. \r can signify a carriage return.|
|Field Delimiter||Text||The delimiter to be used that separates fields (columns) in the file.|
|Skip Header||Text||The number of rows at the top of the file to ignore - defaults to 0.|
|Date Format||Text||Defaults to 'auto' - this can be used to manually specify a date format.|
|Time Format||Text||Defaults to 'auto' - this can be used to manually specify a time format.|
|Timestamp Format||Text||Defaults to 'auto' - this can be used to manually specify a time format.|
|Binary Format||Select||Choose between the expected format of binary data: BASE64, HEX, UTF-8.|
|Escape||Text||Single character string used as the escape character for any field values. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default).|
|Escape Unenclosed Field||Text||Single character string used as the escape character for unenclosed field values. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). If a character is specified in the 'Escape' field, it will override this field.|
|Trim Space||Select||Removes trailing and leading whitespace from the input data (TRUE or FALSE).|
|Field Optionally Enclosed||Text||A character that is used to enclose strings. Can be single quote (') or double quote (") or NONE (default). Note that the character chosen can be escaped by that same character.|
|Null If||Multiple Text||Enter one or more strings that will be replaced with NULL if found in the source. Only applies to columns that are nullable.|
|Error On Column Count Mismatch||Select||Generate an error if the number of delimited columns in the input does not match that of the table. If false, extra columns are not loaded into the table and missing columns are recorded as NULL in the table.|
|Empty Field as Null||Select||If TRUE, cast empty fields as NULL. If FALSE, cast empty fields as the corresponding column type with blank data.|
In this example, we take a file and load the data into a table. However, a defined format must exist for the data to be transcribed into rows and columns. We use the Create File Format component to create a new format that can be used when loading data. Below shows the canvas of a typical job of this type:
To explain briefly; the Data Transfer Component takes a file from an HTTP server and puts it into an S3 Bucket while Create Table makes a table for the data to be loaded into. After this, Create File Format is used to create a predefined format that Azure Blob Load (or S3 Load if on AWS) can use. Below a snippet of the raw data is shown.
From this data we learn that there is a header row we must ignore, which also provides us with column names. Note that these columns must be defined in the Create Table component and then selected in the Azure Blob Load component. The data is field delimited using commas and record delimited only by a carriage return. We choose not to define any time or date formats since our file does not contain any such data. There is also no need to define escape characters.
These realizations are reflected in our choices for parameterization of the Create File Format component, shown below:
We have named the new file format 'CUSTOMEXAMPLE' and so this is the format we choose in the Azure Blob Load Component's Format property. When selecting a format, many properties in Azure Blob Load will disappear, since the predefined format overrides those properties.
Note that the Create File Format component must be run before its file format becomes available in the dropdown options for available formats. Otherwise, the expected name can be entered, instead.
This Orchestration job is now ready to be run. A simple Transformation job with a Table Input component can then be used to sample the resulting table, a small snippet is shown below:
We can also see that this file format has been created and stored in our Snowflake account. To view file formats, visit your Snowflake account and browse to Databases → <DatabaseName> → File Formats.