File Iterator Component

File Iterator Component



File Iterator Component

The File Iterator component lets users loop over matching files in a remote file system.

The component searches for files in a number of remote file systems, running its attached component once for each file found. Filenames and pathnames are mapped into environment variables, which can then be referenced from the attached component(s).

To attach the File Iterator component to another component, use the blue output connector and link to the desired component. To detach, right-click on the attached component and then left-click Disconnect from Iterator.

If you need to iterate more than one component, put them into a separate Orchestration or Transformation Job and use a Run Transformation or Run Orchestration component attached to the iterator. In this way, you can run an entire ETL flow multiple times, once for each file found.

Note: All iterator components are limited to a maximum 5000 iterations.


Properties

The below table cites the File Iterator component's configuration properties and any actions required by the user.

Some configuration properties are dependent on the settings chosen for earlier properties. With this in mind, not all properties in this table will required for your Orchestration Job.

Property Setting Description
Name String Input the descriptive name for the component.
Input Data Type Select Select the remote file system to search. Avaliable data types include: Azure Blob Storage, Cloud Storage, FTP, HDFS, S3, SFTP, and Windows Fileshare.
Input Data URL String / Select Select or input the URL, including the full path and file name, that will point to the files to download to the selected staging area. Once you have selected the connection's Input Data Type, Matillion ETL will provide a template URL string.
Note: Special characters used in this field (e.g. in usernames and passwords) must be URL-safe. For more information, please refer to our Safe Characters documentation.
Domain String Input your connection domain.
SFTP Key String Input your SFTP private key. This property will only be used if the data source requests it. This property is only available when the Input Data Type is set to SFTP.
Username String Input your URL connection username. This property will only be used if the data source requests it.
Password String Input your URL connection password. This property will only be used if the data source requests it. Users can store passwords in the component itself, or use the secure Password Manager feature (recommended).
Set Home Directory as Root Select No: Designates that the URL path is from the server root.
Yes: Designates that the URL path is relative to the user's home directory. Default setting is Yes.
This property is only available when the Input Data Type is set to either FTP or SFTP.
Recursive Select No: Only search for files within the folder identified by the Input Data Url.
Yes: Consider files in subdirectories when searching for files.
Max Recursion Depth Integer Set the maximum recursion depth into subdirectories. This property is only available when Recursive is set to Yes.
Ignore Hidden Select No: Include "hidden" files.
Yes: Ignore "hidden" files, even if they otherwise match the Filter Regex. Default setting is Yes.
Max Iterations Integer Set the total number of iterations to perform. As mentioned earlier, the maximum cannot exceed 5000.
Filter Regex String The java-standard regular expression used to test against each candidate file's full path. If you want ALL files, specify .*
Concurrency Select Concurrent: Iterations are performed concurrently. This setting requires all "Variables to Iterate" to be defined as local variables, allowing each iteration its own copy of the variable, isolated from the same variable being used by other concurrent executions. The maximum concurrency is limited by the number of available thrreads (2x the number of processors on your cloud instance).
Sequential: Iterations are performed in sequence, and Matillion ETL waits for each iteration to complete before starting the next. Default setting is Sequential.
Variables Variable An existing environment variable to hold the given value of the Path Selection.
Path Section For each matched file, the target variable can be populated with the Base Path, the Subfolder (useful when recursing), or the Filename. You can export any or all of these into variables used by each iteration.
See the example section in the full documentation for the difference between these.
Break on Failure Select No: Matillion ETL will attempt to run the attached component for each iteration, regardless of success or failure.
Yes: Matillion ETL will fail the job run if a single iteration failure occurs.
Notes: If a failure occurs during any iteration, the failure link is followed. This parameter controls whether the job fails immediately or after all iterations have been attempted.
This property is only available when the Concurrency property is set to Sequential
Record Values In Task History Select Choose whether to record iteration values in the Matillion ETL Task History.

Variable Exports

This component makes the following values available to export into variables:

Source Description
Iteration Attempted The number of iterations that this component attempts to reach (Max Iterations parameter).
Iteration Generated The number of iterations that have been initiated. Iterators terminate after failure, so this number will be the successful iterations plus any potential failures.
Iteration Successful The number of iterations successfully performed. This is the max iteration number, minus failures and any unattempted iterations (since the component terminates after failure).


Example

This example shows how specific files can be transferred from an S3 bucket to a Google Cloud Storage bucket. This will be done by using the File Iterator component in conjunction with the Data Transfer component.

The File Iterator component is set up to point to an Input Data URL (this is the Base Path). The File Iterator recurses any found folders/directories (this is the Subfolder), and matches files like "sales_.*.gz" (this is the Filename).

In this example, the variable mapping is set up to provide both the "subfolder" and the "filename" into environment variables.

Those variables can then be referenced from the attached Data Transfer component both in the Input Data URL and Target Object Name.

At runtime, any matching files are uploaded to the Google Cloud Storage bucket.



Video

Unfortunately, this component does not yet have its own video. We are working hard on delivering videos for every part of Matillion ETL and this is no exception.

Contact Support

If you require further help with the File Iterator component, please read our Getting Support page.