S3 Put Object Component

S3 Put Object Component



S3 Put Object


Note: This feature is only available for instances hosted on AWS.


Transfer a file from a remote host onto Amazon S3.

This component can use a number of common network protocols to transfer data up to an S3 bucket. This component copies, not moves, the target file. In all cases, the source data is specified with a URL.

Currently supported protocols are:

  • FTP
  • HDFS
  • HTTP
  • HTTPS
  • SFTP
  • Windows Fileshare
  • S3 Bucket
  • Google Cloud Storage





Example URLs

Each protocol, when entered for the first time, will have a sample URL associated with it, detailing the structure of the URL format for that protocol.

Protocol Sample URL
FTP ftp://[username[:password]@]hostname[:port][path]
HDFS hdfs://host:port/filePath
HTTP http://[username[:password]@]hostname[:port][absolute-path]
HTTPS https://[username[:password]@]hostname[:port][absolute-path]
SFTP sftp://[username[:password]@]hostname[:port][path]
Windows Fileshare smb://[[[authdomain;]user@]host[:port][/share[/dirpath][/name]]][?context]
S3 Bucket s3://[bucketname][/path]     More...
Google Cloud Storage Bucket gs://[bucketURL][/path]


Square brackets indicate that that part of the URL is optional. In particular, whether the username and password are entered within the URL is discouraged - it CAN be done, but it poses a potential security risk, and may not work. Entering the username and password in the parameters provided is the preferred style.

Note:Special characters used in the URL field must be URL-safe. See documentation on URL Safe Characters for more information. Where possible, use the username and password fields to avoid special characters interfering with the URL (this also means passwords are not stored as plain text).


Variable Exports

This component makes the following values available to export into variables:

Source Description
Bytes written The number of bytes read from the source and written to S3. If you have selected the Gzip option, this byte count is after compression.


Example

In this example, S3 Put Object is used in an Orchestration job to take a file from a government website and load it into an S3 Bucket. This data from this file can then be loaded into a newly made table.

S3 Put Object requires a link to the file as well as several properties. The Input Data Type is HTTP to match the Input Data URL. Since our data is zipped, we choose to Unpack the ZIP file and since it is a public file, we need no Username or Password. The S3 Path is set to a bucket we own and have write permissions to. Finally, we choose not to Gzip the data as we'll be using it immediately.

When run (to run alone, right click the component and select 'Run Component'), S3 Put Object will unzip and deposit the file given by the URL into the specified S3 Bucket. Note that the name of the copied file will be the name of the file that has been zipped, not the ZIP file itself. In this case, the resulting file is 'Vehicles_2015.csv'. The CSV file is now present on the S3 Bucket and can, among other things, be used as a data source for a table.

A table named 'Vehicles 2015' is created using the Create/Replace Table Component. It is often a good idea to run this component independently by right-clicking the component and selecting 'Run Component', ensuring the table already exists for when the S3 Load component attempts to verify it.

The new table can then be loaded with data from the CSV file we imported using the S3 Load component. The S3 Load component is pointed toward the imported file using the correct S3 Bucket using the 'S3 URL Location' property and then to the imported file using the 'S3 Object Prefix' property. It is important that the component has the Data File Type set to CSV, since we are working with a CSV file. The 'Target Table Name' allows us to choose which table the data will be populating. More information, see documentation on S3 Load and S3 Load Generator.


When run, the S3 Load Component will take data from the file and place it into the table 'Vehicles_2015". Finally, we can use a single Table Input Component in a Transformation job to check the data is correct. The component is simply pointed to our newly-made table 'Vehicles_2015' and we retrieve a sample from the Sample tab. For more information, see the Table Input Component documentation.


Videos