1.2x Matillion ETL for Redshift Notes

1.2x Matillion ETL for Redshift Notes

Matillion ETL for Redshift 1.29.9

  • Matillion ETL for Redshift Introduces the ability to configure Matillion ETL in a highly available topology with fully active-active cluster. This feature is only available on large and xlarge instance types.
    • Jobs run from SQS, the API or the built-in Scheduler will now fail-over in the event of an instance failure.
    • Scheduled runs missed because a server is offline will be run when it becomes available again.
    • Once two or more members are in the cluster, a Cluster Info tab shows membership status and activity.
    • OAuth tokens, Database Drivers and RSD Profiles are replicated via the persistence database (postgres).
    • Logging from each node is sent to Cloudwatch.
    • Cloudformation Templates help you get started with a clustered Matillion.
  • New Jira Query component loads data from Atlassian's popular Software Development Platform.
  • New PayPal Query component can load payment and other data from Paypal Business accounts.
  • New ServiceNow Query component loads data from Servicenow’s IT Service Management (ITSM) platform.
  • New Stripe Query component loads data from Stripe’s payment platform
  • New Email Query component can query an IMAP based email system.
  • New YouTube Analytics component can query data from the YouTube Analytics API.
  • Excel Query can now load files from Google Cloud Storage, as well as Amazon S3
    • You only see S3 and/or GCS when you have credentials in the environment, otherwise they are hidden.
  • New option to drop a schema from the Environment Tree.
  • Specify a region in S3 Unload (to allow writing to buckets outside of the Redshift cluster's region)
  • S3 / Google Cloud Storage file browser enhancements.
  • Set advanced connection options during OAuth flow (e.g. to connect to a Salesforce Sandbox)
  • Warning: Manage Backups and View Audit haven't been removed, they have been moved to the Admin menu.
  • New External Table Output component - similar to Table Output but creates a Amazon Redshift Spectrum table over S3 data.
  • New "Add Partition" component (Amazon Redshift Spectrum only).
  • New "Delete Partition" component (Amazon Redshift Spectrum only).
  • Other Amazon Redshift Spectrum components have new Table Partitioning parameters.


IMPORTANT Upgrade Notes: All data-staging components now create a target table with a wider range of target data types. Mostly this will be transparent, however if your source data contains variables with the Boolean type, these will now be Boolean in Redshift too (previously, they were varchar true/false strings). This may have an impact on downstream logic so please test jobs after upgrade.


Matillion ETL for Redshift 1.28.7

  • Amazon Redshift Spectrum support. You can now run SQL Queries in redshift directly against data sets in your S3 data lake in Text, Parquet, SequenceFile and other formats. Matillion ETL 1.28 introduces first-class support for all key Redshift Spectrum features and will allow users to combine Amazon Redshift Spectrum data with regular Redshift data in transformations. These include:
    • Components for creating an External Tables over S3 Data.
    • Rewrite External Table writes redshift data into S3 and defines an external table to reference it.
    • All data-staging orchestration components (all components ending in "query") can write data to S3 and generate a compatible External Table to reference it. This will allow users to keep both small data sets and very large data sets in S3.
    • Amazon Redshift Spectrum schemas and tables are displayed in the Environment Tree.
  • Matillion no longer relies on creating views in Redshift to represent components.
    • Post upgrade we recommend using the “Delete Views” function on the Environment to remove existing views generated by Matillion. These will not be recreated and any other v_xxxxxxxxxx_xxxxxxxxxx views can be safely, manually removed.
  • A new Admin Menu allows administrators to:
    • Get the server log.
    • Update the Matillion server version.
    • Configure users (using either an internal user database or external directory server).
    • Configure SSL.
    • Note: This will replace the existing /admin application. This is currently retained on upgrade but will be removed in a future update. Please use the new Admin Menu where possible.
  • All transformation components support multiple outputs.
    • Separate Replicate component no longer mandatory.
  • Enterprise Features (these features are only enabled for users running m4.large or m4.xlarge instance types).
    • Automatic Job Documentation. Matillion ETL can automatically generate documentation for your ETL process. This tool will recursively search your jobs and include all job detail including linked notes and descriptions.
    • Auditing of User Actions with searchable Audit Log provides fine grained audit of every change to an ETL process.
    • Ability to use Matillion ETL with an external postgresql repository on RDS. Allows you to externalise all your Matillion Job and configuration data to RDS and take advantage or RDS features such as backups and point-in-time recovery. Please contact Matillion Support if you wish to take advantage of this.
  • Database Query now supports IBM Netezza data warehouses via JDBC.
  • S3 Server Side Encryption. Data written to S3 from any Query component, the S3 Put component or the S3 Unload component can now apply Server Side Encryption (SSE-S3 or SSE-KMS)

Matillion ETL for Redshift 1.27.4

  • The Python Script component has been upgraded and now supports use of Jython (the default), Python 2 and Python 3. This is useful for customers who wish to use pip modules that are not pure python.
  • New Search system. Find jobs, notes and component properties anywhere in a project via a new Search tab.
  • An upgraded UI toolkit delivers a smoother, faster user experience.
  • Upgrades to the Task Panel add the ability for the user to:-
    • Multi-select tasks to cancel in the task panel.
    • Collapse all expanded items.
    • Remove all completed tasks.
  • Export Jobs now allows you to multi-select a choice of jobs in the job tree and export them.
  • The S3 Put component now allows the user to grab data from a self-signed HTTPS endpoint.
  • Google Cloud users - Also check out new Matillion ETL for BigQuery.

Matillion ETL for Redshift 1.26.9-2

Matillion ETL for Redshift 1.26.9

  • A new Connection Manager allows you to see and control connected sessions. This will also prevent users from being locked out when they hit their connection limit.
  • All data-loading components now support a "Load Options" parameter to control:-
    • keeping the objects in S3 after the load completes for archive purposes.
    • Turning off automatic compression analysis.
    • Turning off automatic statistics gathering.
  • Users (using internal security) can be added/removed without requiring a restart.
  • The If component now logs the decision taken to the task panel. This will help users diagnose decision logic problems.
  • Matillion now runs on Java 8.

Matillion ETL for Redshift 1.25.3


New Data Connectors

New Components
  • New Text Output orchestration component simplifies export to CSV and other Text based formats.
    • Similar to S3 Unload, with support for headers
Other Changes
  • File Iterator now supports S3, you can loop over a list of files in an S3 bucket.
  • KMS Encryption option in password manager allows you to use AWS managed encryption keys to encrypt passwords in Matillion ETL. 
  • Run Transformation/ Run Orchestration components now support variable overrides to make it easier to run jobs in a reusable manner.
  • Added support for the boolean data type.
  • The scheduling test will check your maintenance window and warn of possible overlaps.
  • Some orchestration components such as Create Table have an SQL Tab so it is easy to understand the generated SQL.
  • Additional methods available on Matillion date variables will simplify using dates in variables.
  • New cleaned up and simplified sample tab.
  • Hundreds of other tweaks, minor improvements and bug fixes.

Matillion ETL for Redshift 1.24.6

New Data Connectors

New Components

  • Delete Tables - Remove table such as temp tables as part of an Orchestration.
  • S3 Get Object - Get S3 Objects and push them SFTP, HDFS and Windows File Shares.
  • File Iterator - Iterate over a list of pattern matched objects in an  FTP, SFTP, HDFS or Windows Fileshare.

S3 Load Generator

  • This tool helps generate compatible "Create Table" and "S3 Load" components by sampling delimited data files on S3 and guessing the layout.

Project Sharing

  • Private projects can be created
  • Projects have an owner who controls which other users can collaborate

Automated Backups

  • You may enable automated daily backups of the Matillion ETL for Redshift instance root volume

New Chat and Presence Features

  • You can see who else is collaborating with you, and chat to them. Chats are persisted to provide context on your project

Other Improvements

  • Create Tableand Fixed Flow components support additional data types (Integer, Date). More to follow.
  • S3 Put Object now supports S3 as a source (in case you have ZIP files on S3 that need unpacking before loading to Redshift).
  • The SQL component can now be used at the beginning of a flow.
  • We now include an API profile for Matillion's API to copy the run history to Redshift. The API Query component can be used to query this data and import to Redshift.

Plus hundreds of minor improvement and bug fixes.

Matillion ETL for Redshift 1.23.6

  • Updated Google Adwords driver to support latest API versions.
  • Fixed an issue with environment explorer showing tables and views.
  • Fixed an issue with join component validation.

Matillion ETL for Redshift 1.23.5

  • The Sample tab now allows filtering to assist debugging complex transformations.
  • Real-time validation of expressions in the Expression Editor
    • Your syntax is checked by Redshift as you type.
  • Jobs and folders in the explorer can be moved and copied in bulk.
  • Improved editor windows. You can see available variables and test your code without leaving the editor when writing Python and SQL Scripts.
  • Notes can now include bold, underlined and italic text, as well as hyperlinks.
  • The Task History is now searchable, and opens in a separate tab.
  • In the environment navigation browse the available tables, views and columns within each environment; drag and drop them into a Transformation.
  • On the Table Output Component "Analyze Compression" now supported an "If not compressed already" setting.
  • Python modules can now be installed with 'pip', and the latest boto3 API is now included by default for interaction with AWS services.
  • The S3 Load Component can specify an IAM Role ARN that is attached to your Redshift cluster.
  • The RDS Bulk Output Component now supports output to Postgresql databases.
  • The S3 Put Component can now read directly from HDFS.

​Matillion ETL for Redshift 1.22.5-2

  • Minor fixes for some customer issues
    • Socket timeouts removed and login timeouts increased for all drivers on the Database Query  component​.
    • Fix a problem where database upgrades for older versions fail during the upgrade process.
    • SFTP port number is not ignored for non standard ports in the S3 Put Object component.

​Matillion ETL for Redshift 1.22.5

  • Upgrades​
  • Non-blocking task queue allows users to collaborate more seamlessly without being blocked by each others requests.
    • Multiple runs of the same job will queue.
    • All other runs may happen concurrently, regardless of the environment.
  • New Components:-
    • Load data from Hubspot with the HubSpot Query Component.
    • Load Odata Sources with the OData Query Component.
    • Load Microsoft Excel Spreadsheets with the Excel Query component.
    • Load Google AdWords data with the Google AdWords Query component.
    • SFTP Put Object component will allow you to write transformed data from Redshift back to an SFTP server.
    • Retry Component allows automatic retrying and backoff which is most useful for 3rd party API's that are not 100% reliable. 
  • Enhancements
    • S3 Put Object now supports copying a file from a Windows File Share.
    • You can now run an orchestration job from part way through.
    • Profile editor for bulding data profiles to describe how API's map to tables and columns that can then be queried from the API Query Component
    • Import/Export can now include details of Variables and Environments
    • Notices/warnings/errors are now displayed on a new "Notices" tab.
  • Preview API to import/export entire projects, run jobs, monitor running jobs.
    • Can be used for integration to 3rd party source control management systems.
    • Ask support for more details on how to get started with this
  • Plus hundreds of performance improvements and minor features.

Matillion ETL for Redshift 1.21.5

  • Upgrades
    • Please backup the instance before upgrading! https://matillion.lightning.force.com/lightning/r/Knowledge__kav/ka04G000000kCw2QAE/view
    • If you are upgrading Matillion to 1.21.5 from a previous release, once you apply the updates and restart the application server (tomcat), the internal job repository will be re-written in a new format. Please be patient and do not restart the server (or tomcat) during this period.
    • If for any reason the upgrade fails, restore from backup and please contact support.
  • New Components
    • Google Spreadsheets Query
    • Marketo Query
    • RDS Bulk Output (for Aurora, MySQL and MariaDB)
    • Bash Script
    • CloudWatch Publish
  • Project Selection and Organisation
    • Open a job directly from the project chooser
    • Recently opened jobs are tracked
    • Project Selection is searchable
    • Create your own folder structure in the project tree
  • Other Enhancements
    • Centrally Managed Passwords
    • Manage Users, Software Upgrades and more through a new Admin screen
    • Copy/Paste settings between spreadsheets/text files and the Grid Editor
    • SNS/SQS/RDS Components will offer Topics, Queues and Endpoints to choose from (if the given credentials allow it)
  • Plus dozens of other minor improvements and fixes

Matillion ETL for Redshift 1.20.4

  • Upgrades
    • If you upgrade an existing instance, you may get errors running Python scripts that worked previously. See here.
    • If you previously committed changes made in the Python component (e.g. via cursor.connection.commit()) this is no longer necessary and may actually fail - please remove any such commits from scripts.
  • Concurrent execution of Orchestration Tasks
    • If your existing jobs make any assumptions about the order that components run in, be careful! Use the And component to ensure all of the components before it complete before the orchestration continues.
  • Scoped Environment Variables.
    • ​Variables can now be local or global (default). Concurrently running jobs see copies of local variables, but share global variables. This is useful if you re-use the same variables concurrently. For more information, see Using Variable
  • Connectors for:
    • Salesforce.com
    • Dynamics CRM
    • MongoDB
    • Google Big Query
    • LDAP
    • NetSuite
    • DynamoDB
  • Create View component
    • End a transformation job with a view definition instead of a table
  • Database Query now supports Teradata
  • S3 Put can unpack a zip file, and place the contents onto S3
    • This will use local storage on the instance temporarily
  • Table Input, S3 Unload and Table Iterator can all use a View as well as a table

Matillion ETL for Redshift 1.19.4

Major new features:

  • Ingest data from Twitter, Facebook and Google Analytics into Redshift 
    • Twitter, Facebook and Google Analytics all support OAuth authentication to keep your data secure
  • New components to provide full transaction control - Begin, Commit and Rollback
    • These will allow you to guarantee the consistency of your ETL jobs output.
  • Improved task cancellation will now also cancel RDS Load, Database Query and S3 Put, as well as give you a way of checking for cancellation within custom Python scripts.
  • New Detect Changes component will compare two input flows and detect if rows are identical, new, deleted or changed. This component supports a number of Data Warehousing use cases such as simplifying development of slowly changing dimensions

Minor features and fixes:

  • Support for EXPLICIT_IDS, this allows you to override ID fields when ingesting data from S3.
  • ALL incoming SQS messages get a response to the success or fail queue, even if the project/version/job was not found

Plus over a hundred other tweaks and improvements.

Matillion ETL for Redshift 1.18.5

New Components:-

  • S3 Manifest File Writer - This components add all S3 objects matching a regularly expression into a manifest file, ready to use in the S3 Load component.
  • Transpose Rows -  Aggregate data into delimited lists, a.k.a. List Aggregate
  • If Component- in an Orchestration job flow evaluates variables to conditionally execute parts of the orchestration flow
Improvements to Existing Components:-
  • Create Table now allows specification of identity (auto-increment) columns, primary keys and sort-key style
  • Components that create tables (Create Table, Rewrite Table, RDS Load, Database Load) can now specify whether the sort key is Compound or Interleaved.
  • S3 Load now supports the AVRO file format.
  • S3 Load and S3 Unload now support a master symmetric key for client-side encryption.
  • Aggregate component now supports an Approximate Count
Other new features:-
  • Many editors now support syntax-highlighting and auto-completion, including the Expression Editor.
  • User Defined Functions are now listed in the Expression Editor along with all the built in Redshift functions
  • Enable SSL between your AMI and the Redshift Cluster
Variable Exports:-
  • Each components can export runtime information into a user-defined variable. That variable can then be used in other components, including the new If component to direct the flow of a job.
  •  Support for DB2 in the Database Query component (subject to uploading your own driver)
  •  Dozens of minor enhancements and bug fixes to components, memory management, performance and documentation
  • Revalidation may be required before new component properties appear
  • A forced-refresh of the page may be required- this is usually [Ctrl-F5] in modern browsers.
  • You will not be able to 'undo' changes to before the time of upgrade.
  • After doing the upgrade, restarting tomcat and force-reloading your browser, if you have any issues please contact support: /articles/Knowledge/2282577"">Cancel Task
New Orchestration Components:
  • SQS Message -  post a message to an SQS Queue.
  • SNS Message - post a message to an SNS Topic.
  • Schema Copy - copy a number of tables from one schema to another in a transaction.
  • Database Query - Read Data from JDBC sources.

Matillion ETL for Redshift 1.15.2

  • Job scheduler. Launch orchestration jobs on a regular basis and configured scheduled data transformations
  • UI Improvements including Snap-to-Grid, Zoom
New orchestration components to integrate with other AWS Services:
  • RDS Query - Read data from RDS sources.
  • SNS Message - post a message to an SNS Topic.
New orchestration components:-
  • Analyze - Initiate a Redshift Analyze on a table
  • Vacuum - Initiate a Redshift Vacuum on a table
  • Truncate - Truncate a table

Matillion ETL for Redshift 1.14.5

  • New S3 Load/Unload components to help get data into and out of Redshift
  • Ability to manually define AWS credentials after AMI Launch
  • Internal caching makes running the same jobs repeatedly much faster.
  • New components can (i) update table rows, (ii) delete table rows, (iii) split a field on a delimiter
  • 72 other bug fixes and minor improvements