1.3x Matillion ETL for Redshift Notes
Matillion ETL for Redshift 1.39 (Features Video)
- S3 Load Generator tool interface has been reworked for a smoother experience.
- New Data Staging components to quickly and easily bring your data into Redshift:
- Redis Query Component pulls data from a target Redis database
- Splunk Query Component connects to your Splunk account.
- Salesforce Marketing Cloud Query component connects to a Salesforce Marketing Cloud account.
- New streamlined connectors are now available to easily pull data from your services into Amazon Redshift Spectrum external tables:
- Snapchat component
- Mandrill component
- Recurly component
- Lead/Lag Component now has the option of ignoring Null values.
- The creating Versions.
- Database Query and Zuora Query component.
- “Auto” Distribution style option added to many components to make use of Redshift’s automatic table distribution.
- A Default S3 Bucket may now be specified in when managing an Environment, allowing users to skip configuring a staging area in many components.
Matillion ETL for Redshift 1.38.8
- Updated authentication for Stripe Query.
- Improved handling of cluster synchronisation
- Reworked logged diagnostic information.
- Fixed an error where jobs could hang indefinitely due to a problem with the handling of temporary files.
Matillion ETL for Redshift 1.38 (Features Video)
- Configuring Environments now uses a multi-step wizard configuration complete with tooltips to help guide you through creating and editing Environments.
- CDC task creation is simplified with a new wizard UI and can pull data from two new source databases - PSQL and MSSQL.
- Reintroducing a dialog option that allows “Text Mode” configuration of Variables (namely, Manage Environment Variables) and Calculator Component has been updated to include additional JSON and Regex functionality.
- Product Improvement Metrics can now be gathered within Matillion ETL. When you update your instance of Matillion ETL for Amazon Redshift you will be prompted to share anonymized Product Improvement Metrics data. Matillion will use this information to build you a better product. Absolutely no personal data is collected through this service.
- Create Project now has a multi-step wizard configuration complete with tooltips to help guide you through creating a Matillion ETL for Amazon Redshift Project.
- A new Sharepoint Query component allows users to connect to their Sharepoint account to bring data into Amazon Redshift and transform it using Matillion.
- New DynamoDB Query component loads data from Amazon’s DynamoDB.
Matillion ETL for Redshift 1.37
Note: The Data Transfer component is now the preferred way to move files/objects between storage providers. Existing components such as S3 Get/Put, GCS Get will continue to work in existing jobs, but new jobs should use Data Transfer.
- New Data Loading orchestration components:
- DoubleClick for Publishers Query connects to Google’s DoubleClick for Publishers.
- Twilio Query retrieves data from the Twilio cloud communications platform.
- Spark Query allows users to bring in data from an Apache Spark database.
- Azure Cosmos DB Query connects and queries an Azure Cosmos database.
- All Data Loading orchestration components can now display data directly sampled from their source, allowing users to easily check their current configuration.
- All Data Loading orchestration components can now display the SQL generated by the component.
- CDC based on DMS gives users the power to automatically keep their Redshift tables in-sync with their source database tables using Matillion ETL.
- Support for creating External Tables over semi-structured external data hosted on S3.
- A new Nested Data Load component can flatten incoming nested data according to a user-defined structure. Unlike nested data, the flattened data can be fully-functional on Redshift.
- A new JDBC Table Metadata to Grid Component connects to many types of JDBC database and can export the metadata from a source table into a Matillion ETL Grid Variable.
- A new Migration Tool can help move any number of Matillion ETL assets directly from one Matillion ETL instance to another.
Matillion ETL for Redshift 1.36
- A selection of new components to connect to various services:
- Shopify Queryconnects to the user’s Shopify account.
- Survey Monkey Query loads data from a SurveyMonkey database.
- Zoho CRM Query retrieves Zoho CRM data.
- Dynamics 365 Sales Query connects to the Sales service in Dynamics 365.
- Dynamics 365 Business Central Query connects to the Business Central services in Dynamics 365.
- Many UX improvements including automatically connecting components on the canvas, improved variables workflow and new keyboard shortcuts.
- ORC and PARQUET file formats now supported in S3 Load.
- Selected Environments are now user-specific. Users can now specify their environments independently of one another.
- Users can now freely copy, cut, and paste jobs within a project.
- Autocompletion prompts now appear in many places when using Matillion ETL variables in code.
- Users can now search their environment tree to quickly find tables and views.
Matillion ETL for Redshift 1.35
- New “Data Transfer” Component that boasts all the functionality of the existing S3 Get, S3 Put and Cloud Storage Put components, plus additional source and target destinations (Azure Blob Storage).
- In addition to AWS and GCP credentials, environments can now reference Azure credentials to interact with Azure services such as Blob Storage.
- A new “Apache Hive Query” component connects to your Apache Hive data warehouse.
- A new “LinkedIn Query” component connects to your company’s LinkedIn apps.
- A new “Bing Search Query” component connects to the Bing Search API.
- A new “Bing Ads Query” component connects to the Bing Ads service.
- A new “Dynamics 365 Sales Query” component connects to the Dynamics 365 API.
- Allow uploading the native Microsoft SQL Server JDBC Driver (the bundled jTDS driver is often the fastest in scenarios where it works)
- New ‘Extract To New Job’ function available by left-clicking a selection of multiple components on the canvas. Allows users to instantly create new jobs from a group of components, tidying up workflows and helping to create reusable jobs.
- A new Table Properties panel (accessible via a right-click on a table in the Environment tree) shows table metadata. For key distributed tables, a chart showing data skew is shown. For external (Spectrum) tables, file locations and partition information are shown.
- The new “Salesforce Incremental Load ” Wizard allows users to quickly and easily set up incremental loads from Salesforce.
- The new “JDBC Incremental Load ” Wizard allows users to quickly and easily create incremental loads from a variety of popular database types.
Matillion ETL for Redshift 1.34.5
- Updated drivers for Cassandra Queryand Dynamics CRM Query.
Matillion ETL for Redshift 1.34
Important Notice: On upgrade, a background task will restructure the task history. During this time not all historic tasks will be available to view in the UI or API. The process only takes a few minutes in the general case but can take several hours if you have millions of run history items. This will require additional disk space (either on the instance or on RDS depending on your setup) so ensure you have at least 50% free space before attempting the upgrade.
- Shared Jobs:
- You can now turn your reusable orchestration jobs into their own components with their own parameters, help and Icon.
- Shared jobs can be packaged and distributed across multiple ETL instances with Import and Export.
- Historic Task Viewer:
- Previously completed tasks can be viewed on the canvas along with any parameter errors.
- You can understand the canvas state of a job and also see the jobs contained in a Shared Job.
- An “Unconditional” connector:
- Its now simpler to build orchestrations where the next orchestration step is run regardless of the success or failure of the prior step. This avoids use of extra “and” and “or” components to achieve the same thing.
- “Auto Debug” for all Data Loading components:
- Data Loaders come with the Auto Debug property. When switched on, allows users to choose between 5 levels of Debug Logging verbosity.
- Makes it easier to retrieve logging information without console access to the Matillion ETL Instance. Include these logs in your support requests for much faster turnaround!
- Warning: Can potentially consume large amounts of disk space. Do not leave this switched on unless directly in need of it!
- It is now possible to import, export and modify permissions via the API.
- Window Calculation component now supports “Standard Deviation” and “Standard Deviation Population”.
- Data Lineage:
- Allows you to understand the effect that your complex transformation jobs will have on your data. Track a column backwards to its source to determine where and how calculations are applied.
- This is an Enterprise Feature and thus is available to customers using large and xlarge instance types.
- OpenID Connect support for third party login providers:
- You can now configure Matillion ETL to authenticate with any Open ID Connect provider.
- Default support for Google, Microsoft and Okta plus a “Generic” option.
Matillion ETL for Redshift 1.33.10
- Hot fixes for Salesforce OOM
Matillion ETL for Redshift 1.33
Important: Queries using the Advanced Mode of the Google BigQuery Query Component will pass the SQL directly to BigQuery without any interpretation. If this causes any problem, please set the Connection Option 'Query Passthrough' to FALSE
Important: Users with the API role bypass some permissions checks (reads) during API calls.
- Open Exchange Rates Query component connects to the Grid Iterator allow iterating the values of a Grid Variable, similarly to iterating through a table of values.
- SQL Editor (in all Query components) now shows available Tables/Columns and Variables to help you author and test SQL queries from source systems.
- A new “Notices” V1 API endpoint allows you to query the current system notifications and post new messages which notify all users.
- New “User Configuration” and "Permission" V1 API endpoints allow user management via the Matillion API.
- Matillion no longer requires “listAllBuckets” permission (although this is still recommended)
- Job Variables (scalar and grid) now have a “Visibility” that determines how they are used elsewhere.
- All variables now have a description.
- 100+ bug fixes across all areas of Matillion ETL.
- When browsing recent STL Load Errors, more details are provided on the exact cause of parsing problems.
Matillion ETL for Redshift1.32.8
- Change: Allow Tomcat to start even if there is a corrupt schedule attached to a job. A bug fix to prevent the corrupt schedules attached to jobs in Matillion ETL 1.32 is currently in progress.
- Bug Fix: Prevented inappropriate authentication errors when using basic OAuth in the Zuora Query component.
Matillion ETL for Redshift 1.32
IMPORTANT : Ensure you have a backup before you upgrade. Security configuration changes are applied on upgrade. These changes cannot be reversed, so do not use “yum downgrade” (or similar) to attempt to get back to versions prior to 1.32.
Enterprise Only: This version of Matillion introduces a new Permissions system that allows users to:
- Setup users with fine grained permission sets that can limit the 100+ core functions of the tool
- Provides default permission groups:
- Reader - Read only user who can’t modify a project
- Reader with Comments - Reader with ability to add notes to jobs
- Runner - A user who can execute but not modify jobs
- Scheduler - A user who can execute, schedule and change related config
- Writer - A user who can create ETL jobs but not delete projects
- Additional permission groups can be added at any time and are organised hierarchically making them easy to set up.
- A new suite of Grid Variable components are now included to make populating and manipulating them simpler - often without requiring any scripting:
- Table Metadata to Grid - populate a grid variable from a table or view metadata (columns names, sizes etc)
- Query Result to Grid - populate a grid variable from the result of a query
- Append To Grid - Append values from one grid into another
- Remove From Grid - Remove values from a grid variable
- A new “SendGrid Query” component connects to the ElasticSearch Query” component to connect to the Magento Query” component to connect to the Zuora Query” component to connect to the GMail Query” component to connect to Google’s Alter WLM Slots Component
- This allows queries run by Matillion to make use of multiple WLM slots to give very complex queries additional memory without having to spill intermediate results to disk. You can determine which queries spill to disk with SVL_QUERY_SUMMARY and similar system views.
- This can be increased and decreased during a flow, so only those parts of the ETL that require it consume additional resources
- External (Domain-based) Login: You can now encrypt your Realm Password with the AWS Key Management Service (KMS)
Matillion ETL for Redshift 1.31.8
The "Google AdWords Query" component has been updated to support the latest Google AdWords API's.
Matillion ETL for Redshift 1.31.7
Important (possible breaking change): API Profiles ("RSD’s") that handle paging may need to be tweaked to disable “auto” paging. Please see here for more details.
Important (possible breaking change): API profile limits are now applied. Where the default of 100 is set it will now be applied. This could affect API Query Components which previously ignored that limit.
- Zendesk Query orchestration component for loading data from the Zendesk customer relationship system.
- Mixpanel Query orchestration component for loading data from Mixpanel product analytics system.
- Xero Query orchestration component for loading data from the Xero accounting system.
- Dynamics 365 Query orchestration component for loading data from Microsoft Dynamics CRM/ERP.
- API Profile RSD Generato
- Accelerate the development of API Profiles using a new tool that automatically generates a basic XML “RSD descriptor” for any API endpoint, based on a sample of data returned.
- New REST API Version 1 for provides full coverage of Matillion ETL functions:
- You can now read/write more assets (JDBC Drivers, credentials, SQS configuration) as well as allowing finer-control of which resources to include.
- The “v0” api is still available and unchanged but may become unsupported in future.
- Grid Variables System
- In addition to “scalar” (single-valued) variables, you can now define grid variables to hold lists and grids of values; use them wherever a compatible list or grid of values is required.
- Grid variables can be manipulated/modified in Python.
- You can pass values for grid variables when starting a job via SQS and/or the V1 API.
- You can now disable parts of an Orchestration job.
- Improved Matching in column mappings - Many transformation component “Column Mapping” parameters can now be automatically mapped, even when the input and output column names are similar but not identical.
- You can now delete tables/views directly from the Environment tree.
- External Tables based on Amazon Redshift Spectrum now support skipping header rows.
Matillion ETL for Redshift 1.30.6
- Redshift now supports Real/Double data types
- The External Table Output component for Redshift Spectrum now has partitioning support.
- New "Copy Table to External Schema" GUI tool generates new "Table Input/External Table Output" components with full partitioning support.
- Perfect for customers who want to try out Redshift Spectrum with their existing data.
- Redshift now supports "late binding" views on input components and view creation.
- The Google BigQuery Query component now supports standard SQL.
- New Server Migration Tool makes it easy to migrate all configuration including Oauth, API Profiles and Drivers in addition to projects to a new Matillion Cluster.
- Redesigned "Scheduler" user interface to simplify the management of scheduled orchestration jobs.
- New "Task Info" panel and "Task" panel make it much easier to understand complex tasks both at run time and after job execution.
- Matillion variables can be defined and scoped at job level making jobs much more reusable. Variables can now be passed to and returned from jobs.
- New Quickbooks Online Query component to connect to the popular online accounting system.
- New Square Query component to connect to the payment system.
- New Google Custom Search component allows google search data to be ingested.
- All data-staging components can append rows to an existing table as well as creating new tables.