Matillion Data Loader Overview
Matillion Data Loader is a SaaS (Software as a Service) data loading platform that extracts raw data from popular sources and loads it into cloud data platform destinations.
Some of the key features of Matillion Data Loader are:
- No-code pipeline creation to empower more users and speed up data pipeline creation and loading.
- A single platform for batch and change data capture pipelines, eliminating the need for multiple tools from different vendors.
- Integration with Matillion ETL for pre-built data transformations.
Matillion Data Loader provides two methods to load data:
- Provides a SaaS-based, incremental data loading experience to extract data from popular sources, and load the data into cloud data platform destinations at user-specified time intervals.
Change Data Capture (CDC) Loading:
- Provides a Hybrid SaaS, log-based data loading experience to capture real-time change events from enabled source databases, and load the change data into cloud data platform destinations in near real time.
Matillion Data Loader's change data capture capabilities go beyond point-in-time replication to capture and replicate all change events in near real time. It stores an immutable history of all change events to recreate an exact view of data at any point in time. Integration with Matillion ETL transforms raw change data into analytics-ready data, and opens up opportunities for fraud detection, marketing personalization, ecommerce recommendations, compliance auditing, AI/ML modeling based on current and historical data, and many other use cases.
Matillion Data Loader process flow
Matillion Data Loader provides an intuitive and streamlined process to create data pipelines. Setting up a new data pipeline takes three steps. First, identify the source and tables to extract. Second, identify the destination where you want to load the data. And third, set the frequency of load job execution.
- A source can be a database, a SaaS-based application (an API endpoint), or file storage containing the data that you want to load into a destination cloud data platform.
- Matillion Data Loader provides pre-built connectors to popular data sources like Salesforce, Facebook, Google Analytics, PostgreSQL, Oracle, MySQL, Excel, Google Sheets, and more.
- Please refer to Sources to learn about supported sources.
- A destination is a cloud data platform where data is loaded and stored.
- Matillion Data Loader supports popular cloud data lake, lakehouse, and data warehouse destinations, like Snowflake, Amazon Redshift, Google BigQuery, Delta Lake on Databricks, Amazon S3, and Azure Blob.
- Please refer to Destinations to learn about supported destinations.
- This is the pipeline setting, which is the final step in creating a data pipeline.
- You may select the frequency for batch pipeline generation as well as schedule pipeline runs.
For a more detailed view of Matillion Data Loader's UI, read the Matillion Data Loader Dashboard Overview
To get started, you will need:
- An active Matillion Hub account.
- A source application or database where your data resides.
- A destination where you want to load the data.
- Credentials to allow Matillion Data Loader to connect with the source and destination systems.
CDC Agent Setup:
Every CDC pipeline requires a Matillion Data Loader CDC Agent to orchestrate the data loading tasks.
- Agents must be installed within the cloud service provider of your choice before a CDC pipeline can be created.
- The CDC Agent requires access to the source database, the target cloud data platform, and a secrets management application, either AWS Secrets Manager or Azure Key Vault. A Platform Key is generated (if not already done so for this account) and stored in AWS Secrets Manager or Azure Key Vault.
- Cloud resources are created for use by the Agent such as storage and logs. See specific installation guides for more information:
- The Agent must be deployed to the appropriate environment with the correct environment variables.
- Agent status must be Connected for a CDC pipeline to succeed.
- Once the CDC Agent is connected, you can create CDC pipelines.
- Please refer to agent installation documentation for detailed information and consult your cloud administrator for help and permissions where required.
- A Source application or database where your data resides.
- A database or data warehouse Destination to which the data must be replicated.
- Access for Matillion Data Loader tool to connect with the Source and the Destination systems.
- A warehouse or cloud storage, to store your data. This can also be the destination warehouse that holds the data from your various CDC/Batch Pipelines.
- The data warehouse can be an existing destination or an external repository that you've set up for your pipelines. To use an existing destination as the activate data warehouse, you must offer Matillion extra permissions.
- Matillion currently supports the following cloud data warehouses:
- To understand the technical requirement for data warehouse setup, please read Technical Requirement.
- Access for Matillion Data Loader to connect to the warehouse systems.
For more information about signing up for Matillion Data Loader, please refer document on Signing up for MDL.