-
DarkLight
Optimise
-
DarkLight

Optimise
Optimise the layout of Delta Lake data with the Optimise component. You can optionally optimise a subset of data or colocate data by column. If you don't specify colocation, bin-packing optimisation is performed.
Bin-packing optimisation is idempotent. This means that if the operation is run twice on the same dataset, the second run has no effect. Bin-packing aims to produce evenly balanced data files with respect to their size on disk, but not necessarily the number of tuples per file. Typically, however, the two measures are often correlated.
Z-Ordering is not idempotent. However, Z-Ordering does aim to be an incremental operation. The time taken for Z-Ordering isn't guaranteed to reduce over multiple runs. Z-Ordering aims to produce evenly balanced data files with respect to the number of tuples, but not necessarily data size on disk. While the two measures are often correlated, situations can occur where this is not the case, leading to skews in optimisation times for tasks.
Properties
Delta Lake Properties | ||
---|---|---|
Property | Setting | Description |
Name | String | A human-readable name for the component. |
Catalog | Select | Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter. |
Database | Select | Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup. |
Table | Select | The Delta Lake table to be optimised. Only one table can be selected. |
Partition | Editor | The partition column(s) to include in the optimisation process with the related condition. The default is none. |
Z-Order | Editor | The column(s) to include in the optimisation process. This list should exclude any partition columns. The default is none. |