Optimise
  • Dark
    Light

Optimise

  • Dark
    Light

This article is specific to the following platforms - Delta Lake.

Optimise

Optimise the layout of Delta Lake data with the Optimise component. You can optionally optimise a subset of data or colocate data by column. If you don't specify colocation, bin-packing optimisation is performed.

Bin-packing optimisation is idempotent. This means that if the operation is run twice on the same dataset, the second run has no effect. Bin-packing aims to produce evenly balanced data files with respect to their size on disk, but not necessarily the number of tuples per file. Typically, however, the two measures are often correlated.

Z-Ordering is not idempotent. However, Z-Ordering does aim to be an incremental operation. The time taken for Z-Ordering isn't guaranteed to reduce over multiple runs. Z-Ordering aims to produce evenly balanced data files with respect to the number of tuples, but not necessarily data size on disk. While the two measures are often correlated, situations can occur where this is not the case, leading to skews in optimisation times for tasks.

Properties

Delta Lake Properties

Property Setting Description
Name String A human-readable name for the component.
Catalog Select Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.
Database Select Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.
Table Select The Delta Lake table to be optimised. Only one table can be selected.
Partition Editor The partition column(s) to include in the optimisation process with the related condition. The default is none.
Z-Order Editor The column(s) to include in the optimisation process. This list should exclude any partition columns. The default is none.

What's Next