Vacuum Tables
  • Dark

Vacuum Tables

  • Dark

This article is specific to the following platforms - Redshift - Delta Lake.


Perform a vacuum operation on a list of tables. Vacuum is a housekeeping task that physically reorganizes table data according to its sort-key, and reclaims space left over from deleted rows.

Vacuum can be a very expensive operation. Depending on your use-case, vacuum may be unnecessary. For more information on the situations where a vacuum may be necessary, please refer to the Amazon Redshift Documentation.

For more information about the vacuum process, read Vacuum.

When in doubt, vacuum is almost always used at the end of an Orchestration Job.


Redshift Properties

Property Setting Description
Name Text A human-readable name for the component.
Schema Select Select the table schema. The special value, [Environment Default] will use the schema defined in the environment. For more information on using multiple schemas, see this article.
Tables to Vacuum Select One or more tables to Vacuum.
Note: Only one vacuum may be running at any one time across an entire Amazon Redshift cluster. Therefore, vacuums may fail due to concurrent workloads. This is usually harmless if the same tables will be vacuumed again on the next run of the job. If this is the case, consider joining the "Failure" link of the component to an "End Success" to prevent vacuum failure from failing the whole job. See the example for more details.
Vacuum Options Select Reclaims disk space occupied by deleted rows in a table. Additional options are available to the user (see AWS Documentation for detailed information).
DELETE ONLY: Will not sort tables and is consequently quicker than other methods.
FULL: Is equivalent to DELETE ONLY if the target table is >95% sorted, otherwise will perform a full sort.
None: A default Vacuum operation. This is analogous to "FULL" in the current AWS implementation.
REINDEX: Analyzes interleaved sort keys and performs a FULL sort.
SORT ONLY: Sorts the table but does not reclaim disk space. Is quick at the expense of unclaimed memory.

Delta Lake Properties

Property Setting Description
Name String A human-readable name for the component.
Catalog Select Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.
Database Select Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.
Tables to Vacuum Table Select Select which tables to vacuum.
Retention Period Integer The retention threshold. The default is 7.
Retention Unit Select Select the unit of retention. Choose from Day, Hour, Week. The default is Day.