Vacuum Tables Component

Vacuum Tables Component

This article is specific to the following platforms - Redshift.

Vacuum Tables Component

Perform a vacuum operation on a list of tables. Vacuum is a housekeeping task that physically reorganizes table data according to its sort-key, and reclaims space leftover from deleted rows.

Vacuum can be a very expensive operation. Depending on your use-case, vacuum may be unnecessary. For more information on the situations where a vacuum may be necessary, please refer to the Amazon Redshift Documentation. When in doubt; vacuum is almost always used at the end of an orchestration job.


Property Setting Description
Name Text The descriptive name for the component.
Schema Select Select the table schema. The special value, [Environment Default] will use the schema defined in the environment. For more information on using multiple schemas, see this article.
Tables to Vacuum Select One or more tables to Vacuum.
Note: Only one vacuum may be running at any one time across an entire Amazon Redshift cluster. Therefore vacuums may fail due to concurrent workloads. This is usually harmless if the same tables will be vacuumed again on the next run of the job. If this is the case, consider joining the "Failure" link of the component to an "End Success" to prevent vacuum failure from failing the whole job. See the example for more details.
Vacuum Options Select Reclaims disk space occupied by deleted rows in a table. Additional options are available to the user (see AWS Documentation for detailed information).
DELETE ONLY: Will not sort tables and is consequently quicker than other methods.
FULL: Is equivalent to DELETE ONLY if the target table is >95% sorted, otherwise will perform a full sort.
None: A default Vacuum operation. This is analogous to "FULL" in the current AWS implementation.
REINDEX: Analyzes interleaved sort keys and performs a FULL sort.
SORT ONLY: Sorts the table but does not reclaim disk space. Is quick at the expense of unclaimed memory.


In this example we have a long ETL job that is ended by cleaning up our data. This clean up involves deleting excess table data and then vacuuming whatever remains. This vacuum operation frees up space on the Redshift cluster.

In the Vacuum Tables component properties, shown below, we ensure the schema is chosen that contains our data. We also set Vacuum Options to FULL so that tables are sorted as well as deleted rows being removed.

In the 'Tables to Vacuum' property, you can select tables by moving them into the right-hand column, as shown below.

When this job is run, it will end by vacuuming up any rows marked for deletion in the requested tables, potentially saving a lot of space on the Redshift cluster.