Skip to content
You're viewing the beta version. Looking for legacy docs? Click here.

What are deletion vectors?

Deletion vectors are a storage optimization feature that can be enabled on Delta Lake tables. By default, when a single row in a data file is deleted, the entire Parquet file containing the record must be rewritten. With deletion vectors enabled for the table, some Delta operations use deletion vectors to mark existing rows as removed without rewriting the Parquet file. Subsequent reads on the table resolve current table state by applying the deletions noted by deletion vectors to the most recent table version.

Support for deletion vectors was incrementally added with each Delta Lake version. The table below depicts the supported operations for each Delta Lake version.

OperationFirst available Delta Lake versionEnabled by default since Delta Lake version
SCAN2.3.02.3.0
DELETE2.4.02.4.0
UPDATE3.0.03.1.0
MERGE3.1.03.1.0

You enable support for deletion vectors on a Delta Lake table by setting a Delta Lake table property:

ALTER TABLE <table_name> SET TBLPROPERTIES('delta.enableDeletionVectors' = true);

Deletion vectors indicate changes to rows as soft-deletes that logically modify existing Parquet data files in the Delta Lake tables. These changes are applied physically when data files are rewritten, as triggered by one of the following events:

  • A DML command with deletion vectors disabled (by a command flag or a table property) is run on the table.
  • An OPTIMIZE command is run on the table.
  • REORG TABLE ... APPLY (PURGE) is run against the table.

UPDATE, MERGE, and OPTIMIZE do not have strict guarantees for resolving changes recorded in deletion vectors, and some changes recorded in deletion vectors might not be applied if target data files contain no updated records, or would not otherwise be candidates for file compaction. REORG TABLE ... APPLY (PURGE) rewrites all data files containing records with modifications recorded using deletion vectors. See Apply changes with REORG TABLE

Reorganize a Delta Lake table by rewriting files to purge soft-deleted data, such as rows marked as deleted by deletion vectors with REORG TABLE:

REORG TABLE events APPLY (PURGE);
-- If you have a large amount of data and only want to purge a subset of it, you can specify an optional partition predicate using `WHERE`:
REORG TABLE events WHERE date >= '2022-01-01' APPLY (PURGE);
REORG TABLE events
WHERE date >= current_timestamp() - INTERVAL '1' DAY
APPLY (PURGE);