This post provides insights into Delta Live Tables, distinguishing between Streaming Tables and Materialized Views, highlighting their features, differences, and best practices for Delta Live Pipeline Triggering Modes.

What is a streaming table (Streaming Live Tables)?

A delta table that has structured streaming writing to it.

Streaming table reads from append-only data sources.
Streaming table allow you to reduce cost and latency by reading each input record only once.
Streaming table supports DML (UPDATE, DELETE, MERGE) for ad-hoc data manipulation
Best used for data ingestion with clould_files
Stateful: each input row is processed only once, even if query changes. A change to a streaming table’s definition does not reread already processed data.
You can do Full Refresh to recompute the data, this only works when your streaming source has full retentions.
When stream-stream job, time bound and watermark is a must-have. Otherwise, the join will buffer all data forever.

What is a Materialized View (Live tables)?

It is a result of a query, precomputed and stored in Delta

It always return the result of the defining query, at the moment it was last updated.
You cannot modify data in the materialized view, you can change its query instead.

Differences between Streaming Table vs. Materialized View

Streaming table handles update and deletes to their inputs
Support any aggregation / windows functions
Joins work the same as in ad-hoc queries
Automatically handle changes to the query definitions

Unlike streaming, MVs are allowed to read the input more than once, and recompute from scratch when necessary

Enzyme optimize the MV using incremental refresh, to recompute data only when necessary:

Appending new data as it arrives [only works for select/project/inner join]
Partition Re-computation: only recompute affected partitions. [help aggregation & updates, but only works when input & output have the same partition]
Merge updates to specific rows [handle complicated queries updates/inserts/deletes]

Delta Live Pipeline Triggering Mode

Delta Live Pipeline allows users to choose how often to update the results.

Triggered Manually (Lowest on Cost, Highest on Latency)
Triggered on a schedule using Databricks Jobs
Continuously (Highest on Cost, Lowest on Latency)

Best Practice

No data manipulation on the bronze layer to keep source intact for re-calculation
Updates to a streaming table will break downstream computation; Updates includes 1. Non-insert DML performed on a streaming table. 2. Streaming tables that compute aggregations without a watermark. 3. Streaming table used with APPLY CHANGES INTO
skipChangeCommit helps to skip changes rather than fail, but DML must manually used to propagate changes downstream

In summary, understanding the characteristics of Streaming Tables and Materialized Views, along with optimal Delta Live Pipeline Triggering Modes, is crucial for effective data processing, manipulation, and maintaining the integrity of downstream computations in Delta Live Tables.

Reference

https://www.youtube.com/watch?v=PIFL7W3DmaY&t=4249s

Breaking

Streaming Tables vs. Materialized View

What is a streaming table (Streaming Live Tables)?

What is a Materialized View (Live tables)?

Differences between Streaming Table vs. Materialized View

Delta Live Pipeline Triggering Mode

Best Practice

Reference

By Hang

Related Post

Leave a Reply Cancel reply

You Missed

Snail Sort – {kata} – [arrays, matrix, traversal] – (medium)

All Balanced Parentheses – {Kata} – [Recursions, Backtracking, Strings] – (Medium)

Next Small Number with the Same Digits – {Kata} – [Math, TwoPointers] -(Medium)

Evolution of Data Architectures

Streaming Tables vs. Materialized View

What is a streaming table (Streaming Live Tables)?

What is a Materialized View (Live tables)?

Differences between Streaming Table vs. Materialized View

Delta Live Pipeline Triggering Mode

Best Practice

Reference

By Hang

Related Post

Understanding Delta Lake: Underlying Data Format, ACID Transaction Support and Scaling Metadata

Delta Lake DDL Cheat Sheet

Delta Live Table Change Data Capture

Leave a Reply Cancel reply

You Missed

Snail Sort – {kata} – [arrays, matrix, traversal] – (medium)

All Balanced Parentheses – {Kata} – [Recursions, Backtracking, Strings] – (Medium)

Next Small Number with the Same Digits – {Kata} – [Math, TwoPointers] -(Medium)

Evolution of Data Architectures