Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 1.02 KB

README.md

File metadata and controls

22 lines (16 loc) · 1.02 KB

Delta Lake

Some example notebooks on how to use the delta lake

Introduction: https://docs.delta.io/latest/delta-intro.html

Most relevant (quite subjective):

  • Upserts and deletes: update / delete rows that change.
  • Time travel: being able to go back before updates / deletes (for example if updated data are faulty)

When time travel does make sense:

  • We have a table with updates / appends / deletions of rows.
  • We are only interested in the most recent data, i.e. if a row is updated, the old value is not of interest any more.
  • However, we are afraid to just overwrite the old data because new data might be faulty and we want to be able to recover the old data in that case.
  • However, we need to be able to reproduce the old state, for example to debug potential inconsistencies reported by users.

When time travel does not make sense:

  • We scrape data daily and are interested in the "history" of the data
    • bad idea: upsert data and then use time travel to loop through history
    • better: append data as new events.