Eyes Wide Shut Data Lineage #1

  • Anton Revyako
    Anton Revyako

    Founder of dwh.dev

Eyes Wide Shut Data Lineage #1

Data people, I have some bad news for you.

The entire concept of column-level data lineage as you know it is a lie. It's broken, and no one is willing to admit it.

This applies across all databases, open-source libraries, commercial products, and dynamic and static analysis tools. Column-level data lineage is fundamentally flawed from start to finish.

The issue isn't just about the lack of support for specific database vendor syntax, or the inability of most products to detect relationships in basic constructs like PIVOT. That's a given :)

The problem lies at the core mathematical level.

Remember what we learned in school? If we know that x = y, we can substitute one for the other anywhere. Right? Now, take a look at this query:

SELECT T1.C1 FROM
  T1 JOIN T2 ON T1.C1 = T2.C3;

We get T1.C1 as the lineage result, correct?

But T1.C1 = T2.C3, which means this query is equivalent to:

SELECT T2.C3 FROM
  T1 JOIN T2 ON T1.C1 = T2.C3;

See what's happening here? The lineage of the upstream column T2.C3 is hidden from your view!

Have you ever encountered a tool that reveals this to you? Sure, you'll see that there's a dependency on table T2 at the object level. But no details. Good luck debugging that!

It gets even worse! If x = y and y = z, then x = z. Right?

SELECT C1 FROM
  T1 
    JOIN T2 ON C1 = C3
    JOIN T3 ON C3 = C5;

You get the point…

All of this becomes even more complicated when you add mathematical operations, function calls, type conversions, unions…

At dwh.dev, we've long had a tool for visualizing JOINs both at the column-level data lineage between objects and in our unique in-query lineage mode.

But today, we're introducing a new and unique feature: EQUALS COLUMN LINEAGE. Now, you won't lose those crucial pieces of data lineage that arise from equality in JOIN and WHERE conditions!

Here's what the full data lineage would look like for the examples above:

dwh.dev in-query equals data linage

dwh.dev in-query equals data linage

Come give it a try! We value your feedback!

We provide all the data lineage information for other data startups through our API. We'd be happy to share!

This feature also opens the door to other impressive and unique functionalities that you'll soon see at dwh.dev