I’ve been trying to build a proper data lineage map for our team’s reporting layer, and honestly, I’m a bit stuck. I started manually tracing everything from our raw ingestion tables through all the transformations, but it feels like I’m constantly chasing changes and missing dependencies. I’m curious if anyone else has hit this wall and how you approached it without it becoming a full-time maintenance job.
I get the grind. I spent days chasing a downstream dependency and still found a miss. data lineage work can feel endless until you lock in a few non negotiables like who owns each step and what you measure.
One route is to treat the map as a product and automate as much as possible. tag sources in a catalog, capture lineage with automated scanners, and keep a lightweight change log. show the critical paths first and keep a simple viewer.
It is easy to think you can map every link but the reality is the map will drift. perhaps focus on the most used dashboards and the top five data producers and accept some opacity in the rest.
Maybe the question to ask is not how to map every step but what business questions the map should answer. if your aim is to prove trust in the numbers you might frame it around outcomes rather than every transform.
As a writer you might frame the lineage as a story about data products and owners. let the map evolve as teams change and use a living doc approach.
I tried a hybrid of automation and human notes. start with core tables then add transformed views and document each hand off. keep it lightweight and review quarterly.
If you want I can sketch a minimal template for a data lineage map and a one page guide about ownership and gates.