Why do data-teams struggle to add value? Why do so many data warehouses and data initiatives fail to add value?
- The concept of raw data when there’s nothing raw about the data. Data that’s coming from external systems, via API, distributed logs like Kafka, message-busses, ftp or any other way is structured. There’s a schema, there’s a meta-data, there’s code that describes the data. Don’t ignore that information, use it! Collaborate with up and downstream teams to get the metadata instead of guessing what a json field might mean.
- All data stored into a central central database. Do you have to? Cloud database or not, the consensus seems that you first have to copy all data in a central place, often in an elaborate way before you can do anything. And then, the central database becomes a means to end. But all computers are connected, all software should have some api to access the data, so just use that directly.
- No regard for software engineering best practices. Separation of concern? Type systems? Component design? Unit tests? CI/CD? Observability? Rollbacks?
Decades of experience is deemed to complicated for data-teams. Data engineering is software engineering. Automatic tests, unit test, CI/CD, type-checks apply to your data product as well. Dealing with large amount of data doesn’t mean you can only test in production. - Related to the above, connecting to the underlying database directly using ODBC, JDBC or using a CDC (Change Capture System) like Debezium. You just bypass the whole application, domain layer. Changes in the database-schema bother you? That’s because you shouldn’t use it in the first place! Read data in way that can handle multiple versions. Use api to access the data.
- Data teams as a completely isolated team is also a fallacy. Handling data is a capability that should be part of any team. The data is part of the product. A separate data team is just as anti-pattern as a separate database team, front-end team or any team that can’t deliver value without costly hand-overs.
Want to know more? Or a more nuanced view? Use comments or see contact field on my blog, I am open for business.