Instrumenting collaboration tools used in data projects:Built-in audit trails can be useful for reproducing and debugging complex data analysis projects by Ben Lorica.
From the post:
As I noted in a previous post, model building is just one component of the analytic lifecycle. Many analytic projects result in models that get deployed in production environments. Moreover, companies are beginning to treat analytics as mission-critical software and have real-time dashboards to track model performance.
Once a model is deemed to be underperforming or misbehaving, diagnostic tools are needed to help determine appropriate fixes. It could well be models need to be revisited and updated, but there are instances when underlying data sources1 and data pipelines are what need to be fixed. Beyond the formal systems put in place specifically for monitoring analytic products, tools for reproducing data science workflows could come in handy.
…
Ben goes onto suggest that an “activity log” is a great idea for capturing a work flow for later analysis/debugging. And so it is, but I would go one step further and capture some of the semantics of the work flow.
I knew a manager who had a “cheat sheet” of report writer jobs to run every month. They would pull the cheat sheet, enter the commands and produce the report. They were a roadblock to ever changing the system because then the “cheatsheet” would not work.
I am sure none of you have ever encountered the same situation. But I have seen it in at least one case.