Machine Learning: The High-Interest Credit Card of Technical Debt by D. Sculley, et al.
Abstract:
Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored where possible. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns.
Under “entanglement” (referring to inputs) the authors announce the CACE principle:
Changing Anything Changes Everything
The net result of such changes is that prediction behavior may alter, either subtly or dramatically, on various slices of the distribution. The same principle applies to hyper-parameters. Changes in regularization strength, learning settings, sampling methods in training, convergence thresholds, and essentially every other possible tweak can have similarly wide ranging effects.
Entanglement is a native condition in topic maps as a result of the merging process. Yet, I don’t recall there being much discussion of how to evaluate the potential for unwanted entanglement or how to avoid entanglement (if desired).
You may have topics in a topic map where merging with later additions to the topic map is to be avoided. Perhaps to avoid the merging of spam topics that would otherwise overwhelm your content.
One way to avoid that and yet allow users to use links reported as subjectIdentifiers and subjectLocators under the TMDM would be to not report those properties for some set of topics to the topic map engine. The only property they could merge on would be their topicID, which hopefully you have concealed from public users.
Not unlike the traditions of Unix where some X ports are unavailable to any users other than root. Topics with IDs below N are skipped by the topic map engine for merging purposes, unless the merging is invoked by the equivalent of root.
No change in current syntax or modeling required, although a filter on topic IDs would need to be implemented to add this to current topic map applications.
I am sure there are other ways to prevent merging of some topics but this seems like a simple way to achieve that end.
Unfortunately it does not address the larger question of the “technical debt” incurred to maintain a topic map of any degree of sophistication.
Thoughts?
I first saw this in a tweet by Elias Ponvert.