In part 1 of this series, we looked at Machine Learning (ML) and how it can revolutionize network analytics, but that isn’t to say it doesn’t have it’s issues. Although ML is an exciting technology that has many useful applications, ML based solutions come with a variety of new challenges.

The single largest problem with ML in network analysis is that each network is unique and, more importantly, is constantly changing. This means that once you’ve successfully trained a set of modules to recognize various conditions, you’ll probably need to do it again. This meta-level training will improve as retraining itself can be automated, but it is easy to get lured into a false sense of complacency. The classic ML error is to not maintain a validation set independently from the training data; as a result, the system appears to be nearly perfect, but really has just memorized the answers to the test. When presented with novel examples, the result is effectively random, because the network has not actually generalized anything from its inputs.

Tuning ML algorithms is still very much an art. Even simple mechanisms like logistic regression have a few hyper-parameters which must be adjusted to get valid results. Large and deep neural networks can have hundreds of parameters, and the selection of algorithm can be considered yet another. This means that you need skilled, knowledgeable staff, trained in data science to develop new modules, and even if you’re using off-the-shelf tools, there’s a good change you can trick yourself into thinking the system is working when it’s just spitting out pretty trash.

The fact is, machine learning is not a magic bullet that can solve any problem.  Machine learning rarely provides new insights, but what it does do well is to pay attention, relentlessly. This means if you can correctly label the training data, you can probably create a module that will do as well as the people who did the static analysis of it. This is the breakthrough difference with machine learning: humans can spend time on what they are good at – discovering and recognizing novelty—and the ML tools can perform the rote tasks.