Machine connectivity, no doubt, allows a host of new functions and features to be implemented across our industrial shop floors. However, the one that will feature highest on the wish list of owners and operators alike will be the possibility of assessing the health of a machine to understand fully its current and future operating state, needs for repair, maintenance, or any limitations that might affect operations in the near or distant future.
As the famous Niels Bohr quoted “Prediction is very difficult, especially about the future”.
Arthur K. Ellis – Teaching and Learning Elementary Social Studies (1970), p. 431
First, failures can only be predicted where they are expected. In your car, nobody expects the steering system to fail; it is simply too important and thus suitably simple and over-engineered. This has not always been the case – there have been failures in the early days of the automobile era, and arguably things might get dicey again in the future as we motorize steering systems for the benefit of self-driving cars. But as it stands, there are no sensors embedded in the drive shafts, gears, et al, that make up a conventional steering system because no failure is expected there, and should it ever happen, it would truly be “unpredicted” and therefore unpredictable.
Yet, in the same car, we expect the that the oil level might run low, and the tires may deflate, and we can guard against these failures through maintenance guidelines (also known as “kicking the tires”) and/or instrumentation (sensors), and based on that instrumentation prediction of failure becomes a possibility.
Over the last few years sensors have become cheap and the entire premise of the Industry of Things is the emergence of a simple and equally affordable platform that would allow measurements from these sensors to be collected and aggregated. Yet, the fundamental truth holds that engineers are unlikely to put sensors on components where no failures are expected, and even if they were willing to do so, nobody would know what measurement to take – where no failure is expected, nobody knows that to measure.
So, consequently we come to the second point, namely, how can we interpret the collected information to understand future risks and failures. It again boils down to how much we know, or believe to know, about the behavior of the equipment in question. A slowly dropping oil level in our car, for example, is a straightforward indication that a top-up will be in required, within a time frame that we can probably predict with good accuracy. On the other hand, an unusual vibration on the rear axle may have many different root causes, so no useful prediction will be possible without pulling other factors, or sensors, into the equation.
Observing a dropping oil level in the car is the first of 5 fundamental methods based on which we can anticipate the need for future operation, maintenance, service, or repair;
1 – Trend Analysis
Machines are, in general, straightforward in that a trend, once established, rarely changes course abruptly. A declining oil level will most likely continue to decline, and will reach “minimum fill level” within a predictable time period when a top-off is in order.
Engineers, operators, and designers are usually very good at “knowing” what to look out for, and what action to take in operating the equipment to remedy (maybe arrest or reverse) and observe trends to maintain a system in working order. In other words, there is a very direct and usually immediate link between the observed trend and that action, a linkage that is a direct consequence from the mechanical working of the machinery.
Sometimes these links are not that imminent, which gives rise to the notion of a digital twin.
2 – Analysis against a “Digital Twin”
The “digital twin” is a new concept that describes the idea of fully, completely, and mathematically describing the inner workings of a piece of actual real-world machinery or system. As we compare measurements from the real world (such as sensors, output, input, etc) with the mathematical model any deviation would, in principle, indicate a problem with either the real word machine (for example, an actual or impending failure), or with the digital twin (an inaccurate or incorrect description). Such gaps will inevitably occur, as machines in the real world rarely operate according to their theoretical descriptions – there are efficiency losses that are hard to quantify on the drawing board, for example, so often it will be more beneficial to observe changes in those gaps rather than the gaps themselves. In other words, we are likely to learn more for the first derivative of the actual deviations than the deviations themselves.
As compelling as this strategy appears, it is fundamentally limited in the shear complexity of creating accurate and all-encompassing digital twins, except for the most basic machines or machine components. This give rise to the third model.
3 – Machine Learning approaches
To begin with, I admit that I dislike the term “machine learning” as I consider learning a human activity that requires creativity and actual insight, both features I will not associate with machines. And, machines that actually learn scare me to death, since inevitably they would eventually be cleverer than us. Alas, the general public seems to be more comfortable with this than me, so here we go.
The approach taken here is based on the fact that machines, when fresh out of wrapping – they still smell new – probably work great. After all they have gone through quality control and been shipped from a reputable vendor, so there is all reason to believe that all is fine if they continue to work as long as possible exactly in the same way as when you switch them on for the first time. And thus there goes a strategy of observing this initial machine behavior using all the sensor technology available to the machine, and continue to observe those values as the machine goes through its life.
Any deviation, such as a vibration, that has never been there before, or a change in motor current, would be indication of an issue.
Determining what the actual issue is would then, as often, be left as an exercise to the operator, but no doubt rules could be established based on that insight to determine the right cause of (maintenance-) action.
4 – “Dumb” Numerical Analysis
As systems get more complex, integrated, and distributed, it will often be impossible to model its inner workings based on full engineering approaches. In such cases, we can hope to learn by observation, or as I prefer to call it: Dumb Numerical Analysis, through statistical methods such as multi-variable correlation analysis.
Beware, your model will only be as good as your observations. Many of the correlations found will be obvious and useless, some wrong, and on occasion you hit jackpot and you learn something genuinely new about your machinery.
5 – Crash and Learn
Finally, despite our best trying, failures will happen. And when they do, they are golden opportunities to learn: post analysis of recorded data can reveal trends & correlations that might not have caught our attention before. Now, we can assess, learn, and watch out for similar trends and occurrences in the future. Because same conditions will almost inevitable lead to the same result. Catching them early might avoid another crash.