Predictive maintenance using a digital twin

When industrial equipment breaks, the resulting problem often is not the cost of replacing that equipment but rather the resulting forced downtime. A production line standing still may mean thousands of dollars lost every minute. Performing regular maintenance can help avoid unplanned downtime but does not guarantee equipment will not fail. Sam Oliver, senior application engineer at Mathworks, explains how to avoid this inevitability.

What if the machine could indicate when one of its parts was about to fail? What if the machine could even tell you which part needs to be replaced? Unplanned downtime would be reduced considerably. Planned maintenance would be more efficient, performed only when necessary rather than at fixed intervals. This is the goal of predictive maintenance: avoiding downtime by using sensor data to predict when maintenance is necessary. Many types of machinery are using predictive maintenance due to the growing prevalence of smart sensors and powerful controllers.

At the heart of developing any predictive maintenance algorithm is sensor data, which can be used to train a classification algorithm for fault detection. From this data, meaningful features are extracted in a preprocessing step, and then used to develop a predictive model by training a suitable machine learning algorithm. This algorithm is exported to simulation software such as Simulink for verification and finally deployed as code to the control unit of the machine.

However, it is often not possible to acquire data from physical equipment in the field under typical fault conditions. Permitting faults to occur in the field may lead to catastrophic failure and result in destroyed equipment. Generating faults intentionally under more controlled circumstances may be time-consuming, costly, and often even unfeasible.

A solution to this challenge is to create a digital twin of the equipment and generate sensor data for various fault conditions through simulation. This approach enables engineers to generate all sensor data needed for a predictive maintenance workflow, including tests with all possible fault combinations and faults in varying severity.

This article discusses the design of a predictive maintenance algorithm for a triplex pump using MATLAB, Simulink, and Simscape (Figure 1). A digital twin of the actual pump is created in Simscape and tuned to match measured data, and machine learning is used to create the predictive maintenance algorithm. The algorithm needs only the outlet pump pressure to recognise which components or combinations of components are about to fail.

Figure 1. Predictive maintenance workflow.

Building the digital twin
A triplex pump has three plungers driven by a crankshaft (Figure 2). The plungers are laid out so that one chamber is always discharging, making the flow smoother, reducing pressure variation, and thereby lowering material strain as compared with a single piston pump. Typical failure conditions of such a pump are worn crankshaft bearings, leaking plunger seals, and blocked inlets.

Figure 2. Triplex pump schematic and plot showing volumetric flow rate.

CAD data for pumps, which is often available from the manufacturer, can be imported into Simulink. This gives us a mechanical model of the pump for 3D multibody simulation. To model the dynamic behaviour of the system, the pump now needs to be complemented by the hydraulic and electric elements.

Some of the parameters needed for creating a digital twin can be found in the manufacturer’s data sheet (bore, stroke, shaft diameter, etc.), but others may be missing or are specified only in terms of ranges. In this example, we need the upper and lower pressures at which the three check valves feeding the outlet will open and close. We do not have exact values for these pressures as they depend on temperature or the fluid transported.

The plot in Figure 3 shows that simulating the pump with rough estimates (blue line) does not sufficiently match the field data (black line). The blue line resembles the measured curve to some extent, but the differences are obviously large.

Figure 3. Estimating parameters using measured data.

Simulink Design Optimization is used to automatically tune the parameter values so that the model will generate results that match the measured data. The parameters selected for optimisation are found in the Check Valve Outlet block in Simscape (Figure 4). The optimisation tool selects parameter values, runs a simulation, and calculates the difference between the simulated and measured curves. Based on this result, new parameter values are selected, and a new simulation is run. The gradients of the parameter values are calculated to determine the direction in which the parameter should be adjusted. Convergence is achieved quickly in this example since only two parameters were tuned. For more complex scenarios with more parameters, it is important to use capabilities that will accelerate the tuning process.

Figure 4. Tuning parameter values in Simscape.

Creating the predictive model

We now have a digital twin of our pump; the next step is to add the behaviour of failed components to the model.

There are various ways to add fault behaviour. Many Simulink blocks have dropdown menus for typical faults such as short or open circuits. Simply changing parameter values can model effects such as friction or fading. Here, three fault types will be considered: increased friction from a worn bearing, a reduced passage area caused by a blocked inlet, and seal leakage at the plungers. The first two faults require the adjustment of block parameters. To model leakage, a path needs to be added to the hydraulic system.

As shown in Figure 5, the selected fault conditions can be switched on and off either from a user interface (top) or from the command line in MATLAB (bottom). In the model presented here, all fault conditions are toggled using MATLAB commands. This way, the whole process can be automated using scripts.

Figure 5. Model leakage in the triplex pump. Parameters can be modified using the Pump block dialog box (top) or the command line (bottom).

In the simulation of the pump shown at the top of Figure 6, two faults have been enabled: a blocked inlet and a seal leakage at plunger 3. These faults are indicated by the red circles. The plot in Figure 6 shows the simulation results for outlet pressure as a continuous line (blue) and also sampled with noise (yellow). This is important because the pressure sensor of the real system also adds quantization noise to the signal. This noise must be part of the data we generate, for we need to train our fault detection algorithm with data that is as realistic as possible.

 

Figure 6. Top: Pump schematic showing the blocked inlet and seal leakage. Bottom: Plot of the outlet pressure simulation (blue line) and sampled with noise (yellow line).

The green box in Figure 6 indicates the normal value range for outlet pressure. There are spikes clearly leaving the normal range, indicating some fault. This plot alone would tell an engineer or operator that something is wrong with the pump. It is, however, still impossible to judge what exactly the fault is.

This simulation is now used to generate pressure data for the pump under all possible combinations of fault conditions. Approximately 200 scenarios have been created for the digital twin, each of which must be simulated numerous times to account for the quantisation effects in the sensor. Since this approach requires several thousand simulations, it is of great help for engineers to be able to speed up the data generation process.

One typical approach is to distribute simulations across the threads available on multicore machines or across several machines or computer clusters. Depending on the complexity of the problem, time constraints, and resources, this approach is supported by Parallel Computing Toolbox and MATLAB Distributed Computing Server.

Another approach is to use the Fast Restart feature in Simulink, which takes advantage of the fact that many systems require a certain settling time until a steady state is achieved. With Fast Restart, this portion of the test needs to be simulated only once. All subsequent simulations will start from the point where the system has reached steady state. In the current example, the settling time would make up about 70 per cent of the simulation time required for a single test (Figure 7). Consequently, about two-thirds of the simulation time can be saved using Fast Restart. Since it can be configured from the MATLAB command line and scripts, Fast Restart is perfectly suited for automating the training process.

Figure 7. Using the Fast Restart feature in Simulink to reduce simulation time.

The next step is to use the simulation results to extract suitable training data for the machine learning algorithm. Release 2018b of Predictive Maintenance Toolbox provides various options for extracting training data. Since the signal we are looking at here is a periodic one, an FFT appears most promising. As shown in Figure 8, the result is a small number of clearly separated spikes of different magnitude for individual faults as well as for fault combinations. This is the kind of data that a machine learning algorithm can handle very well.