By Jean-Baptiste Lanfrey, Manager – Application Engineering and Training Services at Mathworks Australia
When ensuring the successful deployment and adoption of a real-time streaming platform, system architects, data engineers, and security architects must address numerous challenges. This article highlights five such challenges that have the potential to negatively impact the usability, operation, maintenance, and security of the platform, as well as the data and devices connected to it.
- Choosing Data Formats, Schemas, and Development Frameworks
A streaming platform is really a big data platform in disguise, and stream data processing is just one aspect of its overall functionality. Streaming data will typically find its way to a large-scale data repository, either on-premise or cloud based. This data is used by analysts and domain experts to extract insights, understand trends, and develop algorithms that will be applied in real time to these data streams to enhance operations and extract further business value. The structure of the data and the tools that analysts, data scientists, and domain experts have at their disposal will greatly impact the adoption and application of the entire platform.
Because streaming data is typically composed of a series of time-stamped data packets, IT teams and engineers should search for tools with native data types for the processing of time-series data. This makes it easy to clean the data, visualise it, and explore patterns. The tools must also provide a development framework architected for the micro-batch/windowed processing necessary for stream processing. This framework will include provisions for storing the interim state of processing results and infrastructure to operationalise the resulting algorithms at scale. These capabilities help make the development and deployment of algorithms to your operational system more efficient and less error-prone.
When developing predictive maintenance algorithms such as Remaining Useful Life (RUL), condition monitoring, and Asset Performance Management (APM) applications, failure data for the assets being monitored is typically required. For high value assets or in the case of safety critical applications it may prove too costly or impractical to run these assets to failure. In this case it’s best to use system simulation tools to generate synthetic data that supplements real data for use in developing algorithms.
- Algorithm Testing, Validation, Deployment, and Life-Cycle Management
Developing predictive algorithms using static data is one thing, but engineers need to consider how algorithms are verified within the operational system to ensure that the operating conditions of the assets are being properly detected and reported. Algorithm life-cycle management must also be addressed to ensure the integrity of the models in production.
To address these challenges, the streaming platform should have the ability to replay archived streaming data for algorithm testing and validation. This step would typically be done on a test/validation system that is a smaller-scale replica of the production streaming system. It should leverage the debugging capabilities the algorithm development environment brings, taking into account the ability to set breakpoints, monitor variables, and generally understand how the algorithm is behaving while processing the streamed data in the same way the production system operates.
Just as simulated data is used in the algorithm development phase, system simulation tools should be used to generate and stream synthetic data to validate adverse or edge cases. Simulation can also be used to pump data into the full production system for acceptance tests, as well as an online stream for use as a benchmark.
As these systems become critical to the operation of a given organisation, the algorithms being deployed into them must be managed across their entire life cycle to ensure their integrity and proper use, similar to how enterprise application software is managed. In the development phase, this will include the ability to evaluate the accuracy and efficacy of an algorithm, manage the data that was used to develop the algorithm and evaluation results, and automatically document the results along with information on where the algorithm can be applied. In operation, it will be important to manage versions of your algorithms and compare performance with the baseline results captured in the development phase to determine if an algorithm needs tuning or updating, or if it will be retired and replaced altogether to ensure proper operation of the system. Considering that these platforms typically operate 24/7, the system should support hot deployment of algorithms, online updates of algorithms, and roll-back in the event an algorithm update encounters an issue.
- Unreliable Network Connectivity and/or Nondeterministic Delivery of Data from Remote Devices
How does the system handle packets that do not arrive in order or have been dropped? How do you architect a system that may not always be connected to the end devices? IT teams and engineers should use a platform that can shuffle and reorder packets before they are analysed. In many cases where machine data is being processed, techniques such as signal processing and time- and frequency-domain analysis are used for feature extraction. It’s important to note that for these techniques to be applied, data packets must be ordered based on the time the original event took place compared with the time the packet was ingested into the streaming system. The processing environment should also be designed to clean up time-series data using methods such as retiming, interpolating, and smoothing.
System architects should look for a platform that can support the deployment of algorithms to a variety of systems including cloud, on-premise, and edge/embedded devices. This capability enables portions of the processing to be localised to edge devices when there is intermittent connectivity. It also enables data reduction in cases of limited bandwidth between the edge devices and the centralised stream processing infrastructure.
- Scaling and Performance
Real-time streaming systems must react within set time periods to incoming data. They are continuously running and should adapt as new devices come online. The system should scale according to the partitioning that common streaming infrastructure typically employs. As new topics or streams are created, new stream processing contexts should be created as well. Look for the ability to leverage the elasticity of the cloud and container technologies. The system should manage the batching of data to ensure it’s not I/O bound (held up waiting for the time spent with input/output operations) and that it’s fully utilising the compute resources of the platform.
System architects must consider up-front the security aspects of the entire workflow, from development through deployment of the operational system. A streaming platform will need to integrate with existing security layers within the organisation to support such things as single sign-on (SSO) and to control access to data and systems based on a user’s role. This includes control of access to edge devices, especially for systems where the streaming platform is providing some sort of supervisory control over the devices. The platform must also support the protection of data, including encryption of data at rest and data in motion, and the intellectual property of the algorithms, especially if algorithms are accessible on a cloud system or are running on edge devices or systems. This can be a challenge because many analytics platforms operationalise algorithms as scripts or unpackaged code.
The Time Value of Streaming Data
Real-time analytics provide crucial insights, but there is a time value to insights gleaned from data. For example, a factory running machine health monitoring on its equipment needs to be alerted to a potential equipment failure to give the operator enough time to intercede. That means it’s extremely valuable for system architects, data engineers, and security architects to avoid potential barriers and successfully deploy real-time streaming platforms as efficiently as possible. By addressing the above challenges, IT teams will be able to use the system and begin extracting insights without delay.