Part 5 : Event Processing
When messaging systems were introduced, the systems started working closer in real-time. The staggered approach based on a scheduler was replaced with real-time systems that leveraged messaging platforms. As messages were received, in a typical process flow, data was parsed, enriched, transformed, and/or aggregated as needed. The result was stored in an Operational Data Store (ODS), and then messages were published out to the next system. The change in approach was huge considering the messages were independently processed, regardless of any schedules.
As messages (events) started flowing into systems in real-time, doors opened-up for newer possibilities. System owners understood that potential indicators of market insights were missed if systems were not able to correlate real-time events. The old way of processing messages independently lacked this correlation. It was not equipped to utilize the full potential of the knowledge that could be inferred through correlating a series of events. A new paradigm emerged, event-based architecture, that provided a way of correlating events that were seemingly unconnected, but when combined, gave valuable actionable insights to the systems. Event-based systems started solving interesting use-cases as they started aggregating data and looking for event-patterns – they could perform predictive analysis; they could build self-adjusting systems based on the events that are coming in. Event-based systems started pushing the throughput barriers. An event-based system along with an in-memory data grid formed a powerful combination as there were almost no I/O operations during processing.
So far, systems were still dealing with bounded data. They were able to optimize the usage of resources based on expected loads and ensured that the systems were ready for peak-loads through vertical and horizontal scaling. Over time, the number of events shot up exponentially, and systems started searching for solutions for further optimization. A closer look into the systems revealed that the multiple hops the same message makes from one component to another, within the same system, was increasing the latency on processing.
There were specific use-cases that showed massive performance improvements when individual events were processed (transformed/aggregated) and discarded, and only the aggregated data was stored. Also, more importantly, the data was aggregated in-flight and the aggregated data stayed in-memory for quick access. The individual events were treated as transient data and discarded after the aggregation. For instance, an event that carried data of an updated flight schedule is only valid until the next update is received. An aggregated data with the latest update and the total number of updates can be more useful than all the individual events leading up to the last update.
In our earlier post, we have discussed on how treating each message as an event, in a complex event processing (CEP) system, can impact the response of subsequent events. We briefly discussed on “state” of a system, and how each event can potentially change it. With Streaming, an essential part is processing these events while still they are in-flight! The output of one stream could be passed on as an input to the next one, thus forming an entire topology that runs in-memory with zero I/O operations involved during the process. The state would be updated as a result of the last stream processor in the topology. As it turns out, that is a BIG deal!
As event streams form a topology from the source processor to the sink processor, a rich set of features would be needed to perform stream operations. A stream operation could be a simple join of two streams based on a key, or it could be series of operations with a combination of functions like aggregation, reassignment of key values, merging streams and producing the resultant system state. Depending on the use-case, the streaming data aggregations could be windowed. For instance, a use-case may only be interested in knowing the number of vehicles that drove through a toll gate for the past four hours. A well-built streaming platform would provide a utility that would flush any data outside of the time-window, and then aggregate active data to update current system “state”. Streaming systems also need support for data structures that can store and update the system’s state in real-time. Any streaming platform would ensure the operations needed to update the “state” are built-in and highly efficient. It would also need a robust mechanism to connect to external systems that produce and consume events.
Thus, a Streaming Platform would essentially provide support for the following
- Data structures that store the events, carry the events through stream processors, and data structures that hold the “state”
- A support for building a stream processor topology consisting of source stream processors, intermediate stream processors, and sink stream processor
- Rich set of features to support operations to be performed on the data in-flight
- Support for connecting to event sources and event sinks
An essential part of an Event Streaming platform is to provide a robust set of adapters that can connect with a variety of data sources. In today’s omnichannel touchpoints for an application, any lack of connectivity is equivalent to loss of data. A central theme of the event streaming system is the processing of data while it is in motion. A streaming platform would have a rich set of features to create in-memory aggregation windows through which events could be correlated and the system state is stored. You would also be able to create a dashboards or other utility applications using the data available from the system state. The System’s cache, where the state is stored, acts as a repository of real-time data, and a Streaming Platform may provide various mechanisms through which the data can be subscribed to. In the Event Stream Processing figure above, the REST API of the streaming platform is leveraged to create a real-time dashboard.
While the Event Streaming is evolving, it has lately also become an overloaded term. One could be referring to the messaging ability of a streaming platform as a messaging solution, while others could be considering the potential of streaming platforms to design a complex application. One could also be specifically talking about the platform’s ability to leverage it as a ETL tool using a cluster of nodes connected to Data Sinks. At Prowess, we have experience in building and maintaining Streaming Systems built on various Streaming Platforms available in market, both open source and proprietary software. Connect with us to know more.
Head, Integration-CoE, Prowess Software Services