Introduction to Event-Driven Architecture
- The discussion begins with an assessment of the audience's experience in building event-driven architectures, with a focus on those in highly regulated industries such as banking and healthcare 10s.
- Event-driven architectures are explained, starting with the definition of an event, which is a change in state somewhere in the system, potentially caused by user actions, background tasks, or external entities, and may carry data or be a simple notification 2m6s.
- The concept of events is further clarified, with a distinction made between "fat events" that carry data and "thin events" that are simply notifications, and a recommendation to aim for "lean" events that include only relevant data 4m30s.
- A key differentiation is made between commands and events, with commands being explicit requests for action and events being notifications of something that has happened, without expectation of a response or result 6m20s.
- Event-driven architectures are defined as systems that combine multiple components reacting to events, typically consisting of producers that publish events and consumers that receive them 8m40s.
Event Types and Commands
- A brief mention is made of event sourcing, which is often confused with event-driven architectures, but is a distinct concept that should be understood and communicated to teams 10m50s.
- Event sourcing is not a requirement for event-driven architecture, but rather a way to represent the state of an application as an immutable sequence of events, which can be complex to apply and takes time to learn 10s.
- Event-driven architecture and event sourcing often come hand in hand because if event sourcing is done, adding event subscription becomes easier, but it is essential to understand that event sourcing is not necessary for event-driven architecture 2m6s.
Cloud Native and Banking Context
- Cloud native refers to designing, constructing, and operating workloads in the cloud using modern engineering practices, such as microservices and modular monoliths, and deploying them using modern DevOps principles and CI/CD practices 4m6s.
- Banking refers to large, slow, and highly regulated organizations that are responsible for keeping customers' cash safe, and they often tend to be hesitant to adopt modern principles, but some banks, like Investec, are more modern and agile 6m6s.
- Event-driven architecture is desired in banking for various reasons, including decoupling, which is essential in use cases like transaction monitoring, where the system needs to monitor client accounts for suspicious activity without being tightly coupled to the payment system 8m6s.
- Payments at a bank are highly regulated, with regulations such as PSD2, and must be built with reliability at their core, with transaction monitoring happening behind the scenes, and this can be achieved by decoupling payments and transaction monitoring systems 10s.
Benefits of Event-Driven Architecture in Banking
- By moving to an event-driven architecture, payments and transaction monitoring can be split, allowing payments to focus on its flow and publish events, such as a payment being initiated or processed, without knowing that transaction monitoring exists 42s.
- The decoupling of these two systems is a very important benefit, as it allows transaction monitoring to be independent and pull events from the payments system without taking it down if it fails 2m6s.
- The use of an event-driven architecture also provides an immutable activity log of the events powering a payment, allowing the bank to see the flow of a payment and track its progress, with events such as a payment being initiated, a fraud check being completed, or a payment being processed 2m6s.
- The event-driven model also enables fan-out, where a single event can trigger multiple actions, such as updating payment limits and sending communications, allowing the bank to perform multiple tasks off the back of a single payment 2m6s.
Efficient Task Handling with Event-Driven Architecture
- Event-driven architecture can solve the problem of handling multiple tasks, such as updating payment limits and sending communications, in a more efficient and independent manner, allowing each process to handle its own faults and retries without affecting others 10s.
- A simple event fan-out can be used to trigger multiple independent processes, such as client communications and payment limit services, which can work independently without needing to know about each other's operations 42s.
- Fault tolerance is a huge benefit of event-driven architecture, especially in highly regulated industries where systems must be tolerant to all faults, and it allows for handling faults in three places: transient retries, eventing technology, and dead lettering 2m6s.
Fault Tolerance in Event-Driven Systems
- Transient retries can be customized to retry failed operations a certain number of times with a bit of jitter, and the asynchronous nature of event-driven architecture allows for extending these retries longer than usual 2m6s.
- If transient retries fail, the system can back off to eventing technology, such as Kinesis, Azure event hubs, or managed CFKA instance, to retry the operation with a longer backoff period 2m6s.
- Dead lettering is used to handle poisonous messages or events that cannot be processed, and it allows for alerting a human to replay the event if necessary, providing a third level of fault tolerance 2m6s.
- Event-driven architecture has been beneficial in highly regulated use cases, providing a way to handle faults and retries independently, and allowing for customization of fault tolerance based on the domain and use case 2m6s.
Plug and Play and Developer Enablement
- The concept of "plug and play" in event-driven systems allows for the integration of new capabilities, such as a rewards system, without disrupting existing platforms like payments and accounts, as long as well-defined events are published, 10s.
- Event-driven architectures can be challenging for people who have not worked with them before, requiring architects and engineers to learn new concepts and patterns, which can take around 6 months for new joiners to get up to speed, 2m6s.
- The difficulties faced by people working with event-driven architectures can be mitigated with the help of a developer platform, which can provide service templates, application modules, and other artifacts to make it easier for engineers to get started, 2m6s.
- Having a developer platform with event-driven artifacts is not enough, and it is also essential to train people on how to use these tools and understand the underlying architecture to avoid problems when the system is in production, 2m6s.
- To address the challenges of event-driven architectures, it is recommended to create a developer platform, provide training and enablement, and use application modules to take away common problems, allowing multiple teams to build out these architectures more easily, 2m6s.
Event-Driven System Implementation and Challenges
- An event-driven system was designed and built with a team, which resulted in a working system by the end of a 5-day period, and this approach is considered a shift from traditional training methods where teams are simply given documentation or videos to learn from 10s.
- Aligning on standards and principles across the organization is crucial, and it is recommended to define event contracts, permissions models, and technology drivers as early as possible to avoid inconsistencies and ensure pace 2m6s.
- In highly regulated industries such as banking, duplicating or losing events can have severe consequences, and it is essential to design and build systems that prevent these issues from occurring 4m42s.
Patterns for Event Reliability
- To address the problem of duplicating or losing events, two patterns can be used: inbox patterns and outbox patterns, which should be built into the developer platform and frameworks to provide a safe and reliable way of handling events 6m15s.
- The outbox pattern protects against losing events when publishing them, and it involves saving the event to an outbox table within the same transactional boundary as the state update, and then using a dispatcher pattern to publish the event 8m30s.
- The outbox pattern ensures that the event is not lost in case of a failure, and it provides a way to recover and retry the publication of the event, which is critical in industries where event loss can have significant consequences 10m50s.
- Event-driven systems in banking can fail due to issues such as duplicating events, and to handle this, an inbox can be used on the consumer side to store event IDs and data, allowing the system to check for duplicate events and avoid processing them multiple times 10s.
- The inbox helps to protect the system from duplicate events by storing the event ID and data, and then checking if the event has been seen before, if so, it will not be processed again, thus avoiding issues with at least once delivery eventing technology 42s.
Event Contract Management and Versioning
- Breaking event contracts is a painful issue to deal with, as events are a contract that has been promised to the world and cannot be taken back, and any changes to the event data can cause consumers to fail, making remediation a difficult process 2m6s.
- To avoid breaking event contracts, it is essential to design events carefully, considering them like API contracts, and being aware that any property added to the contract is permanent and cannot be removed without causing breaking changes 2m6s.
- Versioning events, like APIs, can help to avoid breaking changes, by adding a data version property to the event, allowing consumers to handle different versions of the event, and safely replay events from the beginning of time 4m30s.
Domain and Integration Event Modeling
- Separating domain and integration events can also help, by drawing bounded contexts and domains, and identifying internal events within the domain, such as payments, which can have their own event-driven architecture 6m40s.
- Modeling integration events is crucial, as it allows protection from bleeding domain concepts and enables changes to be made to domains without being contractually tied to specific concepts 10s.
- Event ordering is not immediately considered in cloud native eventing technology, which prioritizes scale over order, and retries do not care about the order of events, but introducing ordering carries more risk, such as allowing a client to make two $1 million payments because the balance had not been updated 2m6s.
Event Ordering and Scalability Trade-offs
- There are two approaches to event ordering: bringing in an order by stamping events with a version property, which enforces ordering within an inbox pattern, and introducing implicit ordering, where the domain handles the types of events it can process without necessarily processing them one after another 4m30s.
- The first approach, using version stamps, can enforce ordering but makes the system less scalable, as it essentially builds a queue into the event-driven architecture, and this approach is used in some bank implementations where ordering is necessary 6m20s.
- The second approach, implicit ordering, relies on domain validation to handle the types of events that can be processed, and this approach is also used in some bank platforms, where ordering version stamps on events have not been necessary 8m10s.
- Bringing all the concepts together, including domain and integration events, into a banking use case with payments and communications, requires considering the trade-offs between scalability and ordering, and using the appropriate approach depending on the specific requirements of the system 10m40s.
Event Flow and System Design in Banking
- The event-driven system flow starts with an API, where a payment is created, and the event is saved to the payments database and the outbox to avoid losing the event, with the outbox also handling internal domain event handling 10s.
- The system has an event handler with an inbox that prevents processing the same event multiple times, and the event is named in a verbose manner, such as "Swift FPS payment processed suite", due to owning the domain 42s.
- The integration event publisher is responsible for filtering, aggregating, and transforming domain events into integration events, and it protects the boundary of the domain by only publishing specific properties of the domain event 1m6s.
- The publisher ensures that not all domain events become integration events, and it can also aggregate multiple domain events into a single integration event, with the transformation process removing domain-specific language 1m30s.
- The integration event handler receives the published event, such as "payment processed", and handles it, with another inbox protecting against multiple processing, and the system then proceeds to perform its work after the inbox has protected it 2m6s.
Event Transformation and Protection
- The system can also transform domain events, such as "SMS delivered", into integration events, like "communication sent", and this flow is deliberately designed to provide protections for event-driven architectures in highly regulated industries 2m30s.
- By building these protections into the developer platform, teams have been able to avoid common problems and build platforms in the cloud without running into issues, and the system uses a versioning approach, such as version 1, 2, 3, etc., to stamp events, with the understanding that this approach may introduce issues with competing updates 4m0s.
Challenges and Risk Management in Event Streams
- Event-driven systems in banking face challenges, such as scaling issues when dealing with multiple events having the same version number, which can lead to problems 10s.
- Proving completeness on an event stream is a requirement for auditing purposes, but it can be difficult to grasp philosophically, especially when dealing with an infinite stream of events, and having an immutable log of events can be sufficient in some cases 1m4s.
- In industries where privacy compliance and auditing are a priority, avoiding event duplication is crucial, and trade-offs must be made, such as using an inbox pattern or relying on idempotency logic in downstream systems 2m6s.
- To manage the risk of duplicate events, either idempotency logic can be implemented in downstream systems or an inbox pattern can be used, and ideally, both approaches should be combined 2m6s.
Event Stream Management and Filtering
- In complex systems with multiple integration events, managing who can create these events is essential to avoid chaos, and having layers of filtering, such as domain-level events, integration events, and platform-level events, can help reduce noise 4m30s.
- Having multiple levels of events, including public events within an organization, can help filter and aggregate events, making it easier to manage and reduce noise in the system 5m40s.
- To further manage event streams, topics can be exposed from a platform or domain level, allowing for more targeted and less noisy event streams 6m50s.
- The concept of using specific topics for events is discussed to avoid having to ignore a large percentage of events that are not relevant, and various approaches are considered 10s.
Event Design Principles and Best Practices
- Lean events are recommended as a middle ground between fat events, which carry the entire entity state, and thin events, which are just notifications, to minimize the need for additional data retrieval and reduce coupling 42s.
- The use of lean events is preferred because they include all the necessary data for a particular event, making it less likely that additional data will need to be retrieved from an API, and this approach is considered a good design principle 2m6s.








