Monolith vs. Microservice vs. Serverless
- The question of whether to build a monolith, microservice, or serverless architecture for a next service is complex and depends on the technology and stack chosen, as each has its own strengths and weaknesses 27s.
- Monoliths are efficient and performant because everything runs in-process, while microservices allow for ownership and scalability, and serverless provides easy deployment and scalability 1m29s.
- The growing complexity in modern software systems is a major challenge for software developers, and finding a way to regain control without losing the benefits of different technologies is crucial 1m33s.
- When choosing a technology, developers must trade off between three pillars: how to code, how to deploy code, and how to run or operate code 2m6s.
Wix's Scale and Engineering Challenges
- Arian is the VP of Engineering at Wix, a leading website builder platform that provides a range of solutions, including e-commerce, events, and booking, and allows users to write code and use serverless technology 3m10s.
- Wix has over 250 million website builders using its platform, 7% of the internet's websites run on its platform, and it has a billion human users visiting its websites, with 4,000 microservices clusters across three data centers 4m5s.
- As engineers, it is their responsibility to deliver business value, and successful companies need to provide high-quality code fast to beat their competition 4m49s.
Monolithic Architecture
- Building a monolith can be complicated, as it involves building a single, large application 5m21s.
- A monolithic service is simple to start with, as everything runs in the same process, making life easier for developers, especially for startups, with a single service to manage 5m23s.
- The pros of a monolithic service include easy coding, accessibility, and testing, as well as simple topology, but as it scales, it can lead to mixing domains, spaghetti code, and difficulties in synchronizing between teams 5m45s.
Microservices and Serverless Architectures
- Microservices and serverless systems are more complex, with distributed systems and indirect dependencies, making it harder to manage and test, but allowing for single responsibility and easier deployment 6m26s.
- The trade-offs of microservices and serverless systems include complexity, cross-cutting concerns, and difficulties in refactoring and breaking APIs, but they also offer clear ownership and scalability 7m38s.
The Ideal System and the Task Management Example
- The goal is to find a system that solves most of the issues with code, deploy, and run, and to build something that tackles each pillar and figures out how to build something else 8m39s.
- Dana, a new developer, is tasked with writing a simple task management system, which requires domain modeling, API design, request flow, authentication, authorization, input validation, and object mapping 9m12s.
- The task management system is a simple example, but it still requires consideration of various aspects, such as request flow, authentication, and input validation, making it a complex task 9m35s.
- A developer, Daniel, is tasked with building a simple task management system but faces numerous challenges, including figuring out APIs, database connections, RPC calls, secret fetching, data access, domain events, error handling, GDPR, PII, caching, logging, and testing 10m49s.
- Daniel becomes frustrated with the complexity of the task and the numerous considerations required to complete it 11m27s.
Frameworks and Best Practices
- The solution to Daniel's problem is to use best practices for each of the challenges, which are documented in a framework 12m19s.
- The framework provides recommendations and documentation for using identity, understanding GDPR laws, sending webhooks, and communicating with databases 12m41s.
- Despite the framework's help, Daniel is still overwhelmed, leading to the realization that increasing programmer productivity is not about writing more lines of code but about eliminating unnecessary code 13m39s.
The Journey to Code Faster
- The goal is to eliminate 80% of the code that needs to be written for an app, and the fastest way to write code is to not write it at all 14m2s.
- A journey to code faster in a complex environment began four years ago, involving weekly meetings with the CEO to review thousands of lines of code and identify unnecessary lines 14m29s.
- The focus is on removing lines of code that do not contribute to business logic, as that is what developers are paid to build 15m5s.
Platform as a Runtime (PaaR)
- The solution to removing unnecessary code is to build a platform as a runtime (PaaR) 15m14s.
- A platform called Nile was built to codify guidelines and best practices, allowing developers to work within the platform and avoid cross-cutting concerns, with features such as a framework and search integration 15m17s.
- The Nile platform automates tasks such as indexing and updating documents in Elastic Search, eliminating the need for developers to annotate domain objects 15m55s.
- The platform was built to help with coding and is one of the pillars of the development process, with the goal of making development faster and more efficient 16m25s.
Serverless Platform Development
- In parallel with building the Nile platform, a serverless platform was also developed to provide a customized solution for the company's needs 16m36s.
- The serverless platform was built to address the issue of having a large footprint in production environments, where most of the code running is not written by the company, but rather consists of frameworks and libraries 18m21s.
- The company's software stack typically consists of virtual machines, containers, microservice frameworks, internal frameworks, and business logic, with the goal of building a Pyramid of software development 17m17s.
- Microservices are packaged with frameworks, libraries, and business logic, but most of the code running is not written by the company, which can become a problem when running thousands of microservices 18m19s.
- The life cycle of frameworks and libraries is tied to the product life cycle, making it difficult to update and manage multiple versions in production 19m7s.
- The company faces challenges in updating and patching common libraries and frameworks, which can be a long and complicated process, especially when dealing with thousands of microservices 19m38s.
- Many companies have services without owners, and supporting multiple languages can be an issue, requiring duplication of platforms and frameworks, which is costly and time-consuming 20m10s.
The Ideal Platform and Wix's Approach
- The ideal platform would have easy and fast code, minimal integration tests, and no boilerplate, with fast deployment, scalability, and low cost 20m49s.
- Every system and stack has pros and cons, and the goal is to take the best of microservices, serverless, monoliths, and managed platforms 21m27s.
- Wix is a managed platform for users, allowing them to write code on top of their website, with Wix handling deployment, Kubernetes, and database provisioning 21m45s.
First Attempt at PaaR with Node.js
- The concept of "Platform as a Runtime" (PaaR) was developed to address these issues, with the goal of building a platform that can run multiple languages and frameworks 21m55s.
- The first attempt at PaaR used Node.js, with an application framework, service integration layer, and data services layer, allowing users to build their business logic on top 23m3s.
- Node.js was chosen for its lightweight nature, dynamic code loading, and ease of learning, and the application framework handles HTTP headers, authentication, and monitoring 23m39s.
- The service integration layer provides RPC clients and libraries, allowing users to integrate their applications without doing lookups or integration 24m5s.
- The data services layer uses DynamoDB as a key-value store, and the platform is packaged and put on the cloud, with user code deployed on top 24m33s.
- Code can be loaded dynamically into a platform without the need for the entire platform, just the interfaces, allowing for a trusted environment where multiple functions or small services can be added to the same container 25m5s.
- This approach differs from AWS Lambda, which is a non-trusted environment requiring everything to be packaged together, making it impossible to share the platform between different services 25m19s.
- The platform has no integrations as they are provided by the platform itself, resulting in less testing, faster deployment, and zero boilerplate code 26m16s.
- Developers can focus on small functions without the overhead of large packages and frameworks, making deployment very fast 26m27s.
Improved Developer Experience
- The developer experience is improved by taking concepts from a previous project, Nile, and re-evaluating every line of code to determine if it should be provided by the platform 26m49s.
- A developer, Dana, has a change request to add an API to retrieve task details and assigned persons, and write an audit log to a database, which can be done as a separate service in a serverless function world 27m9s.
- Dana only needs to write a small amount of code to import the task server, expose a new API endpoint, call the task server, extract the contact ID, call the contact server, and return the task and contact details 28m35s.
- Adding a Kafka consumer is also simplified, as Dana can get the Kafka consumer and data source from the context without writing any connection handling code 29m20s.
- A developer, Dana, can write code without having to worry about boilerplate code, and once she pushes the code, it's running on production within one to two minutes, thanks to the platform's minimal code and small deployable size 29m52s.
Optimizing the Run Pillar
- The deployment pillar is built as a platform as a service, and the concept is being applied to the Run pillar to optimize it 30m32s.
- The Run pillar uses a serverless node runtime, allowing for ownership and control, even with a serverless strategy, by giving containers to each team or business unit 31m3s.
- The platform can handle functions running in-process and dynamically loaded into the platform, making it easy for developers like Dana to write code without worrying about the underlying infrastructure 31m57s.
- The platform can scale by putting functions on another container and load balancing between them, and it can also optimize function affinity by deploying frequently called functions in-process 32m19s.
- The platform can optimize the function affinity by deploying frequently called functions in-process, replacing network calls with in-process calls, and reducing latency 32m43s.
- The platform's deployment strategy allows for small functions to be deployed without the framework, a single version of frameworks and libraries, and decouples the life cycle of frameworks and libraries from the life cycle of products 33m28s.
- The platform's deployment strategy also allows for easy deployment of the platform, without requiring a cross-company effort, and enables all teams to get the updated platform at once 33m55s.
Adding Multi-Language Support
- The platform is missing the ability to add an additional language, which is being worked on in version two, to support languages other than TypeScript, such as Scala 34m27s.
- A lot of progress has been made on the platform, with thousands of functions running for two years, and developers love it, with the next step being the Wix single runtime 34m51s.
Wix Single Runtime
- The Wix single runtime involves tradeoffs, including switching from in-process to out-of-process calls, with the product teams building business logic using an SDK that communicates with the host platform 35m11s.
- The architecture consists of a demon set with one container for incoming and outgoing calls, and every function or microservice runs on its own pod on the same machine, with local host communication being faster than network communication 35m56s.
- The tradeoff of using out-of-process calls is offset by gaining the Kubernetes ecosystem for deployment and autoscaling, which would have had to be reinvented otherwise 36m51s.
- The solution has a larger footprint than the previous TypeScript solution but is still 50% lower than packaging the entire framework inside a microservice, with a JVM footprint 37m12s.
- Benchmarks showed a performance loss of about 2 milliseconds compared to an in-process microservice, which is tolerable, and it's believed that this loss can be gained back in a distributed system 37m39s.
- The deployment strategy involves local host communication, which can potentially offset the 2-millisecond overhead, and performance tests showed that up to 15,000 RPMs, the overhead is about 2 milliseconds 38m40s.
- Services with millions of RPMs may not be suitable for this solution and can be packaged as standalone microservices instead 39m3s.
- The work setup has been changed to allow packaging the platform inside a regular microservice with just a flag on the build system, and developers still don't need to change their code 39m26s.
- A high-scale system can be packaged together with a standard framework, allowing for relatively low RPMs, and can be used for systems with up to 30k-50k RPMs, which is still a lot, and provides cost savings and a single framework for fast deployment of security changes or legal constraints 39m41s.
Benefits of the Platform
- The platform takes care of GDPR for users and supports multiple languages because it switches from an in-process to an auto-process, requiring only the investment of writing an SDK to support a different language, which is much cheaper than keeping feature parity between frameworks 40m17s.
- The platform is coded, deployed as a service, and run as a bar or as a virtual monolith, taking the best of all worlds and creating a platform as a runtime, allowing for business value to be brought fast 40m55s.
- The improved velocity of developers increased by 50 to 80% due to writing less code, and costs on compute can be reduced by about 50% because more density can be pushed into a single node, with the footprint of a single service being about 50% smaller 41m31s.
- The platform allows for faster deployment of security changes or legal constraints, such as GDPR, and provides support for multiple languages, making it a cost-effective solution 40m11s.
Backward Compatibility and Platform Evolution
- When a breaking change occurs, backward compatibility is kept, and proxies are created to change the previous API to the new API, ensuring a smooth transition 43m41s.
- The platform's framework is designed to keep up with architectural requirements, and changes can be made to accommodate different types of requirements, such as dedicated different types of services 44m25s.
Opinionated System and Developer Freedom
- A highly opinionated system is necessary for scaling, which means limiting the freedoms of developers to ensure consistency across different stacks of technologies 44m46s.
- The platform provides a set of tools and technologies that developers are expected to use, similar to how AWS or Google Cloud provide their own set of tools and technologies 45m7s.
- While the platform provides a lot of value to developers, there are cases where developers need to opt out of the platform, and in such cases, they should first try to solve their issues within the platform 45m34s.
- The platform is built with layers, allowing developers to take the lower levels of the platform and use the same core libraries, but without the synthetic sugars and automations 45m41s.
- In rare cases, developers may need to use a different technology stack, such as the media platform at Wix, which requires video encoding and image manipulations and uses a different platform 46m1s.
- For most cases (80-90%), developers should stick with the platform, as it provides a lot of value and makes development easier 46m41s.
- Developers tend to try to stay within the platform and avoid opting out because they realize the value it provides 46m54s.
Testing and Local Runtime
- The concept of using a boring architecture is similar to the idea of using a platform, as it provides a set of proven and reliable technologies 47m6s.
- Testing functions and APIs can be done without having the whole environment, and Wix provides a local runtime for testing, as well as the option to test on production using a test tenant 48m3s.
- Wix's multi-tenant system allows developers to create their own test tenant and test end-to-end flows without corrupting other tenants 48m30s.
- The deploy preview feature, part of Wix's CD/CI system, allows developers to deploy and test their code in a production-like environment 48m43s.
- In a preview GA, a specific artifact is deployed but doesn't receive any traffic, unless a test request is made with a special header, which routes the test calls to the new deploy artifact, allowing it to interact with other artifacts and APIs without corrupting data 48m52s.
- The platform prevents users from making tenancy mistakes by injecting the tenant ID into queries, which are not allowed to be tampered with, and the platform injects the tenant ID from the authentication headers 49m27s.
- The system is designed to be fairly safe, preventing users from corrupting any data, thanks to the platform's handling of tenancy and injection of tenant IDs into queries 49m44s.








