Deepthi Sigireddi and Vitess
- Deepthi Sigireddi is the technical lead for Vitess, a cloud-native, open-source, distributed database project, and also the Vitess engineering lead at PlanetScale, a database-as-a-service company built on Vitess 1m11s.
- Deepthi's career started as an application developer working with databases, including Oracle, DB2, and MySQL, and later transitioned to working on supply chain planning solutions for the retail industry, which required handling massive amounts of data 1m36s.
- To handle large amounts of data, Deepthi and her team developed parallelizable computing solutions to work with monolithic databases, and later worked on cloud security at a startup that was acquired by IBM, where they implemented custom sharding for a multi-tenant system 1m57s.
- Deepthi joined PlanetScale and started working on Vitess, a massively scalable distributed database system built around MySQL, which solves the limitations of monolithic MySQL and provides a distributed data management capability 3m14s.
Vitess Origins and Capabilities
- Vitess was created at YouTube in 2010 to handle the large amount of traffic and video metadata stored in MySQL, which was causing the site to go down daily 4m6s.
- Vitess brings a distributed data management capability to MySQL, allowing it to scale beyond the limits of a single server and providing a high level of performance, availability, scalability, and resilience 3m52s.
- YouTube's MySQL database was unable to handle the increasing load, leading to the development of Vitess, a distributed database architecture, to solve this problem in a fundamental way 4m30s.
- Vitess allows for vertical sharding, where a set of tables is split across multiple MySQL servers, and horizontal sharding, where a single table is distributed across multiple servers, making it transparent to the application 5m22s.
- In the Vitess system, every MySQL instance has a sidecar process called VTTablet, which manages the MySQL instance, and applications interact with the system through a gateway called VTGate, which accepts requests and routes queries to the backing VTTablets 6m1s.
- The VTGate gateway can accept MySQL protocol or gRPC calls, decide how to route queries, and aggregate results if necessary before sending them back to the client application 6m18s.
Choosing a Cloud-Native Database
- When looking for a cloud-native database, developers should consider usability, compatibility with their application needs, and uptime, as these factors are crucial for a cloud database 7m12s.
- Almost every commercially available database has a cloud offering, and developers should evaluate these options based on their specific needs, considering factors such as configuration, tuning, and support for their application 7m30s.
- Compatibility is essential, as there are many cloud databases available, including Amazon RDS, Google Cloud SQL, and Oracle Cloud database offerings, and developers should choose a database that supports their application needs 8m11s.
- Uptime is also critical, as the database is often critical to operations, and developers should evaluate the historical uptime of the database before making a decision 8m36s.
- Cloud providers handle baseline requirements such as data safety, database availability, and data loss prevention, allowing users to focus on other aspects of their applications 8m51s.
- Not every application requires a massively parallel distributed database, and users should consider their specific needs before choosing a database solution 9m21s.
- In general, the only reason to not use a cloud database is due to legal restrictions on storing highly sensitive data with a third-party provider 9m52s.
- Cloud databases are a good choice for most applications, but not all applications require a distributed database, and a standalone database in the cloud can be sufficient 10m27s.
Vitess Sharding
- When choosing a database, users should consider factors such as usability, compatibility with existing applications and frameworks, and reliability 10m44s.
- Distributed databases like Vitess use sharding to provide scalability and replication to provide high availability 11m12s.
- In Vitess, sharding is customizable, allowing users to choose the column and function used for sharding, and also provides a public interface for building custom sharding functions 11m36s.
- Vitess's sharding process involves computing which shard to write data to when inserting a row and computing which shard to read data from when querying a row 12m0s.
- For queries that cannot be computed to a specific shard, Vitess uses a scatter query to send the query to all shards, gather the results, and bring them back 12m44s.
- As a database grows, it eventually reaches a point where everything starts slowing down, and managing things becomes difficult, prompting the need to shard the database 13m9s.
- Horizontal sharding is a method of sharding where data is split across multiple servers, and tools can be used to define the sharding scheme, choose the tables to shard, and copy data onto new shards 13m22s.
- The sharding process can be done in the background while the system is still running, and when ready, the database can be switched to the new sharded configuration, with the original database being stopped for a short period, typically less than 30 seconds 14m2s.
Replication and High Availability in Vitess
- An optional reverse replication can be set up to keep the old databases in sync with the new shards, allowing for rollbacks in case something goes wrong 14m32s.
- Replication is a crucial aspect of modern software applications, as downtime is not acceptable, and services need to be available 24/7 15m14s.
- Traditionally, high availability is achieved through replication, where a primary database is written to, and replicas follow and apply changes to their local databases, providing an always-available copy of the data 16m17s.
- Vitess uses replication to provide high availability and has tooling to guarantee very high availability, allowing for planned maintenance without downtime by transitioning leadership from the primary to a replica 16m44s.
- This process enables maintenance to be performed on the primary database without affecting the availability of the service 17m6s.
- Vitess uses replication for higher availability, and around that replication, features have been built to handle planned maintenance and unplanned failures, such as a primary MySQL instance going down due to memory issues, disk errors, or network problems 17m56s.
- In Vitess, a monitoring component called VTOR (Vitess Orchestrator) monitors the cluster and elects a new primary instance if the current one is unreachable, ensuring that data is not lost 18m10s.
- VTOR also monitors replication on replica instances and ensures they are replicating correctly from the cluster primary, as well as monitoring for other error conditions and fixing them 18m33s.
- When a node in the cluster is not available or needs to be down for maintenance, a replica node will take over leadership, but the data needs to be properly sharded, partitioned, and replicated 18m59s.
- The process of switching between nodes for planned maintenance is not fully automated by Vitess, but users can automate it using the VTOR and command-line interface, typically during a rolling upgrade of the cluster 19m39s.
- For unplanned failures, VTOR handles the monitoring and failover without human intervention, and the timing for this process can be as quick as 5-10 seconds 20m20s.
Distributed Transactions in Vitess
- During planned failures, applications may see a slight delay in response time, but not errors, due to request buffering, while unplanned failures may result in errors, but typically within a 30-second timeframe 20m38s.
- Distributed databases come with distributed transactions, which involve balancing data consistency and data availability, as stated by the CAP theorem, which says that if you want one, you can't have the other 21m52s.
- In a distributed system, consistency issues arise when reading data from replicas, as they may be caught up to different extents, resulting in an inconsistent view of the data 22m57s.
- Historically, Vitess has dealt with consistency issues by reading from the primary instance if consistency is important, and from replicas if it's not, but people have used tricks like always reading from the primary in user sessions that involve writing 23m24s.
- A desired feature in Vitess is to provide a way to specify read-after-write consistency, which would guarantee up-to-date data whether reading from the primary or a replica 24m13s.
- In a distributed system, there is always a possibility of a distributed transaction, but most transactions go to one shard, in which case MySQL itself provides transactional guarantees 24m49s.
- Distributed transaction problems arise when updating rows in different shards, such as in a bank transaction where both the sender and recipient's balances need to be updated 25m2s.
- To ensure data consistency, distributed transactions are necessary, but true distributed transaction support can be challenging to implement, so creative solutions like writing to a ledger and reconciling after the fact can be used to solve this problem 25m20s.
- Vitess uses a best-effort distributed transaction approach, where writes are executed one at a time, and if a failure occurs, all previous writes are rolled back, reducing the universe of possible failures 25m55s.
- Once all writes are successful, the commits are issued in parallel, and the probability of a commit being rejected is low, making this approach effective for most use cases 26m37s.
- Truly atomic distributed transactions are on the roadmap for Vitess, but the current best-effort approach has proven to be reliable 27m2s.
Schema Management in Vitess
- Schema management and data contracts are gaining attention, and Vitess has a tool called VT Admin, which provides a structured gRPC API and a UI for managing the cluster 27m21s.
- VT Admin allows users to view schemas, but schema management is not yet available through the UI, although the APIs are available for building custom tools 28m58s.
- Best practices for managing schemas and schema versions include using tools and frameworks that provide versioning and change management capabilities, and Vitess plans to improve its schema management capabilities in the future 29m20s.
- Storing schema in a version control system is recommended, as schema changes are a crucial topic in the MySQL community due to historical performance issues with large tables under load 29m38s.
- MySQL has improved its schema change capabilities, making certain changes instant to prevent locking and blocking, but the community had to develop its own tools due to the slow pace of progress 30m20s.
- The principles for online schema changes include being non-blocking, happening in the background, and allowing for reversion without data loss 30m49s.
- Vitess has an online schema change system that follows these principles, allowing for non-blocking changes that can be initiated and cut over manually or automatically 30m56s.
- Vitess's online schema changes are designed to be non-blocking, allowing the system to continue running with minimal additional load 31m17s.
Vitess Roadmap and Future Improvements
- New features on the Vitess roadmap include improved MySQL compatibility, with recent additions such as Common Table Expressions and window functions 32m16s.
- Performance improvements are also ongoing, including a new connection pooling implementation that reduces latency and memory utilization 33m8s.
- The new connection pooling implementation makes the connection pool more efficient, reducing the need to cycle through the entire pool and minimizing the number of open connections required 33m19s.
- Benchmark query performance is run every night and the results are published on a dedicated website called benchmark.us.doio, with an ongoing effort to continuously improve the benchmarks 33m42s.
- Recent functionality improvements include the second iteration of point-in-time recovery in Vitess, with plans to make further improvements based on user feedback 34m4s.
Learning More about Vitess
- To learn more about distributed databases in general or Vitess in particular, online resources include the Vitess website, which has documentation, examples, quick start guides, and links to videos 34m31s.
- Vitess is part of the Cloud Native Computing Foundation, and was donated by Google in 2018, graduating from the foundation in 2019 34m35s.
- The Vitess website provides resources for running Vitess on a laptop, within Kubernetes, and has links to videos from maintainers, community members, and users 34m49s.
- Searching for "Vitess" on YouTube yields plenty of talks and introductions to Vitess features, architecture, and diagrams 35m26s.
- Working on open-source projects like Vitess has been a positive experience, allowing for new interactions, experiences, and personal growth as a software developer 35m54s.
- Listeners can learn more about data engineering topics by checking out the AIML and data engineering community web pages on infoq.com, and listening to recent podcasts 36m41s.







