YouTube video summary

Monorepos: beyond the Technicalities

Technology16 Dec 202414 min summaryFrom InfoQ
Monorepos: beyond the Technicalities
InfoQ
YouTube

Introduction to Monorepos and Polyrepos

  • Monorepos and polyrepos are two approaches to organizing code, with monorepos being a single repository that produces multiple artifacts, and polyrepos being multiple repositories that produce a single artifact or multiple artifacts with complex dependency relationships 10s.
  • A polyrepo setting can be identified by multiple repositories with clearly defined dependency relationships that work together to produce a single artifact, or multiple repositories with complex dependency relationships and code sharing that produce multiple artifacts 1m18s.
  • A monorepo setting can be identified by a single repository that produces multiple artifacts, with code sharing between internal modules 2m18s.
  • The key factor in determining whether a setting is a polyrepo or monorepo is the way code is shared between repositories, with polyrepos having code sharing between multiple repositories and monorepos having code sharing within a single repository 3m36s.

Characteristics of Monorepos and Polyrepos

  • Monolithic applications are not always the result of monorepos, and simply putting code together in the same repository does not necessarily make it a monorepo 4m20s.
  • A monorepo requires well-defined relationships between internal modules, allowing for a clear diagram of dependency relationships to be drawn 4m45s.
  • There are also cases where it is not clear whether a setting is a polyrepo or monorepo, such as when multiple repositories produce a single artifact without code sharing, or when a single repository has internal modules that do not share code 2m33s.
  • Most real-world setups are a mix of polyrepo and monorepo characteristics, making it difficult to categorize them as one or the other 3m11s.
  • Modules in a monorepo are part of the same build system, similar to Maven builds in Java or Go, producing a single artifact or multiple smaller artifacts that may be important for publication or consumption 5m4s.
  • A monorepo is not necessarily big or messy, but rather a repository that contains code producing more than one interesting artifact after the build, such as publishing multiple Docker images from one single repo 5m57s.
  • Code sharing is also a key aspect of monorepos, allowing a single module to be reused by final artifacts 6m18s.
  • It's rare to see a company with everything inside the same repository, but many companies have monorepos in the sense that they have a single repository publishing multiple artifacts 6m44s.
  • Using the given definition, it can be said that everyone has a monorepo, as many repositories have multiple libraries that get published 7m21s.

Discussion on Monorepo vs Polyrepo

  • This session aims to answer whether one should have a monorepo by discussing codebase structure, team operations, software development, and code reuse 7m52s.
  • The discussion will be based on general terms and oversimplified examples, as it's hard to capture the uniqueness of each company or organization 8m39s.
  • The conversation will also touch on how people interact to produce software, which can easily become about management and separation 9m31s.
  • The structure of a company's codebase may closely mirror the structure of its teams and management, so it's essential to consider this when discussing monorepos or polyrepos 9m42s.
  • A typical example of a codebase consists of multiple microservices, backend-for-frontend repos, frontend applications, common code repos, and an end-to-end tests repository, which many people can relate to, although it's oversimplified 9m53s.
  • When presenting a simple example, people often have different opinions on what should be done, which can hinder the conversation about whether to pursue a monorepo or polyrepo setting 10m41s.
  • To have a productive conversation, it's crucial to get everyone moving in the same direction, exposing them to the same problems and understanding the implications of library building, Ops Team deployment, and other factors 11m20s.
  • The conversation about monorepos or polyrepos is slow, lengthy, and complex, and cannot be decided in a single meeting with stakeholders or architects 11m53s.
  • The uniqueness of a company's operating dynamics and team alignment come before choosing a monorepo or polyrepo approach, and it's essential to document the decision-making process and ensure everyone understands why a particular approach is chosen 12m26s.

Team Dynamics and Codebase Structure

  • Breaking down a codebase into teams can reveal affinities between repos and help understand how people should work together, but it's also important to consider the unique operating dynamics of each organization 13m7s.
  • Different teams may have different names, operating dynamics, and collaboration styles, which can affect how they work on repos and tests 13m34s.
  • It's common for multiple teams to collaborate on the same repo or end-to-end tests, and everyone may own the tests, but it's essential to consider the unique aspects of each organization's operations 14m3s.
  • Different teams within an organization may have varying needs and priorities, such as a focused team for security and another for tests, which can inform decisions about monorepos and polyrepos 14m35s.
  • As decision-makers, individuals have a better understanding of their team's operations and can evaluate the benefits of monorepos and polyrepos for specific parts of their organization 14m50s.
  • Teams can choose to use monorepos for certain parts of their organization and polyrepos for others, rather than having to choose one approach exclusively 15m20s.

Dependency Management and Complexity

  • The dependency relationships between repositories can be complex and may involve multiple layers of dependencies, including common libraries and microservices 15m39s.
  • There are different ways to understand and visualize these dependency relationships, and teams may have varying approaches to managing them 15m51s.
  • In addition to internal dependencies, teams must also consider third-party dependencies and versioning, which can add complexity to the development process 16m42s.
  • Some teams may choose to version and publish their own libraries, which can be consumed by their services, but this can also create challenges in terms of dependency management 16m55s.
  • To address these challenges, teams may adopt strategies such as updating all services to use the same version of a library whenever a new release is published, or automating the process of publishing changes to services 17m22s.
  • Ultimately, the complexities of software development and dependency management can be addressed in various ways, and teams must find the approach that works best for their specific needs 17m55s.

Managing Dependencies in Polyrepos

  • Managing dependencies in a polyrepo setting can be difficult, especially when dealing with third-party dependencies, as it requires managing complexity, upgrading, tracing, and ensuring reliability 18m58s.
  • Making software reusable by others adds another layer of complexity, as it requires considering who is using the software and managing the impact of changes 19m7s.
  • A polyrepo is typically understood as a small repository containing software for a specific purpose, producing a single artifact or a few, and the responsibility of upgrading dependencies lies with the users 19m37s.

Advantages of Monorepos

  • A monorepo, on the other hand, contains multiple modules that can or cannot be related, and code is reused internally by pointing to local artifacts 20m3s.
  • Monorepos can have fast builds if set up correctly, as only the necessary parts need to be built, without requiring a full rebuild of the entire repository 20m22s.
  • The concepts of upstream and downstream are important in understanding the relationships between repositories and artifacts, with upstream referring to libraries and services used, and downstream referring to libraries, Docker images, and services that depend on the repository 21m11s.

Upstream and Downstream Dependencies

  • In a polyrepo setting, changes to a repository can have implications for multiple downstream artifacts, making it essential to manage these dependencies carefully 21m57s.
  • Publishing libraries using semantic versioning is a common setup for sharing code, but it can lead to a chain of updates and version publications, which can be wasteful if not managed efficiently 22m11s.
  • Using a monorepo can potentially simplify this process by eliminating the need for multiple version publications and allowing for more efficient management of dependencies 23m16s.

Making Changes in a Monorepo

  • In a monorepo, every component is in a module inside the same repository under the same build system, allowing for changes to be made and released together 23m19s.
  • This approach can be beneficial, but it also means that making big changes can take weeks, especially for major version bumps, as all changes must be made at once 23m52s.
  • The responsibility for updating downstream code in a monorepo depends on the team's dynamics and preferences, and this decision has an impact on whether a monorepo should be used 24m26s.
  • In a monorepo, introducing changes can be more complex, as the team making the changes may need to ask for help updating downstream code, but once done, the new artifacts are deployed and aligned for everyone 24m57s.

Monorepos and Polyrepos as Complementary Techniques

  • Monorepos and polyrepos are not mutually exclusive, but rather complementary techniques that can coexist in the same codebase, depending on the organization's needs and structure 25m21s.
  • The decision to use a monorepo or polyrepo depends on the specific needs of each part of the organization, with some teams requiring independence and others needing alignment and synchronicity 25m58s.
  • Some parts of an organization may be better suited to a monorepo, where alignment and synchronicity are crucial, while others may be better suited to a polyrepo, where independence is necessary 26m28s.
  • The choice between monorepo and polyrepo is not a one-size-fits-all solution, but rather a decision that depends on the specific context and needs of each organization 26m43s.

Example: Apache Key Tools Project

  • The Apache Key Tools project is an example of a large monorepo, with 200 packages, a custom build system, and a package.json file for each package, which defines how to build it and its relationships with other modules 27m11s.
  • A monorepo is used to build a section, allowing the selection of the exact part of the tree to be built, with almost 50 artifacts coming from the monorepo, including Docker images, VS Code extensions, and Maven modules and applications 27m56s.
  • Standardized script names are used, with each package having "build Dev" and "build prod" commands, and standalone-developed packages having a "start" command, to put everyone under the same build system 28m22s.
  • Configuration is done through environment variables, borrowing from the 12 Factor App Manifesto, and an internal tool manages a large amount of environment variables to configure things like logo pass, optimizer, minifier, and tests 28m40s.
  • Environment variables are used to make references to other packages, and symbolic links and definitions in package.json are used to safely reference dependencies 29m7s.
  • A system is in place to prevent mistakes during builds by only referencing declared dependencies, and partial builds of the monorepo are possible in PR checks, depending on the files changed 29m34s.
  • The ability to partially build the monorepo in PR checks is helpful, with a script figuring out what packages need to be rebuilt and retested, and the slowest part of a run taking 16 minutes 29m59s.
  • The monorepo can scale well, with the ability to split builds into partitions and sections of the tree, and it supports multiple languages, including Java, TypeScript, Go, and container images 30m36s.
  • Sparse checkout ability is available, allowing users to clone the repo and select only a portion of it, even if the repo gets very big 31m20s.

Challenges and Improvements in the Apache Key Tools Project

  • Challenges exist, including the lack of a user manual, with knowledge currently residing in people's heads and private messages, but a user manual is being written 31m46s.
  • Improvements are being made to the development experience for Maven-based packages, including better importation in IDEs and accurate reference picking, with issues being highlighted when they are incorrect 32m12s.
  • A problem exists where changes to the top-level lock file are not understood by the partitioning system, affecting which model modules are impacted, but a solution is being researched and implemented 32m30s.
  • A full build will not be required for every code change, unless it's a root-level file, and a merge queue is being considered to simulate merges and prevent semantic conflicts 32m50s.
  • The merge queue will allow code to be merged automatically after passing checks, preventing breaks to the main branch and reducing conflicts 33m21s.
  • Multiple cores will be made available for each package to build, enabling parallel builds, with environment variables likely being used to control core allocation 33m42s.
  • Research is being conducted on using environment variables to configure parameters that distinguish between production and development builds, reducing duplication in build commands 34m10s.
  • The use of Turbo Repo is being explored, which includes a test runner that understands packages and files, and has caching capabilities that can speed up development and onboarding 34m26s.

Recommendations for Building a Monorepo

  • When building a monorepo, it's recommended to start small, choosing a few languages or one, and a single build tool, rather than trying to incorporate the entire codebase at once 35m15s.
  • It's also recommended to establish defaults and conventions from the beginning, even if they may not be perfect, to ensure everyone is working in the same environment and can provide feedback 35m52s.
  • When implementing a monorepo, it's essential to make the relationship between modules easy to visualize to avoid confusion and dependencies, as it can be challenging to navigate with many small modules 36m20s.
  • Be prepared to write custom tools for unique build necessities, such as dealing with network issues, old platforms, or special build requirements 36m42s.
  • Be prepared to discuss the monorepo extensively, as it's a controversial topic that may require explaining the reasoning behind it multiple times 37m18s.
  • Optimize the monorepo for development by making it easy for people to clone and start working immediately, with minimal configuration steps 37m35s.
  • A monorepo should have everything turned off by default, targeting development, local host, and no production references or dependencies 37m52s.
  • When organizing a monorepo, do not group by technology, but instead group by operating dynamics, team affinity, and how teams interact with each other 38m12s.
  • Do not compromise on quality, be thorough about decision-making, and avoid adding low-quality code to the monorepo 39m3s.
  • Avoid doing too much at once when implementing a monorepo, and prioritize the most essential features 39m43s.

Re-evaluating and Adjusting the Monorepo

  • Be open to reevaluating and adjusting the monorepo if it's not working out, incorporating feedback, learning from mistakes, and prioritizing the well-being of the development team 40m14s.

Q&A and Discussion

  • The conversation has come to a close, and the speaker is open to answering questions from the audience, with a preference for discussing topics related to build tools and coding styles in monorepos 40m50s.
  • The choice of build tool, such as Maven or Gradle, depends on the team's preference and skill set, and there is no one-size-fits-all solution 41m16s.
  • The speaker's team has struggled to transition from a Maven-based approach to a less structured one using tools like npm and JavaScript 41m42s.
  • In a polyglot monorepo, the speaker recommends using a flat structure to organize code, with a prefix for package names and minimal nesting to facilitate visualization of relationships between internal modules 42m44s.
  • The speaker suggests that coding styles should be specific to each language and can be found in the user manual for each language 43m38s.
  • A member of the audience shares their experience with using a monorepo for their entire codebase and warns about the potential drawbacks of polyglot repos, including dependency hell 43m50s.
  • The speaker acknowledges the concerns about polyglot monorepos but notes that their use case requires cross-language dependencies, and their structure allows them to navigate complex dependency trees 44m40s.
  • The speaker clarifies that their previous statement about polyglot monorepos was referring to individual repositories having multiple languages, rather than a single monorepo with multiple languages 45m31s.
  • The company has a coarse-grained dependency structure, with a large codebase, and this structure is not working out for the organization, leading to a decision to move out of it, but some modules will remain in the same repository 45m38s.
  • The company's problem was that changes to the package lock file or pnpm lock file would cause downstream things to build, and this was an issue due to the scripting system's behavior when the lock file in the root folder changed 46m15s.
  • The company's goal is to build only downstream things when a dependency changes, but the previous issue was that everything would get built when the lock file changed, due to the scripting system's understanding that a root file changed 46m35s.
  • The company has implemented a solution using the turbo repo diffing algorithm to understand which packages are affected by the dependencies that changed inside the lock file, allowing for more targeted building 46m54s.
  • The new solution enables building only downstream things when a dependency change occurs, which is a desirable outcome 47m4s.
Made with Recall · in 3 seconds

Get a summary like this for anything you read, watch or save.

Recall summarizes any link you paste, then keeps it in your personal library so you can search, chat with it, and never lose a key idea again.

YouTube videosArticlesPodcastsPDFsAnything else
Save this summary

Then save anything you watch or read next.

Bookmark this summary, then save any video, article or PDF you read next.

Save to your library

Ready to get started?

Save, summarize & chat with your content.

GET STARTED

IT'S FREE

No credit card required · 30 Day Refund on Premium · 24 Hour Support

Recall web app on laptop