A risk-based approach to open source strategy

19 Apr 2023

On it's face, using open source code is an inherently risky endeavor. We are trusting external developers to write code that we'll eventually deliver to our users, and we have no recourse if the code is buggy or malicious in some way. In practice, people are generally good and the forces that govern popularity of open source projects help reduce this risk to the point where nearly all folks in the industry are prolific consumers of open source code.

One problem with the current approach is that the evaluation of risk is something that happens once up front, but rarely thereafter. When selecting a library, folks look at alternatives for the same functionality in hopes of selecting a robust solution. They may also look at the issue trackers to validate that bugs are addressed in a timely manner. Once the selection is made, these metrics aren't revisited. This presents an issue because projects, ecosystems and communities aren't static. What was a well-maintained project, may have fallen into disrepair as the primary author finds other focuses.

That level of evaluation and re-evaluation is something that happens on a project-by-project basis. When dealing with companies with a large open-source usage footprint, this is a problem that needs to be addressed at scale. Today, we don't have a term for "the pile of open source code I rely on" beyond perhaps "dependencies". I've been thinking about it in terms of a "consumption portfolio". We consume a great deal of code. Much like a stock portfolio, there are a mix of risk profiles there and expected change trajectories. Innovation and stability are qualities that we can monitor and shape over time depending on the needs of our business, just like we might adjust our risk vs reward balance in a financial portfolio.

Moving forward, organizations interested in tracking the risk of their consumption portfolio should begin by gathering an inventory. Thanks to the push by various national governments, Software Bill of Materials (SBOMs) are becoming more mainstream. SBOMs allow organizations who depend on open source to capture their dependencies (and, transitive dependencies, depending on how they're setup). The result is a rich dataset of software composition, which we can use to drive our understanding of our consumption portfolio. There are a variety of additional tools which can increase the utility of this data, such that it benefits those interested in software composition as well as security vulnerability management, license compliance, and a host of other regulatory/compliance interests.

Once we have the data store with the relevant source data, we can begin to conduct data analysis of each of these dependencies. These dependencies each exist on a spectrum of strategic/not and popular/not. Strategic, in this context, means that it is used within a large portion of your group's applications. Because of this ubiquitous usage, any risk in these projects will have a disproportionate impact on our organization. "Popular" in this case is a placeholder for the community stability of a project. It's certainly an imperfect word for this, but the general sense is that "popular" projects have the time/people/resources they need to be healthy.

strategic/popular    |  one-off / popular
(safe/boring)        |  (legacy replacement)
                     |
                     |
---------------------+-------------------
                     |
                     |
(risky/legacy)       |  (necessary, uninteresting)
strategic/unpopular  |  one-off / unpopular

Tools which are strategic and popular may be foundational things like the Java language or the Django web framework. These have broad use within the industry and are probably not at existential risk due to lack of involvement from their users. Projects which are strategic but not well funded are risky. These are likely to be future legacy dependencies in the organization. Tools which are popular but aren't in wide use in our data set may be upcoming bets on the next "strategic" tool. Regardless, given their popularity, they are generally safe dependencies to have. Dependencies which are neither popular nor necessary are just dependencies which are in use by a few teams and don't represent any sort of broader pattern within the organization. These are safe to ignore for now, though we may look to move towards more popular replacement if they start to gain traction internally.

Strategic and unpopular dependencies, because of their ubiquity in the organization, will be expensive to move away from. We should actively be working to reduce risk in this area. We can do this is by shedding the internal use-case that this project supports, migrating to more popular alternatives, or putting in abstraction layers to reduce future switching costs. Alternatively, we could contribute resources (money, developers) to ensure the long-term health of that project's ecosystem.

If you are working on this problem, let me know. I'm looking to deepen my involvement in these areas (and I'm currently for hire).

Thank you to Alex Scammon, Van Lindberg, John Benninghoff for the underlying ideas that triggered this post. Thank you to Julia Ferraioli and Vijay Samuel for their reviews.