Feature Flags, Dynamic Config and Experimentation (oh my!)

15 Nov 2022

As part of the OpenFeature project, I've been thinking a bunch about feature flags. There's been ambiguity about how feature flags differ from dynamic config and if that's the same thing as experimentation (e.g. A/B testing).

At their core, feature flags are really fancy if statements. Those if statements control the behavior we want to manage for our application.

// example OpenFeature java code
if (client.getBoolean("should-show-redesign", /* default */ false)) {
  // ...
} else {
  // ...
}

That may be backed by a file or it may be backed by some giant distributed system. That's a serialization concern that's separate from the use of "gimmie this config value". This brings up the question about how this relates to config in general. Should we use something like to fetch environment variables? system properties?

From my perspective, an abstraction layer like OpenFeature can be used to hold all config. We should be able to generate a backing provider that checks locally for a config before falling back to slower, remote sources. This allows the serialization of your config to migrate to the places it makes sense at the time. For local development, environment variables may be fine. As you move into productionization and need more fine-tuning, you can migrate towards a proper feature management platform and do things like "show the redesign to users who have purchased more than $30 in the past 10 days" or other powerful targeting. This puts OpenFeature squarely into being a potential client of dynamic config systems.

When feature flags are mentioned, experimentation (and A/B tests in particular) aren't far behind. I believe that experimentation requires feature flagging. Rephrased, it's a necessary pre-condition to control a user's experience on a per-session/request basis, even if you don't expose that to users directly. Second, I think that the main difference between a "feature flag" and "experiment" is statistics and tracking. To determine if variant A is better than variant B, we need to record data about whether a user saw variant A or B. Then, we can correlate that with whether they achieved the outcome that we wanted. If we combine that with statistical rigor, we get an experiment result.

OpenFeature is decidedly not trying to solve that experimentation side. We are hoping to provide relevant extension points so that you can emit the "user saw variant A" type events. From there, you're on your own with the statistics!