At work, I rolled out backstage as a service catalog offering in 2025 across a few hundred git repos. It provided a pathway to get rid of a spreadsheet hand-maintained by a developer about what each repo or service was, who owned it, what it depended on, etc. When our regular compliance reviews came around, it made me think: Can this help us here too? It turns out it could.
We are subject to multiple compliance frameworks, SOX and PCI among potential
others. Each repository has a catalog-info.yaml file which contains
information about their status and the reasoning thereof for each of these. It
looks something like this:
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
labels:
sox: "true"
pci: "false"
data-classification: "restricted"
annotations:
compliance/data-classification-justification: "Stores payroll and employee PII, including SSNs and salary data"
compliance/sox-justification: "Handles payroll data and expense reports that feed into financial statements"
compliance/pci-justification: "No payment card processing - only employee reimbursements via ACH"
We then take the labels here and use them to power automation. We have a series
of github actions which will periodically query backstage’s API every day to
find all services/repos. While the implementation is specific to your service
catalog, this is the main method we use:
def main():
"""Main entry point."""
# Get configuration from environment
backstage_url = os.environ.get("BACKSTAGE_URL")
backstage_token = os.environ.get("BACKSTAGE_TOKEN")
github_token = os.environ.get("GITHUB_TOKEN")
github_org = os.environ.get("GITHUB_ORG")
backstage_filter = os.environ.get("BACKSTAGE_FILTER", "")
dry_run = os.environ.get("DRY_RUN", "false").lower() == "true"
# Validate required environment variables
if not all([backstage_url, backstage_token, github_token, github_org]):
logger.error("Missing required environment variables")
logger.error(
"Required: BACKSTAGE_URL, BACKSTAGE_TOKEN, GITHUB_TOKEN, GITHUB_ORG"
)
sys.exit(1)
if dry_run:
logger.info("Running in DRY RUN mode - no changes will be made")
try:
# Initialize clients
backstage = BackstageClient(backstage_url, backstage_token)
syncer = PropertySyncer(github_token, github_org, dry_run)
# Fetch entities from Backstage
logger.info(
f"Fetching entities from Backstage with filter: '{backstage_filter or 'none'}'"
)
entities = backstage.fetch_entities(
backstage_filter if backstage_filter else None
)
if not entities:
logger.warning("No entities found in Backstage")
return
# Aggregate compliance by repository
logger.info("Aggregating compliance status by repository")
repo_compliance = syncer.aggregate_compliance_by_repo(entities)
if not repo_compliance:
logger.warning("No repositories found to update")
return
logger.info(f"Found {len(repo_compliance)} repositories to process")
# Sync properties to GitHub
logger.info("Syncing properties to GitHub repositories")
success_count, failure_count = syncer.sync_all(repo_compliance)
# Report results
logger.info(
f"Sync completed: {success_count} successful, {failure_count} failed"
)
if failure_count > 0:
sys.exit(1)
except Exception as e:
logger.error(f"Fatal error: {e}")
sys.exit(1)
If the services are in-scope for SOX, we add the custom property sox:true to
the repository. That allows us to set up org-wide rulesets such that any
repository which is tagged with that custom property must adhere to a ruleset
which enforces “You must get this peer reviewed” etc.
There are layers beyond this, like how we assert that everything has a
catalog-info.yaml file, how we ensure those that do have the relevant required
properties, etc. That’s a governance problem more than a technical one and might
warrant a future post.