Using your (Backstage) service catalog for compliance

At work, I rolled out backstage as a service catalog offering in 2025 across a few hundred git repos. It provided a pathway to get rid of a spreadsheet hand-maintained by a developer about what each repo or service was, who owned it, what it depended on, etc. When our regular compliance reviews came around, it made me think: Can this help us here too? It turns out it could.

We are subject to multiple compliance frameworks, SOX and PCI among potential others. Each repository has a catalog-info.yaml file which contains information about their status and the reasoning thereof for each of these. It looks something like this:

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  labels:
    sox: "true"
    pci: "false"
    data-classification: "restricted"
  annotations:
    compliance/data-classification-justification: "Stores payroll and employee PII, including SSNs and salary data"
    compliance/sox-justification: "Handles payroll data and expense reports that feed into financial statements"
    compliance/pci-justification: "No payment card processing - only employee reimbursements via ACH"

We then take the labels here and use them to power automation. We have a series of github actions which will periodically query backstage’s API every day to find all services/repos. While the implementation is specific to your service catalog, this is the main method we use:

def main():
    """Main entry point."""
    # Get configuration from environment
    backstage_url = os.environ.get("BACKSTAGE_URL")
    backstage_token = os.environ.get("BACKSTAGE_TOKEN")
    github_token = os.environ.get("GITHUB_TOKEN")
    github_org = os.environ.get("GITHUB_ORG")
    backstage_filter = os.environ.get("BACKSTAGE_FILTER", "")
    dry_run = os.environ.get("DRY_RUN", "false").lower() == "true"

    # Validate required environment variables
    if not all([backstage_url, backstage_token, github_token, github_org]):
        logger.error("Missing required environment variables")
        logger.error(
            "Required: BACKSTAGE_URL, BACKSTAGE_TOKEN, GITHUB_TOKEN, GITHUB_ORG"
        )
        sys.exit(1)

    if dry_run:
        logger.info("Running in DRY RUN mode - no changes will be made")

    try:
        # Initialize clients
        backstage = BackstageClient(backstage_url, backstage_token)
        syncer = PropertySyncer(github_token, github_org, dry_run)

        # Fetch entities from Backstage
        logger.info(
            f"Fetching entities from Backstage with filter: '{backstage_filter or 'none'}'"
        )
        entities = backstage.fetch_entities(
            backstage_filter if backstage_filter else None
        )

        if not entities:
            logger.warning("No entities found in Backstage")
            return

        # Aggregate compliance by repository
        logger.info("Aggregating compliance status by repository")
        repo_compliance = syncer.aggregate_compliance_by_repo(entities)

        if not repo_compliance:
            logger.warning("No repositories found to update")
            return

        logger.info(f"Found {len(repo_compliance)} repositories to process")

        # Sync properties to GitHub
        logger.info("Syncing properties to GitHub repositories")
        success_count, failure_count = syncer.sync_all(repo_compliance)

        # Report results
        logger.info(
            f"Sync completed: {success_count} successful, {failure_count} failed"
        )

        if failure_count > 0:
            sys.exit(1)

    except Exception as e:
        logger.error(f"Fatal error: {e}")
        sys.exit(1)

If the services are in-scope for SOX, we add the custom property sox:true to the repository. That allows us to set up org-wide rulesets such that any repository which is tagged with that custom property must adhere to a ruleset which enforces “You must get this peer reviewed” etc.

There are layers beyond this, like how we assert that everything has a catalog-info.yaml file, how we ensure those that do have the relevant required properties, etc. That’s a governance problem more than a technical one and might warrant a future post.

Related