dev-resources.site
for different kinds of informations.
Understanding the Backstage System Model
The Backstage Internal Developer Portal is, at its heart, a software catalog. As a catalog, Backstage relies on a structured System Model to represent and organize individual items, in order to make it easier to find the information development teams need. When you are setting up or running Backstage you’ll often want to tweak this Model (or make wholesale changes to it) to make it fit your organization.
In this blog we’ll explore the Backstage System Model and how you can extend it if you need to.
Why do we need a system model?
Catalogs require at least some structure. If you don’t have a common taxonomy for how to describe each element inside it then it lacks coherence, like a library with no labels on the shelves (or worse yet, contradictory labels). You could pour in all of your various repositories, components, gateways, resources and clusters into a catalog and it will closely resemble a giant blob of nothing.
In a Catalog, information needs to be sorted to have value. Decisions need to be made about what gets included and what does not, and you need an idea of what goes where - how things are categorized now and how they should be categorized in the future.
The Basics
The Backstage data model is made up of nodes ("entities") and edges ("relationships").
Entities
The Backstage data model is built around "entities." Entities are the core units within the Backstage catalog that represent various elements of your software ecosystem.
Each entity is defined via metadata (name, description, labels etc), spec (custom properties), and relations (connections with other entities). In OSS Backstage this information is often piped into Backstage via YAML files that adhere to Backstage's entity specification. Sometimes entities can also come from "Providers" which provide the entity from some source of truth (i.e. Users and Group entities from Okta)
This model allows teams to maintain a structured, discoverable Catalog by distributing the load across every team who owns part of the Catalog.
Friction Warning:
- Backstage advocates for distributed ownership (i.e. each team owns the information in the Catalog that represents the software that it owns) so it can be tricky to update your model and change it over time. For example, if you wanted to replace a Kind all of the various teams would need to update their catalog files. To get around this, a lot of self-hosted Backstage users have built API-based methods for mass updates.
Kinds
Entities are grouped into Kinds. Kinds are like a aisle at a supermarket - everything within it is broadly cohesive and organised around similar principles.
Kinds have a schema and they require a processor to correctly ingest them into the Catalog.
You get some core Kinds out-of-the-box with Backstage, like:
- Domain: Defines larger business domains, organizing systems and components
- System: Higher-level abstraction representing a collection of components working together
- Component: Represents deployable units like services, websites, or libraries)
Friction Warning:
- In OSS Backstage you can extend existing Kinds or write new Kinds to include whatever you’d like, but you need to build or modify a processor each time. That means writing code.
- You will also need consider the long-term impact of a new Kind. You’ll likely be supporting that Kind for a long time unless you want to deprecate it and force entities that use that Kind to fail.
Types
Kinds have Types, allowing grouping within these larger buckets.
Types can be defined on-the-fly. Nothing special is needed to make Types work, any team can create a new Type just by articulating it in their catalog-info.yaml file.
Friction Warning:
- This can lead to a Cambrian explosion of Types, so you may want to introduce some constraint there. Validation of Types is common.
- Annoying errors can creep into Types (i.e.
Website
andWesbite
) unless you’re validating them in some way.
Relationships
Relationships exist between entities to provide the connective tissue of the Backstage Catalog.
Each Kind has a preset series of permissible relationships that are built when the processor runs for that Kind.
For example, a simple Component might have some API relationships and dependencies defined:
```apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: artist-web
description: The place to be, for great artists
spec:
type: website
lifecycle: production
owner: artist-relations-team
system: artist-engagement-portal
dependsOn:
- resource:default/artists-db
dependencyOf:
- component:default/artist-web-lookup
providesApis:
- artist-api
## The Core System model
Out of the box, Backstage comes with a lot of built-in Kinds with attendant relationships so you can get started as quickly as possible.
Some Kinds, like software templates and Locations are effectively atomic and compartmentalised away from other Kinds. The remainder are tied to how the Catalog is built and used to represented entities.
![Backstage System Model](//images.ctfassets.net/hcqpbvoqhwhm/2c89CI4rKHDNmWvCM2IHhM/37022568f4168c497956da8d9615511a/software-model-entities.drawio-3ce7f43dd236c3934209fde8f21a4d9e.svg)
They in effect represent "The Spotify Way" to model software. That’s not for everyone and won’t necessarily work perfectly for you.
If that’s the case, you have two options:
- `Force it a little`: aka shoehorn your existing concepts into Spotify’s version. This works in a lot of cases, but is necessarily a compromise.
- `Re-model`: if that doesn’t do the trick, you need to get to work remodeling Backstage entity Kinds and types to fit your needs. Some can be done without code changes, but some need you to get your hands dirty.
# Going beyond the basics and extending the Backstage System Model
The Backstage framework is designed to be highly extensible, allowing you to modify or add new Kinds, Types, and Relationships based on the requirements of your organisation.
That said, there are a few things you need to think about when extending the model:
### 1. No code extensibility
Backstage has flexibility baked on for a large degree of software definition. Using Types or built-in relationships handles for most situations when you want to model your software inside the Backstage System model. 80-90% of the time this will do the trick, but will often come with some degree of compromise. For example, let’s say you want to articulate `Value Streams` as a top level concept, but have to make do with `Value Streams` being a Type associated to the `Domain` Kind. It’s imperfect, but it’ll do in a pinch.
At Roadie, we evaluate and extend the System Model for our customers regularly. That works a lot of the time, but sometimes customers have niche requests that we don’t feel would benefit all our users. This is non-optimal. We want to customers the freedom to extend the model without talking to us or writing code. To achieve that we’re building a fully self-serve, no-code UI for dynamically generating Kinds and defining a system model that can be as arbitrary as you’d like: if you want a Kind called `purple-monkey-dishwasher` you should be able to have one.
### 2. Extending the framework using code
Backstage is built around [providers](https://backstage.io/docs/features/software-catalog/external-integrations/) and [processors](https://backstage.io/docs/features/software-catalog/external-integrations/#custom-processors). Providers pull data in, processors manipulate and validate that data to build the Catalog entities and relationships.
You can create wholly new providers to handle the ingestion of data from sources not currently handled by Backstage. The Backstage community has built a lot of Providers over the years, but they may require tweaks to fit your specific use-case. For example, Roadie has rebuilt the GitHub provider to use webhook-based ingestion because the size of Catalog we habitually deal with break the GitHub rate limits
You can also modify processors for existing Kinds. For example to extend the list of allowed relationships between Kinds you need to tweak those processors.
You can also create wholly new processors to define new business logic or processes for manipulating and validating that data when you create a new Kind. Going back to the Value Stream example, now you can differentiate `Value Stream` from `Domain` and allow the Kinds to deviate usefully from one another. Maybe they each need different allowed relationships, or they’ll build their entities differently: the choice is yours.
### 3. Data
In the [out-of-the-box OSS Backstage model](https://backstage.io/docs/features/software-catalog/system-model/) the data for the system model comes from yaml files. This follows the GitOps model, where changes are made in git-tracked repositories and then ingested by other systems (in this case, the Backstage Catalog).
That means if you want to change or update your model you need to change all those files. That in turn means that opening PRs against every repos which contain a relevant yaml file. This is often a large undertaking, adding significant friction. That’s why most high-volume users of OSS Backstage have built API- and database-based mechanisms to do mass updates. Roadie has two: the Decorator UI and APIs to do a variety of different update patterns (idempotent updates to sync data from a source of truth into Backstage, or just pushing in whole entities via the Roadie Entities API).
# Levers to pull when extending the model
Below are some common methods for extending the Backstage data model:
### 1. **Custom Annotations**
Difficulty: Trivial
- **Why**: If you need to add metadata specific to your organization (like security labels, compliance levels, etc.), you can define custom annotations.
- **How**: Annotations are added as key-value pairs within the `metadata.annotations` field in your YAML definitions. These annotations can be used to enhance search functionality, create custom views, or provide additional context.
- **Example**: Adding `security-level: high` as an annotation for services that handle sensitive data allows you to quickly filter and prioritize compliance and monitoring for these services.
**References**:
- [Backstage Annotations Documentation](https://backstage.io/docs/features/software-catalog/well-known-annotations/#annotations): Documentation on creating custom annotations to extend metadata.
```yaml
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: fraud-detection-model
description: "AI model for fraud detection"
annotations:
security-level: high
...
```
### 2. **Custom Types**
Difficulty: Easy
- **Why**: In cases where the existing entity types (Component, API, etc.) do not fit your specific resources, you can create custom entities.
- **How**: Define a new entity type in any valid catalog-info.yaml. This simple involves adding a new type to the `spec.type` in the YAML file.
- **Example**: Suppose you have machine learning models as a core resource in your project. You could define a new `model` type.
**References**:
- [Roadie Kinds and Types documentation](https://roadie.io/blog/kinds-and-types-in-backstage/) talks a lot about how to use Types without introducing problems
```yaml
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: fraud-detection-model
description: "Machine learning model for fraud detection"
annotations:
security-level: high
spec:
type: model
version: "1.0"
trainingDataset: "transactions-v1"
accuracy: "95%"
```
### 3. **Modifying Existing Kinds to Add Custom Relations**
Difficulty: Normal
- **Why**: Relationships between entities help you capture dependencies, ownership, and team structures within your catalog. If your use case involves additional relationship types, custom relations can improve representation.
- **How**: Modify the relevant processor for a given Kind to enable new types of relationships to be built for that kind. Then define relations within the `spec.relations` section of the YAML file.
- **Example**: Suppose you want to track models associated with data sources. You could create a custom relation `usesDataFrom`, linking ML models to the Resource entities that document data sources they rely on.
**References**:
- [Roadie Kinds and Types Documentation](https://roadie.io/blog/kinds-and-types-in-backstage/): Provides practical examples of defining and extending Kinds.
```yaml
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: fraud-detection-model
description: "Machine learning model for fraud detection"
annotations:
security-level: high
spec:
type: model
version: "1.0"
trainingDataset: "transactions-v1"
accuracy: "95%"
relations:
- type: usesDataFrom
targetRef: resource:exampleorg/some-data-source
target:
kind: resource
namespace: exampleorg
name: some-data-source
4. Creating Entirely New Custom Kinds
Difficulty: Normal / Hard
- Why: When the System Model cannot adequately encapsulate how you build software or the relationships between various parts of your organisation, you will need to build a custom Kind.
- How: Write a new processor for that Kind and define a custom schema for that Kind. This ensures all entities adhere to required fields, valid types, and constraints, providing an additional layer of validation. Then add new catalog-info.yaml files for the new Kind to relevant resources, or modify existing catalog-info.yaml files.
-
Example: For the
MLModel
entity, you could create a new Kind to represent that in your System model. Using that new Kind you could then model relationships asversion
,trainingDate
, andaccuracy
.
References:
- Backstage JSON Schema Documentation: Explains how to define and enforce custom schemas.
Conclusion
Backstage is an extremely flexible framework for modelling software and once the building blocks and options are understood it’s simple enough to fully customise the model.
Useful links:
- Backstage System Model: official docs and a good starter diagram for how entities in the Catalog interact.
- Backstage Entities: official docs on the lifecycle of entities
- Backstage Relationships: official docs on how relationships work inside Backstage
- Modelling software in Backstage: Roadie blog from 2021 about how to model software in Backstage using the core system model. This still represents a great primer on the out-of-the-box system model and how you could use it.
Featured ones: