Engineering Strategy is a Fractal
logical unit engineering strategy = organization strategy + local context
An engineering strategy is composed of two phases that create a feedback loop between each other; designing and execution.
The important aspect is that both phases are performed by the same group of people to improve effectiveness.
When a group of people design and the other execute, things tend to not go as expected.
Same group of people != Team
I had the misconception for a long time that the team, as a minimum unit of delivery, should be the one that decides and execute their own strategy. At the end, we are the ones facing the customer challenges, aren’t we?
Then I moved into management positions and I understood that each role have their own area of focus within the organization.
One focus on building/growing the product.
One focus on building/growing the organization.
Both are needed, and depending on the business context, the engineering strategy will be different.
That’s why the group of people designing and executing the engineering strategy will be different, and also their area of influence.
Then, how engineering strategy scales?
What happens when the C-level is part of the decision, does it needs to be executing as well at product team level? Isn’t it micromanagement?
I consider that it happens when the level in which the strategy is designed doesn’t belong to their area of decision-impact.
I would expect different levels to design and execute decisions with different area of impact, scope, and risk, depending on their business stage (startup or big enterprise)
C-level impacting things like: hiring, department structure, career ladder, …
Middle management: Shared capabilities and cross-team alignment, …
Product teams: Testing approach, reliable event driven architecture, delivery, …
Those lines between logical groupings are blurry, it is not one thing fits one logical unit but the collaboration between them.
I consider a better way to structure engineering strategy working group as a mix of people from different logical structures depending on the context at hand. It is rare that each layer works in isolation, it can even be a red flag 🚩.
Recap of how an engineering strategy looks like
Full article here 👇
An engineering strategy is composed of three main components:
Analysis/diagnosis: The context analysis and which are the areas that we took into account in order to design our engineering strategy. It contains as much as relevant context as possible, not only about the state of the engineering but also other contexts within the organization that impacts our decision making such as business, product, …
Direction: A guiding direction on how we will address our challenge. It’s a decision that has to help people to know what to do, but more importantly, what not to do.
Coherent actions: The initial set of actions that we assume will help to overcome our high-stake challenge that are coherent with the direction we set.
You: But… Aleix, are you telling me that the coherent actions are deliveries made that need to be made by teams?
Me: Not at all! They need to be actionable but they don’t need to be at a low level.
Oversimplified engineering strategy example
Business context
We are facing a customer credibility issue due to our platform breaks often. We need to keep and regain customer trust by making our product less buggy 🐛.
Diagnosis
Our software has many defects due:
Very syncronous communication between microserves that makes a chain of failure when a microservice is overloaded and is unable to serve more requests.
People isn’t familiar with event driven architecture.
We don’t have space for learning and make safe experiments to improve our system.
The delivery pressure is high and makes it hard to improve our reliability due to engineering efforts invested in new features instead of maintaining/improving existing software.
Approving new cloud solutions is a very long process with multiple approvals.
Direction
Invest on training and creating a safe space to learn about more resilient architectures such as Event-Driven Architecture and tooling like Kafka to move from synchronous microservice communication to asyncronous.
We allocate the 30% of engineering efforts into improving the resiliency and making space for tech huddles and training.
We remove the blockers on asking for the required tooling or providers in regards to improving resiliency and we, instead, notify to leadership about the cost.
Coherent Actions
Create an Enabling Team to upskill the teams on resilient microservices.
Allocate a budget for this initiative of 5.000EUR/month in tooling and cloud services that are related to improve the resiliency of our systems.
Remove leadership approvals for team expenses on this initiative on experiments of less than 1.000EUR/month.
Each experiment needs to be shared as a learning with the rest of the organization, either if it failed or it succeeded.
Experiments need to be timeboxed to 2 weeks max per experiment. We want more experiments than very long lived. Apply this as a guideline and not a strong requirement.
This can be an engineering strategy that comes from the leadership. As you can see, it provides a good guiding direction and it allows teams to adapt the engineering strategy to their context.
The strategy could have been created with multiple inputs from different people within the organization, from C-level to a team member.
Making the engineering strategy a fractal
Now, you can see how the engineering strategy implementation will depend on each department/team context.
The previous engineering strategy is very high-level and it will not apply evenly to all teams. Instead, you will need to adapt it to your local context.
Local engineering strategy
Context
Our services have an outage at least twice a day during the peak traffic hours, but it is because another microservice we rely on fails. Our microservice needs to be synchronous because we need to show most up to date data.
Analysis
Our services relay a lot to the `
Item Storage Service
` to enrich the API responses because they need to be real time information.We have an important delivery that won’t be possible if we focus on improving the resiliency.
We cannot introduce event-driven architecture because all our code is designed in a synchronous way.
We don’t have experience with queues or streaming technology like Kafka in the team.
Direction
Dedicate effort to learn in safe environments like Architectural Katas about making more resilient systems like event-driven architecture, but also explore intermediate solutions. Ask for organization support and communicate often the state that we are.
Each learning we have, we can share them in the tech huddle or do an async video with our findings.
Coherent Actions
Identify the use cases that fail the most, so we can focus on them instead as improving everything at once.
Bring business experts/product to understand if we can lower the real-time requirements in some parts of the system.
Explore if adding circuit-breakers, rate limits, or retries would help.
Explore if we can introduce a local cache so that we don’t need to call the other service every time.
Pick a use case that has low risk and experiment to make it event-driven instead of synchronous.
The importance of the fractal engineering strategy
An engineering strategy at organization level won’t be able to capture all the particularities of your local context, nor it should aim for it.
It should give a good guiding direction so each team knows how to adapt it to their context.
We can say that each logical unit, like a team, department, etc, needs to build their own engineering strategy to their local context like:
It is important to notice that it is the sum of organization context + local context. They need to be aligned and work together so the organization as a whole can face their high-stake business challenge.