Engineering Ideas #9
Variety vs uniformity in the tech stack, why NoSQL, documenting architecture decisions, architecture decision drivers, effective reading, team motivation
The following statement caught my attention in this post by Jesse Howarth:
Discord has never been afraid of embracing new technologies that look promising. For example, we were early adopters of Elixir, React, React Native, and Scylla. If a piece of technology is promising and gives us an advantage, we do not mind dealing with the inherent difficulties and instability of the bleeding edge. This is one of the ways we’ve quickly reached 250+ million users with less than 50 engineers.
Which sounds almost the direct opposite of the experience at Instagram (citation from “The Effective Engineer” by Edmond Lau):
Whenever they could, the Instagram team picked proven and solid technologies instead of shiny or sexy new ones. “Every single, additional [technology] you add,” Krieger cautions, “is guaranteed mathematically over time to go wrong, and at some point, you’ll have consumed your entire team with operations.” And so, whereas many other startup teams adopted trendy NoSQL data stores and then struggled to manage and operate them, the Instagram team stuck with tried and true options like PostgreSQL, Memcache, and Redis that were stable, easy to manage, and simple to understand. They avoided re-inventing the wheel and writing unnecessary custom software that they would have to maintain. These decisions made it significantly easier for the small team to operate and scale their popular app.
The conclusion that I make from this contradiction is that sticking to proven tech vs. embracing new tech is not really that important for the scalability of engineering resources. Perhaps, instead of trying to explain our successes, we should focus on understanding what helped us to avoid failure. Nassim Taleb’s Via negativa and Will Larson’s Iterative Elimination Tournaments come to mind.
These points from Pramod Sadalage’s and Martin Fowler’s book “NoSQL Distilled” that I found most insightful:
The rise of NoSQL appears to be connected (or enabled by) to the rise of service-oriented architectures:
There is a movement away from using databases as integration points towards encapsulating databases within applications and integrating through services.
One of the main factors to consider when choosing between a relational and a NoSQL database:
Aggregate-oriented databases work best when most data interaction is done with the same aggregate; aggregate-ignorant databases are better when interactions use data organized in many different formations.
On the tradeoff between technology suitability and the toolbox size (connects to the discussion above about variety vs uniformity in the tech stack):
Adding more data storage technologies increases complexity in programming and operations, so the advantages of a good data storage fit need to be weighed against this complexity.
Michael Nygard presents a simple framework for documenting the changes and additions to the design of the system. The benefits:
The motivation behind previous decisions is visible for everyone, present and future. Nobody is left scratching their heads to understand, "What were they thinking?" and the time to change old decisions will be clear from changes in the project's context.
The architecture decision records (ADRs) could be committed to the project repository under a separate directory, like doc/adr/. The structure of an ADR:
Title. These documents have names that are short noun phrases. For example, "ADR 1: Deployment on Ruby on Rails 3.0.10" or "ADR 9: LDAP for Multitenant Integration"
Context. This section describes the forces at play, including technological, political, social, and project local. These forces are probably in tension, and should be called out as such. The language in this section is value-neutral. It is simply describing facts.
Decision. This section describes our response to these forces.
Status. "Proposed"/"accepted"/"superseded" with a reference to its replacement.
Consequences. This section describes the resulting context, after applying the decision. All consequences should be listed here, not just the "positive" ones. A particular decision may have positive, negative, and neutral consequences, but all of them affect the team and project in the future.
For foundational decisions, I think it makes sense to use a more elaborate structure, for example, the one described in the following article:
Jeff Tyree and Art Akerman enumerate five types of objectives that drive architecture decisions: business needs (aka functional requirements), risks, system issues, change cases (which can be addressed by creating options), and non-functional requirements.
When does a decision require heavyweight documentation?
To test a decision’s architectural significance, an architect should ask the following question: does this decision affect one or more system qualities (performance, availability, modifiability, security, and so on)? If so, an architect should make this decision and document it completely.
Authors propose the following sections for a decision document:
Issue (problem, subject matter). Explain the urge for making the decision (rather than deferring it for later, which should be the default strategy): “Describe the architectural design issue you’re addressing, leaving no questions about why you’re addressing this issue now.”
Decision (short statement)
Constraints (boundary conditions): organizational, human, schedule, cost, risk constraints, non-functional requirements, alignment with the business and technology strategy.
Positions (alternatives, options). “This section also helps ensure that you heard others’ opinions; explicitly stating other opinions helps enroll their advocates in your decision.”
Decision: a longer description
Implications of the decision
Related requirements. Show how the decision is driven by the objectives: “Decisions should be business-driven. To show accountability, explicitly map your decisions to the objectives or requirements. […] If a decision doesn’t contribute to meeting a requirement, don’t make that decision.”
Related principles. “If the enterprise has an agreed-upon set of principles, make sure the decision is consistent with one or more of them. This helps ensure alignment along domains or systems.” I think that industry best practices or the prior experience (within the team, personal, or found online) could also be referenced here.
What hard data (evidence) supports the decision?
What conflict is baked into the decision (every important decision has it)? This could be a separate section “Drawbacks”, or a sub-section within “Implications”. How the conflict is addressed (drawbacks mitigated)?
What metric of feedback loop will help to verify that the decision is effective in the future?
Who is responsible for carrying out the decision? What is the implementation plan? The specific next step?
Is there anything else that you think is important to check in architecture documents? Please share it in comments!
This repository curated by Joel Parker Henderson has many links for further reading and alternative architecture decision record templates.
Steve McConnell defines two main types of reading:
Inspectional: get the most out of a book or article within a given amount of time
Analytical: get the most out of a book or article with an unlimited amount of time
The process of inspectional reading:
Study the table of contents. What subject areas are covered? What is the book’s emphasis? Can you infer the author’s main points from the table of contents?
Read the book’s preface.
Study the book’s index.
Read the introductory text of each chapter.
Dip into the first and last chapters of the book.
McConnell advises reading inspectionally even when we plan to read analytically anyway:
In case you do need to perform Analytical reading, Inspectional reading prepares you for that. In reading a book or article for understanding rather than just for information, you need to both acquire an understanding of the way the author has organized the subject matter and an understanding of the subject matter details. By jumping into the details first, starting at page 1 and reading through to the end, you have to acquire both of these kinds of knowledge simultaneously, which is very difficult. By performing Inspectional reading first, you acquire an understanding of the organizational framework quickly, and you can then fit the details into the framework during your more detailed reading of pages 1 to n.
Philipp Hauer writes from his experience about nurturing a motivating atmosphere in the team. Below are some ideas that jumped out for me. The first two echo Daniel Coyle’s “The Culture Code”:
We usually hold a meeting when we have to come up with a database schema, architecture, design decisions or bigger refactorings in our software. After my experience, the outcome of those meetings is great and much better than the solution of a single developer. Swarm intelligence impresses me every time!
The above also reminds me of Earl Beede’s take on the three best decision-making strategies, two of them are communal:
Delegation to those who will be implementing the decision
One person decides after a discussion. Everybody feels heard and understood.
Consensus. Check often that everybody is still on board, to avoid small disagreements.
Noncontrolling language. The phrasing is crucial. This applies not only for feedback in code reviews but also for the daily communication within a team. Try to avoid phrases that contain “must” or “should”. Instead, use “think about …” or “have you considered …”. Again, ask questions.
At Spreadshirt, a developer can make an “internship” in other teams for one or two weeks. I can’t highlight enough the value of those internships: It’s so interesting to see how other teams work, what are their processes, what are their technologies and solutions and what are the social dynamics. And the knowledge exchange works in both directions.
The following reminds me of Edmond Lau’s idea about picking the right metric:
A developer put a lot of effort into optimizing the performance of a remote call or an algorithm? It’s so rewarding if there are metrics (e.g. in Grafana) showing the improvement in terms of shorter response times and higher throughput.
Thank you for reading thus far!