Crafting Products Customers Love: The Science of Engineering
Mastering engineering design practices & building a culture of operational excellence
In my previous post, I explored the art of crafting customer-centric products through product definition & reviews, and the significance of collecting valuable customer feedback.
Now, we’ll continue our journey toward excellence in product development, by shifting focus to the science of engineering. In this companion piece, I’ll delve into two critical pillars that form the bedrock of successful product delivery: engineering design and operational readiness. These pillars ensure that your products not only meet customer expectations but consistently deliver exceptional results.
I'll also discuss the crucial role of cultivating a culture of operational excellence — which is the glue that sustains the benefits of mastering all four pillars and prevents your customers from experiencing the frustrations of outages, slow-downs, and sub-par product & service performance.
1. Engineering & Design Reviews
Similar to product reviews, engineering & design reviews play a crucial role in ensuring the quality and feasibility of your technical solutions. However, the format of these reviews can vary significantly, as it depends on factors such as the programming languages used, development environment, and the specific nature of the service being developed—whether it's front-end or back-end, control-plane or data-plane, and the maturity of your company. Given this variability, I recommend being less prescriptive about the types of artifacts used for engineering & design reviews. Instead, encourage teams to use whatever format makes sense and is reusable but not overly burdensome.
Typically, engineering & design reviews involve a combination of short narratives and diagrams, which together form a design document for review. To facilitate this process, consider establishing standing meetings with an “engineering council” composed of engineering leadership and principal engineers (PEs). If your company is large enough, it's beneficial to include representation from outside your organization for diverse insights.
During these reviews, teams schedule their sessions and present their documentation. PEs lead discussions to ensure that teams are considering critical factors such as security, scalability, data management, and system design. They also assess how teams are elevating cross-company standards compliance in areas like CI/CD, test coverage, API guidelines, and developer efficiency. While this process may be seen as a form of "gating," it's important to emphasize that teams don't necessarily need to return for a second engineering review if they agree to make recommended changes; they can proceed with execution. The Operational Readiness Review, which we'll discuss shortly, serves as the second, final gate before products go live.
In cases where you have engineering initiatives that span across all your products, consider encouraging the creation of a technical version of a product document. This approach involves working backward from internal development teams as the customers, and running the document through the Product/Feature definition process. For example, this may be applicable when selecting common tooling or implementing a unified system for development, source code management, CI/CD, and code review across your company.
2. Operational Readiness Reviews
Operational readiness reviews (ORRs) play a vital role in ensuring that major services or significant new features are fully prepared for launch. Think of ORRs as a comprehensive checklist that outlines the expectations for any new service, covering essential aspects such as security, enterprise readiness (including SSO/Auth, compliance standards, auditability/logging), scalability, code quality and coverage, deployment automation, documentation, and the initial user experience. To guide this process effectively, it's crucial to have operationally-minded principal engineers (PEs) and VPs present to provide valuable feedback and action items before a product can be launched. Consider ORRs as a "hard gate" that must be successfully crossed before anything can go live for customers.
Creating a standard template for ORRs and establishing clear standards and goals for each team to work toward is essential. The exact criteria for being "launch ready" can vary based on the size and complexity of your organization. Smaller companies may have lighter-weight ORRs with fewer rigid requirements compared to larger enterprises. Regardless of size, setting and adhering to these standards is essential for consistency and to meet customer expectations. Missing ORR requirements can unintentionally disappoint customers and hinder the adoption of the features or services being launched.
In the short term, you may need to make tough decisions about what to block and what to allow through with follow-up actions. However, over time, the bar for blocking should rise. Your products, especially if they are cloud-native software-as-a-service (SaaS) solutions, should continually raise the standard in this area. More of these elements should be integrated into the engineering design review and planning processes. If you observe that teams struggle with specific compliance areas, consider developing "shared services" aimed at improving developer efficiency and standardization across your organization. This approach centralizes the burden, preventing duplicate efforts across multiple teams.
3. Building a Culture of Operational Excellence
Picture a company where every product and service operates with impeccable security, unwavering reliability, and seamless scalability. While achieving this level of excellence may initially appear as an unattainable dream, I've personally witnessed the transformation that occurs when an organization dedicates itself to this vision. Cultivating a culture where operational excellence becomes the norm can be a driving force behind the success and reputation of your products among customers.
To make operational excellence, security, reliability, and scalability integral aspects of both your products and your delivery process, you need a mechanism to continuously assess and reinforce these dimensions. This isn't a one-time effort confined to engineering design and operational readiness before a product's launch; it's an ongoing commitment that permeates every service and operation.
Enter the "Cross-Organization Ops Meetings." These meetings play a pivotal role in instilling a culture of operational excellence across your entire company. By assembling a diverse group of participants, including engineering VPs, Principal Engineers, senior staff, software engineering managers, and even junior engineers, these gatherings establish a shared understanding of the company's dedication to service quality and operational excellence. During these sessions, attendees collaboratively discuss, review, and align their efforts toward the common objective of elevating the standards for every product and service the company delivers. This inclusive approach ensures that operational excellence isn't merely a directive from the top but a collective aspiration that propels the entire organization forward.
Now, let's dive into what a typical agenda for an Ops meeting like this might look like. Whether you choose to hold these meetings weekly or bi-weekly, having a structured agenda ensures that they consistently reinforce your organization's commitment to operational excellence
Good News - Kick off the meeting on a positive note by sharing recent engineering and quality successes. Highlight achievements, such as improved system performance due to the adoption of new technologies, successful project completions, or notable team accomplishments. Starting with good news sets a constructive tone for the meeting, especially when addressing more challenging topics later on.
Company-wide Initiative Updates - Provide updates on significant company-wide initiatives that impact engineering and operational aspects. These initiatives may involve the introduction of new shared services, compliance requirements that necessitate changes across all teams, or strategic projects that require collective attention. Sharing these updates ensures that all teams stay informed and aligned with organizational goals.
Outage Reviews (Blameless Postmortems) - Conduct in-depth reviews of recent service outages, focusing on a "Blameless Postmortem" approach. The team responsible for the affected service should come prepared to present a thorough analysis of what transpired during the outage, including its impact, the actions taken to resolve it, and the preventive measures being implemented to avoid a recurrence. These reviews serve the dual purpose of upholding high operational standards within the team and facilitating cross-team learning. By sharing outage experiences and lessons learned, teams can collectively work towards enhancing system reliability and resilience.
Dashboard Reviews - Rotate through a comprehensive review of various dashboards and metrics used by different teams to monitor and manage their services. The scope of these reviews will depend on the size of the organization. In smaller teams, it's feasible to cover every service at each meeting, while medium-sized teams may adopt a rotating calendar to address specific services. For larger organizations, consider a dynamic approach like "spin-the-wheel," where any service could be chosen for review, adding a “fun” element of unpredictability to the operational reviews to keep everyone on their toes. Evaluating dashboards allows teams to share insights, identify trends, and collaboratively address challenges related to system performance, monitoring, and management.
While this meeting primarily serves as a valuable learning opportunity, it also carries an element of leadership guidance. It differs from the typical business/product/project update, except for the dedicated initiatives section. Ideally, our most senior engineering VPs, along with the PEs and other experts, actively engage in a dynamic dialogue by asking insightful and challenging questions during team presentations. These questions are aimed at upholding a culture of high standards and expectations. It’s not a personal performance review, but it should emphasizing the company’s collective commitment to objectivity and excellence.
Conclusion
Just as in the first part of our journey, where we focused on product definition and customer feedback <add link when available>, the pillars of engineering design, operational readiness & operational excellence are fundamental for crafting products that captivate and inspire trust among your customers.
Engineering design ensures that your product features achieve levels of security, scalability, and reliability that inspire confidence. Operational readiness reviews offer a final checkpoint before your products take flight.
Just as in our exploration of product definition and customer feedback, cultivating a culture steeped in innovation and high standards is paramount in the realm of engineering and operational excellence. This culture evolves team processes into company-wide standards, promotes shared knowledge, and drives continuous product improvement.
To make all this advice work, you’ll need to wholeheartedly make excellence across the four pillars an inseparable part of your organization's identity. Collectively your goal is to create products that not only meet or surpass customer expectations but set entirely new benchmarks. Best of luck in your journey towards excellence, and please feel free to share your progress with me!