Scaling Up To a High Reliability Organization

Randy Cadieux, founder of V-Speed LLC, started to post some interesting articles in the Lean Startup Circle Group on LinkedIn in June of this year, in particular his “Working on the Edge of Failure.” The high reliability organization as a lot to teach startups so I decided to reach out to him to compare notes. This led to some great conversations and a recorded sessions that we have transcribed into this edited transcript–with some hyperlinks added for context.

Scaling Up To a High Reliability Organization

Q: Could you talk a little bit about your background?

A: My background is in US Marine Corps Aviation. After graduating from US Navy flight school I started out flying the KC-130 Hercules aircraft and then had a couple tours as a flight instructor flying the T-34C Turbo Mentor and a tour flying the UC-12B, which is a military version of the King Air B200. Along the way I gained experience in the areas of operations and safety leadership and management, and Crew Resource Management. As the founder of V-Speed I am now a consultant, coach, and trainer and I work mainly with industries that face hazards and risks related to personnel and operations.

Q: You say you did inflight refueling missions. I pump my own gas so I have a pretty good idea what’s involved: except for a few minor differences like your car and the pump are both traveling at around 275 miles per hour, you are up a couple of miles in the air, and there is the small but nagging possibility you may find yourself in the middle of a drive by shooting. What was it about these missions that got you interested in the high reliability organization?

A: Interesting analogy! I didn’t actually find out about the term High-Reliability Organizations (HRO) until 2006. I was serving as a Standardization Officer for one of the USMC Air Wings and attended a Safety Stand-Down sponsored by Bombardier aircraft. Kathleen Sutcliffe (co-author of Managing The Unexpected: Resilient Performance in an Age of Uncertainty) was one of the presenters. I found her talk very interesting.

Fast forward to 2010. I was the Director of Safety and Standardization for a US Navy Flight Training Squadron and I was looking for an interesting speaker for our next Safety Stand-down. A quick search for “High-Reliability Organizations” led me to a website called High-Reliability Organizing (high-reliability.org). I submitted a web form contact request and was called by one of the group leaders. From there I was invited to join in on bi-weekly thought leadership teleconferences on the subject of High-Reliability Organizing.

I quickly realized that what we were talking about in the teleconferences (including Weick and Sutcliffe’s 5 Principles of HRO) was really what we as Marine and Navy Aviators practiced every day during administrative, routine, and combat aviation operations, even though we did not describe it using the HRO lexicon. This isn’t surprising, considering that much of HRO theory was developed after observing aircraft carrier operations and leadership. Sometimes talking about what you already do helps you gain clarity on it, which helps refine your understanding of it.

The Five Principles of a High Reliability Organization

Q: How can you tell when you are working in a high reliability organization (HRO)?

A: I like to use the 5 Principles of HRO’s as codified by Karl Weick and Kathleen Sutcliffe in their book Managing The Unexpected: Resilient Performance in an Age of Uncertainty.

These are:

Preoccupation with Failure: We were continuously on the lookout for what could go wrong and ways to spot weak signals and small failures early so we could act on them and take corrective actions.
Reluctance to Simplify: Complex problems don’t always have simple solutions, yet it can be comfortable to jump on the last solution that worked well or the one that most readily comes to mind. We created an environment where diverse opinions were sought out to help balance biased decisions.
Sensitivity to Operations: Plans are necessary, but operations will vary, so crews need to pay attention to operations as they unfold and make dynamic adjustments based on the actual (not planned) conditions.
Deference to Expertise: Marine Aviation units are complex organizations, with multiple different job duties and roles. For example, there are mechanics, administrative personnel, intelligence personnel, and aircrew. Everyone has his or her job to do. Deferring to the experts means ensuring there are competent Marines who can do their jobs and then trusting them. This is often referred to as “Special Trust and Confidence” and in many cases this can be seen between Officers and Staff Non-Commissioned Officers, who are trusted advisors.
Commitment to Resilience: Adaptability and Flexibility is a Critical Skill for aviators. In fact it is one of the Crew Resource Management Critical Skills which aviators and aircrew are evaluated on during annual evaluation flights. Resilience requires the ability to react appropriately to dynamic conditions as they unfold so the mission can be continued while actively managing risk using judgment and decision-making.

I like using these 5 Principles because they really do embody many of the principles that are actualized in U.S. Marine Corps aviation operations.

Accept No Unnecessary Risks

Q: Could you give an example of where you applied these principles?

Several years ago I was part of a KC-130 Hercules unit that was supporting Operation Enduring Freedom (OEF) while transitioning to a brand new version of the KC-130 Hercules. The new version (the KC-130J, or the “KJ”) was a highly automated version of the older aircraft. It looked similar on the outside, but that is largely where the similarities stopped.

The legacy version of the Hercules (we operated the F and R versions) had analog displays (the old round gauges, often referred to as “steam gauges”), whereas the KJ was considered a glass cockpit, with flat panel displays. It also had a Heads-Up Display and digitally controlled motors.

This was all new technology to most of us who had “grown up” on the steam gauges. When you go from steam to glass you have to change your scan patterns and the way you take in information. Rather than searching over a lot of real estate on the instrument display, taking in information from big, round analog dials you have to develop a tighter scan pattern around a smaller area where information is displayed electronically and in many cases, in tape or digit format as opposed to dials. It was a very complex machine and the learning curve was pretty steep.

My unit was tasked with being the first squadron to transition to the KJ and had to do so while simultaneously supporting OEF. This was no easy task, but we had strong leadership and excellent Marines. As a unit, we embodied the 5 Principles, which were woven into the fabric of the organization. These principles were most readily observed during the way we planned and executed operations and the way we used Crew Resource Management (a human performance and safety system) to achieve both safety and production goals.

During the transition to the new aircraft we had to get rid of a lot of old habits that wouldn’t work in the new aircraft, so there was a cultural transformation at the same time. Because we were all new to this aircraft, collectively our knowledge was limited. We learned to rely on each other and build a relationship of trust and support with each other so that we were a learning organization. We needed even the youngest pilots with the least experience to tell us if we were doing something wrong with the technology, like the mission computers. Some of the younger pilots were like “whiz kids.” They may have lacked the total flight experience, but they were amazing at operating the technology. The more seasoned pilots learned a great deal from them, but we had to break down cultural barriers to an extent to foster this learning environment.

While we transitioned to the KC-130J aircraft we were all learning. We had aircraft operating manuals, but the aircraft had so many capabilities that were not designed into all the procedures. We had some great instructors and folks who used trial and error to develop and refine the tactics. We also continued to refine our crew concepts. We tried things out and if they worked we would use them as techniques and teach others. The cost of failure was low because we calculated the risk taking and cost-benefit ratio. One of the Marine Corps Risk Management Principles is “Accept No Unnecessary Risk.”

We balanced risk taking with asset preservation. In other words, sometimes we took risks, but they were calculated and we acted our way into learning. Therefore, the cost of learning was often low.

The Marine Corps also has a process for capturing lessons learned, so that long- term learning can be solidified in not only the single squadron, but also so all USMC units can learn as well.

The unit I was in embodied the 5 Principles of HRO and our unit was able to continue to support Operation Enduring Freedom while transitioning to the new aircraft. We were then the first unit to take this new aircraft to support Operation Iraqi Freedom. This entire transition was not an easy task and by using attitudes and actions that embodied the 5 Principles of HRO we were able to successfully accomplish our mission.

Toxic Leadership

Q: Have you worked on teams that required an HRO approach but failed to perform? What were some key practices they needed to change?

A: When dealing with high-risk operations all teams should apply the 5 Principles of HRO even if they call them by different names. For example, in the USMC there are often programs and policies that (if used correctly) will help teams, crews, and units achieve the 5 Principles, such as Risk Management (RM) and Crew Resource Management (CRM). RM is a process for identifying hazards, assessing risks, and reducing risks to acceptable levels. CRM is a crew performance system designed to integrate non-technical skills (also known as soft skills), such as communications and leadership, into system operations.

There are other programs and tools that help units apply the principles of HRO, although the HRO principles are not used in USMC vernacular. Some units are better than others. I was once part of a unit that had poor leadership. This was the kind of leadership where the top leaders felt it was okay to bark out orders and yell at their Marines as a standard style of leadership. I am not saying that there isn’t a time and place for yelling in the military, but to use it as a standard everyday leadership style along with intimidation is toxic.

In fact, you can find many articles on “Toxic Leadership” if you do a quick search engine query. In this unit the HRO Principle “Deference to Expertise” was routinely ignored. Deference to Expertise requires a degree of humility and admitting that we don’t know everything. We seek out information from experts to help make decisions.

When an organization has overbearing leaders who create an environment where expertise from lower levels is squashed, it is often a matter of time before things get to the point that are beyond leadership’s ability to control them. What needed to change was the overall attitude towards followers, and viewing experts as those who possess expertise and who can help, regardless of rank; trusting those who are put in their positions to do their work as technical and functional experts. Competency and trust are key elements to facilitate a Deference to Expertise.

Don’t Let Heroes Become a Single Point of Failure

Q: Does HRO offer startup founders insights on managing high performance teams?

A: I was in another unit that was more administrative in nature, but we had what I refer to as a Human Single Point of Failure (or what I call H-SPF). SPFs are often thought of in mechanical or engineering terms, such as a single generator supplying power to multiple different types of equipment. If the generator fails, all the equipment receiving power from that single generator also fails (unless there are backup systems in place).

Organizational staffing isn’t much different, though. Organizations often place single individuals in critical positions, but without building in a depth of capacity to fill that position if something goes wrong, such as when that person can’t come to work, or transfers. Organizations may get too comfortable relying on “heroes” to get the job done rather than creating reliable and redundant systems that are more resistant to failure. This is somewhat understandable. People like to feel needed and often gain a sense of security through this, and the organization has their “go-to-guy” to get the job done. This often works until it doesn’t.

In this particular unit there was one individual who performed key duties and was the only person who knew how to perform those duties. This ignores the HRO principle of “Preoccupation with Failure.” When we are preoccupied with the potential failure points we should begin getting uncomfortable with things like H-SPFs and should start looking at ways to build in a depth of capacity with critical resources. In that particular instance I began a process of setting up additional personnel who could assist to reduce the likelihood of failure associated with a H-SPF.

Developing Resilience Is Key to Managing and Surviving Failure

Q: What can startups learn from HROs? Much of what startups do is to search for an effective business model, somewhat akin to a research organization looking for a new drug compound or chemical process or material. Research organizations tolerate a high rate of failure–how does this square with the HRO model?

A: If startup founders understand the 5 Principles of HRO and the components of Crew Resource Management (including communications, leadership, decision-making, assertiveness, situational awareness, mission analysis, and adaptability/flexibility) they can start building the structure to support a more resilient organization.

To explain resilience I will adapt the description my book Team Leadership in High-Hazard Environments: Performance, Safety and Risk Management Strategies for Operational Teams. A highly resilient organization should have the ability to accomplish operational goals with consistent performance and adequate levels of safety (managed risk), despite the existence of hazards, uncertainty, and risk, and while being exposed to constant external and internal pressure and unexpected disruptions and threats.”

While it might seem contradictory to think that a startup organization can be an HRO, I think they can be preoccupied with failure. Just like any other organization, startups don’t have unlimited resources. Trial and error are important and acting into learning is important, and sometimes there will be failure, but I think an HRO will fail early, minimize losses, capture the lessons from the failure, and move forward. One of the hallmark skills of Marine Corps aviator is the ability to be adaptable and flexible, and this is often associated with the HRO principle “Sensitivity to Operations.”

We know that some plans are not perfect and that failure may occur, but we adjust as soon as we recognize that things aren’t working anymore. I see startups being able to do the same thing. It isn’t blind risk-taking, but calculated risk taking. Additionally, I understand that startups may often function at lower tiers of resilience.

From Surviving To Planning to Sustaining High Performance

A: In my book “Team Leadership in High Hazard Environments” I refer to levels of resilience: Surviving, Planning, and Sustaining.

At the Surviving level organizations (particularly bootstrapped companies) often rely on heroes and hope and work very hard to build out first iteration products and services in hopes of gaining customers. They may have several H-SPFs. That is understandable.

As they gain cash flow from operations they may be able to reinvest portions of that to build in a depth of capacity to reduce the number of H-SPFs and design systems to help avoid critical failures in high-consequence categories.

Additionally, for many non venture-backed startups and possibly even for some early growth stage companies founders and employees cannot put all their chips on the table because if they are wrong on their bet they risk losing everything. In high-risk operations, like combat aviation operations you typically can’t afford to lose that big, so you take calculated risks and you shape the approach through planning, execution analysis, and adaptation.

Additionally, as some authors explain, such as Nassim Taleb (The Black Swan, Fooled By Randomness, and Antifragile) and Charles Yoe (Principles of Risk Analysis: Decision-making Under Uncertainty), in general people are not great at predicting risk. For some startups there is not much data to draw on to see how others failed and to shore up defenses against those failures. In essence, when startups are pathfinders with new technology they are like test pilots. This isn’t doom and gloom, though.

One technique I teach in my courses involves Event Tree Analysis, which is basically a “what if” technique to look at how a process may unfold and barriers that may be put in place to either halt the progression of failure or minimize the losses. It is a way to visualize how events may unfold under certain conditions.

Q: Can you give an example where you introduced a change in methodology or process improvement in a high reliability organization? How does a high reliability crew balance the learning curve cost and higher risk of failure in the short term vs. the long-term benefits?

A: Years ago, when I went through US Navy flight school I was yelled at and treated in a demeaning manner. That was the way it was done back then and was sort of a right of passage. I knew it wasn’t effective in many cases, but it was the way it was. When I went back as a flight instructor I made a point to treat my students differently and to see each one as a valuable crew member, even if they lacked skill and experience. I told them that I was relying on them to help keep us safe because there could be situations where I might not see something outside of the aircraft due to the limited visibility from the back seat of the aircraft where instructors often sit. I explained to them that I needed their eyes and ears to help keep us safe. I wasn’t the only Instructor Pilot to start treating flight students differently.

Over time this new style evolved and a culture of respect developed, yet this did not degrade levels of authority in the organizations. People may fear change and they may fear the outcomes if they give up a little bit of power and control, but sometimes the perception of control isn’t as effective as leaders or managers may think it is.

Empowering team members can pay off and I think it helps to build more effective leadership and teamwork in the long run.

For Further Reading on the High Reliability Organization

Q: Randy thanks very much for your time, any suggestions for further reading on the High Reliability Organization?

A: Sure here are a couple of books and articles worth reading

From the High-Reliability Organizing website:

The Circle of Fear Cultivating A Healthy Respect for Unexpected Failures [PDF] by Randy Cadieux
Changing a Pediatric Sub-Acute Facility to Increase Safety and Reliability [PDF] by Daved W. van Stralen, Raquel M. Calderon, Jeff F. Lewis and Karlene H. Roberts
Transferring Wildland Fire Knowledge Using a National Lessons Learned Center [PDF] by David Christenson (tells story of formation and early evolution of Wildland Fire Lessons Learned Center

Books

Managing The Unexpected: Resilient Performance in an Age of Uncertainty by Karl Weick and Kathleen Sutcliffe
The Black Swan by Nassim Taleb
Fooled By Randomness by Nassim Taleb
Antifragile by Nassim Taleb
Principles of Risk Analysis: Decision-making Under Uncertainty by Charles Yoe
Team Leadership in High Hazard Environments by Randy Cadieux

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Scaling Up To a High Reliability Organization