Waiting … to … find … out … something … breaks … everything.
If you found yourself wanting to skip over that sentence, you’re not alone.
For engineers, and knowledge workers in general, milliseconds can mark the difference between a person’s willingness to wait for information and their need to take action. If they wait, they risk falling behind. If they act on incomplete information, they make suboptimal decisions.
As business trends—and the release cycles they drive—speed up and companies struggle to fill engineering roles, this tradeoff becomes even more important. If your teams are chronically understaffed by 10-20%, can you afford to have existing staff executing at anything less than 100% efficiency?
Rapid information flow is key to ensuring that employees have maximum visibility into the information they need, when they need it. In an ideal world teams use that visibility to move with speed AND accuracy—even Facebook realized that a maturing company can’t just move fast and break things. But given that the faster you move, the higher probability you have of breaking something, navigating the speed vs. accuracy conundrum becomes paramount. Giving employees a complete view of the environment and the results of their actions is the single biggest thing you can do to enable success. Put simply:
Maximum visibility depends on knowing four key things:
- What to do
- When to do it
- The starting state of the system
- What actually happened/is happening
Effective information flow for the first two are core tenets of the Agile movement. Done right, Agile makes it clear to both engineers and project managers what needs to be done, and when. Engineers no longer need to wait to learn (or guess at) what a product manager was intending, and product managers no longer have to guess how far along a project is, or if it can be built as desired. This visibility increase between product and engineering forms the basis of many of Agile’s advantages.
Numbers 3 and 4 might lack their own manifesto, but seasoned developers and ops engineers instinctively understand how critical they are. The methods and tools deployed to gain visibility into an environment fall broadly into five categories:
- Application Performance Monitoring (APM)
- Systems and Network Monitoring
- Metrics Dashboards
- Log Aggregation
- Configuration Management
Collectively these categories represent a more than $15 billion-dollar market, and that’s not accounting for dominant open-source players in the space like Nagios, Grafana, ELK, and Ansible (among many, many others).
Why are so many resources aimed at solving this visibility issue?
The Benefits of Increased Visibility
Let’s use two fictitious organizations: Acme Corp and Nadir Corp, to explore how visibility impacts behavior and execution speed. In both companies any employee can access any piece of information—but the method and speed of access differ greatly.
Acme Corp has built a culture of radical transparency where every employee has immediate access to every piece of company information through a lightning-fast application accessible from anywhere in the world on any device. Employees have a top-level view of key information and can do ad-hoc data exploration, for near-perfect visibility into the operation of the system at all times.
At Nadir Corp, every request for information goes through a rigorous process, occasionally with hard-copy sign-offs, before being granted. Employees must find out where the data is stored, who to request it from, justify their request, and wait for approval. Once all of that work is complete they can finally try to answer their question using the data they received.
In practice, of course, no company is as open as Acme (for very good security reasons!) and very few are as convoluted as Nadir. But from this example it’s brutally apparent which company will be able to investigate, reach decisions, and execute faster.
Employees at Nadir either 1) won’t bother trying to get data unless they absolutely have to, or 2) will look for shortcuts that allow quicker access to a slice of the data. Both of these factors lead to a continuation of the speed vs. accuracy conundrum mentioned above. Employees at Nadir are forced to either wait for key information to act, or act with limited information.
Teams or individuals who take the first option get left behind, those that take the second option make more than their share of errors.
Every company has elements of Nadir Corp in them. Sometimes for good reasons (HR records), sometimes for no good reason (lack of priority/time), and sometimes for bad ones (silo building).
Companies that aspire to be more like Acme Corp and invest in finding and eliminating silos and legacy barriers to data will quickly realize the gains of increased visibility:
- Increased visibility drives use of optimal data sources
- Fast access to optimal data leads to more efficient work
- More efficient work equals faster execution
In the age-old debate of good vs. fast vs. cheap, what should you do if you want good and fast but don’t have an unlimited budget? Invest in tools that allow employees to quickly get to key information, rapidly assess the results of their work, and continually refine their actions. Do that and those chronically overworked engineers and operations staff will be able to operate faster and with fewer errors. And isn’t that what we’re all building toward?
In my next posts, I’ll delve into the practical implications of increased visibility and common tools of the trade that promote visibility.
More action. Less Noise
Get a weekly dose of news from the observability world.Close