Nov 15, 2013

Understanding the Intricacies of Hosting

Outages & Mitigation

Hosting providers have been making headlines and coming under scrutiny as of late. With the U.S. government publicly implicating Terremark, a Verizon hosting provider subsidiary, as the point of failure for the troubled Affordable Care Act web site, and yet another AWS outage, interest in the industry has greatly increased. I’d like to shed some light on the intricacies of our business in hopes that customers have a better understanding of what it is hosting providers do, and what best practices are.

To most, the hosting business seems like a relatively easy and straight forward endeavor—it should just work all the time. We agree, but the business of providing hosting services is more nuanced than one might think. The aforementioned headline-making outages, and the one suffered by NetFlix on Christmas Eve 2012, illustrate the reality of major failures that can occur in this business. Sometimes outages are brief, sometimes longer than comfortable, and at other times they cause major disruptions that leave customers and companies alike deeply frustrated. We understand the toll an outage can take on a business of any size, as well as the impact it has on their customers. No matter whom it affects, we’re sympathetic to their plight and to the plight of the hosting service provider as well.

The hosting companies at the center of these storms have certainly suffered, but at the end of the day, their customers feel the deepest impact—they’ve endured revenue loss, damaged reputations, and severe operational disruptions. Not an enviable spot for any organization, but one that can be mitigated by following best practices.

Solid Infrastructure, Sound Practices

In these instances we’re on the outside looking in, so we have little insight into the specifics of what really happened in these scenarios, but I wanted to impart my wisdom and share how this looks to an industry veteran.

The reality is that the severity of extended outages can be mitigated through the following five tenets:

  1. Build a solid infrastructure
  2. Establish robust processes
  3. Provide superior customer support before, during, and after the event
  4. Be empathetic with customers throughout
  5. Engage in impeccable communication during and after the event

During a recent incident at HostGator, the official statement claimed the cause was network-related, citing the failure of two core network switches at a central facility during routine datacenter network maintenance. The outage occurred during the middle of a business day, raising questions about the timing of the maintenance, as well as the failure of both switches. While certainly possible, no single action or flaw should have such a profound effect. When scaling data centers, the redundancies and dependencies are exponentially complex, and even small changes to routing rules can have massive consequences.

Communication and Empathy Key to Customer Satisfaction

Listening and communication plays a key role in empathy to customersStepping out of the role of armchair quarterback, it’s important to take a look at the technical lessons to be learned—and there’s always a lesson to be learned. Along with a sound technical reaction, how you execute escalation and contingency plans, and the support you provide customers through a rough patch make all the difference in the world! When faced with an outage, customers are rightfully quick to vent their frustration, so having a forum such as frequent and timely Twitter updates, and other platforms for customer updates are all great tools. However, if customers are having difficulty getting the support they need—lack of status updates, techs unable to research accounts, and unacceptable response times—that’s troublesome. It’s when things get rough that great customer relationships and exceptional support count the most. It’s never fun or easy, but you win and lose in this industry based on the quality of your reaction in tough situations.

Let me share some sobering truth about the hosting industry: there are hosting and cloud providers out there that don’t offer true 24/7/365 support and network monitoring as part of their business model. They rely on low price points to entice customers to choose them over a better provider. This is sometimes done on top of overselling either their own or their data center’s technical capacity and not following industry best practices to cut costs until they have a devastating breakdown like those mentioned above. When a business model incorporates overselling and system overload, the customer is bound to run into serious problems that ultimately affect their uptime, and business continuity.

The best rule is to make sure that a hosting provider runs their business with a customer first mentality and longevity in mind. In addition, here are a few items to look for when choosing a hosting provider:

  • All businesses are different. Make sure the hosting company understands the business needs and is flexible in finding the best solution. Otherwise, failures may result due to incompatibilities and weak implementations.
  • Identify a hosting partner that not only has experience handling major technical issues, but one that also has significant financially-backed SLAs to ensure the importance of the uptime required by your operation.
  • Make sure they have well-trained, responsive, courteous staff focused on delivering the ultimate in customer satisfaction. Downtime will happen, but of key importance in that situation is the quality of customer service and how quickly services and functionality are addressed and restored.
  • Not all support is the same. Find a provider that offers 24/7/365 support in an organized and efficient manner with qualified technicians.

The bottom line is that at the peak of the crisis, communication needs to be ongoing and available through multiple channels, including Twitter, status blogs, phone calls, and all other established platforms. Immediately following an event, a Root Cause Analysis should be performed, resulting in specific action plans that remedy the situation permanently, with a full report provided to the customer.

Less than this, you are probably gambling with your customers and your business.

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedIn

Tags: , , , , ,