I'm a big fan of designing systems to deal with component failures. But let's be honest, doing that perfectly is pretty darn hard.
In the research paper "Fundamental Concepts of Dependability", all possible sources of fault conditions have been classified into 16 different categories. In another paper, "Software Architecture Reliability Analysis using Failure Scenarios", an 8 step failure analysis process is proposed for how to understand a system's potential failure conditions. All this is about identifying and classifying fault conditions, not actually providing designs to resolve them.
I'm going to go out on a limb, and declare that nobody is doing that type of full and formal analysis for their cloud applications. (OK, perhaps somebody, but certainly not many.)
So that's the problem in a nutshell. How can you really say that you have fully designed for failure, given all of the possible failure conditions? And for the 90% of the cloud platform population that just want to get their apps built, how much time should they really be spending on solving this problem?