Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 6, 2014

Myth Busting Doubts About Formal Methods

Filed under: Formal Methods,Modeling,Programming — Patrick Durusau @ 6:38 pm

Use of Formal Methods at Amazon Web Services by Chris Newcombe, Tim Rath, Fan Zhang, Bogdan Munteanu, Marc Brooker, and, Michael Deardeuff. (PDF)

From the paper:

Since 2011, engineers at Amazon Web Services (AWS) have been using formal specification and model-checking to help solve difficult design problems in critical systems. This paper describes our motivation and experience, what has worked well in our problem domain, and what has not. When discussing personal experiences we refer to authors by their initials.

At AWS we strive to build services that are simple for customers to use. That external simplicity is built on a hidden substrate of complex distributed systems. Such complex internals are required to achieve high-availability while running on cost-efficient infrastructure, and also to cope with relentless rapid business-growth. As an example of this growth; in 2006 we launched S3, our Simple Storage Service. In the 6 years after launch, S3 grew to store 1 trillion objects [1]. Less than a year later it had grown to 2 trillion objects, and was regularly handling 1.1 million requests per second [2].

S3 is just one of tens of AWS services that store and process data that our customers have entrusted to us. To safeguard that data, the core of each service relies on fault-tolerant distributed algorithms for replication, consistency, concurrency control, auto-scaling, load-balancing, and other coordination tasks. There are many such algorithms in the literature, but combining them into a cohesive system is a major challenge, as the algorithms must usually be modified in order to interact properly in a real-world system. In addition, we have found it necessary to invent
algorithms of our own. We work hard to avoid unnecessary complexity, but the essential complexity of the task remains high.

The authors are not shy about arguing for the value of formal methods for complex systems:

In industry, formal methods have a reputation of requiring a huge amount of training and effort to verify a tiny piece of relatively straightforward code, so the return on investment is only justified in safety-critical domains such as medical systems and avionics. Our experience with TLA+ has shown that perception to be quite wrong. So far we have used TLA+ on 6 large complex real-world systems. In every case TLA+ has added significant value, either finding subtle bugs that we are sure we would not have found by other means, or giving us enough understanding and confidence to make aggressive performance optimizations without sacrificing correctness. We now have 7 teams using TLA+, with encouragement from senior management and technical leadership. (emphasis added)

Hard to argue with “real-world” success. Yes?

Well, or if you want your system to be successful. Say compare Amazon’s S3 with the ill-fated healthcare site.

The paper also covers what formal methods cannot do and recounts how this was sold to programmers within Amazon.

I suggest reading the paper more than once and following all the links in the bibliography, but if you are in a hurry, at least see these two:

Lamport, L. The TLA Home Page; http://research.microsoft.com/en-us/um/people/lamport/tla/tla.html

Lamport, L. The Wildfire Challenge Problem ; http://research.microsoft.com/en-us/um/people/lamport/tla/wildfire-challenge.html

Public forum of the TLA+ user community; https://groups.google.com/forum/?hl=en&fromgroups#!forum/tlaplus

Which leaves me with the question: How do you create a reliability guarantee for a topic map? Manual inspection doesn’t scale.

I first saw this in a tweet by Marc Brooker.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress