Making reliable distributed systems in the presence of software errors

Categories:

Recommended

How can we program systems which behave in a reasonable manner in the presence of sodware errors? This is the central question that I hope to answer in this thesis. Large systems will probably always be delivered containing a number of errors in the sodware, nevertheless such systems are expected to behave in a reasonable manner. To make a reliable system from faulty components places certain requirements on the system. The requirements can be satisfied, either in the programming language which is used to solve the problem, or in the standard libraries which are called by the application programs to solve the problem.

In this thesis I identify the essential characteristics which I believe are necessary to build fault-tolerant sodware systems. I also show how these characteristics are satisfied in our system.

Some of the essential characteristics are satisfied in our programming language (Erlang), others are satisfied in library modules written in Erlang. Together the language and libraries form a basis for building reliable sodware systems which function in an adequate manner even in the presence of programming errors.

Having said what my thesis is about, I should also say what it is not about. The thesis does not cover in detail many of the algorithms used as building blocks for construction fault-tolerant systems—it is not the algorithms themselves which are the concern of this thesis, but rather the programming language in which such algorithms are expressed. I am also not concerned with hardware aspects of building fault-tolerant systems, nor with the sodware engineering aspects of fault-tolerance.

The concern is with the language, libraries and operating system requirements for sodware fault-tolerance. Erlang belongs to the family of pure message passing languages—it is a concurrent process-based language having strong isolation between concurrent processes. Our programming model makes extensive use of fail-fast processes. Such techniques are common in hardware platforms for building fault-tolerant systems but are not commonly used in sodware solutions. This is mainly because conventional languages do not permit dicerent sodware modules to co-exist in such a way that there is no interference between modules. The commonly used threads model of programming, where resources are shared, makes it extremely diecult to isolate components from each other—errors in one component can propagate to another component and damage the internal consistency of the system.

Category:

Attribution

Joe Armstrong. Making reliable distributed systems in the presence of software errors. http://www.erlang.org/download/armstrong_thesis_2003.pdf

VP Flipbook Maker

Convert your work to digital flipbook with VP Online Flipbook Maker! You can also create a new one with the tool. Try it now!