Byzantine Fault Tolerance and The Byzantine Generals’ Problem

Byzantine Fault Tolerance

Last Updated: 1st November 2018

Byzantine Fault Tolerance is the characteristic of a system e.g. a distributed system, that is able to tolerate the class of failures belonging to the Byzantine Generals’ Problem, with these class of failures being known as Byzantine failures. Byzantine failures are characterized as being the arbitrary ways in which components of a system can fail, which prevents various components of a system from reaching consensus. For example, the incorrect processing of requests, or producing incorrect or inconsistent outputs can be classified as being Byzantine failures. The aim of a Byzantine fault tolerant system is to be able to defend against Byzantine failures.

The Byzantine Generals' Problem

The Byzantine Generals’ Problem is an agreement problem first described by Leslie Lamport, Robert Shostak and Marshall Pease in their 1982 paper ‘The Byzantine Generals Problem'. The problem is a thought experiment that is intended to illustrate the difficulty of reaching consensus in a distributed system.

In the problem, a group of generals, each commanding a portion of the Byzantine army, encircle a city. These generals must now develop a plan to either attack or to retreat from the city. Some generals may want to attack, whilst others may want to retreat, what is necessary is that every general reaches a collective decision. If a unified decision is not reached, and some generals decide to attack, whilst others choose to retreat, then the uncoordinated attack or retreat will fail. The generals are also physically separated from each other, and must communicate their decision to either attack or retreat via messengers. There is a possibility that these messengers may fail to deliver votes, or worse, may forge votes, which would result in an uncoordinated decision taking place. The problem is further complicated by traitorous generals who may send mixed messages about their preferences, leading some generals to believe that they will attack, and others to believe that they will retreat.

The issues presented in the Byzantine Generals’ Problem can be applied to the case of distributed computing systems. There is a risk that a distributed system will contain bad actors, that can cause the ecosystem to fall apart. These bad actors may fail to pass on data, or even send inaccurate data to other participants in the computing ecosystem.

The Byzantine Generals’ Problem is particularly challenging to a public blockchain, because there is no central authority to remedy any wrongs in the event of a Byzantine failure. If some members within the community send inaccurate information to others about transactions, then the reliability of the whole environment will break down. Proof-of-work (PoW), the consensus algorithm first found in Bitcoin, provides a solution to the Byzantine Generals’ Problem, by overcoming Byzantine failures to allow public blockchains to reach a clear and global view of the system.

Under PoW any individual who wishes to add a block to the blockchain must perform work-intensive computations using existing information on the blockchain. PoW requires a significant time investment by the individual wishing to add a block to the blockchain. This serves as practical protection against any malicious party who wants to manipulate the blockchain in order to undermine group consensus. This is because the malicious party would have to invest a considerable amount of time and money producing the PoW that is necessary to exert any meaningful influence on the blockchain.

More on proof-of-work.