1 AR EPL OT HE CRITICISMS OF THE KNIGHT & LEVESON EXPERIMENT YT Nanc yG ev e son John C. Knight .L irginia Uni Uni versity of California versity of V CA 92717 Charlottesville, V 2903 Irvine, A2 1. Intr oduction In July 1985, we presented a paper at the Fifteenth International Symposium on F ault-T olerant Computing [KNI85] xamining an h ypothesis about one aspect of describing the results of an e -version xperiment that we performed e N ersion f longer journal paper on that research appeared programming, i.e., the statistical independence of v ailure. A are Engineering in January 1986 [KNI86]. ransactions on Softw in the IEEE T N -version programming ha ve c riticized us and our papers, Since our original paper appeared, some proponents of ve d one and what we ha ve c oncluded. W eh av e spok en and written making inaccurate statements about what we ha vately attempting to e to them pri ork. Unfortunately subsequent papers xplain their misunderstandings about our w viduals ha ontained the same misrepresentations. and public pronouncements by these indi ve c We h av e not pre viously responded publicly to this criticism because we feel that our papers stand up for themselv es, ant to f an the flames. Ho we ver, i th as no wb een nearly 5 years since our w ork first appeared in and we did not w arther from the truth. print and, in our opinion, the attacks are getting more frequent, more outrageous, and f We ecided that it is no ecessary to respond publicly to ensure that those hearing these statements do not think have d wn we are being silent because we agree with the things being said about us and our w ork. None of the papers xperiment ha criticizing our e ppeared in a refereed journal or been presented in a forum in which we could ve a respond. Therefore, we are using this forum. Nearly all of the criticism of our w ork has come from Professor Algirdas A vizienis of UCLA or his former students John K elly ,M ichael L yu, and Mark Joseph. We r eply to their criticism by quoting some of the erroneous statements from their papers and addressing each of these statements in turn. The quotations from these papers are printed in . italics in this paper amiliar with the contro versy ,w ep rovide some background. An N rthose who are unf Fo -version system attempts to ault tolerance into softw are by e xecuting multiple v ersions of a program that ha ve b een prepared incorporate f .T heir outputs are collected and e xamined by a decision function that chooses the output to be used independently Fo by the system. xample, if the outputs are not identical b ut should be, the decision function might choose the re alue if there is one. majority v are, e veni ft he softw are It has been suggested that the use of this technique may result in highly reliable softw versions ha ot been subjected to e xtensi ve t esting. F or e xample, A vizienis states: ve n By combining softwar ev ersions that have not been subjected to V&V [verification and validation] testing to produce highly r eliable multiver e, w em ay be able to decr ease cost while incr easing r eliability . sion softwar [AVI84] ha sf aster r elease of trustworthy softwar e, The higher initial cost may be balanced by significant gains, suc less in vestment and criticality in verification and validation, ... [AVI89] The primary ar gument for the attainment of ultra-high reliability using this technique is gi venb yA vizienis: † It is the fundamental conjectur of the NVP [N-V ersion Pr ogramming] appr oac ht hat the independence of e †‘ ‘Conjecture’ ’. efined by W ebster’ sN ew W orld Dictionary to be ‘ ‘an inference, theory ,orp rediction based on guessw ork’ ’isd
2 programming ef forts will assur ow pr obability that r esidual softwar ed esign faults will lead to an eal oneous decision by causing similar err orst ccur at the same c[r oss]c[hec k]-points in two or mor e err oo effectiveness of the NVP appr ver epends on the validity of this conjectur e... [AVI85b] oac sions... The hd ritten independently ew ypothesized that the ya re n ot lik ely to contain the same sions ar Since the ver ,itish ., that err orsint heir r esults ar err ncorr elated. [AVI85a] ors, i.e eu vizienis notes, this h gree to which it holds will determine the amount of As A ypothesis is important because the de Eckhardt and Lee [ECK85] ha mall probabilities of hown that e vens vement that is realized. reliability impro ve s viation from statistically independent f ailures, cause a substantial reduction in potential correlated f ailures, i.e., de reliability impro vement. ‘lo wp robability’ ’a The phrases ‘ ‘not lik ely’ ’u sed by A vizienis are not quantified, and his conjecture and nd ‘ hypothesis are, therefore, not testable. we ver, s tatistical independence, i.e., uncorrelated f ailures, is well defined, Ho OU85]. In implied by much of the discussion about the technique, and assumed by some practitioners [MAR83, Y xperiment in which 27 v ersions of a program order to test for statistical independence, we designed and e xecuted an e graduate were prepared independently from the same requirements specification by graduate and senior under as subjected to an students at tw es, b ut each program w ou niversities. The students tested the programs themselv xperiment consisting of 200 typical inputs. Operational usage of the programs w as acceptance procedure for the e xecuting them on one million inputs that were generated according to a realistic operational profile for simulated by e Using a statistical h ypothesis test, we concluded that the assumption of independence of f ailures did the application. vement predictions using models based on this not hold for our programs and, therefore, that reliability impro This w as all that we concluded (see belo w). assumption may be unrealistically optimistic. gument made by our critics is that if we had just used a dif The basic ar ferent methodology for producing the programs, our results w ould ha ve b een dif ferent. F or e xample, A vizienis states: It is our conjectur et hat a rigor ous application of the design par adigm, as described in this paper would have led to the elimination of most faults described in [KNI86] befor ea ograms. [AVI87, cceptance of the pr I88] AV ould yield dif ferent results. Such conjectures, ho we ver, It is easy to assert that changes in e xperimental procedures w an be accepted. vizienis and former students use yc need to be supported with scientific proof before the Professor A Their first ar rguments to support their conjecture. xperiments, the yd id not get the twoa gument is that, in their e same results that we did: eo fthe design diver sity appr oac hi sthat the independence of the de An important conjectur ocess velopment pr will minimize the pr ed esign faults will cause similar err orst oo ccur .A lthough early obability that softwar experiments caused concern that this assumption may not hold [KNI86], additional empirical work has shown that this assumption can hold if a ‘best pr velopment par adigm is used [BIS85, A VI87]. [KEL89] actice’ de esults [of the UCLA/H e These r ed iffer ent fr om pr eviously published r esults by Knight and xperiment] ar Leveson. [AVI87, A VI88] Their second ar gument is that the claimed dif ferences ference in results is accounted for by the significant dif xperiments and ours and the inadequacies of our softw are de between their e velopment method: eco veson and Scott, et al. studies] fail to r ous pr ocess of softwar e [The Knight/Le gnize that NVP is a rigor paper development. The on ot document the rules of isolation, and the C[ommunication]&D[ocumentation] sd protocol ... that ar ei ndicator so fN VP quality .T he V -specs of [KNI86] do not show the essential NVS [N- Ve rsion Softwar e] attrib utes. It must be concluded that the author sa re a ssessing their own ad hoc pr ocesses for writing multiple pr ather than the NVP pr ocess as de veloped at UCLA, and that their numerical ograms, r results uniquely tak et he measur eo fthe quality of their casual pr ogramming pr ocess and their classr oom ot supported by the documentation of programmer claims that the NVP pr ocess was in vestigated ar en s. The the softwar ed evelopment pr ocess. [AVI89]
3 In summary ,w ave r eviewed the V/UCI e xperiment and found r easons that may account for this outcome: eh sity potential of the e xperiment, lac fM VS [Multi-V ersion Softwar e] the small scale and limited diver ko evelopment disciplines, and appar ocessing of MVS systems. ed softwar ently inadequate testing and pr [AVI88] We e xamine each of these in turn. Both of these ar guments are unfounded. 2. Differ ent Results The first claim by our critics is that their results are dif ferent from ours. Fo rthe benefit of the reader ,w er epeat the wi no conclusion we dre ur 1986 paper [KNI86]: ‘‘Fo as programmed for this e xperiment, we conclude that the assumption of rthe particular problem that w N does not hold . independence of errors that is fundamental to some analyses of -version programming Using a probabilistic model based on independence, our results indicate that the model has to be rejected at the 99% confidence le vel. This w as our only conclusion. ws data from other rele vant studies that ha Ta ble 1 sho een conducted. Chen generated 16 programs, chose 4 of ve b these to consider further ,a nd added 3 programs written by ‘ ‘the authors’ ’for a total of 7 programs written in PL/I [CHE78]. K elly used three dif ferent specifications, written in OBJ, PDL, and English, to generate 18 programs which he e xecuted on 100 input cases [KEL83, A VI84]. K elly w as also in volved in a team ef fort headed by N ASA involving 4 uni versities (UCSB, V irginia, Illinois, and NCSU) in which 20 programs were generated from a single We r efer to this e ASA e xperiment. A vizienis, L yu, and Schutz specification [ECK89]. xperiment here as the N We r efer to this e xperiment as the generated 6 programs written in 6 languages [A VI87] from a single specification. UCLA/H e xperiment. Knight/Le Ke lly N ASA UCLA/H veson Chen Number of v 71 82 06 ersions. 27 1.6 not reported not reported 1.8 Av e rage no. of reported not faults per post- de- velopment v ersion. 100 921,000 No. of simulated 1,000,000 32 1000 use input cases. reported .27 .0007 not reported Av e rage indi vidual .006 not failure rate. .00004 .10 .20 .0002 not reported Av e rage 3-v ersion failure rate report- ed. no not tested not tested no not tested Statistically inde- pendent f ailure be- havior . Ta ble 1 - Data From Rele vant Experiments
4 Joseph mak es the follo wing claim: al e xperiments on NVP performed at UCLA [A VI87] have not disco ver ed the high r ates of failur e Sever VI84, A veson. as r [JOS88] eported in Knight/Le ailure rates to which Joseph may be referring: indi ersion f ailure rates and 3-v ersion There are tw vidual v ypes of f ot ailures by the number of input cases. As sho wn in Failure rate is calculated by di failure rates. viding the number of f ersion f ailure rates are much lo wer for our e xperiment than for an the table, both the indi fthe vidual and the 3-v yo xperiments for which the data w UCLA e as collected and published. xperiment may be unf as not generated randomly and may elly e air because his data w The comparison with the K ficult cases. Ho we have b tb est, the rates are uncomparable. The yc ertainly are een generated to check for dif ver, a not better than ours. Av izienis and L ailure rates in an yo fthe papers published by them. The yr eport only the number of yu do not report f It is not possible to h ypothesize f xecution from the number of f aults remo ved faults found. ailure rates during e Ho ver, i tm ight be noted that the a verage number of f aults per program that were detected in during testing. we as greater than ours. Wi thout collecting data on f ailure rates, it is not possible to simulated use of their programs w wa fecti onclusions about the ef dra veness of N -version programming. ny c ve b Fo ys tatements that the UCLA/H study got dif ferent results than we did. There ha re xample: een man Although early e xperiments caused concern that this assumption may not hold [KNI86], additional empirical work has shown that this assumption can hold if a ‘best pr actice’ de velopment par adigm is used [BIS85, AV I87]. [KEL89] The comparison of the r esults in the V/UCI and UCLA/H studies thus shows major disa greements [AVI88] From the published papers, we can find no e vidence that the independence assumption w as tested in the UCLA/H . study xplain some of the confusion about the UCLA/H study results: The follo wing statement may e elates to the ef ors( due to ajor observation that r fectiveness of MVS is that similar and time-coincident err Am er are. O nly one identical pair e xisted in the 82 faults r identical faults in two ver ved fr om the sions) wer emo sions befor ea cceptance [in the UCLA/H study]. During post-acceptance testing and inspection, five six ver eu faults wer ver ed by testing .O ne pair a gain was identical. Six mor ef aults wer ed isco ver ed by code nco inspection, all unr fer ent ... These r esults ar elated and dif iffer ent fr om pr eviously published r esults by ed Knight and Le veson. [AVI88] The rele vant f aults are identical, b ut whether f ailures are coincident. When attempting to actor is not whether the f vide f oting, it is the intermediate results or outputs that are compared, not the f aults — pro ault tolerance through v aults in a program does not pro vide information about the f looking at the f vior of the e xecuting program. ailure beha The UCLA/H results cannot be compared with ours because the rele vant analysis is not reported; independence is a statistical property and must be tested statistically . In f act, f aults do not need to be identical to produce statistically-dependent coincident f ailures. Man yo fthe f aults in our programs that produced such f ailures were seemingly unrelated on the surf ace, sometimes occurring in totally This puzzled us until we realized that the relationship is in the functions computed unrelated parts of the programs. by the paths tak en for that input rather than in an articular ‘ ‘faulty’ ’s tatements on that path. In a paper to be yp published in the February 1990 issue of IEEE T ransactions on Softw are Engineering, we describe the f aults in our programs and pro vide a model that e xplains wh yp rograms f ail coincidentally due to f aults that are not identical. Av izienis refers to our study and another one that got similar results [SCO87] and states:
5 These ef forts serve to illustr ematur ep reoccupation with numerical r esults. [AVI89] ate the pitfalls of pr thout looking at numerical results, one cannot do the necessary statistical analysis to determine whether the Wi xpected from using N -version independence h ypothesis holds or determine what type of benefits can be e vizienis belie hat it is too early to look at numerical results b ut is surprising that Professor A programming. It vest from his statements does not seem to belie ve t hat it is too early to use this technique in safety-critical applications VI88, A VI89, etc.]. such as commercial aircraft [A VI87, A izienis mak es the follo ork: Av wing truly outrageous statement when describing our w ’ism isleading ‘experiment’ epeatability of the e xperimental pr ocedur e The use of the term ‘ ,since it implies r en for gr anted in science . that is tak [AVI89] Our e We p ublished precisely what we did and the requirements xperimental procedure is completely repeatable. An yr ve r epeated our e xperiment. In fact, the result has been confirmed. esearcher could ha specification we used. which were de veloped using a method closely related to that used in the UCLA/H ASA programs, Using the N ,ag roup (including Dr .K study .D avid Eckhardt of N ASA ’s L angle yR esearch Center collected the elly) led by Dr same type of data and came to the same conclusion that we did. [ECK89]. In summary ,the claims that our critics did not get our results are unsupported and appear to be based more on wishful thinking than scientific analysis. 3. Methodological Comparisons The second set of criticisms of our e xperiment ha ve t od ow ith the softw are de velopment method that we used. Fo r example, Joseph states: It would be a mistak oa ccept the Knight and Le veson work at face value without considering its many et is pr oposed that the study did not use NVP due to inadequacies in pr velopment weaknesses. It oper system de [JOS88] methods. and A vizienis states: ocess was in vestigated ar en ot supported by the documentation of the softwar e The claims that the NVP pr ocess in [Knight/Le veson]. development pr [AVI89] We u elly [KEL83, A VI84] and follo wed e xactly what w as stated sed the methodology used by Chen [CHE78] and K by A VI77, A VI85b]. According to these definitions, N -version programming means writing N versions vizienis in [A independently .T his is what we did. To c laim that we did not use N -version programming is ridiculous. The only significant dif ference between the v velopment methodology we used and that used in the ersion de xperiment (which occurred 3 years after ours) is that our methodology is more lik UCLA/H e ely to result in design .T he current ‘ ‘paradigm’ ’p romoted by A diversity yu [A VI88] in volves o verspecifying the design and vizienis and L thus limiting potential di versity . The specific criticisms in the area of methodology that ha ve b een le veled at us ha ve t od ow ith quality of v ersions, testing, v versity and scale, specification, communication and isolation of programmers, and oting procedure, di programming ef fort in volved. W ee xamine each in turn. 3.1. Quality of V ersions Av izienis and Joseph state:
6 [The Knight and Le veson] numerical r et he measur eo ft he quality of their casual esults uniquely tak ocess. [AVI89] programming pr [KNI86], no softwar sions. In ds or methods wer e ed NVP does not mean low quality ver evelopment standar s. This ement for all de equir ogrammer velopment and r aises doubts about the requir is an essential r ed of the pr sions. [JOS88] ated ver quality of the g ener irst of all, we did not say that no softw There is a misunderstanding here about what we said in our paper .F are We s as imposed on all the programmers. All development methods were used. aid that no one particular method w graduates taking a are engineering or related fields or were under of our participants were graduate students in softw vel, adv anced softw are engineering course. senior -le wq uality despite the e vidence to the contrary re particularly concerned about the claim that the programs are lo We a Six of the twenty-se venp rograms did not f ail on an in our papers. fthe million input cases. The a verage f ailure yo probability w orst w as 0.009. This should be compared to the published UCLA e xperiments as 0.0007 and the w VI84], which all had much poorer quality v ersions than we did in our e [CHE78], [KEL83], and [A xperiment. No een published for the UCLA/H e ersion ut the reported number of f aults per v ve b failure probabilities ha xperiment b is comparable with ours. vedi ni ndustrial settings and is better than that of the studies by our Our reliability is of the same order as that achie ec an see no possible basis for an ar gument that our v ersions are lo critics. W uality . wq 3.2. T esting izienis, L Av yu, and Joseph state: ,w eh ave r eviewed the V/UCI e In summary easons that may account for this outcome: ... xperiment and found r appar ently inadequate testing and pr ocessing of MVS ver sions. [AVI88] Acceptance tests for eac hv ersion wer et oo small (i.e ., only 200 test cases). Also, oper ational testing used randomly g ener Fo rc ritical and life-critical computer systems this is completely unacceptable . ated test cases. [JOS88] This has been one of the frequent criticisms about our e ve r efused to belie ve o ur statements critics ha xperiment. The ya re misinformed. The programs were tested by their authors vate, oral and written) that the (both public and pri yw ere submitted to the acceptance procedure. The acceptance procedure w as an e xperimental artif act, not before the ys are de part of an velopment process. The purpose w as merely to ensure that the v ersions were all suitable for oftw xperiment before the programmers became una ant to put too man ep urposely did not w the e yi nput cases vailable. W into the acceptance procedure in order not to bias the e aults outside the xperiment by finding and eliminating f experimental domain. Dif ferent inputs were used in the acceptance procedure for each v ersion to a void filtering common f aults. As it turned out, v ery fe wo fthe programs f ailed the acceptance test (and if the yd id, it w as usually only one f ault in volved). ve,the programs were all of v yw ere ob viously tested by the de velopers. What ery high quality so the As we said abo aid in our papers that it w as not testing. W as simulated use of the programs. es ’w Joseph calls ‘ ‘operational testing’ ere trying to simulate the lifetime production use of the softw We w is wh yw eu sed inputs that were are. That randomly generated. Note, ho we ver, t hat the yw ere randomly generated according to what the people at Boeing felt wa sar AG 82]. ealistic operational profile for this application [N We a wed were not suf ficiently complete for life-critical softw are. W e gree, of course, that the procedures we follo neverc laimed that the yw ere. If in their statements A vizienis, L yu, and Joseph are implying that the performance of N -version programming w ould be better using a dif ferent de velopment methodology ,then this needs to be sho wn using controlled e xperiments to demonstrate that there is a statistically significant dif ference in the number of
7 correlated f ailures under dif velopment procedures for the same application. This cannot be assumed ferent de xperimental data of which there is currently none. without e ’’ a st heys ay.O ur higher quality programs seem to imply that our testing But the proof is really ‘ ‘in the pudding, sa wa fthe e xperiments of our critics. dequate as that in an yo sa xplains our results: as inadequate and that this e Ke lly also claims that our testing w om empirical studies addr essing the similar err or pr Ke y ersions whic hh ave results fr oblem include: ... (c) V gone systematic testing contain r elated faults [KNI86, SCO87]. [KEL89] not under The implication in this statement is that v ersions that ha ve u nder gone systematic testing will not contain related we ver, w en ote that all e xperiments on N -version programming that we kno wa bout ha ve f faults. Ho ound related faults, no matter what type of testing the av e under gone. Shimeall and Le veson [SHI88], in a study that compared yh ault tolerance methods using eight softw are v xtensi ve t esting w as not fault elimination and f ersions, found that e aults that resulted in correlated f ailures. This mak es some sense intuiti vely since there is reason to lik ely to find the f ve t hat if these common errors are lik ely to be made by the programmers, the ya belie ely to be made by re also lik those constructing test data. wt hat testing of an yk ind has an ef fect on related There is no scientific data to sho en faults. W aults found after testing were ote that in the UCLA/H study a much higher percentage of the f ‘‘identical’ ’than those found during testing. 3.3. V oting Pr ocedur e In discussing our definition of f ailure, Joseph states: The definition of complete NVP failur ei ect. In [KNI86] an NVP system fails if a majority of ver sions sincorr .A dless of whether the err roduced wer es imilar orsp nN VP system will pr oduce the fail at the same time ,regar ong r esult only if similar, coincident err orsa re g ener ated. [JOS88] wr The author’ ss ailed; a tatement implies that a program that does not produce an output when required has not f ails when it produces incorrect output. een. F fers from e very definition of f ailure we ha ve s program only f or This dif ailure as the inability of a system or system component to example, ANSI/IEEE Standard 729-1983 defines a f perform a required function within specified limits. If we are prepared to accept that producing no result is satisf actory performance, then an yp rogram can be made arbitrarily reliable by making it abort on e very e xecution. The purpose of f ault masking in N -version systems is to continue to pro vide service in the presence of f will not be possible if a majority of v ersions f ail no matter aults. This heyf howt ail. as stated that we took an e xtreme position by using v ector v oting. W eh av e In [KEL86] it w revoted the programs using element-by-element v oting, as w as suggested, and the results are identical to the ones we published originally [MAR87]. Av izienis and L yu state (the emphasis is theirs): In the testing and pr ocessing of the MVS systems, the failur ed etection and gr anularity for the V/UCI experiment was coar se,s ince it used one Boolean variable to r epr esent 241 Boolean conditions for a ‘ ‘missile launc hing decision. ’’ T he UCLA/H e xperiment employed r eal number comparisons for ine xact matc hing with specified toler ances. [AVI88] We u sed all 241 boolean conditions in the v oting as is clearly stated in our paper [KNI86, page 99].
8 3.4. Di versity and Scale izienis states: Av sity) [in Knight and Le veson’ se xperiment] is smaller The scale of the pr oblem (and the potential for diver [than the UCLA/H study]. [AVI88] and: ather small scale geslong and an see that the V/UCI e ,since the specification was 6 pa We c xperiment was of r xperiment with 64 pa could be pr fspecification and at ogrammed in 327 lines. The scale of the UCLA/H e geso [AVI88] ger. least 1250 lines of code was significantly lar verage of 554 lines (the yd Our P ascal programs ranged in size from 310 to 781 lines of code with an a id not include /O statements). ascal program w as 1288 lines of code with 491 e xecutable statements anyI The UCLA/H P are engineering, a medium size program is 50,000-100,000 lines of code and a In softw (including I/O statements). By these realistic standards, there w as no dif lar ge program is 500,000 to 1,000,000 lines of code. ference in the oa pplications — both were small-scale. scale of the tw gument about the size of the specification is more interesting, ho we ver, b The ar ecause it addresses the more important claim that our application had limited potential for di as short. Note versity because our specification w vizienis ackno wledges in the follo verspecification when attempting to get that A wing the potential problem of o diverse programs from a single specification: sions, to be called V epr esents the starting point of the NVP The specification of the member ver -spec, r h, the V process. As equir ements completely and unambiguously ,w hile suc -spec needs to state the functional r hoice of implementations to the N pr ogramming ef forts... Suc hs leaving the widest possible c pecific sug ‘how’ ’r educe the c hances for diver sity among the ver sions and should be systematically gestions of ‘ om the V [AVI89] eliminated fr -spec. wt erm, V The introduction of a ne Usual softw are engineering terminology includes -spec, clouds the issues. requirements specifications, high-le veld esign specifications, detailed design specifications, etc; each contain dif ferent amounts of information about ‘ ‘ho w.’’ T he specification we g av e our programmers w as short because we purposely eliminated all information about algorithms and implementation in order to pro vide the maximum versity ,i.e., it w Since there is no precise definition of opportunity for di as a pure requirements specification. versity is present or not in an ote, et of program v ersions. W en ,itisimpossible to determine whether di diversity ys ver, t ersions were v ery dif ferent: the yd iffered in program structure, howe hat informally and in our opinion, our v ariables and data structures, length, and layout. algorithms, v xperiments ha The reason that the specifications for the UCLA e een so long is that the yc ontained design ve b or e elly programs, which a veraged 300 lines of PL/I code, information. F xample, one of the specifications for the K sa7 3-page specification written in PDL (Program Design Language). In the UCLA/H study wa ,the specification wa s6 4p ages long and included detailed information about the algorithms, v ariables, and data structures to be used. Here are some quotes from their description of the results of the UCLA/H study (emphasis is ours): Primitive oper ations ar ei nte grator s, linear filter s, ma gnitude limiter s, and r ate limiter s. The algorithms for xactly specified, howe ver ,d iffer ent c hoices of whic hp rimitive oper ations to implement these oper ations wer ee gnitude of the output ograms have been made ainly whether the inte grator sinclude limits on the ma ,m as subpr value (as is r equir ed in most cases), or not. [AVI87] Tw of actor st hat limit actual diver sity have been observed in the cour se of this assessment... algorithms specified by figur eg ener ally implemented by following the corr esponding figur efromt op to bottom... In es wer retrospect, a second r eason for this lac ko fd iver sity is that we have concluded that the lo gic part of the Lo gic Mode was o ver specified. [AVI87]
9 The H[one ywell]/S[perry] concept of ‘ ’i st he second factor that tends to limit diver sity .T heir ‘test points’ en ot only the final r ut also some purpose is to output and compar esult of the major subfunctions, b ver intermediate r ogrammer so nt heir c hoices of whic hp rimitive oper ations estricted the pr ,that r esults. Howe ge s tatement. In ef fect, the intermediate values to be ogramming langua ficiently!) into one pr to combine (ef hosen for them. [AVI87] computed wer ec ersions [A In their post-assessment of the di versity of the v VI87] obtained from this detailed design specification, the the use of parameter passing vs. global ferences seemed to be syntactic rather than semantic, e.g., major dif ferences in the calling structure of the procedures, the use of subprograms vs. functions. None of these variables, dif viding f ault tolerance of design errors. seem to us to be v ery significant in terms of pro ote on intermediate results versity stems from the use of cross-checks points to v Another source of their lack of di and what the yc all Community Error Reco very [TSO87]: oss-c hec kp oints] have to be e [The cr edetermined or der ,but a gain gr eat car ew as tak en xecuted in a certain pr not to o estrict the possible c hoices of computation sequence . [AVI87] verly r eco eco ver a failed ver sion by supplying it with a set of ne wi nternal state variables One r very point is used to r eo btained fr om the other ver sions by the Community Err or Reco very T echnique . [AVI87] that ar fect this type of reco very,the internal states of the v ersions must be identical. In f act, the v ersions had In order to ef isagreement caused by: ad oduction of ne w, u nspecified state variables whic hw ec the intr ‘under ground variables’ ’, since the ya re all ‘ neither c hec kedn or corr ected in any cr oss-c hec ko rr eco very point... A ne wd esign rule for multi-ver sion softwar em ‘Do not intr oduce any ‘under ground’ variables. ’’ [AVI87] ust be stated as ‘ wi dentical f volved checking the results of these It is not surprising that fe aults were found by a test procedure that in ainst each other since the designs were, for all practical purposes, identical. programs ag On a related issue, A vizienis states in a recent paper: oftwar et end to contain guidance not only ‘ The specification for simple ’n eeds to be done ,b ut also xs ‘what’ ’the solution ought to be appr oac hed. [AVI89] ‘‘ how’ x( Specifications for simple ersion) softw are are no dif ferent in this respect than for N -version softw are. In single-v are de fact, the same specifications can be used for both types of softw velopment, and requirements specifications for xs oftw are can just as easily contain only ‘ simple ’a sr equirements specifications for multi-v ersion softw are. ‘what’ To t he contrary ,w eh av e found that in practice (and in the UCLA e xperiments) specifications for N -version systems almost al wa ‘ho w’’(i.e., detailed design) in order to pro vide the kinds of v oting and ys require specification of the ‘ N ficulties with respect to the v Av izienis, describes these dif comparison required for ersions -version programming. xperiment: in his UCLA/H e fer ent algorithms wer en ot suitable for the scope of FCCs [flight contr ol computer s] due It was decided that dif to potential timing pr oblems and dif ficulties in pr oving their corr ectness (guar anteed matc hing among them). [AVI87] ferences between the supposedly ‘ ‘di verse’ ’m odules are often minor . In real-life systems that use this technique, dif In f act, our e xperiment w as unique and unrealistic in that we allo wed more di versity than is usual for these systems in practice (or in the other studies that ha ve b een done), and thus our programs are more lik ely to pro vide design fault tolerance. This is rather frightening since this real-life softw are often is depended on for safety-critical acti vities.
10 3.5. Specification Av izienis states: -specs [of Knight and Le veson] do not show the essential NVS attrib [AVI89] The V utes. utes are missing is gi No e ould be. ven, and we are at a loss to figure out what the xplanation of what attrib yc ference between our izienis’ only specific criticism of our specification seems to be one of length. Av The primary dif specification and those of his latest UCLA/H study is that our specifications are closer to what he describes as -version programming, i.e., the ys pecify only ‘ ‘what’ ’w ithout ‘ ‘ho necessary for N w.’’ and Isolation 3.6. Communication Av izienis states: ogrammer equir ed communication pr otocol for the pr as no r si nt he V/UCI e xperiment, while the ew Ther otocol for the UCLA/H e xperiment was well-defined, and rigor ously enfor ced. communication pr [AVI88] The paper describing Knight and Le veson’ se xperiment] do not document the rules of isolation, and the s[ otocol that ar eindicator fN VP quality . [AVI89] C&D pr so oe Our protocol is documented in our paper The communication protocols for the tw xperiments are identical. [KNI86, page 98]. Isolation, which is also documented in the same paper [KNI86], w as enforced in the same w ay as xperiment with the added f actor that some of the programmers were separated by 3000 miles and in the UCLA e were unkno wn to each other ll correlated f ailures in volved v ersions from both schools. .A ogramming T 3.7. Pr ime izienis states: Av xperiment was a class pr The V/UCI e hs tudent pr oduced the pr ogram oject during one Quarter term, and eac alone .O nt he contr ary ,p ro g rammer sint he UCLA/H e xperiment work ed in two-member teams and wer ep aid af ull-time r ch a ssistant (RA) salary during a class-fr ee 12 week period during the summer . [AVI88] esear xperiments, The real problem is that students were used as opposed to professional programmers in all of these e izienis ar gues that his students are someho wl ess lik Av es imilar errors because the yw ere including ours. ely to mak yw ork ed during the summer and not during the academic year ,a nd because the yw paid, because the ed tw o ork weeks longer ec annot see ho wa ny o fthese things could mak ead .W ifference. There is lik ely to be some dif ference in results if professional programmers are in volved. But it is not intuiti vely obvious that fe wer coincident f ailures w ould be the result. It seems reasonable to h ypothesize that student programs wo uld be prone to more randomness in errors and designs than those written by professionals. It is interesting to we hat one UCI participant in our study ,w ho had more than ten years of professional scientific note, ho ver, t xperience, had f aults that correlated in f programming e As tudent who had no ailure characteristics with a UV professional programming e xperience. 4. Conclusions Joseph states: Thus, [the Knight and Le veson] r esults ar em isleading and should not be used by themselves as a basis for a decision about the ef .R eal world e xperience ,n ot a class r oom assignment as in [KNI86], is fectiveness of NVP needed. Curr us 320 ,s everal systems in Eur ope ar eu sing NVP (e .g., the Eur opean designed Airb ently
11 [ROU86] [A VW87]). [JOS88] Our conclusion is simple and clearly stated. We r ain part of the conclusion Our results are not misleading. epeat ag section from our paper [KNI86] (italics are from the original paper): rthe particular problem that w as programmed for this e ‘‘Fo xperiment, we conclude that the assumption of does not hold -version programming . independence of errors that is fundamental to some analyses of N Using a probabilistic model based on independence, our results indicate that the model has to be rejected vel. at the 99% confidence le on the application that First, it is conditional ‘‘It is important to understand the meaning of this statement. he result may or may not e xtend to other programs, we do not kno w. O ther e xperiments must we used .T w be carried out to g al conclusions. Ho we ver, t he ather data similar to ours in order to be able to dra gener N - result does suggest that the use of -version programming in crucial systems should be deferred until fur ther e vidence is a vailable if the reliability analysis of the system has been performed assuming indepen- dence of f ailures. ’’ ev e rs We n fecti veness of N - uggested that our result should be used by itself as a basis for a decision about the ef version programming. erely suggested that caution w ould be appropriate. We f eel strongly that careful, We m xperimentation in a realistic en vironment is required. we ver, a dvocating careful laboratory controlled e Ho ocating the accumulation of e xperience by using the technique in real, safety- experimentation is dif ferent from adv critical applications where loss of life is possible, such as commercial aircraft [JOS88]. xperiment by John Knight and Nanc yL ev e son does not mak e N -version programming a reasonable Attacking one e ay yt nsure the safety of softw are. All of the uni versity e xperiments, including ours, ha ve b een limited in one w oe wa or another .N one (e xcept Shimeall and Le veson [SHI88]) has attempted to compare N -version programming with the alternati vest ofi nd out whether the mone ya nd resources could ha ve b een more ef fecti vely spent on other techniques such as sophisticated testing or formal v erification. id not say that N ould not be our first choice of We d -version programming should not be used although it w ej vide ultra- ve t hat it should be relied upon to pro .W techniques for increasing reliability and safety ust do not belie he F AA requires that f ailures of critical systems in commercial air transports be ‘ ‘extremely high reliability .T ’. The phrase ‘ ‘extremely improbable’ ’isd efined by the F AA as ‘ ‘not e improbable’ xpected to occur within the total life span of the whole fleet of the model [W 78]. ’’ I np ractice, where such reliability can be analyzed, the phrase is AT 9 − en to mean no more than 10 failures per hour of operation or per e vent for acti tak No one vities such as landing. N has demonstrated that vement of this le velo fr eliability .I nf act, we -version programming can guarantee achie kno wo nly of countere xamples. Our conclusion is modest and follo ws from the e Our research is not ‘ ‘fla wed’ ’a si th as been xperimental data. Our critics should refrain from attrib uting conclusions to described in public by our critics at professional meetings. us that we ha ve n ot dra wn, from making statements about the quality or di versity of our programs without the data to substantiate these judgements, and from making unsupported comparisons of our results with theirs. Until N -version programming has been sho wn to achie ve u ltra-high reliability and/or has been sho wn to achie ve higher reliability than alternati ays of b uilding softw are, the claims that it does so should be considered unpro ven ve w hypotheses. Until these h ypotheses are sho wn to hold for controlled e xperiments, depending on N -version programming in real systems to achie ve u ltra-high reliability where people’ sl iv e sa re at risk seems to us to raise important ethical and moral questions. Attacking us or our papers will not change this. REFERENCES
12 [AVI77] A. Av N -Version Programming for Softw are F ault- izienis and L. Chen, “On the Implementation of lerance During Program Ex ecution”, ,N ove mber 1977, pp. 149-155. To Proc. of Compsac ’77 izienis and J.P [AVI84] A. ,“ Fault T olerance by Design Di versity: Concepts and Experiments”, IEEE .J. K Av elly ol. 17, No. 8, August 1984, pp. 67-80. ,V Computer uted T ,“ The UCLA Dedix System: A Distrib [AVI85a] A. estbed for Multiple-V ersion Softw are”, izienis, et al. Av oler ant Computing ,M ichig 15th Int. Symposium on F ault-T an, June 1985, pp. 126-134. Av N -Version Approach to F ault-T olerant Softw are”, IEEE T rans. on Softwar e [AVI85b] A. izienis, “The ,V ol. SE-11, No. 12, December 1985, pp. 1491-1501 Engineering izienis, M.R. L yu, and W .S [AVI87] A. fecti ve D iv e rsity: A Six-Language Study of Av chutz, “In Search of Ef olerant Control Softw are”, T ech. Report CSD-870060, UCLA, No vember 1987. Fault-T Av yu, “On the Ef [AVI88] A. fecti veness of Multi version Softw are in Digital A vionics”, izienis and M.R. L vionics Systems Confer ence an Jose, October 1988, pp. 422-427. AIAA/IEEE 8th Digital A ,S ess ’89 izienis, “Softw ault T Av IFIP XI W orld Computer Congr are F ,S an Francisco, August [AVI89] A. olerance”, 1989. Airb us 320, the Ne wG eneration Aircraft”, Aviation W eek & Space T echnolo gy ,F ebruary 2, 1987, pp. [AVW87] “ 45-66. [BIS85] P et al. ,“ Project on Di verse Softw are — An Experiment in Softw are Reliability”, .G. Bishop, AC W ,C omo, Italy 1985. Proceedings IF orkshop Safecomp ’85 Chen and A A. A vizienis, “ N -Version Programming: AF ault-T olerance Approach to Reliability of [CHE78] L. oler are Operation”, est FTCS-8: Eighth International Symposium on F ault-T Dig ant Computing , Softw To louse, France, June 1978, pp 3-9. [ECK85] D.E. Eckhardt and L.D. Lee, “ AT heoretical Basis for the Analysis of Multi version Softw are Subject to Coincident Errors”, IEEE T eE ngineering ,V ol. SE-11, No. 12, December 1985, pp. rans. on Softwar 1511-1516. [ECK89] D.E. ,“An Experimental Ev aluation of Softw are Redundanc ya saS trate gy for Impro ving Eckhardt, et al. echnical Report, N ASA/Langle esearch Center ,submitted for publication. Reliability”, T yR Joseph, “ ault-T olerant, Secure Computing Systems”, Ph.D. Dissertation, [JOS88] M.K. Architectural Issues in F Dept. of Computer Science, UCLA, 1988. .J. K elly ,and A. A vizienis, “ AS pecification-Oriented Multi-V ersion Softw [KEL83] J.P Proc. 13th are Experiment”, International Symposium on F oler ant Computing ,M ilan, Italy ,June 1983, pp. 120-126. ault-T .J. K velopment”, , et al. ,“Multi-V ersion Softw are De [KEL86] J.P Proc. Safecomp ’86 ,S arlat, France, October elly 1986, pp. 43-49 [KEL89] J.P .J. K elly ,“ Current Experiences with F ault T olerant Softw are Design: Dependability Through Di verse Fo rmal Specifications”, Confer ence on F ault-T oler ant Computing Systems ,G erman y, S eptember 1989, pp. 134-149. [KNI85] J.C. veson, “ AL arge Scale Experiment In N -Version Programming”, Digest of P apers Knight and N.G. Le Fifteenth International Symposium on F oler ant Computing ,June 1985, Ann Arbor ,M I. FTCS-15: ault-T pp. 135-139. Knight and N.G. Le An Experimental Ev [KNI86] J.C. aluation of the Assumption of Independence in veson, “ ersion Programming”, IEEE T Multi-v eE ngineering ,V ol. SE-12, No. 1 (January ransactions on Softwar 1986), pp. 96-109. [MAR87] A.J. Mar gosis, “Empirical Studies of Multi-V ersion System Performance”, Master’ sT hesis, Uni versity of Virginia, January 1988. [MAR83] D.J. are in high inte grity applications in flight controls”, Softwar ef Martin, “Dissimilar softw vionics , or A AG ARD Conference Proceedings, No. 330, pp. 36-1 to 36-9, January 1983. [N AG 82] P .M. Nagel and J.A. Skri van, “Softw are Reliability: Repetiti ve R un Experimentation and Modelling”, Technical Report N ASA CR-165836, N ASA/Langle yR esearch Center ,F ebruary 1982.
13 [ROU86] J.C. Rouquet and P raverse, “Safe and Reliable Computing on Board the Airb us and A TR Aircraft”, .J. T ,S arlat, France, October 1986, pp. 93-97. Proc. SAFECOMP ’86 Fault-T .G .M cAllister [SCO87] R.K. ault, and D.F olerant Softw are Reliability Modeling”, IEEE Scott, J.W ,“ nsactions on Softwar eE ngineering ,V ol. SE-13, No. 5, May 1987, pp. 582-592. Tr a .J. Shimeall and N.G. Le veson, “ An Empirical Comparison of Softw are F ault T olerance and F ault [SHI88] T anf Proc. 2nd W eT Elimination”, ,V erification, and Analysis ,B orkshop on Softwar f, July 1988. (A esting more complete description is a vailable as T ech. Report NPS52-89-047, Na valP ostgraduate School, July 1989. [TSO87] K.S. Tso and A. A very in N-V ersion Softw are: A Design Study with vizienis, “Community Error Reco Dig est 17th Int. Symposium on F Experimentation”, oler ant Computing ,P ittsb urgh, July 1987, pp. ault-T 127-133. [W AT 78] H.E. Wa terman, “F AA ’s C ertification Position on Adv anced A vionics”, AIAA Astr onautics and Aer ,M ay 1978, pp. 49-51. onautics [Y OU85] L.J. Yo unt, et al. ,“ Fault Ef fect Protection and P artitioning for Fly-by-W ire/Fly-by-Light A vionics Systems”, AIAA Computer in Aer ospace V Confer ence ,L ong Beach, August 1985.
nt Accountability Office United States Governme GA O Februar 2009 y FEDERAL INFORMATION SYSTEM CONTROLS AUDIT MANUAL (FISCAM) GAO-09-232GMore info »
TIONAL ACADEMIES PRESS THE NA This PDF is available at http://nap.edu/24938 SHARE Thriving on Our Changing Planet: A Decadal Strategy for Earth Observation from Space DET AILS 700 pages | 8.5 ...More info »
The Health Consequences of Smoking—50 Years of Progress A Report of the Surgeon General U.S. Department of Health and Human ServicesMore info »
MANAGING THE RISKS OF EXTREME EVENTS AND DISASTERS TO ADVANCE CLIMATE CHANGE ADAPTATION SPECIAL REPORT OF THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGEMore info »
Report of the Commission to Assess the Threat to the United States from Electromagnetic Pulse (EMP) Attack Critical National Infrastructures ___________________________________________________________...More info »
California English Language Development Standards (Electronic Edition) Kindergarten Through Grade 12 Adopted by the California State Board of Education November 2012More info »