1 Local Reasoning about Programs Alter that Structures Data 1 2 3 Peter , and Hongseok Yang , John Reynolds O'Hearn 1 Mary , Univ Queen ersity of London 2 Univ Mellon ersity Carnegie 3 ersity of Illinois at Urbana-Champaign ersity of Birmingham Univ and Univ e an extension of Hoare's logic for reasoning about Abstract. We describ that alter structures. We consider a low-lev el storage programs data with allocation lookup, update, on a heap and model based associated cation and unrestricted address arithmetic. The asser- deallo operations, language tion on a possible worlds model of the logic of bunc hed is based implications, includes spatial conjunction and implication connec- and those of classical Heap operations are axiomatized tives alongside logic. what axioms", each of which mentions only those using we call the \small accessed by a particular command. Through these and a number of cells we show that the formalism supp orts local reasoning: A speci- examples and proof can concen that on only those cells in memory Øcation a trate program accesses. This paper builds on earlier work by Burstall, Reynolds, Ishtiaq and O'Hearn on reasoning about data structures. 1 Introduction Pointers have been a persisten area in program proving. The main di±- t trouble an in-principle adequate of pointer op- culty is not one of Ønding axiomatization there about the way that h between simple intuitions rather erations; is a mismatc y of their axiomatic treatmen ts. For pointer operations work and the complexit pointer assignmen t is operationally simple, but when there is aliasing, example, from several pointers to a given cell, then an alteration to that cell may arising the values unrelated expressions. (See [20, 2, 4, 6] aÆect of many syntactically and references to the literature on reasoning for discussion about pointers.) We suggest that the source of this mismatc h is the global view of state taken in most formalisms for reasoning about pointers. In contrast, programmers reason informally structure algorithms typically work by in a local way. Data such as local surgeries rearrange small applying of a data structure, that parts rotating a small part of a tree or inserting a node into a list. Informal reasoning usually concen trates on the eÆects of these surgeries, without picturing the entire memory of a system. this local reasoning viewp oint as follows. We summarize To understand works, it should be possible for reasoning how a program and speciØcation to be conØned to the cells that the program actually ac- cesses. The value of any other cell will automatically remain unchanged.

2 Local reasoning is intimately to the complexit y of speciØcations. Often, a tied works with ed collection of resources, and it stands to program a circumscrib a speciØcation reason trate on just those resources that a should that concen a program inserts an elemen t into a linked For example, that program accesses. in that list; there is no need list need ely) to know only about the cells (intuitiv in memory when reasoning about the program. cells keep track of all other of the approac h studied The con- central idea in this paper is of a \spatial P Q , that asserts that P and Q hold for separate parts of a data junction" § conjunction provides a way to comp ose assertions that refer to structure. The t areas , while retaining disjoin tness information for each of diÆeren of memory The locality that this provides can be seen both on the level of the conjuncts. heap assignmen ts and the level of comp ound operations or procedures. atomic an alteration to a single cell aÆects P in P § Q , then we know that it When heap ; this gives us a way to short-circuit to check for poten- Q the need will not aÆect Q . On a larger scale, a speciØcation f tial aliases g C f Q g of a heap surgery in P using a rule that lets us infer f P § R can be extended C f Q § R g , which expresses g that heap cells remain unaltered. This enables the initial speciØcation additional footprin P C f Q f to concen trate on only the cells in the program's g t. g The basic idea of the spatial conjunction is implicit in early work of Burstall [3]. It was explicitly describ ed by Reynolds in lectures in the fall of 1999; then an intuitionistic logic on this idea was disco vered indep enden tly by Reynolds based and O'Hearn [7] (who introduced a spatial implication [20] and by Ishtiaq also Q on the logic BI of bunc hed implications [11, 17]). In addition, °§ , based P a classical version of the logic that Ishtiaq and O'Hearn expressiv e devised is more the intuitionistic version. In particular, it can express storage deallo cation. than tly, Reynolds extended the classical version by adding Subsequen pointer arithmetic. extension results in a model that is simpler and more gen- This our previous and opens up the possibilit y of verifying a wider eral than models, of low-lev el programs, including many whose prop erties are di±cult to range out the theme using Mean while, O'Hearn ∞eshed type systems. of local capture reasoning sketched in [7], and he and Yang developed a streamlined presen tation of the logic based on what we call the \small axioms". In this joint paper we presen model and assertion t the pointer arithmetic with the streamlined logic. We illustrate the formalism using language, Hoare that a space-sa ving represen tation of doubly-link ed lists, programs work with that copies and a program a tree. Two points are worth before continuing. First, by local we do not stressing merely comp ositional reasoning: It is perfectly possible to be comp ositional mean and global (in the state) at the same time, as was the case in early denotational models of imperativ e languages. some aspects of this work bear a strong Second, y to seman [19, 15, 16, 13, 12]. In particular, of local state similarit tic models the conjunction is related to interpretations of syntactic control of interference § [18, 10, 12], and the Frame Rule describ ed in Section 3 was inspired by the idea of the expansion from [19, 15]. Nevertheless, local reasoning about of a command osing is not the same thing as reasoning about local state: We are prop state

3 here that and reasoning themselv es be kept conØned, and this is speciØcations whether an issue facilities for hiding state. or not we consider programming Language 2 The Model and Assertion ts, the store the heap. The store is a Ønite model has two comp and The onen from variables to integers. The heap partial by a function mapping is indexed of the integers, using indirect addressing and is accessed E ] Locations subset [ is an arithmetic expression. where E ¢ ¢ g ° 1 ; 0 ; 1 ; ::: g Variables = = f x; y; ::: f :::; Ints 2 μ Ints Locations \ Atoms Locations fg ; nil ; Atoms Atoms = ¢ ¢ Variables Ints Heaps = = Locations * Stores Ints * fin fin ¢ = £ Heaps States Stores to always succeed, a requiremen t on the set for allocation we place In order e integer n , there are inØnitely many sequences of Locations : For any positiv of consecutiv e integers in Locations . This requiremen t is satisØed if we n length Locations to be the non-negativ e integers. (In several example form ulae, we take rely on this choice.) Atoms we could take will implicitly to be the negativ e Then and ° 1. to be nil integers, expressions are determined by valuations Integer and boolean 2 2 s Ints [[ [[ B ]] E ]] f true; false g s 2 of s where Stores includes the domain variables of E or B . The the free grammars for expressions are as follows. E; F; G ::= j 0 j 1 j E + F j E £ F j E ° F x; y; ::: j ::= B ) B j E = F j E < F j isatom ?( E ) j isloc ?( E ) B false expressions isatom ?( E ) and isloc ?( E ) test whether E The or loca- is an atom tion. assertions all of the boolean expressions, the points-to relation The include 7! F , all of classical logic, and the spatial connectiv es emp E § and °§ . , P; Q; R ::= B j E 7! F Atomic Formulae j false j P ) Q j 8 x:P Classical Logic j emp P § Q j P °§ Q Spatial Connectiv es j P Various es are deØned as usual: : P = connectiv ) false ; true = other : ( false ); P _ Q = ( : P ) ) Q ; P ^ Q = : ( : P _: Q ); 9 x: P = :8 x: : P . We use the following in the seman tics of assertions. notations 2 , and dom h ) denotes the domain of deØnition of a heap h 1. Heaps ( dom ( s ) is 2 the domain of s Stores ;

4 0 0 2. indicates that the domains of h and h h are disjoin t; # h 0 § denotes the union of disjoin t heaps (i.e., the union of functions with 3. h h t domains); disjoin except 4. ( j ) is the partial function like f j that i goes to j . This notation f i 7! i is and is not in the domain of f . is used both when judgemen t s; h j = P which says that an assertion We deØne a satisfaction and heap. ), where assumes that Free( P ) μ dom ( s for a given store holds (This ) is the set of variables freely P in P .) Free( occurring = j iÆ [[ B ]] s = true s; h B j = E 7! F iÆ f [[ E ]] s g = dom ( h ) and h ([[ E ]] s ) = [[ F ]] s s; h j s; h never = false then = Q iÆ if s; h j = P ) s; h j = Q j P s; h 2 8 x:P s; h 8 v j Ints : [ s j x 7! v ] ; h j = P = iÆ j = emp iÆ h = [ ] is the empt y heap s; h h j P § Q iÆ 9 h s; h ; h Q : h = # h j ; h s; h § = and = h; s; h P j = 1 0 0 1 1 0 1 0 0 0 0 0 °§ Q iÆ 8 h : s; h if h j # h and s; h = j = P then s; h § h P j = Q Notice that tics of E 7! F is \exact", where it is required that E is the seman activ e address t heap. Using § we can build up descriptions the only in the curren For example, 10) describ 7! 3) § (11 7! heaps. es two adjacen t cells of larger (10 contents are 3 and 10. whose E = F is completely hand, indep enden t (like all boolean On the other heap expressions). As a consequence, a conjunction ( E = and integer ) § P is true F just when = F holds in the curren t store and when P holds for the same store E heap in the curren t one. and some contained t to have syntactic sugar for describing adjacen t cells, It will be convenien E and for an exact y. We also have sugar for when of equalit is an activ e form address. ¢ 7! F ) ; :::; F F ( = E E 7! F 7! ) §¢¢¢§ ( E + n 0 n 0 n : ¢ F = ( E = F ) ^ emp = E ¢ 2 9 = E y:E 7! y ( y 6 { Free( E )) 7! : A characteristic prop erty of = is the way it interacts with § : : E = ( F ) § P , ( E = F ) ^ P: As an example of adjacency an \oÆset list", where the next node in , consider by adding an oÆset to the position of the curren t node. a linked list is obtained Then the form ula x ( 7! a; o ) § ( x + o 7! b; ° o )

5 describ es a two-elemen oÆset list that contains a and b in its head t, circular, and Øelds For example, in a store where x = 17 and oÆsets in its link Øelds. of a heap ula is true o = 25, the form b 17 42 a 18 -25 43 25 is a model of (the Boolean version of) the logic The seman tics in this section [11, 17]. This that the model validates all the hed implications of bunc means comm utativ e monoid laws for emp and § , and the \parallel laws of classical logic, § rule" for °§ . for and \adjunction rules" R ) S ) P Q R ) Q § S P § ) R S Q °§ P ) R ) S P § R S R ) P P § Q ) S °§ facts, in the speciØc model, include Other true ° ¢ 0 0 0 § ( E = 7! F ( ) § true E ) E 6 7! E F emp , 8 ) : ( x 7! { § true ) x: See [21] for a fuller list. Core 3 The System we presen t the core system, which consists of axioms In this section for commands that alter the state as well as a number of inference rules. We will describ e the meanings for the various commands informally , as each axiom is discussed. There is one axiom atomic commands. We emphasize that for each of four side of := is an expression occurring in the forms x := [ E ] the right-hand not ¢ := x E and ; :::; E Only ); [ cons ] and cons do not appear within expressions. ( k 1 := E is a traditional assignmen x atomic command that can t, and it is the only be describ assignmen t axiom. In the axioms x; m; n are assumed ed by Hoare's variables. to be distinct Small Axioms The E 7! { g f E ] := F f E 7! F g [ f E 7! { g dispose ( E ) f emp g : x = m g x := cons ( E x ; :::; E g ) f f 7! E ] [ m=x ] ; :::; E m=x [ k 1 k 1 : : := = n g x f E f x x = ( E [ n=x ]) g ] f 7! n ^ x = m g x := [ E E f x = n ^ E [ m=x ] 7! n g

6 The Str Rules uctural Rule Frame g C f Q g f P fg ModiØes( C ) \ Free( R ) = § C f Q § R g g R P f Elimination Auxiliary Variable C f f g g P Q 2 Free( C ) 6 x g f9 f9 x:Q g x:P C Variable Substitution g ∂ ), and P; C; Q Free( ; :::; x f x 1 k f P Q f C g g 2 ModiØes( C ) implies x i =x ; :::; E =x E )[ ( f P g ] f Q g C 1 1 k k not free in any other is a variable E E i j of Consequence Rule 0 0 f P g C f Q g Q ) P ) P Q 0 0 P g C f Q g f Ørst small axiom just says that if E points to something beforehand The (so it is activ it points to F afterw ards, and it says this for a small portion of e), then in which onds is the only activ e cell. This corresp the state to the operational E the value of [ idea F as a command that stores E of F at address E in ] := the heap. The axiom also implicitly says that the command does not alter any variables; this is covered of its ModiØes set below. by our deØnition dispose ( ) instruction deallo cates the cell at address E . In the post- The E dispose emp is a form ula which says that the heap is for the axiom condition are activ e). So, the axiom states that if empt is the sole activ e y (no addresses E and it is disposed, then in the resulting state there will be no activ e address Here, addresses. points-to relation is necessary , in order to be able to the exact conclude on termination. emp cells, x cons ( E := ; :::; E k ) command allocates a contiguous segmen The t of k 1 , and places to the values E cell ; :::; E initialized of in x the address of the Ørst 1 k from the segmen t. The precondition of the axiom uses the exact equalit y, which implies that is empt y. The axiom says that if we begin with the empt y the heap and a store contiguous x = m , we will obtain k heap cells with appropriate where of The m in this axiom is used to record the value values. x before the variable command is executed. We only get Øxed-length allocation from x := cons ( E ). It is also ; :::; E k 1 alloc to form an axiom for a command x possible ulate ( E ) that allocates a := segmen t of length E ; see [21]. We have also included axioms for the other two commands, but they are small less importan t. These commands are not traditionally as problematic, because they do not involve heap alteration.

7 The small are so named because each mentions only the area of heap axioms by the corresp onding For [ E ] := F and x := [ E ] this is accessed command. for dispose cons precisely those cells allocated or one cell, in the axioms or and in E := are mentioned, no heap cells are accessed. deallo x cated referred to in the structural rules is the standard of free variable The notion C ) is the set of variables that are assigned to within C . The Mod- one. ModiØes( g set of each of cons ( E iØes ; :::; E for ), x := E and x := [ E ] is f x := , while x k 1 y. Note E [ E ] := dispose it is empt ) and that the ModiØes set only tracks ( F to the store, and says nothing potential alterations cells that about the heap migh t be modiØed. the Rule of Consequence tically . That is, when In this paper we treat seman 0 0 P and Q ) Q store/heap are true in the model for arbitrary P the premisses ) we will use the rule without proving the premisses. pairs, formally codiØes Rule of local behaviour. The idea is that the pre- The Frame a notion in f P g C f Q g speciØes an area condition as well as a logical prop erty, of storage, that t for C to run and (if it terminates) establish postcondition Q . is su±cien execution a state that has additional heap cells, beyond those If we start with ed by P , then the values of the additional cells describ unaltered. We will remain use to separate out these additional cells. § invarian t assertion R is what The McCarth y and Hayes called a \frame axiom" [9]. It describ es cells that are not accessed, and hence by C . not changed, example, using we can prove that assigning As a warming-up the Frame Rule cons t of a binary the second comp onen t. onen to the Ørst comp cell does not aÆect 7! a g [ x ] := b f x 7! b g f x Frame x 7! a ) § ( x + 1 7! c ) g [ x ] := b f ( x 7! b ) § ( x + 1 7! c ) g ( f Syntactic Sugar a; c g [ x ] := b f x 7! b; c g f 7! x c overlap variables between x + 1 7! of free and [ x ] := b is allowed here The because ModiØes([ x ] := b ) = fg . 4 Deriv ed Laws small axioms but not practical. Rather, they represen t a kind The are simple t, an extreme take on the idea that a speciØcation can of though t experimen trate on just those cells that a program concen accesses. In this we show how the structural rules can be used to obtain a section convenien ed laws (most number of more of which were taken as primitiv e t deriv in [20, 7]). Although state a completeness result, along we will not explicitly the way we will observ e that weakest preconditions or strongest postconditions are deriv able commands. This shows a sense in which for each of the individual each small is missing system, and nothing the claim that in the core justiØes axiom gives enough information to understand how its command works. We begin with [ E ] := F . If we consider an arbitrary invarian t R then we obtain the following deriv ed axiom using the Frame Rule with the small axiom

8 as its premise. f 7! {) § R g [ E ] := F f ( E 7! F ) § R g ( E expresses a kind t to [ E ] aÆects the heap cell This axiom of locality: Assignmen aÆect there R . In particular, only, and so cannot is at position E the assertion alias R . With several more steps of Auxiliary to generate no need checks within we can obtain an axiom that is essen tially the one from Variable Elimination [20]: ; x ; ¢¢¢ ; x f9 : ( E 7! {) § R g [ E ] := F f9 x x ; ¢¢¢ g R : ( E 7! F ) § n n 1 1 2 Free( where x 6 ; :::; x E; F ). n 1 2 ose x 6 For allocation, Free( E supp ; :::; E of the ). Then a simpler version k 1 is small axiom g x := cons ( E g ; :::; E f ) f x 7! E emp ; :::; E k k 1 1 ed using rules for auxiliary variables and Consequence. If, This can be deriv 2 is an assertion where x 6 Free( R ) then R further, emp g x := cons ( E f ; :::; E g ) f x 7! E ; :::; E 1 k 1 k Frame § R g x := cons ( E ) ; :::; E f emp f ( x 7! E g ; :::; E R ) § k 1 k 1 Consequence R g x := cons ( E f ; :::; E g ) f ( x 7! E R ; :::; E § ) k 1 1 k conclusion is the strongest and a varian t involving auxiliary The postcondition, 2 ; :::; E the case Free( R; E when x handles ). variables k 1 laws, recall As an example ( x 7! a; o ) § ( x + o 7! of the use of these the assertion ° o ) that describ es a circular oÆset-list. Here is a proof outline for a sequence b; that of commands such a structure. creates emp g f cons a; a ) x ( := 7! a; a g f x := cons ( b; b ) t b; b ( 7! a; a ) § ( t 7! x ) g f [ x + 1] := t ° x f ( x 7! a; t ° x ) § ( t 7! b; b ) g [ t x ° t + 1] := ( ) 7! a; t ° x ) § ( t 7! b; x ° t f g x + f9 x 7! a; o ) § ( x ( o 7! b; ° o ) g o: The last step, which is an instance of the Rule of Consequence, uses t ° x as the witness for . Notice how the alterations in the last two commands are done o locally. For example, of the placemen t of § we know that x + 1 must be because diÆeren t from t and t + 1, so the assignmen t [ x + 1] := t ° x cannot aÆect the t 7! b; b conjunct.

9 If we wish to reason then °§ can be used to express weakest backwards, Given an arbitrary postcondition , choosing ( E 7! F ) °§ Q as preconditions. Q for [ the invarian F E t gives a valid precondition ] := E { [ E 7! F f g 7! F g E f ] := Frame 7! {) § (( E 7! F ) °§ Q ) g [ E ] := F f ( f 7! F ) § (( E 7! F ) °§ Q ) g ( E E Consequence 7! {) § (( E 7! F ) °§ Q E g [ E ] := F f Q g ( f ) Consequence step uses an adjunction rule for § and °§ . The precondition The is in fact the weakest: it expresses as deletion obtained followed by the \update idea extension" for allocation can also explained in [7]. The weakest precondition °§ be expressed . with can be computed The weakest precondition , because the dispose for directly dispose ( E ) is empt ModiØes set of y. E { g dispose ( 7! ) f emp g f E Frame E § {) § R g dispose ( E ) f emp ( R g f 7! Consequence E {) § R g dispose ( 7! ) f R g ( f E conclusion is (a unary version of) the axiom for dispose from [7]. The The weakest precondition for x := E is the usual one of Hoare. For axiom 2 (where x using 9 to form a \let binder" E n 6 := [ Free( E; P; x ). ] is it similar, f P [ E=x ] g x := E f P g f9 n: true § E 7! n ) ^ P [ n=x ] g x := [ E ] f P g ( formal deriv of these laws from the small axioms make heavy use of The ations and Variable Substitution Elimination; the details are con- Auxiliary Variable thesis [24]. in Yang's tained 2 deriv ed law for x := [ E ] is for the case when x 6 Free( Another E; R ), useful 2 6 Free( E ), and when the precondition is of the form y E 7! y ) § R . Then, ( ) ( 7! y ) § R g x f E ] f ( E 7! x E § R [ x=y ] g : := [ 5 Beyond the Core In the next few sections we give some examples of the formalism at work. In these examples we use sequencing, if - then - else , and a construct newvar for declaring a local variable. the core system with their usual Hoare We can extend rules. logic 0 ^ B g C f Q g f f ^: B g C P f Q g P g R f C g Q g f Q f C g P f 1 2 0 f C else C then B if g g P f f P g C g ; C R f Q 2 1 f P g C f Q g 2 P; Q x 6 Free( ) Q g f g newvar x: C P f We will also use simple Ørst-order procedures. The procedure deØnitions we need will have the form

10 procedure p x ( ; :::; x ) ; y n 1 B x are variables not changed in the body B and y is a variable ; :::; x where 1 n headers all of the variables to. Procedure that is assigned will always contain body. Accordingly , we deØne occurring freely in a procedure y ( ; :::; x ModiØes( ; p )) = f y g x 1 n p x Free( ; :::; x . ; y )) = ( x g ; :::; x ; y f n 1 n 1 these clauses when applying the structural rules. In the examples We will need mechanism by-name the calling for all the parameters, can be taken to be either on the x 's and by-reference on y . or by-value i are used 7 mainly to help structure the presen tation, Procedures in Section use recursiv e calls. There we appeal to the standard but in Section 6 we also rule which allows us to use the speciØcation we are trying to partial correctness when reasoning about the body [5]. prove as an assumption t in what follows will not be completely formal. We will continue Our treatmen of Consequence tic way, and we will make inductiv e to use the Rule in a seman without formally deØning their seman tics. Also, as is common, we will deØnitions rather t program annotated with intermediate assertions, speciØcations presen than give step-b y-step proofs. 6 Tree Copy In this section we consider a procedure for copying a tree. The purp ose of the example is to show the Frame in action. Rule oses a tree be an atom a or a pair ( ø For our purp ; ø will either ) of trees. 2 1 Here of a predicate tree ø i which says when a number is an inductiv e deØnition ts a tree ø . represen i ¢ ^ () i = a tree isatom a i a ) ^ emp ?( ¢ ø ) ; ø y ) i tree () 9 x; y: ( i 7! ( ) § ( tree ø ø x § tree x; y 1 2 2 1 These two cases are exclusiv e. For the Ørst to be true i must be an atom, where in the second it must be a location. when tree is \exact", in the sense that predicate it is true the curren t The ø i heap those heap cells used to represen t the tree. If ø has must have all and only n pairs in it and s; h j = tree ø i then the domain of h has size 2 n . The of the CopyTree procedure is speciØcation © ™ © ™ CopyTree ( p ; q tree ø p ( tree ø p ) § ( tree ø q ) ) : and here is the code.

11 procedure CopyTree p ; q ) ( 0 0 i; j; i : newvar ; j ø p g tree f ) p ?( then if isatom emp ^ ?( f ) ^ isatom g p = ø p ø p ) § ( tree f ) g ( tree ø p p q := ) ø p § ( f ø q ) g ( tree tree else : ; ø g ; x; y: ø f9 = ( ø ) ; ø y ) § ( p 7! x; y ø § ( ) ø ø x ) § ( tree tree 1 2 1 2 1 2 p j := [ p + 1]; i := [ ]; : ; ø g : ø f9 = ( ø ) ; ø j ) § ( p ø i; j ) 7! ( tree ø ø i ) § ( tree § 1 2 1 2 1 2 0 i ( i ); CopyTree ; : 0 ; ø ) : ø ) = ( ø f9 ; ø i ) § ( p 7! i; j ø § ( tree g ø i ) § ( tree ø tree j ) § ( ø 2 1 2 1 2 1 1 0 ( j j ); CopyTree ; : 0 tree : ø i = ( ø ) ; ø f9 ) § ( p 7! i; j ) § ( tree ø i ø ) § ( ; ø ø ø j ) § ( tree 2 1 1 2 1 2 1 0 ) ( j tree ø g § 2 0 0 ( i ) ; j q := cons : 0 ø ) : ø ø = ( ø f9 ; ø i ) § ( p 7! i; j ) § ( tree ø tree i ) § ( tree ø ( j ) § ; ø 2 1 1 2 1 2 1 0 0 0 g j ( ) § ( q § i ø ; j tree ) 7! 2 ( ( ø p ) f tree tree ø q ) g § Most of the steps are straigh tforw ard, but the two recursiv e calls deserv e special commen the body of the procedure we get to use the speci- t. In proving of Øcation at Ørst sight the speciØcation does as an assumption. CopyTree But 0 ; since CopyTree ( i to be sure i we need ) enough, not appear to be strong that p 7! i; j and tree ø . Similarly j does not aÆect , we need that the assertions 2 0 0 ( ) does not aÆect tree ø ; i j . j CopyTree 1 prop erties are obtained from two instances of the These \does not aÆect" Rule: Frame 0 0 g i g f ( i ; i tree ) f ( tree ø ) i ) § ( tree ø ø i CopyTree 1 1 1 : i = ( ø g ; ø ) ) § ( p 7! i; j ) § ( tree f j ø ) § ( tree ø ø 2 2 1 1 0 ( i CopyTree i ) ; : 0 ø = ( ø g ; ø ) ) § ( p 7! i; j ) § ( tree ø f i ) § ( tree ø i j ) § ( tree ø 1 2 1 1 2 and 0 0 j CopyTree ( j ; j tree ) f ( tree ø j ø ) § ( tree ø f j g ) g 2 2 2 : 0 ø ; ø f ) § ( p 7! i; j ) § ( tree ø i ) § ( tree ø j ) § ( tree ø i ø ) g = ( 1 2 1 2 1 0 ( j ; j ) CopyTree : 0 0 = ( ø : ; ø g ) § ( p 7! i; j ) § ( tree ø ) i ) § ( tree ø ø j ) § ( tree ø j i f ) § ( tree ø 2 1 1 2 1 2 the required Then, for the calls are obtained using Auxiliary Variable triples Elimination to introduce 9 ø the ; ø to strip . (It would also have been possible 2 1 existen tial at the beginning of the proof of the else part, and then reintroduce it after Ønishing instead of carrying it through the proof.)

12 This section two main points. First, if one does not have some illustrates ting or inferring axioms, then the proofs of even simple way of represen frame procedure programs In particular, for recursiv e calls with will not go through. tion is essen tial if one is to obtain strong enough atten programs to framing The CopyTree procedure could not be veriØed without induction hypotheses. the Rule, we were to complicate the initial speciØcation by including Frame unless represen of frame explicit axioms. some tation the speciØcation CopyTree illustrates the idea of a speciØcation Second, of concen trates only that cells that a program accesses. And of course on those these some way to infer frame axioms, or else two points are linked; we need would be too weak. such a speciØcation 7 DiÆerence-link ed Lists The purp ose of this section the treatmen t of address arithmetic, is to illustrate also disposal. We do this by considering a space-sa ving represen tation and of doubly-link ed lists. , a node in a doubly-link a data Øeld, together Conventionally ed list contains a Øeld storing with n to the next node and another storing a pointer p a pointer to the previous node. In the diÆerence represen tation we store n ° p in a single Øeld rather have separate Øelds for n and p . In a conventional doubly-link ed than to move either or backwards from a given node. In list it is possible forwards ed list given the curren t node c we can lookup the diÆerence a diÆerence-link by itself, = ° p d and previous pointers. This diÆerence does not, n between next give us enough information to determine either n or p . However, if we also know p we can calculate n as d + p , and similarly given n we can obtain p as n ° d . So, using the diÆerence it is possible to traverse the list in either represen tation, as we keep track of the previous or next node as we go along. direction as long more time-e±cien t, represen tation is sometimes given using the A similar, than diÆerence. their xor of pointers rather of a predicate dl . If we were working with conven- We now give a deØnition 0 0 ed lists then dl a ond to ¢¢¢ a corresp ( i; i tional ; j; j doubly-link ) would n 1 0 i j a a n 1 . . . j 0 i . . .

13 0 Typically i and back j , a doubly-link would satisfy the predicate ed list with front 0 0 ( nil ; j i; ). The reason for the internal nodes i ; and j is to allow us to dl nil Æ not terminated partial nil . consider lists, by dl doubly-link ed lists was given in [20]. The of A deØnition for conventional we must make is to use main alteration a a n - p n instead of p to represen t a node. Here is the deØnition. ¢ 0 0 0 0 ^ ( () emp ^ i ≤ j ; j; j i i; i = j ) dl = ¢ 0 0 0 0 dl ) aÆ () 9 j: ( i 7! a; j ( i i; i ) § dl Æ ( j; i; k; k ° ) ; k; k juxtap We are using t the consing of an elemen t a onto the osition to represen , and to represen t the empt y sequence. As a small example, Æ ≤ front of a sequence (5 ; 1 ; 3 ; 8) is true of dl ab 8 a b 5 9 6 8-1 3-5 works It is instructiv consisting e to look at how this deØnition for a sequence 0 0 a . For dl a ( i; i of a single ; j; j elemen ) to hold we must have 9 x: ( i 7! a; x ° t, 0 0 § dl ≤ ( x; i; j; j to be ); we can pick x ) j , as suggested by the i = j part of the i 0 for ≤ . We are still left, however, with the requiremen t that i = j case , and this 0 0 0 0 in fact leads i us to the characterization ^ i = j 7! of dl a ( i; i a; j ; j; j ° ). i t list exempliØes how the case is arranged to be compat- Thus, a single-lemen ≤ the operation an elemen t onto the front of a sequence. The ible with of consing 0 0 = j and i requiremen = j roles of the ts are essen tially reversed for the dual i of adding operation elemen t onto the end of a sequence. This operation, a single as follows. is characterized 0 0 0 0 0 0 0 0 j ( i; i dl Æ ( dl : ; k ; k; k ; j Æa ) § k ) 7! a; k ° j , 9 i; i In the examples to come we will also use the following prop erties. : 0 0 ^ dl Æ ( i; nil ; j; j j = ) 9 Ø; a; k: Æ nil = Øa § 6 ) 0 0 0 dl ; j Ø ; k ) § j ( 7! a; j ° k i; i 0 0 ( i; i nil ; j; dl ) ) emp ^ Æ = ≤ ^ i Æ = nil ^ i = j 0 0 0 0 ; i dl ; j; j ( ) ) emp ^ Æ = ≤ ^ j = nil ^ i nil = j Æ

14 Doubly-link ed lists used to implemen t queues, because they make are often to work at either end. an enqueue operation. it easy We axiomatize give the code all at once, Rather it will be helpful than to use a procedure setting pointer . Supp ose we are in the of a right to encapsulate the operation 0 ts pointing , whose diÆerence Øeld represen position on the of having a pointer j the right pointer so that right to, say, k instead. j . We want to swing it points to of the procedure speciØcation The is 0 0 0 nil ; j; j g ) f setrptr ( dl Æ ; k ; i ) f dl Æ ( i; nil ; k; j ( ) g : i; j; j 0 handles the Æ = ≤ case, when j that does not point to Notice this speciØcation e cell. an activ and proof of setrptr for a momen t, we can use it Postponing the deØnition t for putting an value a on the end of a queue. to verify a code fragmen dl Æ ( front; nil ; nil ; back ) g f t := ; back dl Æ front; nil ; nil ; t ) g f ( cons a; nil ° t ); := ( back Æ ( front; nil ; nil ; t ) f back 7! a; nil ° t g dl § ( nil ; t; back ; front ) setrptr dl Æ ( front; nil ; back ; t f § back 7! a; nil ° t g ) f Æa ( front; nil ; nil ; back ) g dl code creates . node containing the value a and the diÆerence nil ° t The a new the procedure nil setrptr ( Then, ; t; back ; front ) swings the right pointer call with t so that the next node becomes back . In the assertions, the eÆect associated of back := cons ( a; nil ° t ) is axiomatized by tacking § ( back 7! a; nil ° t ) onto its precondition. This ; because of the placemen t sets us up for the call to setrptr t we know that back 7! a; nil ° ( ). More precisely , § the call will not aÆect of using Variable Substitution to instan the triple the for the call is obtained tiate and the Frame with ( back 7! a; Rule ° t ) as the invarian t. speciØcation, nil 0 is an implemen tation of setrptr ( j; j ; ; k Finally i ). , here 0 dl i; nil ; j; j f ) g ( Æ 0 = nil then j if 0 = ≤ ^ emp ^ j nil = f g Æ := i k 0 nil = ^ emp ^ j Æ = ≤ ^ i = k g f else : 0 0 0 0 0 p Æ ( b ) § dl Æ ; b; p: ( i; nil ; j Æ ; p ) § ( j f9 7! b; j ° Æ ) g = 0 0 d d: d + 1]; [ j j + 1] := k + := [ ° j newvar : 0 0 0 0 0 f9 = Æ Æ b ) § dl Æ ( ; b; p: i; nil ; j ( ; p ) § ( j Æ 7! b; k ° p ) g 0 f ( i; nil ; k; j ) g Æ dl tricky part in the veriØcation is the else branc h of the conditional, where The 0 the diÆerence Øeld of j the code has to update appropriately so that k becomes 0 by adding j the next . It updates the Øeld node of k and subtracting j ; since

15 the Øeld initially j ° p , where p is the address of the previous node, such stores results calculation ° p . in the value k orary variable in the else branc h is a minor irritation. The use of the temp d 0 0 +1] j +1] := k +[ j [ simply ° j if we were to allow nesting more We could write ]. An unresolv ed question is whether, in our formalism, such nesting could of [ ¢ in a way simpler with compiling it out using temp orary variables. be dealt than t for code that ts a dequeue developmen implemen Now we sketch a similar 0 0 ( i; i operation. ; k ; In this case, we use a procedure ), which is similar setlptr j setrptr except that it swings a pointer to the left instead of to the right. to 0 0 0 0 0 ) ; i; i ) g setlptr ( i; i ( ; k ; j Æ ; j f dl Æ ( i; k; nil ; j dl ) g f nil dequeue operation remo ves the Ørst elemen t of a queue and places its The in x data . ) aÆ front; nil ; dl ; back ( g f nil 0 0 0 7! a; n f9 ° nil § dl Æ ( n n ; front; nil ; back ) g : front := := [ d := [ front + 1]; n ]; d + nil ; x front : f = a § front 7! a; n ° nil § dl Æ ( n; front; nil ; back ) x g dispose front ); dispose ( front + 1); ( : ) = a § dl f ( n; front; nil ; back x g Æ setlptr ( n; front; nil ; back ) : x dl = a § f Æ ( n; nil ; nil ; back ) g This code stores of the Ørst node in the variable x and obtains the the data pointer n arithmetic with the diÆerence Øeld. The placemen t of § next using front two front + 1: The precondition to these us up for disposing sets and § alent to an assertion ( is equiv 7! a ) of the form ( front + 1 7! commands front 0 ° nil ) § R , which is compatible with what is given by two applications n of rule . After dispose the weakest precondition the disposals have been done, for of the call n; front; nil ; back ) resets the diÆerence Øeld ( the procedure setlptr n so that its previous node becomes nil . node 0 0 ( code for setlptr ; k ; j The ) is as follows. i; i 0 0 ( f dl ; nil ; j Æ ) g i; i if i = nil then f Æ = ≤ ^ emp ^ i = nil g 0 j k := 0 = ≤ ^ emp ^ i = nil f k = j Æ g ^ else : 0 0 0 0 0 f9 = aÆ Æ ) § dl Æ ) ( n; i; nil ; j ; a; n: ) § ( i 7! a; n ° i ( Æ g 0 k i + 1] + i [ ° + 1] := [ i : 0 0 0 0 Æ f9 = aÆ Æ ) § dl Æ ( ; a; n: n; i; nil ; j ( ) § ( i 7! a; n ° k ) g 0 f dl Æ ( i; k; nil ; j ) g

16 8 Memory Faults t SpeciØcations and Tigh a seman tics of commands interpreta- In this paper we will not include or precise but in this section tion of the seman tic we give an informal of triples, discussion that system relies on. of triples the axiom prop erties f P g C f Q Usually is interpreted \loosely", in the sense , the speciØcation form g migh t cause state changes not describ ed by the pre and postcondition. C that to the need for explicit frame axioms. An old idea is to instead This leads con- a \tigh of f P g C f Q g , which should guaran tee that C only sider t" interpretation resources P in those and Q ; unfortunately , a precise deØni- alters mentioned of the meaning of tight speciØcations e [1]. However, the tion has proven elusiv from of local reasoning where a speciØcation and description the Introduction, trate on a circumscrib ed area of memory , requires something proof concen like tightness. need for a tight interpretation is also clear from the small axioms, The of or the speciØcations and CopyTree . setlptr , setrptr calls of memory fault. This can be the model here To begin, for a notion that there is an \access bit" associated with each location, pictured by imagining of the heap. to read Any attempt is in the domain which is on iÆ the location whose access bit is oÆ causes a memory fault, so if E is not or write a location 0 then [ an activ ] := E e address or x := [ E ] results in a fault. A simple way E to interpret ( E ) is so that it faults if E is not an activ e address, and dispose turns bit oÆ. otherwise the access a speciØcation f P g C Then, Q g holds iÆ, whenev er C is run in a state satisfy- f ing P : (i) it does not generate a fault; and (ii) if it terminates then the Ønal state satisØes Q is a partial correctness interpretation; the total correctness vari- . (This (ii) by requiring that are no inØnite reductions.) For example, ant alters there f voiding f 17 to the fault-a { g [17] := 4 interpretation, 17 7! 4 g holds according 7! f true g [17] := 4 f 17 7! 4 g does not. The latter triple fails because the empt y but heap satisØes but [17] := 4 generates a memory fault when executed in the true y heap. empt faults are precluded by the assumptions E 7! { and E 7! n in In the logic, 0 x the preconditions E ] := E of the small , axioms := [ E ] and dispose ( E ). for [ The main point of this section is that this fault-a voiding interpretation of f P C f Q g gives us a precise form ulation of the intuitiv e notion of tightness. (We g that of enabled faults, or a notion emphasize action, and we do not this requires that analysis a general claim of the notion of tight speciØcation.) it constitutes The of memory faults in speciØcations ensures that a well- avoidance speciØed program can only dereference (or dispose) those heap cells guar- anteed to exist or those which are allocated during by the precondition, execution. C a program proved to satisfy Concretely P g , if one executes f Q g , starting in a f state satisfying P , then memory access bits are unnecessary . A consequence is don't that to explicitly describ e all the heap cells that it is not necessary change, because those not mentioned automatically stay the same.

17 Fault avoidance f P g C f Q g ensures that if C is run in a state strictly in than one satisfying , then any additional cells must stay unchanged; larger P any of the additional an attempt falsify the speciØcation, cells to write would satisfying a fault generate to a smaller heap when P . it would because applied f 17 7! { g C f 17 7! 4 g For example, then f (17 7! {) § (19 7! 3) g C f (17 7! if holds § 7! 3) g should (19 by the Frame Rule, because any 4) as well, as mandated to dereference address 19 would falsify f 17 7! { g C f 17 attempt 4 g if we give 7! C where the access bit for 19 is turned oÆ. (This last step is delicate, a state one could entertain such as to test whether an access bit is in that operations, is generally for it is a notion which can be it; what on, which contradict needed but not the programming language.) detected in the logic 9 Conclusion the paper by suggesting that the main challenge facing veriØcation We began for pointer programs is to capture the informal local reasoning used formalisms or in textb yle argumen by programmers, ts about data structures. Part of ook-st y is that exacerbate the frame problem [9, 1]. (It is only the di±cult pointers of the di±cult y because the frame problem does not, by itself, part say anything about aliasing.) e programs the problem is to Ønd a way, preferably For imperativ succinct e, to describ e or imply the frame axioms, which say what and intuitiv memory cells are not altered by a program or procedure. Standard metho ds, such as listing the variables migh t be modiØed, do not work easily for pointer that because there many cells not directly named by variables in programs, are often fragmen t be accessed cells migh or program by a program by a program t. These pointer chains , or they migh t not be accessed even when following in memory they are reachable. h taken here is based on two ideas. The Ørst, describ ed in Section The approac voiding interpretation of triples to ensure that additional cells, 8, to use a fault-a e but not describ are not altered during execution. activ ed by a precondition, second is to use the § connectiv e to infer invarian t prop erties The by implied these tight speciØcations. problem for programs is perhaps The frame approac hable than the gen- more eral frame problem. Programs come with a clear operational seman tics, and one can appeal to concrete notions footprin t. But the metho ds such as a program's also appear to be more applicable. It would be interesting to give here generally comparison with ideas from the AI literature [22], as well as with vari- a precise on ModiØes ations [1, 8]. We hope to report further on these matters { clauses in particular outlined in Section 8 { in the future. (Several relev ant on the ideas developmen ts can be found in Yang's thesis [24].) There are several immediate for further work. First, the interaction directions reasoning and we do not mean di±cult, between local and global to is in general imply things always go as smoothly as in the example programs we chose. that They Øt our formalism nicely because their data structures break naturally into that disjoin and data structures t parts, use more sharing are more di±cult to

18 handle. This tree represen tations that allow sharing of subtrees, and includes structures. Yang has treated example, the Shorr-W aite graph graph a nontrivial using the spatial °§ is used to deal with the marking algorithm, implication [23]. More experience is needed in this direction. Again, the found sharing there is not to Ønd a system that is adequate in principle, but challenging problem is to Ønd rules idioms that cover common cases simply and rather or reasoning . naturally Second, in examples in this paper is only semi-formal, be- done the reasoning tically when applying the Rule of Consequence. We cause we have worked seman axioms to supp ort a number of examples, but a comprehensiv e know of enough of the proof theory of the assertion language is needed. Pym has worked study of the underlying logic we can draw on. But here out a proof theory BI [17] that an analysis erties special model of BI and thus require of prop we use a speciØc needed is a thorough treatmen t of recursiv e deØnitions to that model. Also of predicates. Finally , the examples involving address arithmetic with diÆerence-link ed lists It would be interesting to try to verify are simplistic. substan tial programs more that tially on address arithmetic, such as memory allocators or garbage rely essen collectors. Ackno . wledgements would like to thank Richard Bornat, Cristiano O'Hearn and David Calcagno Pym for discussions about local reasoning and bunc hed logic. He was supp orted by the EPSR C, under the \VeriØed Byteco de" and \Local Reasoning about State" grants. Reynolds orted by NSF grant CCR-9804014. Yang was was supp orted by the NSF grant INT-9813854. supp under References J. Mylop and R. Reiter. On the frame problem [1] A. Borgida, in procedure oulos, Transactions of Softwar e Engine ering , 21:809{838, 1995. speciØcations. IEEE Proving pointer programs in Hoare logic. Mathematics of Program [2] R. Bornat. , 2000. Construction Burstall. Some techniques for proving correctness of programs which alter [3] R.M. structures. 1972. Intel ligenc e , 7:23{50, data Machine S. Isthiaq, tic analysis Seman [4] C. Calcagno, of pointer aliasing, and P. W. O'Hearn. and allocation in Hoare logic. Proceedings of the Second International disposal ACM SIGPLAN Conference on Principles and Practice of Declarativ e Program- ming, 2000. Metho for proving programs. In J. van Leeu wen, editor, [5] P. Cousot. ds and logics ook of Theoretical Computer Scienc e , volume B, pages 843{993. Handb Elsevier, Amsterdam, MIT Press, Cam bridge, and The 1990. Mass., [6] C. A. R. Hoare and J. He. A trace model for pointers and objects. In Rachid Guer- raoui, editor, ECCOP'99 - Object-Oriente d Programming, 13th European Confer- ence 1{17, 1999. Lecture Notes in Computer Science, Vol. 1628, Springer. , pages [7] S. Isthiaq and P.W. O'Hearn. BI as an assertion language for mutable data struc- tures. Confer ence Record of the Twenty-Eighth Annual ACM Symp osium on In 2001. of Programming Languages , pages 39{46, London, January Principles

19 [8] K. R. M. Leino and G. Nelson. abstraction and information hiding. Technical Data h Report 160, Compaq Researc h Center, Palo Alto, CA, Report Reearc Systems November 2000. y and P. Hayes. Some from the standp oint of philosophical [9] J. McCarth problems Machine ligenc e , 4:463{502, 1969. intelligence. artiØcial Intel Resource interpretations, bunc hed implications and the Æ∏ - [10] P. W. O'Hearn. In Typed -calculus and Applic ations , J-Y Girard editor, L'Aquila, Italy, calculus. ∏ Lecture Science in Computer 1999. 1581. April Notes and P. W. O'Hearn The logic of bunc hed implications. Bulletin of [11] D. J. Pym. olic Logic , 5(2):215{244, June 99. Symb P. W. O'Hearn [12] J. C. Reynolds. From Algol to polymorphic linear lambda- and J. ACM , 47(1):267{223, 2000. calculus. January and R. D. Tennen y and local variables. J. ACM , P. W. O'Hearn t. Parametricit [13] Also in [14], vol 2, pages 42(3):658{709, May 1995. 109{164. P. W. O'Hearn and R. D. Tennen t, editors. Algol-like Languages . Two volumes, [14] Boston, 1997. Birkhauser, F. J. Oles. A Category-The oretic Appr oach to the Semantics of Programming [15] . Ph.D. Syracuse Univ ersity, Syracuse, N.Y., 1982. Languages thesis, F. J. Oles. Functor categories [16] store shap es. In O'Hearn and Tennen t [14], and pages 3{12. Vol. 2. [17] D. J. Pym. The Semantics and Proof Theory of the Logic of Bunche d Implic ations . Monograph to appear, 2001. J. C. Reynolds. Syntactic In Confer ence Record of the [18] control of interference. ACM Symp on Principles of Programming Languages , pages Annual osium Fifth Arizona, January 1978. ACM, New 39{46, Also in [14], vol 1. Tucson, York. J. C. Reynolds. The essence of Algol. In J. W. de Bakk er and J. C. van Vliet, [19] editors, Languages , pages 345{372, Amsterdam, Octob er 1981. North- Algorithmic Holland, Also in [14], vol 1, pages 67-88. Amsterdam. J. C. Reynolds. structure. reasoning about shared mutable data [20] In Intuitionistic Jim Bill Rosco e, and Jim Woodcock, editors, Millennial Persp ectives in Davies, Computer Scienc e , pages 303{321, Houndsmill, Hampshire, 2000. Palgra ve. [21] J. C. Reynolds. on reasoning about shared mutable data structure. IFIP Lectures Group 2.3 School/Seminar rt Program Design Using Working on State-of-the-A Tandil, Argen tina, Septem Logic. ber 2000. [22] M. Shanahan. Solving the Frame Problem: A Mathematic al Investigation of the Common Sense Law of Inertia . MIT Press, 1997. [23] H. Yang. of local reasoning in BI pointer logic: the Schorr-W aite An example marking algorithm. Manuscript, Octob er 2000. graph [24] H. Yang. Local Reasoning for Stateful Programs . Ph.D. thesis, Univ ersity of Illinois, Urbana-Champaign, Illinois, USA, 2001 (expected).