Click Here Return to Posts List # Interprocedural analysis ## Motivation of Interprocedural analysis for **intra**procedural analysis, we make conservative assumption for method call, leaving most of variable imprecise. e.g. ```bash void foo(){ int n =ten(); addOne(42); } int ten(){ return 10; } int addOne(int x){ int y = x + 1; return y; } ``` for **Constant Propagation** in intraprocedural analysis, n,x,y are all NAC because we ignore the connection between each method. BUT for Interprocedural, we use **call graph** to point the call edges between each method, and n =10, x=42, y = 43. ## Call Graph call graph is a representation of calling relationships in the program. It is made up by a set of call edges from one method to another. ### Call Graph Construction for OOPLs Class Hierarchy Analysis(CHA) Rapid Type Analysis(RTA) Variable Type Analysis(VTA) Pointer Analysis 越往下越精确,越往上效率越高 #### Method Dispatch on Virtual Call a virtual call is resolved based on: 1. type of the receiving project:c 2. method signature:m We use *Dispatch(c,m)* to jsimulate the procedure of method dispatch, if c doesn't contain non-abstract method c, then it will find in the **superclass** of c. ```bash class A{void foo(){...}} class B extends A{} class C extends B{ void foo(){...} void dispatch(){ A x = new B(); x.foo(); A y = new C(); y.foo(); } } ``` 对于x.foo() 实际为Dispatch(B,A.foo()),由于在B中没有找到foo方法,所以寻找B的父类A,实际调用的是A类中的foo方法。 对于y.foo() 实际为Dispatch(C,A.foo()),由于C中就存在了foo方法,所以实际调用为C中的foo方法。 ### Class Hierarchy Analysis We define func *Resolve(cs)* to resolve possible target methods of a call site *cs* by CHA. ```bash Resolve(cs) T = {} m = method signature at CS if CS is a static call T = {m} if CS is a special call cm = class type in m T = {Dispatch(cm,m)} if CS is a virtual call c = declared receiver type at CS foreach c' that is subclass of c do add Dispatch(c',m) to T return T ``` e.g.对于虚拟调用 ```bash class A{void foo(){...}} class B extends A{} class C extends B{ void foo(){...} } class D extends B{ C c = ... c.foo() A a = ... a.foo() B b = ... b.foo() } ``` Resolve(c.foo()) = {C.foo()} Resolve(a.foo()) = {a.foo(),c.foo(),d.foo()} Resolve(b.foo()) = {a.foo(),c.foo(),d.foo()} 如果在声明时使用了B b = new B() 此时Resolve的dispatch结果仍然不变,同样包含c.foo() & d.foo(), 与我们前文中所写的本类中没有应该在父类中调用有所不符,这就是CHA方法带来的“假”调用。 ### Call Graph Construction in CHA Start from entry method, for each reachable method,resolve each CS using CHA, until no method is discovered Algorithms: ```bash BuildCallGraph(entry) WaitList = [Entry],CallGraph = {},ReachableMethod = {} while WL is not empty remove top m from WL if m not belongs to RM add m to RM for each CS in m T = Resolve(CS) for each target method tm in T add CS -> tm to CG add tm to WL return CG ``` ## Interprocedural Control Flow Graph Besides in and out in the CFGs, ICFG has **Call edges** and **Return edges**. *Call edges*: from CS to the **entry node of the called method**,pass the argument value *Return edges*: from **return statement** to the **return sites**(the statement after the call in previous method),pass the argument value In ICFG,when node is a **call site**, transfer function **DOESN'T contain gen_value** on the left of equation. The edge between CS and RS is called call-to-return edge, to transfer local data(which is not used by called method). ## Problem of CHA if there are false positive, it may lead to false Constant Propagation. e.g. ```bash void foo(){ Number x = new ONE() int x = n.get() } interface Number{ int get() } class ONE implements Number{ public int get(){ return 1; } } class TWO implements Number{ public int get(){ return 2; } } class ZERO implements Number{ public int get(){ return 0; } } ``` 当 *Resolve(n.get())* 时,由于是CHA的virtual call,所以会返回zero.get(),one.get(),two.get()三个return edge返回的值0,1,2,此时交运算则会变为NAC(参考前面常量传播规则),这显然是不符合实际的。 But for Pointer Analysis, it will based on point-to relation, it will only return one.get() and return the value 1. ## Pointer Analysis What are pointers pointing? e.g. ```bash void foo() { A a = new A() B x = new B() a.setB(x) B y = a.getB() } class A { B b; void setB(B b){ this.b = b } B getB(){ return this.b } } ``` Points-to Relation |Variable | Object| |:----:|:----:| |a | new A| |x | new B| |this | new A| |b | new B| |new A.b | new B| |y | new B| ### Pointer Analysis and Alias Analysis Pointer Analysis: what a pointer to point to? Alias Analysis: can two pointer point same obj? ### Key Factor in Pointer Analysis Heap Abstraction Context Sensitivity Flow Sensitive Analysis Scope ### Heap Abstraction when in loops and recursion, the heap object can be unbounded. ```bash for(i=0;i<3;i++){ A a = new A(); } ``` when in dynamic analysis, this class will be called three times: o2.iteration i = 0,o2.iteration i = 1,o2.iteration i = 2 to ensure termination in static analysis, we use heap abstraction model dynamically allocated, unbounded concrete objects as finite abstract objects. #### Allocation-site abstraction Model concrete objects by their **allocation sites**, and each abstract object per allocation site represents ALL allocated concrete objects in dynamic analysis. when in allocation-site abstraction, the class call will only be abstracted as **o2** in before example. ### Context Sensitivity Context Sensitive: use each method multiple times. Different use has different data flow. Context insensitie: use each method one time and all data flow merge. ### Flow Sensitivity flow sensitive: respect the execution order of the statements flow insensitive: ignore the CFG order, treat the whole program as a set of unorderd statement. ### Analysis Scope whole-program: compute point-to information in all pointers. demand-driven: only compute information for pointers demanded. ### Pointer Analysis in our case Allocation-Site/Context-Sensitive/Flow-intensitive/Whole-program ### Concerned statement we only focus on pointer affect statement #### Pointers in Java 1. Local variable:x 2. Static field:C.f 3. Instance field:x.f 4. Array element:array[i] #### Pointer affecting statements New/Assign(x = y)/Store(x.f = y)/Load(y = x.f)/Call(r = x.k(a,...)) #### Domains and notations Variables: V Fields: F Objects: $o_i,o_j \in$ O Instance field: $o_i.f , o_j.g \in$ $O \times F$ Pointers include Variables & Instance field Points-to relations: pt-> $\rho(O)$ , $\rho(O)$ is the powerset of the O ### Rules 1. x = new T() => $\frac{}{pt(x).append(o_i)}\frac{\leftarrow premise}{\leftarrow conclusion}$ if no premise then unconditional 2. y = x => $\frac{o_i\in pt(x)}{pt(y).append(o_i)}$ 3. x.f = y => $\frac{o_i\in pt(x),o_j\in pt(y)}{pt(o_i.f).append(o_j)}$ 4. y = x.f => $\frac{o_i\in pt(x),o_j\in pt(x.f)}{pt(y).append(o_j)}$ ## Pointer Analysis Foundation ### Pointer Flow Graph(PFG) display how objects flow among the pointers in the program. Nodes: $V\bigcup U \times F$ Edges: $Pointer \times Pointer$ ### Pointer Analysis Algorithms ```bash # Main Algorithm Solve(S){ WL = [],PFG=[] for each i : x = new T() in S{ add to WL } for each x = y in S{ addEdge(y,x) } while(WL not empty){ remove top from WL delta = pts - pt(n) Propagate(n,delta) if(n == variable x){ for each o_i in delta{ for each x.f = y in S{ addEdge(y,o_i.f) } for each y = x.f in S{ addEdge(o_i.f,y) } } } } } #Propagate Propagate(n,pts){ if(pts not empty){ pt(n) = pt(n) & pts for each n -> (s in PFG){ add to WL } } } # addEdge addEdge(s,t){ if(s -> t not in PFG){ add s-> t to PFG if(pt(s) not empty){ add to WL } } } ``` Propagation aims to transfer delta to reduce loop count. ### Pointer Analysis with Method Calls #### Difference between CHA and pointer-analysis in Call Graph Construction 1. CHA:imprecise, may introduce spurious call edges 2. pointer analysis:precise #### Rules for method call r = x.k(a1,...,an) => $\frac{o_i \in pt(x), m = Dispatch(o_i,k),o_u \in pt(a_j),o_v \in pt(m_{ret})}{o_i\in pt(m_{this}),o_u\in pt(m_{pj}),o_v \in pt(r)}$ 1. $Dispatch(o_i,k),o_i\in pt(x)$ resolve the virtual dispatch of k on o_i 2. $m_{this}$ : this variable of m 3. $m_{pj}$ : the j-th parameter of m 4. $m_{ret}$ : the variable that holds return value ```bash C x = new T() r = x.foo(a1,a2); class T { B foo (A p1, A p2){ this; return ret; } } ``` PFG Edges: a1 -> p1, a2 -> p2, ret -> r ### Algorithms(pointer analysis and Call Graph Construction combined) S: set of reachable statement S_M: set of statement in method M RM: reachable method CG: call graph edges ```bash Solve(m_entry){ WL=[],PFG=[],S=[],RM=[],CG=[] AddReachable(m_entry) while(WL not empty){ remove top from WL delta = pts - pt(n) Propagate(n,delta) if(n == variable x){ for each o_i in delta{ for each x.f = y in S{ addEdge(y,o_i.f) } for each y = x.f in S{ addEdge(o_i.f,y) } ProcessCall(x,o_i) } } } } #Reachable AddReachable(m){ if(m !in RM){ add m to RM S = S combine S_M for each i: x = new T() in S_M{ add to WL } for each x = y in S_M{ AddEdge(y,x) } } } #MethodCall ProcessCall(x,o_i){ for each l: r = x.k(a1,...,an) in S{ m = Dispatch(o_i,k) add to WL if(l -> m !in CG){ add l -> m to CG AddReachable(m) for each p_i of m{ AddEdge(a_i,p_i) } AddEdge(m_ret,r) } } } #Propagate Propagate(n,pts){ if(pts not empty){ pt(n) = pt(n) & pts for each n -> (s in PFG){ add to WL } } } # addEdge addEdge(s,t){ if(s -> t not in PFG){ add s-> t to PFG if(pt(s) not empty){ add to WL } } } ```