Click Here Return to Posts List
# Interprocedural analysis
## Motivation of Interprocedural analysis
for **intra**procedural analysis, we make conservative assumption for method call, leaving most of variable imprecise.
e.g.
```bash
void foo(){
int n =ten();
addOne(42);
}
int ten(){
return 10;
}
int addOne(int x){
int y = x + 1;
return y;
}
```
for **Constant Propagation** in intraprocedural analysis, n,x,y are all NAC because we ignore the connection between each method.
BUT for Interprocedural, we use **call graph** to point the call edges between each method, and n =10, x=42, y = 43.
## Call Graph
call graph is a representation of calling relationships in the program. It is made up by a set of call edges from one method to another.
### Call Graph Construction for OOPLs
Class Hierarchy Analysis(CHA)
Rapid Type Analysis(RTA)
Variable Type Analysis(VTA)
Pointer Analysis
越往下越精确,越往上效率越高
#### Method Dispatch on Virtual Call
a virtual call is resolved based on:
1. type of the receiving project:c
2. method signature:m
We use *Dispatch(c,m)* to jsimulate the procedure of method dispatch, if c doesn't contain non-abstract method c, then it will find in the **superclass** of c.
```bash
class A{void foo(){...}}
class B extends A{}
class C extends B{
void foo(){...}
void dispatch(){
A x = new B();
x.foo();
A y = new C();
y.foo();
}
}
```
对于x.foo() 实际为Dispatch(B,A.foo()),由于在B中没有找到foo方法,所以寻找B的父类A,实际调用的是A类中的foo方法。
对于y.foo() 实际为Dispatch(C,A.foo()),由于C中就存在了foo方法,所以实际调用为C中的foo方法。
### Class Hierarchy Analysis
We define func *Resolve(cs)* to resolve possible target methods of a call site *cs* by CHA.
```bash
Resolve(cs)
T = {}
m = method signature at CS
if CS is a static call
T = {m}
if CS is a special call
cm = class type in m
T = {Dispatch(cm,m)}
if CS is a virtual call
c = declared receiver type at CS
foreach c' that is subclass of c do add Dispatch(c',m) to T
return T
```
e.g.对于虚拟调用
```bash
class A{void foo(){...}}
class B extends A{}
class C extends B{
void foo(){...}
}
class D extends B{
C c = ...
c.foo()
A a = ...
a.foo()
B b = ...
b.foo()
}
```
Resolve(c.foo()) = {C.foo()}
Resolve(a.foo()) = {a.foo(),c.foo(),d.foo()}
Resolve(b.foo()) = {a.foo(),c.foo(),d.foo()}
如果在声明时使用了B b = new B() 此时Resolve的dispatch结果仍然不变,同样包含c.foo() & d.foo(), 与我们前文中所写的本类中没有应该在父类中调用有所不符,这就是CHA方法带来的“假”调用。
### Call Graph Construction in CHA
Start from entry method, for each reachable method,resolve each CS using CHA, until no method is discovered
Algorithms:
```bash
BuildCallGraph(entry)
WaitList = [Entry],CallGraph = {},ReachableMethod = {}
while WL is not empty
remove top m from WL
if m not belongs to RM
add m to RM
for each CS in m
T = Resolve(CS)
for each target method tm in T
add CS -> tm to CG
add tm to WL
return CG
```
## Interprocedural Control Flow Graph
Besides in and out in the CFGs, ICFG has **Call edges** and **Return edges**.
*Call edges*: from CS to the **entry node of the called method**,pass the argument value
*Return edges*: from **return statement** to the **return sites**(the statement after the call in previous method),pass the argument value
In ICFG,when node is a **call site**, transfer function **DOESN'T contain gen_value** on the left of equation. The edge between CS and RS is called call-to-return edge, to transfer local data(which is not used by called method).
## Problem of CHA
if there are false positive, it may lead to false Constant Propagation.
e.g.
```bash
void foo(){
Number x = new ONE()
int x = n.get()
}
interface Number{
int get()
}
class ONE implements Number{
public int get(){
return 1;
}
}
class TWO implements Number{
public int get(){
return 2;
}
}
class ZERO implements Number{
public int get(){
return 0;
}
}
```
当 *Resolve(n.get())* 时,由于是CHA的virtual call,所以会返回zero.get(),one.get(),two.get()三个return edge返回的值0,1,2,此时交运算则会变为NAC(参考前面常量传播规则),这显然是不符合实际的。
But for Pointer Analysis, it will based on point-to relation, it will only return one.get() and return the value 1.
## Pointer Analysis
What are pointers pointing?
e.g.
```bash
void foo()
{
A a = new A()
B x = new B()
a.setB(x)
B y = a.getB()
}
class A {
B b;
void setB(B b){
this.b = b
}
B getB(){
return this.b
}
}
```
Points-to Relation
|Variable | Object|
|:----:|:----:|
|a | new A|
|x | new B|
|this | new A|
|b | new B|
|new A.b | new B|
|y | new B|
### Pointer Analysis and Alias Analysis
Pointer Analysis: what a pointer to point to?
Alias Analysis: can two pointer point same obj?
### Key Factor in Pointer Analysis
Heap Abstraction
Context Sensitivity
Flow Sensitive
Analysis Scope
### Heap Abstraction
when in loops and recursion, the heap object can be unbounded.
```bash
for(i=0;i<3;i++){
A a = new A();
}
```
when in dynamic analysis, this class will be called three times: o2.iteration i = 0,o2.iteration i = 1,o2.iteration i = 2
to ensure termination in static analysis, we use heap abstraction model dynamically allocated, unbounded concrete objects as finite abstract objects.
#### Allocation-site abstraction
Model concrete objects by their **allocation sites**, and each abstract object per allocation site represents ALL allocated concrete objects in dynamic analysis.
when in allocation-site abstraction, the class call will only be abstracted as **o2** in before example.
### Context Sensitivity
Context Sensitive: use each method multiple times. Different use has different data flow.
Context insensitie: use each method one time and all data flow merge.
### Flow Sensitivity
flow sensitive: respect the execution order of the statements
flow insensitive: ignore the CFG order, treat the whole program as a set of unorderd statement.
### Analysis Scope
whole-program: compute point-to information in all pointers.
demand-driven: only compute information for pointers demanded.
### Pointer Analysis in our case
Allocation-Site/Context-Sensitive/Flow-intensitive/Whole-program
### Concerned statement
we only focus on pointer affect statement
#### Pointers in Java
1. Local variable:x
2. Static field:C.f
3. Instance field:x.f
4. Array element:array[i]
#### Pointer affecting statements
New/Assign(x = y)/Store(x.f = y)/Load(y = x.f)/Call(r = x.k(a,...))
#### Domains and notations
Variables: V
Fields: F
Objects: $o_i,o_j \in$ O
Instance field: $o_i.f , o_j.g \in$ $O \times F$
Pointers include Variables & Instance field
Points-to relations: pt-> $\rho(O)$ , $\rho(O)$ is the powerset of the O
### Rules
1. x = new T() => $\frac{}{pt(x).append(o_i)}\frac{\leftarrow premise}{\leftarrow conclusion}$ if no premise then unconditional
2. y = x => $\frac{o_i\in pt(x)}{pt(y).append(o_i)}$
3. x.f = y => $\frac{o_i\in pt(x),o_j\in pt(y)}{pt(o_i.f).append(o_j)}$
4. y = x.f => $\frac{o_i\in pt(x),o_j\in pt(x.f)}{pt(y).append(o_j)}$
## Pointer Analysis Foundation
### Pointer Flow Graph(PFG)
display how objects flow among the pointers in the program.
Nodes: $V\bigcup U \times F$
Edges: $Pointer \times Pointer$
### Pointer Analysis Algorithms
```bash
# Main Algorithm
Solve(S){
WL = [],PFG=[]
for each i : x = new T() in S{
add to WL
}
for each x = y in S{
addEdge(y,x)
}
while(WL not empty){
remove top from WL
delta = pts - pt(n)
Propagate(n,delta)
if(n == variable x){
for each o_i in delta{
for each x.f = y in S{
addEdge(y,o_i.f)
}
for each y = x.f in S{
addEdge(o_i.f,y)
}
}
}
}
}
#Propagate
Propagate(n,pts){
if(pts not empty){
pt(n) = pt(n) & pts
for each n -> (s in PFG){
add to WL
}
}
}
# addEdge
addEdge(s,t){
if(s -> t not in PFG){
add s-> t to PFG
if(pt(s) not empty){
add to WL
}
}
}
```
Propagation aims to transfer delta to reduce loop count.
### Pointer Analysis with Method Calls
#### Difference between CHA and pointer-analysis in Call Graph Construction
1. CHA:imprecise, may introduce spurious call edges
2. pointer analysis:precise
#### Rules for method call
r = x.k(a1,...,an) => $\frac{o_i \in pt(x), m = Dispatch(o_i,k),o_u \in pt(a_j),o_v \in pt(m_{ret})}{o_i\in pt(m_{this}),o_u\in pt(m_{pj}),o_v \in pt(r)}$
1. $Dispatch(o_i,k),o_i\in pt(x)$ resolve the virtual dispatch of k on o_i
2. $m_{this}$ : this variable of m
3. $m_{pj}$ : the j-th parameter of m
4. $m_{ret}$ : the variable that holds return value
```bash
C x = new T()
r = x.foo(a1,a2);
class T {
B foo (A p1, A p2){
this;
return ret;
}
}
```
PFG Edges: a1 -> p1, a2 -> p2, ret -> r
### Algorithms(pointer analysis and Call Graph Construction combined)
S: set of reachable statement
S_M: set of statement in method M
RM: reachable method
CG: call graph edges
```bash
Solve(m_entry){
WL=[],PFG=[],S=[],RM=[],CG=[]
AddReachable(m_entry)
while(WL not empty){
remove top from WL
delta = pts - pt(n)
Propagate(n,delta)
if(n == variable x){
for each o_i in delta{
for each x.f = y in S{
addEdge(y,o_i.f)
}
for each y = x.f in S{
addEdge(o_i.f,y)
}
ProcessCall(x,o_i)
}
}
}
}
#Reachable
AddReachable(m){
if(m !in RM){
add m to RM
S = S combine S_M
for each i: x = new T() in S_M{
add to WL
}
for each x = y in S_M{
AddEdge(y,x)
}
}
}
#MethodCall
ProcessCall(x,o_i){
for each l: r = x.k(a1,...,an) in S{
m = Dispatch(o_i,k)
add to WL
if(l -> m !in CG){
add l -> m to CG
AddReachable(m)
for each p_i of m{
AddEdge(a_i,p_i)
}
AddEdge(m_ret,r)
}
}
}
#Propagate
Propagate(n,pts){
if(pts not empty){
pt(n) = pt(n) & pts
for each n -> (s in PFG){
add to WL
}
}
}
# addEdge
addEdge(s,t){
if(s -> t not in PFG){
add s-> t to PFG
if(pt(s) not empty){
add to WL
}
}
}
```