Compiler Structure
- Source code parsed to produce AST
- AST transformed to CFG
- Data flow analysis operates on control flow graph (and other intermediate representations)
Abstract Syntax Tree (AST)
- Programs are written in text
■ I.e., sequences of characters
■ Awkward to work with - First step: Convert to structured representation
■ Use lexer (like flex) to recognize tokens
– Sequences of characters that make words in the language
■ Use parser (like bison) to group words structurally
– And, often, to produce AST
Abstract Syntax Tree Example
ASTs
- ASTs are abstract
■ They don’t contain all information in the program
– E.g., spacing, comments, brackets, parentheses
■ Any ambiguity has been resolved
– E.g., a + b + c produces the same AST as (a + b) + c - For more info, see CMSC 430
■ In this class, we will generally begin at the AST level
Disadvantages of ASTs
- AST has many similar forms
■ E.g., for, while, repeat...until ■ E.g., if, ?:, switch
- Expressions in AST may be complex, nested
■ (42 * y) + (z > 5 ? 12 * z : z + 20)
• Want simpler representation for analysis
■ …at least, for dataflow analysis
Control-Flow Graph (CFG)
- A directed graph where
■ Each node represents a statement
■ Edges represent control flow - Statements may be
■ Assignments x := y op z or x := op z
■ Copy statements x := y
■ Branches goto L or if x relop y goto L
■ etc.
Control-Flow Graph Example
Variations on CFGs
- We usually don’t include declarations (e.g., int x;)
■ But there’s usually something in the implementation - May want a unique entry and exit node
■ Won’t matter for the examples we give - May group statements into basic blocks
■ A sequence of instructions with no branches into or out of the block
Control-Flow Graph w/Basic Blocks
- Can lead to more efficient implementations
- But more complicated to explain, so…
■ We’ll use single-statement blocks
CFG vs. AST
- CFGs are much simpler than ASTs
■ Fewer forms, less redundancy, only simple expressions
■ - But…AST is a more faithful representation
■ CFGs introduce temporaries
■ Lose block structure of program
■ - So for AST,
■ Easier to report error + other messages
■ Easier to explain to programmer
■ Easier to unparse to produce readable code
Data Flow Analysis
- A framework for proving facts about programs
- Reasons about lots of little facts
- Little or no interaction between facts
■ Works best on properties about how program computes - Based on all paths through program
■ Including infeasible paths
Available Expressions
- An expression e is available at program point p if
■ e is computed on every path to p, and
■ the value of e has not changed since the last time e is computed on p - Optimization
■ If an expression is available, need not be recomputed
– (At least, if it’s still in a register somewhere)
Data Flow Facts
Gen and Kill
- What is the effect of each statement on the set of facts?
Computing Available Expressions
Terminology
- A joint point is a program point where two branches meet
- Available expressions is a forward must problem
■ Forward = Data flow from in to out
■ Must = At join point, property must hold on all paths that are joined
Data Flow Equations
- • Let s be a statement
■ succ(s) = { immediate successor statements of s }
■ pred(s) = { immediate predecessor statements of s}
■ In(s) = program point just before executing s
■ Out(s) = program point just after executing s
■ Note: These are also called transfer functions
Liveness Analysis
- A variable v is live at program point p if
■ v will be used on some execution path originating from p…
■ before v is overwritten - Optimization
■ If a variable is not live, no need to keep it in a register
■ If variable is dead at assignment, can eliminate assignment
Data Flow Equations
- Available expressions is a forward must analysis
■ Data flow propagate in same dir as CFG edges
■ Expr is available only if available on all paths - Liveness is a backward may problem
■ To know if variable live, need to look at future uses
■ Variable is live if available on some path
Gen and Kill
- What is the effect of each statement on the set of facts?
Computing Live Variables
Very Busy Expressions
- An expression e is very busy at point p if
■ On every path from p, e is evaluated before the value of e is changed - Optimization
■ Can hoist very busy expression computation - What kind of problem?
■ Forward or backward? backward
■ May or must? must
Reaching Definitions
- A definition of a variable v is an assignment to v
- A definition of variable v reaches point p if
■ There is no intervening assignment to v - Also called def-use information
- What kind of problem?
■ Forward or backward? forward
■ May or must? may
Space of Data Flow Analyses
- Most data flow analyses can be classified this way
■ A few don’t fit: bidirectional analysis - Lots of literature on data flow analysis
One response to “Abstract Syntax Tree”