RSS

Abstract Syntax Tree

04 Oct

Compiler Structure

  • Source code parsed to produce AST
  • AST transformed to CFG
  • Data flow analysis operates on control flow graph (and other intermediate representations)

Abstract Syntax Tree (AST)

  • Programs are written in text
    ■ I.e., sequences of characters
    ■ Awkward to work with
  • First step: Convert to structured representation
    ■ Use lexer (like flex) to recognize tokens
    – Sequences of characters that make words in the language
    ■ Use parser (like bison) to group words structurally
    – And, often, to produce AST

Abstract Syntax Tree Example

ASTs

  • ASTs are abstract
    ■ They don’t contain all information in the program
    – E.g., spacing, comments, brackets, parentheses
    ■ Any ambiguity has been resolved
    – E.g., a + b + c produces the same AST as (a + b) + c
  • For more info, see CMSC 430
    ■ In this class, we will generally begin at the AST level

Disadvantages of ASTs

  • AST has many similar forms
    ■ E.g., for, while, repeat...until
    ■ E.g., if, ?:, switch
  • Expressions in AST may be complex, nested
    ■ (42 * y) + (z > 5 ? 12 * z : z + 20)
    • Want simpler representation for analysis
    ■ …at least, for dataflow analysis

Control-Flow Graph (CFG)

  • A directed graph where
    ■ Each node represents a statement
    ■ Edges represent control flow
  • Statements may be
    ■ Assignments x := y op z or x := op z
    ■ Copy statements x := y
    ■ Branches goto L or if x relop y goto L
    ■ etc.

Control-Flow Graph Example

Variations on CFGs

  • We usually don’t include declarations (e.g., int x;)
    ■ But there’s usually something in the implementation
  • May want a unique entry and exit node
    ■ Won’t matter for the examples we give
  • May group statements into basic blocks
    ■ A sequence of instructions with no branches into or out of the block

Control-Flow Graph w/Basic Blocks

  • Can lead to more efficient implementations
  • But more complicated to explain, so…
    ■ We’ll use single-statement blocks

CFG vs. AST

  • CFGs are much simpler than ASTs
    ■ Fewer forms, less redundancy, only simple expressions
  • But…AST is a more faithful representation
    ■ CFGs introduce temporaries
    ■ Lose block structure of program
  • So for AST,
    ■ Easier to report error + other messages
    ■ Easier to explain to programmer
    ■ Easier to unparse to produce readable code

Data Flow Analysis

  • A framework for proving facts about programs
  • Reasons about lots of little facts
  • Little or no interaction between facts
    ■ Works best on properties about how program computes
  • Based on all paths through program
    ■ Including infeasible paths

Available Expressions

  • An expression e is available at program point p if
    e is computed on every path to p, and
    ■ the value of e has not changed since the last time e is computed on p
  • Optimization
    ■ If an expression is available, need not be recomputed
    – (At least, if it’s still in a register somewhere)

Data Flow Facts

  • Is expression e available?
  • Facts:
    a + b is available
    a * b is available
    a + 1 is available

Gen and Kill

  • What is the effect of each statement on the set of facts?


Computing Available Expressions

Terminology

  • A joint point is a program point where two branches meet
  • Available expressions is a forward must problem
    ■ Forward = Data flow from in to out
    ■ Must = At join point, property must hold on all paths that are joined

Data Flow Equations

  • • Let s be a statement
    ■ succ(s) = { immediate successor statements of s }
    ■ pred(s) = { immediate predecessor statements of s}
    ■ In(s) = program point just before executing s
    ■ Out(s) = program point just after executing s

■ Note: These are also called transfer functions

Liveness Analysis

  • A variable v is live at program point p if
    ■ v will be used on some execution path originating from p…
    ■ before v is overwritten
  • Optimization
    ■ If a variable is not live, no need to keep it in a register
    ■ If variable is dead at assignment, can eliminate assignment

Data Flow Equations

  • Available expressions is a forward must analysis
    ■ Data flow propagate in same dir as CFG edges
    ■ Expr is available only if available on all paths
  • Liveness is a backward may problem
    ■ To know if variable live, need to look at future uses
    ■ Variable is live if available on some path

Gen and Kill

  • What is the effect of each statement on the set of facts?

Computing Live Variables

Very Busy Expressions

  • An expression e is very busy at point p if
    ■ On every path from p, e is evaluated before the value of e is changed
  • Optimization
    ■ Can hoist very busy expression computation
  • What kind of problem?
    ■ Forward or backward?  backward
    ■ May or must?  must

Reaching Definitions

  • A definition of a variable v is an assignment to v
  • A definition of variable v reaches point p if
    ■ There is no intervening assignment to v
  • Also called def-use information
  • What kind of problem?
    ■ Forward or backward? forward
    ■ May or must? may

Space of Data Flow Analyses

  • Most data flow analyses can be classified this way
    ■ A few don’t fit: bidirectional analysis
  • Lots of literature on data flow analysis
 
1 Comment

Posted by on October 4, 2008 in GWT/ JSNI / COMPILER

 

Tags: , , , , , ,

One response to “Abstract Syntax Tree

Leave a comment