gcc/doc/loop.texi

   1 @c Copyright (c) 2006 Free Software Foundation, Inc.
   2 @c Free Software Foundation, Inc.
   3 @c This is part of the GCC manual.
   4 @c For copying conditions, see the file gcc.texi.
   5
   6 @c ---------------------------------------------------------------------
   7 @c Loop Representation
   8 @c ---------------------------------------------------------------------
   9
  10 @node Loop Representation
  11 @chapter Analysis and Representation of Loops
  12
  13 GCC provides extensive infrastructure for work with natural loops, i.e.,
  14 strongly connected components of CFG with only one entry block.  This
  15 chapter describes representation of loops in GCC, both on GIMPLE and in
  16 RTL, as well as the interfaces to loop-related analyses (induction
  17 variable analysis and number of iterations analysis).
  18
  19 @menu
  20 * Loop representation::         Representation and analysis of loops.
  21 * Loop querying::               Getting information about loops.
  22 * Loop manipulation::           Loop manipulation functions.
  23 * LCSSA::                       Loop-closed SSA form.
  24 * Scalar evolutions::           Induction variables on GIMPLE.
  25 * loop-iv::                     Induction variables on RTL.
  26 * Number of iterations::        Number of iterations analysis.
  27 * Dependency analysis::         Data dependency analysis.
  28 * Lambda::                      Linear loop transformations framework.
  29 @end menu
  30
  31 @node Loop representation
  32 @section Loop representation
  33 @cindex Loop representation
  34 @cindex Loop analysis
  35
  36 This chapter describes the representation of loops in GCC, and functions
  37 that can be used to build, modify and analyze this representation.  Most
  38 of the interfaces and data structures are declared in @file{cfgloop.h}.
  39 At the moment, loop structures are analyzed and this information is
  40 updated only by the optimization passes that deal with loops, but some
  41 efforts are being made to make it available throughout most of the
  42 optimization passes.
  43
  44 In general, a natural loop has one entry block (header) and possibly
  45 several back edges (latches) leading to the header from the inside of
  46 the loop.  Loops with several latches may appear if several loops share
  47 a single header, or if there is a branching in the middle of the loop.
  48 The representation of loops in GCC however allows only loops with a
  49 single latch.  During loop analysis, headers of such loops are split and
  50 forwarder blocks are created in order to disambiguate their structures.
  51 A heuristic based on profile information is used to determine whether
  52 the latches correspond to sub-loops or to control flow in a single loop.
  53 This means that the analysis sometimes changes the CFG, and if you run
  54 it in the middle of an optimization pass, you must be able to deal with
  55 the new blocks.
  56
  57 Body of the loop is the set of blocks that are dominated by its header,
  58 and reachable from its latch against the direction of edges in CFG.  The
  59 loops are organized in a containment hierarchy (tree) such that all the
  60 loops immediately contained inside loop L are the children of L in the
  61 tree.  This tree is represented by the @code{struct loops} structure.
  62 The root of this tree is a fake loop that contains all blocks in the
  63 function.  Each of the loops is represented in a @code{struct loop}
  64 structure.  Each loop is assigned an index (@code{num} field of the
  65 @code{struct loop} structure), and the pointer to the loop is stored in
  66 the corresponding field of the @code{larray} vector in the loops
  67 structure.  Index of a sub-loop is always greater than the index of its
  68 super-loop.  The indices do not have to be continuous, there may be
  69 empty (@code{NULL}) entries in the @code{larray} created by deleting
  70 loops.  The index of a loop never changes.
  71
  72 The entries of the @code{larray} field should not be accessed directly.
  73 The function @code{get_loop} returns the loop description for a loop with
  74 the given index.  @code{number_of_loops} function returns number of
  75 loops in the function.  To traverse all loops, use @code{FOR_EACH_LOOP}
  76 macro.  The @code{flags} argument of the macro is used to determine
  77 the direction of traversal and the set of loops visited.
  78
  79 Each basic block contains the reference to the innermost loop it belongs
  80 to (@code{loop_father}).  For this reason, it is only possible to have
  81 one @code{struct loops} structure initialized at the same time for each
  82 CFG.  The global variable @code{current_loops} contains the
  83 @code{struct loops} structure.  Many of the loop manipulation functions
  84 assume that dominance information is up-to-date.
  85
  86 The loops are analyzed through @code{loop_optimizer_init} function.  The
  87 argument of this function is a set of flags represented in an integer
  88 bitmask.  These flags specify what other properties of the loop
  89 structures should be calculated/enforced and preserved later:
  90
  91 @itemize
  92 @item @code{LOOPS_HAVE_PREHEADERS}: Forwarder blocks are created in such
  93 a way that each loop has only one entry edge, and additionally, the
  94 source block of this entry edge has only one successor.  This creates a
  95 natural place where the code can be moved out of the loop, and ensures
  96 that the entry edge of the loop leads from its immediate super-loop.
  97 @item @code{LOOPS_HAVE_SIMPLE_LATCHES}: Forwarder blocks are created to
  98 force the latch block of each loop to have only one successor.  This
  99 ensures that the latch of the loop does not belong to any of its
 100 sub-loops, and makes manipulation with the loops significantly easier.
 101 Most of the loop manipulation functions assume that the loops are in
 102 this shape.  Note that with this flag, the ``normal'' loop without any
 103 control flow inside and with one exit consists of two basic blocks.
 104 @item @code{LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS}: Basic blocks and
 105 edges in the strongly connected components that are not natural loops
 106 (have more than one entry block) are marked with
 107 @code{BB_IRREDUCIBLE_LOOP} and @code{EDGE_IRREDUCIBLE_LOOP} flags.  The
 108 flag is not set for blocks and edges that belong to natural loops that
 109 are in such an irreducible region (but it is set for the entry and exit
 110 edges of such a loop, if they lead to/from this region).
 111 @item @code{LOOPS_HAVE_RECORDED_EXITS}: The lists of exits are recorded
 112 and updated for each loop.  This makes some functions (e.g.,
 113 @code{get_loop_exit_edges}) more efficient.  Some functions (e.g.,
 114 @code{single_exit}) can be used only if the lists of exits are
 115 recorded.
 116 @end itemize
 117
 118 These properties may also be computed/enforced later, using functions
 119 @code{create_preheaders}, @code{force_single_succ_latches},
 120 @code{mark_irreducible_loops} and @code{record_loop_exits}.
 121
 122 The memory occupied by the loops structures should be freed with
 123 @code{loop_optimizer_finalize} function.
 124
 125 The CFG manipulation functions in general do not update loop structures.
 126 Specialized versions that additionally do so are provided for the most
 127 common tasks.  On GIMPLE, @code{cleanup_tree_cfg_loop} function can be
 128 used to cleanup CFG while updating the loops structures if
 129 @code{current_loops} is set.
 130
 131 @node Loop querying
 132 @section Loop querying
 133 @cindex Loop querying
 134
 135 The functions to query the information about loops are declared in
 136 @file{cfgloop.h}.  Some of the information can be taken directly from
 137 the structures.  @code{loop_father} field of each basic block contains
 138 the innermost loop to that the block belongs.  The most useful fields of
 139 loop structure (that are kept up-to-date at all times) are:
 140
 141 @itemize
 142 @item @code{header}, @code{latch}: Header and latch basic blocks of the
 143 loop.
 144 @item @code{num_nodes}: Number of basic blocks in the loop (including
 145 the basic blocks of the sub-loops).
 146 @item @code{depth}: The depth of the loop in the loops tree, i.e., the
 147 number of super-loops of the loop.
 148 @item @code{outer}, @code{inner}, @code{next}: The super-loop, the first
 149 sub-loop, and the sibling of the loop in the loops tree.
 150 @end itemize
 151
 152 There are other fields in the loop structures, many of them used only by
 153 some of the passes, or not updated during CFG changes; in general, they
 154 should not be accessed directly.
 155
 156 The most important functions to query loop structures are:
 157
 158 @itemize
 159 @item @code{flow_loops_dump}: Dumps the information about loops to a
 160 file.
 161 @item @code{verify_loop_structure}: Checks consistency of the loop
 162 structures.
 163 @item @code{loop_latch_edge}: Returns the latch edge of a loop.
 164 @item @code{loop_preheader_edge}: If loops have preheaders, returns
 165 the preheader edge of a loop.
 166 @item @code{flow_loop_nested_p}: Tests whether loop is a sub-loop of
 167 another loop.
 168 @item @code{flow_bb_inside_loop_p}: Tests whether a basic block belongs
 169 to a loop (including its sub-loops).
 170 @item @code{find_common_loop}: Finds the common super-loop of two loops.
 171 @item @code{superloop_at_depth}: Returns the super-loop of a loop with
 172 the given depth.
 173 @item @code{tree_num_loop_insns}, @code{num_loop_insns}: Estimates the
 174 number of insns in the loop, on GIMPLE and on RTL.
 175 @item @code{loop_exit_edge_p}: Tests whether edge is an exit from a
 176 loop.
 177 @item @code{mark_loop_exit_edges}: Marks all exit edges of all loops
 178 with @code{EDGE_LOOP_EXIT} flag.
 179 @item @code{get_loop_body}, @code{get_loop_body_in_dom_order},
 180 @code{get_loop_body_in_bfs_order}: Enumerates the basic blocks in the
 181 loop in depth-first search order in reversed CFG, ordered by dominance
 182 relation, and breath-first search order, respectively.
 183 @item @code{single_exit}: Returns the single exit edge of the loop, or
 184 @code{NULL} if the loop has more than one exit.  You can only use this
 185 function if LOOPS_HAVE_MARKED_SINGLE_EXITS property is used.
 186 @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
 187 @item @code{just_once_each_iteration_p}: Returns true if the basic block
 188 is executed exactly once during each iteration of a loop (that is, it
 189 does not belong to a sub-loop, and it dominates the latch of the loop).
 190 @end itemize
 191
 192 @node Loop manipulation
 193 @section Loop manipulation
 194 @cindex Loop manipulation
 195
 196 The loops tree can be manipulated using the following functions:
 197
 198 @itemize
 199 @item @code{flow_loop_tree_node_add}: Adds a node to the tree.
 200 @item @code{flow_loop_tree_node_remove}: Removes a node from the tree.
 201 @item @code{add_bb_to_loop}: Adds a basic block to a loop.
 202 @item @code{remove_bb_from_loops}: Removes a basic block from loops.
 203 @end itemize
 204
 205 Most low-level CFG functions update loops automatically.  The following
 206 functions handle some more complicated cases of CFG manipulations:
 207
 208 @itemize
 209 @item @code{remove_path}: Removes an edge and all blocks it dominates.
 210 @item @code{split_loop_exit_edge}: Splits exit edge of the loop,
 211 ensuring that PHI node arguments remain in the loop (this ensures that
 212 loop-closed SSA form is preserved).  Only useful on GIMPLE.
 213 @end itemize
 214
 215 Finally, there are some higher-level loop transformations implemented.
 216 While some of them are written so that they should work on non-innermost
 217 loops, they are mostly untested in that case, and at the moment, they
 218 are only reliable for the innermost loops:
 219
 220 @itemize
 221 @item @code{create_iv}: Creates a new induction variable.  Only works on
 222 GIMPLE.  @code{standard_iv_increment_position} can be used to find a
 223 suitable place for the iv increment.
 224 @item @code{duplicate_loop_to_header_edge},
 225 @code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and
 226 on GIMPLE) duplicate the body of the loop prescribed number of times on
 227 one of the edges entering loop header, thus performing either loop
 228 unrolling or loop peeling.  @code{can_duplicate_loop_p}
 229 (@code{can_unroll_loop_p} on GIMPLE) must be true for the duplicated
 230 loop.
 231 @item @code{loop_version}, @code{tree_ssa_loop_version}: These function
 232 create a copy of a loop, and a branch before them that selects one of
 233 them depending on the prescribed condition.  This is useful for
 234 optimizations that need to verify some assumptions in runtime (one of
 235 the copies of the loop is usually left unchanged, while the other one is
 236 transformed in some way).
 237 @item @code{tree_unroll_loop}: Unrolls the loop, including peeling the
 238 extra iterations to make the number of iterations divisible by unroll
 239 factor, updating the exit condition, and removing the exits that now
 240 cannot be taken.  Works only on GIMPLE.
 241 @end itemize
 242
 243 @node LCSSA
 244 @section Loop-closed SSA form
 245 @cindex LCSSA
 246 @cindex Loop-closed SSA form
 247
 248 Throughout the loop optimizations on tree level, one extra condition is
 249 enforced on the SSA form:  No SSA name is used outside of the loop in
 250 that it is defined.  The SSA form satisfying this condition is called
 251 ``loop-closed SSA form'' -- LCSSA.  To enforce LCSSA, PHI nodes must be
 252 created at the exits of the loops for the SSA names that are used
 253 outside of them.  Only the real operands (not virtual SSA names) are
 254 held in LCSSA, in order to save memory.
 255
 256 There are various benefits of LCSSA:
 257
 258 @itemize
 259 @item Many optimizations (value range analysis, final value
 260 replacement) are interested in the values that are defined in the loop
 261 and used outside of it, i.e., exactly those for that we create new PHI
 262 nodes.
 263 @item In induction variable analysis, it is not necessary to specify the
 264 loop in that the analysis should be performed -- the scalar evolution
 265 analysis always returns the results with respect to the loop in that the
 266 SSA name is defined.
 267 @item It makes updating of SSA form during loop transformations simpler.
 268 Without LCSSA, operations like loop unrolling may force creation of PHI
 269 nodes arbitrarily far from the loop, while in LCSSA, the SSA form can be
 270 updated locally.  However, since we only keep real operands in LCSSA, we
 271 cannot use this advantage (we could have local updating of real
 272 operands, but it is not much more efficient than to use generic SSA form
 273 updating for it as well; the amount of changes to SSA is the same).
 274 @end itemize
 275
 276 However, it also means LCSSA must be updated.  This is usually
 277 straightforward, unless you create a new value in loop and use it
 278 outside, or unless you manipulate loop exit edges (functions are
 279 provided to make these manipulations simple).
 280 @code{rewrite_into_loop_closed_ssa} is used to rewrite SSA form to
 281 LCSSA, and @code{verify_loop_closed_ssa} to check that the invariant of
 282 LCSSA is preserved.
 283
 284 @node Scalar evolutions
 285 @section Scalar evolutions
 286 @cindex Scalar evolutions
 287 @cindex IV analysis on GIMPLE
 288
 289 Scalar evolutions (SCEV) are used to represent results of induction
 290 variable analysis on GIMPLE.  They enable us to represent variables with
 291 complicated behavior in a simple and consistent way (we only use it to
 292 express values of polynomial induction variables, but it is possible to
 293 extend it).  The interfaces to SCEV analysis are declared in
 294 @file{tree-scalar-evolution.h}.  To use scalar evolutions analysis,
 295 @code{scev_initialize} must be used.  To stop using SCEV,
 296 @code{scev_finalize} should be used.  SCEV analysis caches results in
 297 order to save time and memory.  This cache however is made invalid by
 298 most of the loop transformations, including removal of code.  If such a
 299 transformation is performed, @code{scev_reset} must be called to clean
 300 the caches.
 301
 302 Given an SSA name, its behavior in loops can be analyzed using the
 303 @code{analyze_scalar_evolution} function.  The returned SCEV however
 304 does not have to be fully analyzed and it may contain references to
 305 other SSA names defined in the loop.  To resolve these (potentially
 306 recursive) references, @code{instantiate_parameters} or
 307 @code{resolve_mixers} functions must be used.
 308 @code{instantiate_parameters} is useful when you use the results of SCEV
 309 only for some analysis, and when you work with whole nest of loops at
 310 once.  It will try replacing all SSA names by their SCEV in all loops,
 311 including the super-loops of the current loop, thus providing a complete
 312 information about the behavior of the variable in the loop nest.
 313 @code{resolve_mixers} is useful if you work with only one loop at a
 314 time, and if you possibly need to create code based on the value of the
 315 induction variable.  It will only resolve the SSA names defined in the
 316 current loop, leaving the SSA names defined outside unchanged, even if
 317 their evolution in the outer loops is known.
 318
 319 The SCEV is a normal tree expression, except for the fact that it may
 320 contain several special tree nodes.  One of them is
 321 @code{SCEV_NOT_KNOWN}, used for SSA names whose value cannot be
 322 expressed.  The other one is @code{POLYNOMIAL_CHREC}.  Polynomial chrec
 323 has three arguments -- base, step and loop (both base and step may
 324 contain further polynomial chrecs).  Type of the expression and of base
 325 and step must be the same.  A variable has evolution
 326 @code{POLYNOMIAL_CHREC(base, step, loop)} if it is (in the specified
 327 loop) equivalent to @code{x_1} in the following example
 328
 329 @smallexample
 330 while (...)
 331   @{
 332     x_1 = phi (base, x_2);
 333     x_2 = x_1 + step;
 334   @}
 335 @end smallexample
 336
 337 Note that this includes the language restrictions on the operations.
 338 For example, if we compile C code and @code{x} has signed type, then the
 339 overflow in addition would cause undefined behavior, and we may assume
 340 that this does not happen.  Hence, the value with this SCEV cannot
 341 overflow (which restricts the number of iterations of such a loop).
 342
 343 In many cases, one wants to restrict the attention just to affine
 344 induction variables.  In this case, the extra expressive power of SCEV
 345 is not useful, and may complicate the optimizations.  In this case,
 346 @code{simple_iv} function may be used to analyze a value -- the result
 347 is a loop-invariant base and step.
 348
 349 @node loop-iv
 350 @section IV analysis on RTL
 351 @cindex IV analysis on RTL
 352
 353 The induction variable on RTL is simple and only allows analysis of
 354 affine induction variables, and only in one loop at once.  The interface
 355 is declared in @file{cfgloop.h}.  Before analyzing induction variables
 356 in a loop L, @code{iv_analysis_loop_init} function must be called on L.
 357 After the analysis (possibly calling @code{iv_analysis_loop_init} for
 358 several loops) is finished, @code{iv_analysis_done} should be called.
 359 The following functions can be used to access the results of the
 360 analysis:
 361
 362 @itemize
 363 @item @code{iv_analyze}: Analyzes a single register used in the given
 364 insn.  If no use of the register in this insn is found, the following
 365 insns are scanned, so that this function can be called on the insn
 366 returned by get_condition.
 367 @item @code{iv_analyze_result}: Analyzes result of the assignment in the
 368 given insn.
 369 @item @code{iv_analyze_expr}: Analyzes a more complicated expression.
 370 All its operands are analyzed by @code{iv_analyze}, and hence they must
 371 be used in the specified insn or one of the following insns.
 372 @end itemize
 373
 374 The description of the induction variable is provided in @code{struct
 375 rtx_iv}.  In order to handle subregs, the representation is a bit
 376 complicated; if the value of the @code{extend} field is not
 377 @code{UNKNOWN}, the value of the induction variable in the i-th
 378 iteration is
 379
 380 @smallexample
 381 delta + mult * extend_@{extend_mode@} (subreg_@{mode@} (base + i * step)),
 382 @end smallexample
 383
 384 with the following exception:  if @code{first_special} is true, then the
 385 value in the first iteration (when @code{i} is zero) is @code{delta +
 386 mult * base}.  However, if @code{extend} is equal to @code{UNKNOWN},
 387 then @code{first_special} must be false, @code{delta} 0, @code{mult} 1
 388 and the value in the i-th iteration is
 389
 390 @smallexample
 391 subreg_@{mode@} (base + i * step)
 392 @end smallexample
 393
 394 The function @code{get_iv_value} can be used to perform these
 395 calculations.
 396
 397 @node Number of iterations
 398 @section Number of iterations analysis
 399 @cindex Number of iterations analysis
 400
 401 Both on GIMPLE and on RTL, there are functions available to determine
 402 the number of iterations of a loop, with a similar interface.  The
 403 number of iterations of a loop in GCC is defined as the number of
 404 executions of the loop latch.  In many cases, it is not possible to
 405 determine the number of iterations unconditionally -- the determined
 406 number is correct only if some assumptions are satisfied.  The analysis
 407 tries to verify these conditions using the information contained in the
 408 program; if it fails, the conditions are returned together with the
 409 result.  The following information and conditions are provided by the
 410 analysis:
 411
 412 @itemize
 413 @item @code{assumptions}: If this condition is false, the rest of
 414 the information is invalid.
 415 @item @code{noloop_assumptions} on RTL, @code{may_be_zero} on GIMPLE: If
 416 this condition is true, the loop exits in the first iteration.
 417 @item @code{infinite}: If this condition is true, the loop is infinite.
 418 This condition is only available on RTL.  On GIMPLE, conditions for
 419 finiteness of the loop are included in @code{assumptions}.
 420 @item @code{niter_expr} on RTL, @code{niter} on GIMPLE: The expression
 421 that gives number of iterations.  The number of iterations is defined as
 422 the number of executions of the loop latch.
 423 @end itemize
 424
 425 Both on GIMPLE and on RTL, it necessary for the induction variable
 426 analysis framework to be initialized (SCEV on GIMPLE, loop-iv on RTL).
 427 On GIMPLE, the results are stored to @code{struct tree_niter_desc}
 428 structure.  Number of iterations before the loop is exited through a
 429 given exit can be determined using @code{number_of_iterations_exit}
 430 function.  On RTL, the results are returned in @code{struct niter_desc}
 431 structure.  The corresponding function is named
 432 @code{check_simple_exit}.  There are also functions that pass through
 433 all the exits of a loop and try to find one with easy to determine
 434 number of iterations -- @code{find_loop_niter} on GIMPLE and
 435 @code{find_simple_exit} on RTL.  Finally, there are functions that
 436 provide the same information, but additionally cache it, so that
 437 repeated calls to number of iterations are not so costly --
 438 @code{number_of_latch_executions} on GIMPLE and @code{get_simple_loop_desc}
 439 on RTL.
 440
 441 Note that some of these functions may behave slightly differently than
 442 others -- some of them return only the expression for the number of
 443 iterations, and fail if there are some assumptions.  The function
 444 @code{number_of_latch_executions} works only for single-exit loops.
 445 The function @code{number_of_cond_exit_executions} can be used to
 446 determine number of executions of the exit condition of a single-exit
 447 loop (i.e., the @code{number_of_latch_executions} increased by one).
 448
 449 @node Dependency analysis
 450 @section Data Dependency Analysis
 451 @cindex Data Dependency Analysis
 452
 453 The code for the data dependence analysis can be found in
 454 @file{tree-data-ref.c} and its interface and data structures are
 455 described in @file{tree-data-ref.h}.  The function that computes the
 456 data dependences for all the array and pointer references for a given
 457 loop is @code{compute_data_dependences_for_loop}.  This function is
 458 currently used by the linear loop transform and the vectorization
 459 passes.  Before calling this function, one has to allocate two vectors:
 460 a first vector will contain the set of data references that are
 461 contained in the analyzed loop body, and the second vector will contain
 462 the dependence relations between the data references.  Thus if the
 463 vector of data references is of size @code{n}, the vector containing the
 464 dependence relations will contain @code{n*n} elements.  However if the
 465 analyzed loop contains side effects, such as calls that potentially can
 466 interfere with the data references in the current analyzed loop, the
 467 analysis stops while scanning the loop body for data references, and
 468 inserts a single @code{chrec_dont_know} in the dependence relation
 469 array.
 470
 471 The data references are discovered in a particular order during the
 472 scanning of the loop body: the loop body is analyzed in execution order,
 473 and the data references of each statement are pushed at the end of the
 474 data reference array.  Two data references syntactically occur in the
 475 program in the same order as in the array of data references.  This
 476 syntactic order is important in some classical data dependence tests,
 477 and mapping this order to the elements of this array avoids costly
 478 queries to the loop body representation.
 479
 480 Three types of data references are currently handled: ARRAY_REF,
 481 INDIRECT_REF and COMPONENT_REF. The data structure for the data reference
 482 is @code{data_reference}, where @code{data_reference_p} is a name of a
 483 pointer to the data reference structure. The structure contains the
 484 following elements:
 485
 486 @itemize
 487 @item @code{base_object_info}: Provides information about the base object
 488 of the data reference and its access functions. These access functions
 489 represent the evolution of the data reference in the loop relative to
 490 its base, in keeping with the classical meaning of the data reference
 491 access function for the support of arrays. For example, for a reference
 492 @code{a.b[i][j]}, the base object is @code{a.b} and the access functions,
 493 one for each array subscript, are:
 494 @code{@{i_init, + i_step@}_1, @{j_init, +, j_step@}_2}.
 495
 496 @item @code{first_location_in_loop}: Provides information about the first
 497 location accessed by the data reference in the loop and about the access
 498 function used to represent evolution relative to this location. This data
 499 is used to support pointers, and is not used for arrays (for which we
 500 have base objects). Pointer accesses are represented as a one-dimensional
 501 access that starts from the first location accessed in the loop. For
 502 example:
 503
 504 @smallexample
 505       for1 i
 506          for2 j
 507           *((int *)p + i + j) = a[i][j];
 508 @end smallexample
 509
 510 The access function of the pointer access is @code{@{0, + 4B@}_for2}
 511 relative to @code{p + i}. The access functions of the array are
 512 @code{@{i_init, + i_step@}_for1} and @code{@{j_init, +, j_step@}_for2}
 513 relative to @code{a}.
 514
 515 Usually, the object the pointer refers to is either unknown, or we can't
 516 prove that the access is confined to the boundaries of a certain object.
 517
 518 Two data references can be compared only if at least one of these two
 519 representations has all its fields filled for both data references.
 520
 521 The current strategy for data dependence tests is as follows:
 522 If both @code{a} and @code{b} are represented as arrays, compare
 523 @code{a.base_object} and @code{b.base_object};
 524 if they are equal, apply dependence tests (use access functions based on
 525 base_objects).
 526 Else if both @code{a} and @code{b} are represented as pointers, compare
 527 @code{a.first_location} and @code{b.first_location};
 528 if they are equal, apply dependence tests (use access functions based on
 529 first location).
 530 However, if @code{a} and @code{b} are represented differently, only try
 531 to prove that the bases are definitely different.
 532
 533 @item Aliasing information.
 534 @item Alignment information.
 535 @end itemize
 536
 537 The structure describing the relation between two data references is
 538 @code{data_dependence_relation} and the shorter name for a pointer to
 539 such a structure is @code{ddr_p}.  This structure contains:
 540
 541 @itemize
 542 @item a pointer to each data reference,
 543 @item a tree node @code{are_dependent} that is set to @code{chrec_known}
 544 if the analysis has proved that there is no dependence between these two
 545 data references, @code{chrec_dont_know} if the analysis was not able to
 546 determine any useful result and potentially there could exist a
 547 dependence between these data references, and @code{are_dependent} is
 548 set to @code{NULL_TREE} if there exist a dependence relation between the
 549 data references, and the description of this dependence relation is
 550 given in the @code{subscripts}, @code{dir_vects}, and @code{dist_vects}
 551 arrays,
 552 @item a boolean that determines whether the dependence relation can be
 553 represented by a classical distance vector,
 554 @item an array @code{subscripts} that contains a description of each
 555 subscript of the data references.  Given two array accesses a
 556 subscript is the tuple composed of the access functions for a given
 557 dimension.  For example, given @code{A[f1][f2][f3]} and
 558 @code{B[g1][g2][g3]}, there are three subscripts: @code{(f1, g1), (f2,
 559 g2), (f3, g3)}.
 560 @item two arrays @code{dir_vects} and @code{dist_vects} that contain
 561 classical representations of the data dependences under the form of
 562 direction and distance dependence vectors,
 563 @item an array of loops @code{loop_nest} that contains the loops to
 564 which the distance and direction vectors refer to.
 565 @end itemize
 566
 567 Several functions for pretty printing the information extracted by the
 568 data dependence analysis are available: @code{dump_ddrs} prints with a
 569 maximum verbosity the details of a data dependence relations array,
 570 @code{dump_dist_dir_vectors} prints only the classical distance and
 571 direction vectors for a data dependence relations array, and
 572 @code{dump_data_references} prints the details of the data references
 573 contained in a data reference array.
 574
 575 @node Lambda
 576 @section Linear loop transformations framework
 577 @cindex Linear loop transformations framework
 578
 579 Lambda is a framework that allows transformations of loops using
 580 non-singular matrix based transformations of the iteration space and
 581 loop bounds. This allows compositions of skewing, scaling, interchange,
 582 and reversal transformations.  These transformations are often used to
 583 improve cache behavior or remove inner loop dependencies to allow
 584 parallelization and vectorization to take place.
 585
 586 To perform these transformations, Lambda requires that the loopnest be
 587 converted into a internal form that can be matrix transformed easily.
 588 To do this conversion, the function
 589 @code{gcc_loopnest_to_lambda_loopnest} is provided.  If the loop cannot
 590 be transformed using lambda, this function will return NULL.
 591
 592 Once a @code{lambda_loopnest} is obtained from the conversion function,
 593 it can be transformed by using @code{lambda_loopnest_transform}, which
 594 takes a transformation matrix to apply.  Note that it is up to the
 595 caller to verify that the transformation matrix is legal to apply to the
 596 loop (dependence respecting, etc).  Lambda simply applies whatever
 597 matrix it is told to provide.  It can be extended to make legal matrices
 598 out of any non-singular matrix, but this is not currently implemented.
 599 Legality of a matrix for a given loopnest can be verified using
 600 @code{lambda_transform_legal_p}.
 601
 602 Given a transformed loopnest, conversion back into gcc IR is done by
 603 @code{lambda_loopnest_to_gcc_loopnest}.  This function will modify the
 604 loops so that they match the transformed loopnest.
 605