aleph1.co.uk Git - yaffs-website/blob - vendor/nikic/php-parser/doc/2_Usage_of_basic_components.markdown

   1 Usage of basic components
   2 =========================
   3
   4 This document explains how to use the parser, the pretty printer and the node traverser.
   5
   6 Bootstrapping
   7 -------------
   8
   9 To bootstrap the library, include the autoloader generated by composer:
  10
  11 ```php
  12 require 'path/to/vendor/autoload.php';
  13 ```
  14
  15 Additionally you may want to set the `xdebug.max_nesting_level` ini option to a higher value:
  16
  17 ```php
  18 ini_set('xdebug.max_nesting_level', 3000);
  19 ```
  20
  21 This ensures that there will be no errors when traversing highly nested node trees. However, it is
  22 preferable to disable XDebug completely, as it can easily make this library more than five times
  23 slower.
  24
  25 Parsing
  26 -------
  27
  28 In order to parse code, you first have to create a parser instance:
  29
  30 ```php
  31 use PhpParser\ParserFactory;
  32 $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
  33 ```
  34
  35 The factory accepts a kind argument, that determines how different PHP versions are treated:
  36
  37 Kind | Behavior
  38 -----|---------
  39 `ParserFactory::PREFER_PHP7` | Try to parse code as PHP 7. If this fails, try to parse it as PHP 5.
  40 `ParserFactory::PREFER_PHP5` | Try to parse code as PHP 5. If this fails, try to parse it as PHP 7.
  41 `ParserFactory::ONLY_PHP7` | Parse code as PHP 7.
  42 `ParserFactory::ONLY_PHP5` | Parse code as PHP 5.
  43
  44 Unless you have a strong reason to use something else, `PREFER_PHP7` is a reasonable default.
  45
  46 The `create()` method optionally accepts a `Lexer` instance as the second argument. Some use cases
  47 that require customized lexers are discussed in the [lexer documentation](component/Lexer.markdown).
  48
  49 Subsequently you can pass PHP code (including the opening `<?php` tag) to the `parse` method in order to
  50 create a syntax tree. If a syntax error is encountered, an `PhpParser\Error` exception will be thrown:
  51
  52 ```php
  53 <?php
  54 use PhpParser\Error;
  55 use PhpParser\ParserFactory;
  56
  57 $code = <<<'CODE'
  58 <?php
  59 function printLine($msg) {
  60     echo $msg, "\n";
  61 }
  62 printLine('Hello World!!!');
  63 CODE;
  64
  65 $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
  66
  67 try {
  68     $stmts = $parser->parse($code);
  69     // $stmts is an array of statement nodes
  70 } catch (Error $e) {
  71     echo 'Parse Error: ', $e->getMessage();
  72 }
  73 ```
  74
  75 A parser instance can be reused to parse multiple files.
  76
  77 Node dumping
  78 ------------
  79
  80 To dump the abstact syntax tree in human readable form, a `NodeDumper` can be used:
  81
  82 ```php
  83 <?php
  84 use PhpParser\NodeDumper;
  85
  86 $nodeDumper = new NodeDumper;
  87 echo $nodeDumper->dump($stmts), "\n";
  88 ```
  89
  90 For the sample code from the previous section, this will produce the following output:
  91
  92 ```
  93 array(
  94     0: Stmt_Function(
  95         byRef: false
  96         name: Identifier(
  97             name: printLine
  98         )
  99         params: array(
 100             0: Param(
 101                 type: null
 102                 byRef: false
 103                 variadic: false
 104                 var: Expr_Variable(
 105                     name: msg
 106                 )
 107                 default: null
 108             )
 109         )
 110         returnType: null
 111         stmts: array(
 112             0: Stmt_Echo(
 113                 exprs: array(
 114                     0: Expr_Variable(
 115                         name: msg
 116                     )
 117                     1: Scalar_String(
 118                         value:
 119
 120                     )
 121                 )
 122             )
 123         )
 124     )
 125     1: Stmt_Expression(
 126         expr: Expr_FuncCall(
 127             name: Name(
 128                 parts: array(
 129                     0: printLine
 130                 )
 131             )
 132             args: array(
 133                 0: Arg(
 134                     value: Scalar_String(
 135                         value: Hello World!!!
 136                     )
 137                     byRef: false
 138                     unpack: false
 139                 )
 140             )
 141         )
 142     )
 143 )
 144 ```
 145
 146 You can also use the `php-parse` script to obtain such a node dump by calling it either with a file
 147 name or code string:
 148
 149 ```sh
 150 vendor/bin/php-parse file.php
 151 vendor/bin/php-parse "<?php foo();"
 152 ```
 153
 154 This can be very helpful if you want to quickly check how certain syntax is represented in the AST.
 155
 156 Node tree structure
 157 -------------------
 158
 159 Looking at the node dump above, you can see that `$stmts` for this example code is an array of two
 160 nodes, a `Stmt_Function` and a `Stmt_Expression`. The corresponding class names are:
 161
 162  * `Stmt_Function -> PhpParser\Node\Stmt\Function_`
 163  * `Stmt_Expression -> PhpParser\Node\Stmt\Expression`
 164
 165 The additional `_` at the end of the first class name is necessary, because `Function` is a
 166 reserved keyword. Many node class names in this library have a trailing `_` to avoid clashing with
 167 a keyword.
 168
 169 As PHP is a large language there are approximately 140 different nodes. In order to make working
 170 with them easier they are grouped into three categories:
 171
 172  * `PhpParser\Node\Stmt`s are statement nodes, i.e. language constructs that do not return
 173    a value and can not occur in an expression. For example a class definition is a statement.
 174    It doesn't return a value and you can't write something like `func(class A {});`.
 175  * `PhpParser\Node\Expr`s are expression nodes, i.e. language constructs that return a value
 176    and thus can occur in other expressions. Examples of expressions are `$var`
 177    (`PhpParser\Node\Expr\Variable`) and `func()` (`PhpParser\Node\Expr\FuncCall`).
 178  * `PhpParser\Node\Scalar`s are nodes representing scalar values, like `'string'`
 179    (`PhpParser\Node\Scalar\String_`), `0` (`PhpParser\Node\Scalar\LNumber`) or magic constants
 180    like `__FILE__` (`PhpParser\Node\Scalar\MagicConst\File`). All `PhpParser\Node\Scalar`s extend
 181    `PhpParser\Node\Expr`, as scalars are expressions, too.
 182  * There are some nodes not in either of these groups, for example names (`PhpParser\Node\Name`)
 183    and call arguments (`PhpParser\Node\Arg`).
 184
 185 The `Node\Stmt\Expression` node is somewhat confusing in that it contains both the terms "statement"
 186 and "expression". This node distinguishes `expr`, which is a `Node\Expr`, from `expr;`, which is
 187 an "expression statement" represented by `Node\Stmt\Expression` and containing `expr` as a sub-node.
 188
 189 Every node has a (possibly zero) number of subnodes. You can access subnodes by writing
 190 `$node->subNodeName`. The `Stmt\Echo_` node has only one subnode `exprs`. So in order to access it
 191 in the above example you would write `$stmts[0]->exprs`. If you wanted to access the name of the function
 192 call, you would write `$stmts[0]->exprs[1]->name`.
 193
 194 All nodes also define a `getType()` method that returns the node type. The type is the class name
 195 without the `PhpParser\Node\` prefix and `\` replaced with `_`. It also does not contain a trailing
 196 `_` for reserved-keyword class names.
 197
 198 It is possible to associate custom metadata with a node using the `setAttribute()` method. This data
 199 can then be retrieved using `hasAttribute()`, `getAttribute()` and `getAttributes()`.
 200
 201 By default the lexer adds the `startLine`, `endLine` and `comments` attributes. `comments` is an array
 202 of `PhpParser\Comment[\Doc]` instances.
 203
 204 The start line can also be accessed using `getLine()`/`setLine()` (instead of `getAttribute('startLine')`).
 205 The last doc comment from the `comments` attribute can be obtained using `getDocComment()`.
 206
 207 Pretty printer
 208 --------------
 209
 210 The pretty printer component compiles the AST back to PHP code. As the parser does not retain formatting
 211 information the formatting is done using a specified scheme. Currently there is only one scheme available,
 212 namely `PhpParser\PrettyPrinter\Standard`.
 213
 214 ```php
 215 use PhpParser\Error;
 216 use PhpParser\ParserFactory;
 217 use PhpParser\PrettyPrinter;
 218
 219 $code = "<?php echo 'Hi ', hi\\getTarget();";
 220
 221 $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
 222 $prettyPrinter = new PrettyPrinter\Standard;
 223
 224 try {
 225     // parse
 226     $stmts = $parser->parse($code);
 227
 228     // change
 229     $stmts[0]         // the echo statement
 230           ->exprs     // sub expressions
 231           [0]         // the first of them (the string node)
 232           ->value     // it's value, i.e. 'Hi '
 233           = 'Hello '; // change to 'Hello '
 234
 235     // pretty print
 236     $code = $prettyPrinter->prettyPrint($stmts);
 237
 238     echo $code;
 239 } catch (Error $e) {
 240     echo 'Parse Error: ', $e->getMessage();
 241 }
 242 ```
 243
 244 The above code will output:
 245
 246     echo 'Hello ', hi\getTarget();
 247
 248 As you can see the source code was first parsed using `PhpParser\Parser->parse()`, then changed and then
 249 again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`.
 250
 251 The `prettyPrint()` method pretty prints a statements array. It is also possible to pretty print only a
 252 single expression using `prettyPrintExpr()`.
 253
 254 The `prettyPrintFile()` method can be used to print an entire file. This will include the opening `<?php` tag
 255 and handle inline HTML as the first/last statement more gracefully.
 256
 257 > Read more: [Pretty printing documentation](component/Pretty_printing.markdown)
 258
 259 Node traversation
 260 -----------------
 261
 262 The above pretty printing example used the fact that the source code was known and thus it was easy to
 263 write code that accesses a certain part of a node tree and changes it. Normally this is not the case.
 264 Usually you want to change / analyze code in a generic way, where you don't know how the node tree is
 265 going to look like.
 266
 267 For this purpose the parser provides a component for traversing and visiting the node tree. The basic
 268 structure of a program using this `PhpParser\NodeTraverser` looks like this:
 269
 270 ```php
 271 use PhpParser\NodeTraverser;
 272 use PhpParser\ParserFactory;
 273 use PhpParser\PrettyPrinter;
 274
 275 $parser        = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
 276 $traverser     = new NodeTraverser;
 277 $prettyPrinter = new PrettyPrinter\Standard;
 278
 279 // add your visitor
 280 $traverser->addVisitor(new MyNodeVisitor);
 281
 282 try {
 283     $code = file_get_contents($fileName);
 284
 285     // parse
 286     $stmts = $parser->parse($code);
 287
 288     // traverse
 289     $stmts = $traverser->traverse($stmts);
 290
 291     // pretty print
 292     $code = $prettyPrinter->prettyPrintFile($stmts);
 293
 294     echo $code;
 295 } catch (PhpParser\Error $e) {
 296     echo 'Parse Error: ', $e->getMessage();
 297 }
 298 ```
 299
 300 The corresponding node visitor might look like this:
 301
 302 ```php
 303 use PhpParser\Node;
 304 use PhpParser\NodeVisitorAbstract;
 305
 306 class MyNodeVisitor extends NodeVisitorAbstract
 307 {
 308     public function leaveNode(Node $node) {
 309         if ($node instanceof Node\Scalar\String_) {
 310             $node->value = 'foo';
 311         }
 312     }
 313 }
 314 ```
 315
 316 The above node visitor would change all string literals in the program to `'foo'`.
 317
 318 All visitors must implement the `PhpParser\NodeVisitor` interface, which defines the following four
 319 methods:
 320
 321 ```php
 322 public function beforeTraverse(array $nodes);
 323 public function enterNode(\PhpParser\Node $node);
 324 public function leaveNode(\PhpParser\Node $node);
 325 public function afterTraverse(array $nodes);
 326 ```
 327
 328 The `beforeTraverse()` method is called once before the traversal begins and is passed the nodes the
 329 traverser was called with. This method can be used for resetting values before traversation or
 330 preparing the tree for traversal.
 331
 332 The `afterTraverse()` method is similar to the `beforeTraverse()` method, with the only difference that
 333 it is called once after the traversal.
 334
 335 The `enterNode()` and `leaveNode()` methods are called on every node, the former when it is entered,
 336 i.e. before its subnodes are traversed, the latter when it is left.
 337
 338 All four methods can either return the changed node or not return at all (i.e. `null`) in which
 339 case the current node is not changed.
 340
 341 The `enterNode()` method can additionally return the value `NodeTraverser::DONT_TRAVERSE_CHILDREN`,
 342 which instructs the traverser to skip all children of the current node.
 343
 344 The `leaveNode()` method can additionally return the value `NodeTraverser::REMOVE_NODE`, in which
 345 case the current node will be removed from the parent array. Furthermore it is possible to return
 346 an array of nodes, which will be merged into the parent array at the offset of the current node.
 347 I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will
 348 be `array(A, X, Y, Z, C)`.
 349
 350 Instead of manually implementing the `NodeVisitor` interface you can also extend the `NodeVisitorAbstract`
 351 class, which will define empty default implementations for all the above methods.
 352
 353 > Read more: [Walking the AST](component/Walking_the_AST.markdown)
 354
 355 The NameResolver node visitor
 356 -----------------------------
 357
 358 One visitor that is already bundled with the package is `PhpParser\NodeVisitor\NameResolver`. This visitor
 359 helps you work with namespaced code by trying to resolve most names to fully qualified ones.
 360
 361 For example, consider the following code:
 362
 363     use A as B;
 364     new B\C();
 365
 366 In order to know that `B\C` really is `A\C` you would need to track aliases and namespaces yourself.
 367 The `NameResolver` takes care of that and resolves names as far as possible.
 368
 369 After running it, most names will be fully qualified. The only names that will stay unqualified are
 370 unqualified function and constant names. These are resolved at runtime and thus the visitor can't
 371 know which function they are referring to. In most cases this is a non-issue as the global functions
 372 are meant.
 373
 374 Also the `NameResolver` adds a `namespacedName` subnode to class, function and constant declarations
 375 that contains the namespaced name instead of only the shortname that is available via `name`.
 376
 377 > Read more: [Name resolution documentation](component/Name_resolution.markdown)
 378
 379 Example: Converting namespaced code to pseudo namespaces
 380 --------------------------------------------------------
 381
 382 A small example to understand the concept: We want to convert namespaced code to pseudo namespaces
 383 so it works on 5.2, i.e. names like `A\\B` should be converted to `A_B`. Note that such conversions
 384 are fairly complicated if you take PHP's dynamic features into account, so our conversion will
 385 assume that no dynamic features are used.
 386
 387 We start off with the following base code:
 388
 389 ```php
 390 use PhpParser\ParserFactory;
 391 use PhpParser\PrettyPrinter;
 392 use PhpParser\NodeTraverser;
 393 use PhpParser\NodeVisitor\NameResolver;
 394
 395 $inDir  = '/some/path';
 396 $outDir = '/some/other/path';
 397
 398 $parser        = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
 399 $traverser     = new NodeTraverser;
 400 $prettyPrinter = new PrettyPrinter\Standard;
 401
 402 $traverser->addVisitor(new NameResolver); // we will need resolved names
 403 $traverser->addVisitor(new NamespaceConverter); // our own node visitor
 404
 405 // iterate over all .php files in the directory
 406 $files = new \RecursiveIteratorIterator(new \RecursiveDirectoryIterator($inDir));
 407 $files = new \RegexIterator($files, '/\.php$/');
 408
 409 foreach ($files as $file) {
 410     try {
 411         // read the file that should be converted
 412         $code = file_get_contents($file);
 413
 414         // parse
 415         $stmts = $parser->parse($code);
 416
 417         // traverse
 418         $stmts = $traverser->traverse($stmts);
 419
 420         // pretty print
 421         $code = $prettyPrinter->prettyPrintFile($stmts);
 422
 423         // write the converted file to the target directory
 424         file_put_contents(
 425             substr_replace($file->getPathname(), $outDir, 0, strlen($inDir)),
 426             $code
 427         );
 428     } catch (PhpParser\Error $e) {
 429         echo 'Parse Error: ', $e->getMessage();
 430     }
 431 }
 432 ```
 433
 434 Now lets start with the main code, the `NodeVisitor\NamespaceConverter`. One thing it needs to do
 435 is convert `A\\B` style names to `A_B` style ones.
 436
 437 ```php
 438 use PhpParser\Node;
 439
 440 class NamespaceConverter extends \PhpParser\NodeVisitorAbstract
 441 {
 442     public function leaveNode(Node $node) {
 443         if ($node instanceof Node\Name) {
 444             return new Node\Name(str_replace('\\', '_', $node->toString()));
 445         }
 446     }
 447 }
 448 ```
 449
 450 The above code profits from the fact that the `NameResolver` already resolved all names as far as
 451 possible, so we don't need to do that. We only need to create a string with the name parts separated
 452 by underscores instead of backslashes. This is what `str_replace('\\', '_', $node->toString())` does. (If you want to
 453 create a name with backslashes either write `$node->toString()` or `(string) $node`.) Then we create
 454 a new name from the string and return it. Returning a new node replaces the old node.
 455
 456 Another thing we need to do is change the class/function/const declarations. Currently they contain
 457 only the shortname (i.e. the last part of the name), but they need to contain the complete name including
 458 the namespace prefix:
 459
 460 ```php
 461 use PhpParser\Node;
 462 use PhpParser\Node\Stmt;
 463
 464 class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
 465 {
 466     public function leaveNode(Node $node) {
 467         if ($node instanceof Node\Name) {
 468             return new Node\Name(str_replace('\\', '_', $node->toString()));
 469         } elseif ($node instanceof Stmt\Class_
 470                   || $node instanceof Stmt\Interface_
 471                   || $node instanceof Stmt\Function_) {
 472             $node->name = str_replace('\\', '_', $node->namespacedName->toString());
 473         } elseif ($node instanceof Stmt\Const_) {
 474             foreach ($node->consts as $const) {
 475                 $const->name = str_replace('\\', '_', $const->namespacedName->toString());
 476             }
 477         }
 478     }
 479 }
 480 ```
 481
 482 There is not much more to it than converting the namespaced name to string with `_` as separator.
 483
 484 The last thing we need to do is remove the `namespace` and `use` statements:
 485
 486 ```php
 487 use PhpParser\Node;
 488 use PhpParser\Node\Stmt;
 489 use PhpParser\NodeTraverser;
 490
 491 class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
 492 {
 493     public function leaveNode(Node $node) {
 494         if ($node instanceof Node\Name) {
 495             return new Node\Name(str_replace('\\', '_', $node->toString()));
 496         } elseif ($node instanceof Stmt\Class_
 497                   || $node instanceof Stmt\Interface_
 498                   || $node instanceof Stmt\Function_) {
 499             $node->name = str_replace('\\', '_', $node->namespacedName->toString();
 500         } elseif ($node instanceof Stmt\Const_) {
 501             foreach ($node->consts as $const) {
 502                 $const->name = str_replace('\\', '_', $const->namespacedName->toString());
 503             }
 504         } elseif ($node instanceof Stmt\Namespace_) {
 505             // returning an array merges is into the parent array
 506             return $node->stmts;
 507         } elseif ($node instanceof Stmt\Use_) {
 508             // remove use nodes altogether
 509             return NodeTraverser::REMOVE_NODE;
 510         }
 511     }
 512 }
 513 ```
 514
 515 That's all.