aleph1.co.uk Git - yaffs-website/blob - vendor/nikic/php-parser/doc/2_Usage_of_basic_components.markdown

   1 Usage of basic components
   2 =========================
   3
   4 This document explains how to use the parser, the pretty printer and the node traverser.
   5
   6 Bootstrapping
   7 -------------
   8
   9 To bootstrap the library, include the autoloader generated by composer:
  10
  11 ```php
  12 require 'path/to/vendor/autoload.php';
  13 ```
  14
  15 Additionally you may want to set the `xdebug.max_nesting_level` ini option to a higher value:
  16
  17 ```php
  18 ini_set('xdebug.max_nesting_level', 3000);
  19 ```
  20
  21 This ensures that there will be no errors when traversing highly nested node trees. However, it is
  22 preferable to disable XDebug completely, as it can easily make this library more than five times
  23 slower.
  24
  25 Parsing
  26 -------
  27
  28 In order to parse code, you first have to create a parser instance:
  29
  30 ```php
  31 use PhpParser\ParserFactory;
  32 $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
  33 ```
  34
  35 The factory accepts a kind argument, that determines how different PHP versions are treated:
  36
  37 Kind | Behavior
  38 -----|---------
  39 `ParserFactory::PREFER_PHP7` | Try to parse code as PHP 7. If this fails, try to parse it as PHP 5.
  40 `ParserFactory::PREFER_PHP5` | Try to parse code as PHP 5. If this fails, try to parse it as PHP 7.
  41 `ParserFactory::ONLY_PHP7` | Parse code as PHP 7.
  42 `ParserFactory::ONLY_PHP5` | Parse code as PHP 5.
  43
  44 Unless you have strong reason to use something else, `PREFER_PHP7` is a reasonable default.
  45
  46 The `create()` method optionally accepts a `Lexer` instance as the second argument. Some use cases
  47 that require customized lexers are discussed in the [lexer documentation](component/Lexer.markdown).
  48
  49 Subsequently you can pass PHP code (including the opening `<?php` tag) to the `parse` method in order to
  50 create a syntax tree. If a syntax error is encountered, an `PhpParser\Error` exception will be thrown:
  51
  52 ```php
  53 use PhpParser\Error;
  54 use PhpParser\ParserFactory;
  55
  56 $code = '<?php // some code';
  57 $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
  58
  59 try {
  60     $stmts = $parser->parse($code);
  61     // $stmts is an array of statement nodes
  62 } catch (Error $e) {
  63     echo 'Parse Error: ', $e->getMessage();
  64 }
  65 ```
  66
  67 A parser instance can be reused to parse multiple files.
  68
  69 Node tree
  70 ---------
  71
  72 If you use the above code with `$code = "<?php echo 'Hi ', hi\\getTarget();"` the parser will
  73 generate a node tree looking like this:
  74
  75 ```
  76 array(
  77     0: Stmt_Echo(
  78         exprs: array(
  79             0: Scalar_String(
  80                 value: Hi
  81             )
  82             1: Expr_FuncCall(
  83                 name: Name(
  84                     parts: array(
  85                         0: hi
  86                         1: getTarget
  87                     )
  88                 )
  89                 args: array(
  90                 )
  91             )
  92         )
  93     )
  94 )
  95 ```
  96
  97 Thus `$stmts` will contain an array with only one node, with this node being an instance of
  98 `PhpParser\Node\Stmt\Echo_`.
  99
 100 As PHP is a large language there are approximately 140 different nodes. In order to make work
 101 with them easier they are grouped into three categories:
 102
 103  * `PhpParser\Node\Stmt`s are statement nodes, i.e. language constructs that do not return
 104    a value and can not occur in an expression. For example a class definition is a statement.
 105    It doesn't return a value and you can't write something like `func(class A {});`.
 106  * `PhpParser\Node\Expr`s are expression nodes, i.e. language constructs that return a value
 107    and thus can occur in other expressions. Examples of expressions are `$var`
 108    (`PhpParser\Node\Expr\Variable`) and `func()` (`PhpParser\Node\Expr\FuncCall`).
 109  * `PhpParser\Node\Scalar`s are nodes representing scalar values, like `'string'`
 110    (`PhpParser\Node\Scalar\String_`), `0` (`PhpParser\Node\Scalar\LNumber`) or magic constants
 111    like `__FILE__` (`PhpParser\Node\Scalar\MagicConst\File`). All `PhpParser\Node\Scalar`s extend
 112    `PhpParser\Node\Expr`, as scalars are expressions, too.
 113  * There are some nodes not in either of these groups, for example names (`PhpParser\Node\Name`)
 114    and call arguments (`PhpParser\Node\Arg`).
 115
 116 Some node class names have a trailing `_`. This is used whenever the class name would otherwise clash
 117 with a PHP keyword.
 118
 119 Every node has a (possibly zero) number of subnodes. You can access subnodes by writing
 120 `$node->subNodeName`. The `Stmt\Echo_` node has only one subnode `exprs`. So in order to access it
 121 in the above example you would write `$stmts[0]->exprs`. If you wanted to access the name of the function
 122 call, you would write `$stmts[0]->exprs[1]->name`.
 123
 124 All nodes also define a `getType()` method that returns the node type. The type is the class name
 125 without the `PhpParser\Node\` prefix and `\` replaced with `_`. It also does not contain a trailing
 126 `_` for reserved-keyword class names.
 127
 128 It is possible to associate custom metadata with a node using the `setAttribute()` method. This data
 129 can then be retrieved using `hasAttribute()`, `getAttribute()` and `getAttributes()`.
 130
 131 By default the lexer adds the `startLine`, `endLine` and `comments` attributes. `comments` is an array
 132 of `PhpParser\Comment[\Doc]` instances.
 133
 134 The start line can also be accessed using `getLine()`/`setLine()` (instead of `getAttribute('startLine')`).
 135 The last doc comment from the `comments` attribute can be obtained using `getDocComment()`.
 136
 137 Pretty printer
 138 --------------
 139
 140 The pretty printer component compiles the AST back to PHP code. As the parser does not retain formatting
 141 information the formatting is done using a specified scheme. Currently there is only one scheme available,
 142 namely `PhpParser\PrettyPrinter\Standard`.
 143
 144 ```php
 145 use PhpParser\Error;
 146 use PhpParser\ParserFactory;
 147 use PhpParser\PrettyPrinter;
 148
 149 $code = "<?php echo 'Hi ', hi\\getTarget();";
 150
 151 $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
 152 $prettyPrinter = new PrettyPrinter\Standard;
 153
 154 try {
 155     // parse
 156     $stmts = $parser->parse($code);
 157
 158     // change
 159     $stmts[0]         // the echo statement
 160           ->exprs     // sub expressions
 161           [0]         // the first of them (the string node)
 162           ->value     // it's value, i.e. 'Hi '
 163           = 'Hello '; // change to 'Hello '
 164
 165     // pretty print
 166     $code = $prettyPrinter->prettyPrint($stmts);
 167
 168     echo $code;
 169 } catch (Error $e) {
 170     echo 'Parse Error: ', $e->getMessage();
 171 }
 172 ```
 173
 174 The above code will output:
 175
 176     <?php echo 'Hello ', hi\getTarget();
 177
 178 As you can see the source code was first parsed using `PhpParser\Parser->parse()`, then changed and then
 179 again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`.
 180
 181 The `prettyPrint()` method pretty prints a statements array. It is also possible to pretty print only a
 182 single expression using `prettyPrintExpr()`.
 183
 184 The `prettyPrintFile()` method can be used to print an entire file. This will include the opening `<?php` tag
 185 and handle inline HTML as the first/last statement more gracefully.
 186
 187 Node traversation
 188 -----------------
 189
 190 The above pretty printing example used the fact that the source code was known and thus it was easy to
 191 write code that accesses a certain part of a node tree and changes it. Normally this is not the case.
 192 Usually you want to change / analyze code in a generic way, where you don't know how the node tree is
 193 going to look like.
 194
 195 For this purpose the parser provides a component for traversing and visiting the node tree. The basic
 196 structure of a program using this `PhpParser\NodeTraverser` looks like this:
 197
 198 ```php
 199 use PhpParser\NodeTraverser;
 200 use PhpParser\ParserFactory;
 201 use PhpParser\PrettyPrinter;
 202
 203 $parser        = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
 204 $traverser     = new NodeTraverser;
 205 $prettyPrinter = new PrettyPrinter\Standard;
 206
 207 // add your visitor
 208 $traverser->addVisitor(new MyNodeVisitor);
 209
 210 try {
 211     $code = file_get_contents($fileName);
 212
 213     // parse
 214     $stmts = $parser->parse($code);
 215
 216     // traverse
 217     $stmts = $traverser->traverse($stmts);
 218
 219     // pretty print
 220     $code = $prettyPrinter->prettyPrintFile($stmts);
 221
 222     echo $code;
 223 } catch (PhpParser\Error $e) {
 224     echo 'Parse Error: ', $e->getMessage();
 225 }
 226 ```
 227
 228 The corresponding node visitor might look like this:
 229
 230 ```php
 231 use PhpParser\Node;
 232 use PhpParser\NodeVisitorAbstract;
 233
 234 class MyNodeVisitor extends NodeVisitorAbstract
 235 {
 236     public function leaveNode(Node $node) {
 237         if ($node instanceof Node\Scalar\String_) {
 238             $node->value = 'foo';
 239         }
 240     }
 241 }
 242 ```
 243
 244 The above node visitor would change all string literals in the program to `'foo'`.
 245
 246 All visitors must implement the `PhpParser\NodeVisitor` interface, which defines the following four
 247 methods:
 248
 249 ```php
 250 public function beforeTraverse(array $nodes);
 251 public function enterNode(\PhpParser\Node $node);
 252 public function leaveNode(\PhpParser\Node $node);
 253 public function afterTraverse(array $nodes);
 254 ```
 255
 256 The `beforeTraverse()` method is called once before the traversal begins and is passed the nodes the
 257 traverser was called with. This method can be used for resetting values before traversation or
 258 preparing the tree for traversal.
 259
 260 The `afterTraverse()` method is similar to the `beforeTraverse()` method, with the only difference that
 261 it is called once after the traversal.
 262
 263 The `enterNode()` and `leaveNode()` methods are called on every node, the former when it is entered,
 264 i.e. before its subnodes are traversed, the latter when it is left.
 265
 266 All four methods can either return the changed node or not return at all (i.e. `null`) in which
 267 case the current node is not changed.
 268
 269 The `enterNode()` method can additionally return the value `NodeTraverser::DONT_TRAVERSE_CHILDREN`,
 270 which instructs the traverser to skip all children of the current node.
 271
 272 The `leaveNode()` method can additionally return the value `NodeTraverser::REMOVE_NODE`, in which
 273 case the current node will be removed from the parent array. Furthermore it is possible to return
 274 an array of nodes, which will be merged into the parent array at the offset of the current node.
 275 I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will
 276 be `array(A, X, Y, Z, C)`.
 277
 278 Instead of manually implementing the `NodeVisitor` interface you can also extend the `NodeVisitorAbstract`
 279 class, which will define empty default implementations for all the above methods.
 280
 281 The NameResolver node visitor
 282 -----------------------------
 283
 284 One visitor is already bundled with the package: `PhpParser\NodeVisitor\NameResolver`. This visitor
 285 helps you work with namespaced code by trying to resolve most names to fully qualified ones.
 286
 287 For example, consider the following code:
 288
 289     use A as B;
 290     new B\C();
 291
 292 In order to know that `B\C` really is `A\C` you would need to track aliases and namespaces yourself.
 293 The `NameResolver` takes care of that and resolves names as far as possible.
 294
 295 After running it most names will be fully qualified. The only names that will stay unqualified are
 296 unqualified function and constant names. These are resolved at runtime and thus the visitor can't
 297 know which function they are referring to. In most cases this is a non-issue as the global functions
 298 are meant.
 299
 300 Also the `NameResolver` adds a `namespacedName` subnode to class, function and constant declarations
 301 that contains the namespaced name instead of only the shortname that is available via `name`.
 302
 303 Example: Converting namespaced code to pseudo namespaces
 304 --------------------------------------------------------
 305
 306 A small example to understand the concept: We want to convert namespaced code to pseudo namespaces
 307 so it works on 5.2, i.e. names like `A\\B` should be converted to `A_B`. Note that such conversions
 308 are fairly complicated if you take PHP's dynamic features into account, so our conversion will
 309 assume that no dynamic features are used.
 310
 311 We start off with the following base code:
 312
 313 ```php
 314 use PhpParser\ParserFactory;
 315 use PhpParser\PrettyPrinter;
 316 use PhpParser\NodeTraverser;
 317 use PhpParser\NodeVisitor\NameResolver;
 318
 319 $inDir  = '/some/path';
 320 $outDir = '/some/other/path';
 321
 322 $parser        = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
 323 $traverser     = new NodeTraverser;
 324 $prettyPrinter = new PrettyPrinter\Standard;
 325
 326 $traverser->addVisitor(new NameResolver); // we will need resolved names
 327 $traverser->addVisitor(new NamespaceConverter); // our own node visitor
 328
 329 // iterate over all .php files in the directory
 330 $files = new \RecursiveIteratorIterator(new \RecursiveDirectoryIterator($inDir));
 331 $files = new \RegexIterator($files, '/\.php$/');
 332
 333 foreach ($files as $file) {
 334     try {
 335         // read the file that should be converted
 336         $code = file_get_contents($file);
 337
 338         // parse
 339         $stmts = $parser->parse($code);
 340
 341         // traverse
 342         $stmts = $traverser->traverse($stmts);
 343
 344         // pretty print
 345         $code = $prettyPrinter->prettyPrintFile($stmts);
 346
 347         // write the converted file to the target directory
 348         file_put_contents(
 349             substr_replace($file->getPathname(), $outDir, 0, strlen($inDir)),
 350             $code
 351         );
 352     } catch (PhpParser\Error $e) {
 353         echo 'Parse Error: ', $e->getMessage();
 354     }
 355 }
 356 ```
 357
 358 Now lets start with the main code, the `NodeVisitor\NamespaceConverter`. One thing it needs to do
 359 is convert `A\\B` style names to `A_B` style ones.
 360
 361 ```php
 362 use PhpParser\Node;
 363
 364 class NamespaceConverter extends \PhpParser\NodeVisitorAbstract
 365 {
 366     public function leaveNode(Node $node) {
 367         if ($node instanceof Node\Name) {
 368             return new Node\Name($node->toString('_'));
 369         }
 370     }
 371 }
 372 ```
 373
 374 The above code profits from the fact that the `NameResolver` already resolved all names as far as
 375 possible, so we don't need to do that. We only need to create a string with the name parts separated
 376 by underscores instead of backslashes. This is what `$node->toString('_')` does. (If you want to
 377 create a name with backslashes either write `$node->toString()` or `(string) $node`.) Then we create
 378 a new name from the string and return it. Returning a new node replaces the old node.
 379
 380 Another thing we need to do is change the class/function/const declarations. Currently they contain
 381 only the shortname (i.e. the last part of the name), but they need to contain the complete name including
 382 the namespace prefix:
 383
 384 ```php
 385 use PhpParser\Node;
 386 use PhpParser\Node\Stmt;
 387
 388 class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
 389 {
 390     public function leaveNode(Node $node) {
 391         if ($node instanceof Node\Name) {
 392             return new Node\Name($node->toString('_'));
 393         } elseif ($node instanceof Stmt\Class_
 394                   || $node instanceof Stmt\Interface_
 395                   || $node instanceof Stmt\Function_) {
 396             $node->name = $node->namespacedName->toString('_');
 397         } elseif ($node instanceof Stmt\Const_) {
 398             foreach ($node->consts as $const) {
 399                 $const->name = $const->namespacedName->toString('_');
 400             }
 401         }
 402     }
 403 }
 404 ```
 405
 406 There is not much more to it than converting the namespaced name to string with `_` as separator.
 407
 408 The last thing we need to do is remove the `namespace` and `use` statements:
 409
 410 ```php
 411 use PhpParser\Node;
 412 use PhpParser\Node\Stmt;
 413
 414 class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
 415 {
 416     public function leaveNode(Node $node) {
 417         if ($node instanceof Node\Name) {
 418             return new Node\Name($node->toString('_'));
 419         } elseif ($node instanceof Stmt\Class_
 420                   || $node instanceof Stmt\Interface_
 421                   || $node instanceof Stmt\Function_) {
 422             $node->name = $node->namespacedName->toString('_');
 423         } elseif ($node instanceof Stmt\Const_) {
 424             foreach ($node->consts as $const) {
 425                 $const->name = $const->namespacedName->toString('_');
 426             }
 427         } elseif ($node instanceof Stmt\Namespace_) {
 428             // returning an array merges is into the parent array
 429             return $node->stmts;
 430         } elseif ($node instanceof Stmt\Use_) {
 431             // returning false removed the node altogether
 432             return false;
 433         }
 434     }
 435 }
 436 ```
 437
 438 That's all.