Skip to content
This repository has been archived by the owner on Sep 8, 2023. It is now read-only.

Commit

Permalink
Fix bugs, and better support v7.4.0 features, in PHP lexer (rouge-rub…
Browse files Browse the repository at this point in the history
…y#1397)

This commit adds a number of fixes to the PHP lexer to both fix
outstanding issues and to add support for features introduced in
versions of PHP up to 7.4.0.

In particular, this commit:

- makes the following terms match case-insensitively: `<?php`, `as`,
  `use`, the strings in `@keywords` and the strings in `@builtins`;
  
- fixes an issue with heredoc syntax where `"` around the starting label
  would lex incorrectly;

- adds support for:
  - binary numbers;
  - use of `_` as a separator in numbers;
  - `yield from` keyword;
  - Unicode codepoint escape syntax; and
  - partial type hinting;

- adds the following words to `@keywords`: `__CLASS__`, `__DIR__`,
  `__FUNCTION__`, `__halt_compiler`, `__METHOD__`, `__NAMESPACE__`,
  `__TRAIT__`, `callable`, `class`, `fn`, `goto`, `instanceof`,
  `insteadof`, `self`, `trait`;

- removes the following words from `@keywords`: `__sleep`, `__wakeup`,
  `empty`, `php_user_filter`, `stdClass`, `this`, `virtual`; and

- simplifies rules for `E_*` and `PHP_*` constants.
  • Loading branch information
julp authored and mattt committed May 21, 2020
1 parent 5f0615c commit 8536f60
Show file tree
Hide file tree
Showing 3 changed files with 69 additions and 32 deletions.
74 changes: 44 additions & 30 deletions lib/rouge/lexers/php.rb
Original file line number Diff line number Diff line change
Expand Up @@ -63,16 +63,23 @@ def builtins
end

def self.keywords
# (echo parent ; echo self ; sed -nE 's/<ST_IN_SCRIPTING>"((__)?[[:alpha:]_]+(__)?)".*/\1/p' zend_language_scanner.l | tr '[A-Z]' '[a-z]') | sort -u | grep -Fwv -e isset -e unset -e empty -e const -e use -e function -e namespace
# - isset, unset and empty are actually keywords (directly handled by PHP's lexer but let's pretend these are functions, you use them like so)
# - self and parent are kind of keywords, they are not handled by PHP's lexer
# - use, const, namespace and function are handled by specific rules to highlight what's next to the keyword
# - class is also listed here, in addition to the rule below, to handle anonymous classes
@keywords ||= Set.new %w(
and E_PARSE old_function E_ERROR or as E_WARNING parent eval
PHP_OS break exit case extends PHP_VERSION cfunction FALSE
print for require continue foreach require_once declare return
default static do switch die stdClass echo else TRUE elseif
var empty if xor enddeclare include virtual endfor include_once
while endforeach global __FILE__ endif list __LINE__ endswitch
new __sleep endwhile not array __wakeup E_ALL NULL final
php_user_filter interface implements public private protected
abstract clone try catch finally throw this use namespace yield
old_function cfunction
__class__ __dir__ __file__ __function__ __halt_compiler
__line__ __method__ __namespace__ __trait__ abstract and
array as break callable case catch class clone continue
declare default die do echo else elseif enddeclare
endfor endforeach endif endswitch endwhile eval exit
extends final finally fn for foreach global goto if
implements include include_once instanceof insteadof
interface list new or parent print private protected
public require require_once return self static switch
throw trait try var while xor yield
)
end

Expand All @@ -85,49 +92,50 @@ def self.detect?(text)
state :root do
# some extremely rough heuristics to decide whether to start inline or not
rule(/\s*(?=<)/m) { delegate parent; push :template }
rule(/[^$]+(?=<\?(php|=))/) { delegate parent; push :template }
rule(/[^$]+(?=<\?(php|=))/i) { delegate parent; push :template }

rule(//) { push :template; push :php }
end

state :template do
rule %r/<\?(php|=)?/, Comment::Preproc, :php
rule %r/<\?(php|=)?/i, Comment::Preproc, :php
rule(/.*?(?=<\?)|.*/m) { delegate parent }
end

state :php do
rule %r/\?>/, Comment::Preproc, :pop!
# heredocs
rule %r/<<<('?)(#{id})\1\n.*?\n\s*\2;?/im, Str::Heredoc
rule %r/<<<(["']?)(#{id})\1\n.*?\n\s*\2;?/im, Str::Heredoc
rule %r/\s+/, Text
rule %r/#.*?$/, Comment::Single
rule %r(//.*?$), Comment::Single
rule %r(/\*\*(?!/).*?\*/)m, Comment::Doc
rule %r(/\*.*?\*/)m, Comment::Multiline

rule %r/(->|::)(\s*)(#{id})/ do
groups Operator, Text, Name::Attribute
end

rule %r/(void|\??(int|float|bool|string|iterable|self|callable))\b/i, Keyword::Type
rule %r/[~!%^&*+=\|:.<>\/?@-]+/, Operator
rule %r/[\[\]{}();,]/, Punctuation
rule %r/(class|interface|trait)(\s+)(#{nsid})/ do
rule %r/(class|interface|trait)(\s+)(#{nsid})/i do
groups Keyword::Declaration, Text, Name::Class
end
rule %r/(use)(\s+)(function|const|)(\s*)(#{nsid})/ do
rule %r/(use)(\s+)(function|const|)(\s*)(#{nsid})/i do
groups Keyword::Namespace, Text, Keyword::Namespace, Text, Name::Namespace
push :use
end
rule %r/(namespace)(\s+)(#{nsid})/ do
rule %r/(namespace)(\s+)(#{nsid})/i do
groups Keyword::Namespace, Text, Name::Namespace
end
# anonymous functions
rule %r/(function)(\s*)(?=\()/ do
rule %r/(function)(\s*)(?=\()/i do
groups Keyword, Text
end

# named functions
rule %r/(function)(\s+)(&?)(\s*)/ do
rule %r/(function)(\s+)(&?)(\s*)/i do
groups Keyword, Text, Operator, Text
push :funcname
end
Expand All @@ -136,13 +144,18 @@ def self.detect?(text)
groups Keyword, Text, Name::Constant
end

rule %r/(true|false|null)\b/, Keyword::Constant
rule %r/stdClass\b/i, Name::Class
rule %r/(true|false|null)\b/i, Keyword::Constant
rule %r/(E|PHP)(_[[:upper:]]+)+\b/, Keyword::Constant
rule %r/\$\{\$+#{id}\}/i, Name::Variable
rule %r/\$+#{id}/i, Name::Variable
rule %r/(yield)([ \n\r\t]+)(from)/i do
groups Keyword, Text, Keyword
end

# may be intercepted for builtin highlighting
rule %r/\\?#{nsid}/i do |m|
name = m[0]
name = m[0].downcase

if self.class.keywords.include? name
token Keyword
Expand All @@ -153,30 +166,30 @@ def self.detect?(text)
end
end

rule %r/(\d+\.\d*|\d*\.\d+)(e[+-]?\d+)?/i, Num::Float
rule %r/\d+e[+-]?\d+/i, Num::Float
rule %r/0[0-7]+/, Num::Oct
rule %r/0x[a-f0-9]+/i, Num::Hex
rule %r/\d+/, Num::Integer
rule %r/(\d[_\d]*)?\.(\d[_\d]*)?(e[+-]?\d[_\d]*)?/i, Num::Float
rule %r/0[0-7][0-7_]*/, Num::Oct
rule %r/0b[01][01_]*/i, Num::Bin
rule %r/0x[a-f0-9][a-f0-9_]*/i, Num::Hex
rule %r/\d[_\d]*/, Num::Integer
rule %r/'([^'\\]*(?:\\.[^'\\]*)*)'/, Str::Single
rule %r/`([^`\\]*(?:\\.[^`\\]*)*)`/, Str::Backtick
rule %r/"/, Str::Double, :string
end

state :use do
rule %r/(\s+)(as)(\s+)(#{id})/ do
rule %r/(\s+)(as)(\s+)(#{id})/i do
groups Text, Keyword, Text, Name
:pop!
end
rule %r/\\\{/, Operator, :uselist
rule %r/;/, Punctuation, :pop!
end

state :uselist do
rule %r/\s+/, Text
rule %r/,/, Operator
rule %r/\}/, Operator, :pop!
rule %r/(as)(\s+)(#{id})/ do
rule %r/(as)(\s+)(#{id})/i do
groups Keyword, Text, Name
end
rule %r/#{id}/, Name::Namespace
Expand All @@ -189,7 +202,8 @@ def self.detect?(text)
state :string do
rule %r/"/, Str::Double, :pop!
rule %r/[^\\{$"]+/, Str::Double
rule %r/\\([nrt\"$\\]|[0-7]{1,3}|x[0-9A-Fa-f]{1,2})/,
rule %r/\\u\{[0-9a-fA-F]+\}/, Str::Escape
rule %r/\\([efrntv\"$\\]|[0-7]{1,3}|[xX][0-9a-fA-F]{1,2})/,
Str::Escape
rule %r/\$#{id}(\[\S+\]|->#{id})?/, Name::Variable

Expand Down
22 changes: 22 additions & 0 deletions spec/lexers/php_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -52,5 +52,27 @@
it 'recognizes trait definition' do
assert_tokens_equal 'trait A {}', ["Keyword.Declaration", "trait"], ["Text", " "], ["Name.Class", "A"], ["Text", " "], ["Punctuation", "{}"]
end

it 'recognizes case insensitively keywords' do
assert_tokens_equal 'While', ["Keyword", "While"]
# class for anonymous classes is recognized as a regular keyword
assert_tokens_equal 'Class {', ["Keyword", "Class"], ["Text", " "], ["Punctuation", "{"]
assert_tokens_equal 'Class BAR', ["Keyword.Declaration", "Class"], ["Text", " "], ["Name.Class", "BAR"]
assert_tokens_equal 'Const BAR', ["Keyword", "Const"], ["Text", " "], ["Name.Constant", "BAR"]
assert_tokens_equal 'Use BAR', ["Keyword.Namespace", "Use"], ["Text", " "], ["Name.Namespace", "BAR"]
assert_tokens_equal 'NameSpace BAR', ["Keyword.Namespace", "NameSpace"], ["Text", " "], ["Name.Namespace", "BAR"]
# function for anonymous functions is also recognized as a regular keyword
assert_tokens_equal 'Function (', ["Keyword", "Function"], ["Text", " "], ["Punctuation", "("]
assert_tokens_equal 'Function foo', ["Keyword", "Function"], ["Text", " "], ["Name.Function", "foo"]
end

it 'recognizes case sensitively E_* and PHP_* as constants' do
assert_tokens_equal 'PHP_EOL', ["Keyword.Constant", "PHP_EOL"]
assert_tokens_equal 'PHP_EOL_1', ["Name.Other", "PHP_EOL_1"]

assert_tokens_equal 'E_user_DEPRECATED', ["Name.Other", "E_user_DEPRECATED"]
assert_tokens_equal 'E_USER_deprecated', ["Name.Other", "E_USER_deprecated"]
assert_tokens_equal 'E_USER_DEPRECATED', ["Keyword.Constant", "E_USER_DEPRECATED"]
end
end
end
5 changes: 3 additions & 2 deletions spec/visual/samples/php
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
<p>It's html outside php!</p>
<?php
file_put_contents("log.txt", "\u{FEFF}===== Début du fichier =====\n");

$test = function($a) { $lambda = 1; }

Expand Down Expand Up @@ -519,7 +520,7 @@ class Zip extends Archive {
$header);

// Valid header?
if($header_info['header'] != 33639248)
if($header_info['header'] != 33_639_248)
return false;

// New position
Expand Down Expand Up @@ -577,7 +578,7 @@ class Zip extends Archive {
$header);

// Valid header?
if($header_info['header'] != 67324752)
if($header_info['header'] != 67_324_752)
return false;

// Get content start position
Expand Down

0 comments on commit 8536f60

Please sign in to comment.