-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested zsh style Parameter Expansion flags throws parser #88
Comments
Do you know what the name of this syntax is |
(note: this comment is unrelated to why the highlighting is failing) |
Yes, this is called a parameter flag or parameter subscript. It is available in both bash and zsh. Not sure about sh. 15.2.3 Subscript Flags I have been looking at it this morning and I see that the variable regex's are don't provsion for them. I think there is overlap with this and issue #74. I don't have a background in Textmate, but I have extensive experience with Perl Regex. I will look at the Textmate side and offer a proposal. |
This is a downstream consequence of missing the variable identification. The quote flag is still open. I have encountered a variety of confusing behavior that I fiddled with iteratively to see if I could reverse engineer the cause. In this case, I think the other is a symptom of the variable identification. |
In terms of fixing the problem. Theres a pattern, related to variable assignment, for detecting an array assignment. I think the array pattern handles named and not-named arrays. For the named arrays, the string pretty much has to be matched with a one-line pattern (not pattern range) due to parser limitations. One line patterns can't handle nested stuff, like nested strings inside of string interpolation. Instead a pattern range has to be used with a start quote and end quote. Slight hiccup tho, Textmate prioritizes long matches. So matching one-line string (one big chunk) compared to matching just the starting quote, will cause the whole-chunk to "win". That "win" scenario is just a warning, idk if thats even happening here. It might simply be that the pattern-range version of the string pattern isnt even included in the array-literal range. |
Maybe you already realize this, but the best way to read this is as nested parameters. It then expands to become ${(z)pathOpts_str} which the z is a flag that will iterate parse the contents of consistent with the parsing algorithm of the zsh command-line. Thus it becomes ${pathopts_quoted[@]}", which then gets passed through Q@. The @ I think is just syntactically preferred by the community, but not necessary. I haven't figured out under what conditions it is. Either way, each element is dequoted iteratively. Two good references for this are: |
"One line patterns can't handle nested stuff, like nested strings inside of string interpolation. Instead a pattern range has to be used with a start quote and end quote." I understand. Greedy vs non-greedy matching. This can actually be overcome by clever application of look-ahead/look-behind assertions By the way, what limits the parser to Textmate? Is that a VSCode thing? |
I think Theres a bit of miscommunication I'll try and clear up. I don't think the parameter expansion or parameter flag or For the
|
Sort of. Yes textmate is greedy, but its separate from the regex greedyness. Like I say "kinda" because yeah, we still use a lot of lookarounds to solve it on the textmate side. Also seeing as you're a regex expert, that will help a lot. In terms of regex hacking, there is one warning. In theory, with enough recursive regex, matching a nested string with a one-line textmate pattern is possible. Sadly I spent a lot of effort on that once for a different language only who realize that textmate will only tag the last (most-inner) part of a recursive regex pattern. So even if the pattern is matched correctly, its not tagged correctly. Ive got an issue on VS Code textmate about it, but I think you me and @ redcmd are the only people in the world who might care about the issue. That said, even with broken scopes/tags sometimes the fact that the pattern matches correctly is enough. It would be enough in this case, but the massive amount of effort it would take to make a recursive nested string pattern (a pattern that needs to contain the entire grammar thanks to subshell interpolation) would be insane for just fixing this one bug and like 2 other non-cascading bugs.
Yep. Other editors are limited to Textmate too. The alternative is the awesome Tree Sitter parser, which would never even run into this problem in the first place. Atom used it, NeoVim uses it. I use it for parsing tasks. Fun fact though. Bash is one of the few languages (I think Perl is another) that is impossible to statically parse perfectly. There can be runtime changes to the syntax thanks to, at minimum, aliases. So even the tree sitter can't always parse bash. Gotta run it to parse it (sometimes) |
Yep, cool. Wouldnt be surprised if that becomes #90 on this grammar |
It took me forever to find this reference. I expected it to be in the redirection section of the manual, but wouldn't you know it was in the Bash documentation on Command Substitution. What's funny about that is that I searched the document for "(<". If that didn't give it away. |
Actually, before I run, what is the order of operation when it comes to pattern recognition. Does the engine do sweeps based on the nodes within repository (I am referring to the json in autogenerated/shell.tmLanguage.json), does it do some compound search pattern or some other algorithm altogether? I want to make sure that we have regex patterns that don't conflict. Now I will talk to you later. |
Ok. I have some really good solutions that need to be rigorously tested. We also should discuss, what is and is not achievable with Textmate. I did find this good Textmate reference, which had a link to this one which suggests to me the Textmate grammar should be as rich as Perl's, but who knows. We should come up with some more rigorous patterns but they worked with all of my scripts, which are pretty aggressive: \b(\w+)(?:=) # Variables anything with assignments
(?:")([^"/]+)(?:") # Any quoted non-path
(?:\$\{)(\w+)(?:\}) # Any simply $ marked variable
(?:\$\{)((?:#)|\w)+(?:\}) # Any simply marked variable including with counting
(?:\$\{)([^ ]+)(?:\}) # Full Variable pattern Excluded everything but the space
(?:\$\{)(?:\()(@|\w+)(?:\))(\w+)(?:\}) # Flagged Variable
# Stack overflow to the Rescue [Regular Expressions to Match Balanced Parenthesis](https://stackoverflow.com/posts/35271017/revisions)
\((?:[^)(]|\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\))*\) Nested Parenthesis *Wow!!!! Shell_variable_identification_regression.zshShell Variable Identification Regression(Note this code isn't working fully. I was developing when I got sidetracked #!/bin/zsh
# _pathOpts startup is in $BIN for me $SOME_ROOT/bin
pathOpts_path() {
local fname="$BIN/._pathOpts.sh"
[[ -e $fname ]] && echo ${fname}
}
<<!
# Usage: assign "$(sourceof $1 to var)"
alias assign=eval
sourceof() {
local_varname='${(P)3}' source_varname="${(P)1}"
echo "local ${local_varname}=${source_varname}"
}
!
deserialize_pathOpts_from_file() {
typeset -grA _pathOpts
_pathOpts=( "${(Q@)${(z@)"$(<pathOpts_path)"}" )
}
serialize_pathOpts_to_file() {
"${(j: :)${(qkv@)_pathOpts}}" #"
}
# Function to initialize _pathOpts
pathOpts() {
local reinitialize=$1
[[ -v ${_pathOpts} && ${(t)_pathOpts} == assoc-array ]] && return 0
deserialize_pathOpts
# Add your initialization logic here
# echo "Initializing _pathOpts..."
local _pathOpts_arr=(
'.' '$(pwd)'
'some' '$HOME/General-Atomics'
'apath' '$SOME_ROOT/apath'
'core_repos' '$SOME_ROOT/apath/core_repos'
'reference_repos' '$SOME_ROOT/apath/reference_repos'
'subject' '$SOME_ROOT/apath/subject_models'
'projects' '$SOME_ROOT/apath/projects'
'support' '$SOME_ROOT/apath/support'
'thesis' '$SOME_ROOT/apath/thesis'
'somecode' '$SOME_ROOT/apath/subject_models/somecode'
'base-envs' '$SOME_ROOT/apath/conda_environments'
'be-miniforge' '$SOME_ROOT/apath/conda_environments/Miniforge3-MacOSX-arm64'
'be-mf-core' '$SOME_ROOT/apath/conda_environments/mambaforge-core'
'be-core-build' '$SOME_ROOT/apath/conda_environments/core-build'
'my-docs' '$SOME_ROOT/Documentation'
'some-docs' '$SOME_ROOT/SOME-Documentation-Repo'
'CORE-projects-logs' '$SOME_ROOT/CORE-projects-logs'
'core-users' '$SOME_ROOT/core-users'
'zsh_funcs' '$HOME/zsh_funcs'
'completions' '$HOME/zsh_funcs/completions'
'.module' '$HOME/.modules'
)
get_KOBE _pathOpts_arr
typeset -agr _pathOpts=( "${j:[:SPACE:]{4}:_pathOpts_arr[@]}" )
save
}
get_KOBE() { # Key-ordered by entry
assign "$(source $1 to anArray)"
start=1 end=${#anArray} stride=2
typeset -a _pathOpts_KOBE[$end/$stride] # Pre-initializing
for i in {${start}..${end}..${stride}}; do
_pathOpts_KOBE[i]=${(qq)anArray[i]}
done
typeset -agr _pathOpts_KOBE # Changing KOBE's status
}
# Save _pathOpts to disk
save() {
pprint _pathOpts _pathOpts_KOBE > $(pathOpts_path) &2> /dev/null
}
# Manage _pathOpts based on flags
edit_PathOpts() {
local flag key path_value
flag="$1"; key="$2"; path_value="$3";
while [[ $# -gt 0 ]]; do
case $flag in
"--add")
_pathOpts[$key]=$path_value; shift 3
echo "Added path option: $key -> $path_value";;
"--remove")
unset _pathOpts[$key]; shift 2
echo "Removed path option: $key";;
"--reset")
mv "$_pathOptsPath" "$_pathOptsPath_$(gdate +%Y%m%d_%H%M%S)"
unset _pathOpts
initialize_PathOpts; shift 1
echo "Reset _pathOpts to default";;
*)
echo "Unknown flag: $flag"
return 1 ;;
esac
done
save
}
# Main function
main() {
initialize_PathOpts
local debug=false
[[ $1 == "--debug" ]] && shift && debug=true
[[ $1 == "--_pathOpts" ]] && shift && edit_PathOpts "${(P)1[@]}" && return
[[ $debug == true ]] && set -o xtrace
find_folder "$@"
[[ $debug == true ]] && set +o xtrace
}
main "$@" Assignment Identification:Quote Identification non-Path$-wrapped Variable identificationFull Variable pattern - Excludes only spacesFlagged VariableMother of all regex's - Nested Parenthesis - Thank you stack overflow |
I wouldn't say that @jeff-hykin
so in both cases of "patterns": [
{ "match": "'" },
{ "match": "'.+'" },
] the single if you attempt to also capture the whitespace before it EDIT: |
@RedCMD The parsing scheme is not robust toward valid code. I will use your "surely". Surely I do not have to a have a bug free program entirely to have a local behavior behave correctly. First, consider this: Second, I contend that the poor syntax highlighting aided in my missing the syntax error because there was no distinction between contexts. (That said I am slightly embarrassed 😬). So I feel that it is more the tool failing me than me failing the tool. Next: I believe your own illustration still shows the behavior. To me pathOpts_path is highlighted like a string. I recognize that the pattern is a challenging one. But it is a good test because it is not that extraordinary, albeit hard to read. After the open paren the pattern is: "string"SHELLSCRIPT"string" to make one word. I am willing to be corrected, but it looks to me to be highlighted exactly the opposite. I have no doubt that you recognize this, but for posterity, the following is the first string. "${(Q@)${(z@)" Just to illustrate, below is a correctly bracketed showing that the color is opposite of what it should be: In the following image I threw in a "closing" quote before the I did a deep dive that I really benefited from earlier this year, regarding the history of Textmate and it's regex engine Oniguruma and while I am super impressed with Oniguruma I read about some early tradeoffs that were made for speed at the expense of accuracy in the case of Textmate. I would rather have my computer running syntax highlighting more accurately than freeing up some energy so it can do more telemetry. I also looked at the Ruby and the Textmate under the hood and used the developer tools to inspect what the parser was interpreting things and decided that I couldn't start this activity at that time and unfortunately I am still at that point. On the surface though it's pretty simple. Quotes and Brackets are not at the same precedence and that's that. I'm willing to listen, because I have been known to make mistakes ;-) but on this one, I think I' pretty spot on. |
@jeff-hykin I assume nested strings require double quotes to be backslashed? |
Yeah, me too. Complain to VS Code. I didn't pick textmate.
Fun fact, bash-like shell scripts are actually not (fully) possible to statically parse, even with the tree sitter, even with a custom built-for-bash parser. Bash syntax is runtime defined. For example: if (( RANDOM % 2 == 0 )); then
alias this_works='if [ "hello '
else
alias this_works='cat << EOF
'
fi
alias EOF=":"
this_works bob" = "hello bob" ]; then
echo 'bash was a mistake'
if
EOF
# sometimes that ends up being
# if [ "hello bob" = "hello bob" ]; then
# echo 'bash was a mistake'
# if
# EOF # <- null command
# other times it ends up being:
# cat << EOF
# bob" = "hello bob" ]; then
# echo 'bash was a mistake'
# if
# EOF # <- end of string being echoed If aliased are ignored (or assumed to be normal) then its statically parsable AFAIK. Just a fun fact. |
@jeff-hykin That's a good one! I often wonder where the line between syntax "aware" is and a full on syntax engine lies. I personally would like a tool that did the expansion real-time but I want to be able to toggle it to pause until I finish typing. LOL. It drives me batty when an engine is aggressive and I am mid thought. I'm working on trying to work out the logic myself and I am battling this beast to stop correcting me while I am typing. It robs me of focus. I seem to recall that no one has shown that shell script is Turing complete and won't result in conflicting conclusions when all available rules are applied in all their possible parametrics. I don't know if there is a way to mathematically reduce the problem to something that is provable. But it is no doubt a hard problem and I imagine people just prefer to work, tend to apply a simpler constraint on themselves so that their codes work and move along. That said, many parsing engines will limit the number of allowable expansions to like 3 or 7 or what have you. Infinite might be mathematically allowed, but it's not practical on a human scale. I take your challenge though. I will look over this and see how I would approach it. Gauntlet thrown, gauntlet accepted. |
@jeff-hykin and @RedCMD I am going to start a discussion in addition to keeping this issue open. My thought is that there is a good discussion to be had about what can and can't be done with syntax highlighting and the three of us, and I bet some others who sound like they ask similar questions might enjoy weighing in. I'll try to keep my discussion on this thread directly related to how do we solve the issue at hand. Oops. I assumed there was a conversation thread, but I don't think it has been enabled. @jeff-hykin would you be open to enabling the discussions feature for this repo? There are interesting discussions to be had about what is achievable and what is beyond the scope of a syntax highlighter that maybe related to this issue but would mostly be indirectly related. |
The code with a problem is:
With all extensions disabled, the resulting code looks like:
To get the parser to highlight correctly I have to append the following:
Interestingly, If I put on a single quote on the deserialize _pathOpts assignment, I get a syntax error indication. Which is not true unless I don't know something about shell grammar. It's just clear that this is not an officially pattern or I should be doing something different somewhere.
The text was updated successfully, but these errors were encountered: