Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ECMA definition of symmetry operation regexp and associated tests #488

Merged
merged 11 commits into from
Mar 22, 2024
3 changes: 3 additions & 0 deletions GNUmakefile
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@
# - tests/generated/identifiers.ere
# - tests/generated/numbers.ere
# - tests/generated/strings.ere
# - tests/generated/symops.pcre
# - tests/generated/symop_definitions.pcre
# - tests/generated/symops.ecma
#
#
# Targets for testing / auditing the specification
Expand Down
51 changes: 32 additions & 19 deletions optimade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4095,35 +4095,48 @@ The strings below contain Extended Regular Expressions (EREs) to recognize ident
The Symmetry Operation String Regular Expressions
-------------------------------------------------

Symmetry operation strings that comprise the :property:`space\_group\_symmetry\_operations\_xyz` property MUST conform to the following regular expressions.
The regular expressions are recorded in the Perl Compatible Regular Expression (PCRE) syntax, with `Perl extensions <https://perldoc.perl.org/perlre>`__ used for readability.
The :val:`symop_definitions` section defines several variables in Perl syntax that capture common parts of the regular expressions (REs) and need to be interpolated into the final REs used for matching.
The :val:`symops` section contains the REs themselves.
The whitespace characters in these definitions are not significant; if used in Perl programs, these expressions MUST be processed with the :code:`/x` RE option.
A working example of these REs in action can be found in the :code:`tests/cases/pcre_symops_001.sh` and other test cases.
Symmetry operation strings that comprise the :property:`space\_group\_symmetry\_operation\_xyz` property MUST conform to the following regular expressions.
ml-evs marked this conversation as resolved.
Show resolved Hide resolved
The regular expressions are recorded below in two forms, one in a more readable form using variables and the other as an explicit pattern compatible with the `OPTIMADE Regular Expression Format`_.

.. code:: PCRE
- Perl Compatible Regular Expression (PCRE) syntax, with `Perl extensions <https://perldoc.perl.org/perlre>`__ used for readability and expressivity.
The :val:`symop_definitions` section defines several variables in Perl syntax that capture common parts of the regular expressions (REs) and need to be interpolated into the final REs used for matching.
The :val:`symops` section contains the REs themselves.
The whitespace characters in these definitions are not significant; if used in Perl programs, these expressions MUST be processed with the :code:`/x` RE modifier.
A working example of these REs in action can be found in the :code:`tests/cases/pcre_symops_001.sh` and other test cases.

#BEGIN PCRE symop_definitions

$translations = '1\/2|[12]\/3|[1-3]\/4|[1-5]\/6';
.. code:: PCRE

$symop_translation_appended = "[-+]? [xyz] ([-+][xyz])? ([-+] ($translations) )?";
$symop_translation_prepended = "[-+]? ($translations) ([-+] [xyz] ([-+][xyz])? )?";
#BEGIN PCRE symop_definitions

$symop_re = "($symop_translation_appended|$symop_translation_prepended)";
$translations = '1\/2|[12]\/3|[1-3]\/4|[1-5]\/6';

#END PCRE symop_definitions
$symop_translation_appended = "[-+]? [xyz] ([-+][xyz])? ([-+] ($translations) )?";
$symop_translation_prepended = "[-+]? ($translations) ([-+] [xyz] ([-+][xyz])? )?";

.. code:: PCRE
$symop_re = "($symop_translation_appended|$symop_translation_prepended)";

#BEGIN PCRE symops
#END PCRE symop_definitions

^ # From the beginning of the string...
($symop_re)(,$symop_re){2}
$ # ... match to the very end of the string
.. code:: PCRE

#END PCRE symops
#BEGIN PCRE symops

^ # From the beginning of the string...
($symop_re)(,$symop_re){2}
$ # ... match to the very end of the string

#END PCRE symops

- The regular expression is also provided in an expanded form:
rartino marked this conversation as resolved.
Show resolved Hide resolved

.. code:: ECMA

#BEGIN ECMA symops

^([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?),([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?),([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?)$

rartino marked this conversation as resolved.
Show resolved Hide resolved
#END ECMA symops

OPTIMADE JSON lines partial data format
---------------------------------------
Expand Down
23 changes: 23 additions & 0 deletions tests/cases/ecma_symops_001.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#! /bin/sh

# Test case: test if the provided ECMA-compatible regular expression correctly
# recognises symmetry operation strings.

#BEGIN DEPEND

INPUT_GRAMMAR=tests/generated/symops.ecma

#END DEPEND


/usr/bin/env python << EOF
import re
import sys
with open("${INPUT_GRAMMAR}") as f:
expression = [line.strip() for line in f.readlines() if line.strip() and not line.strip().startswith("#")][0]

ml-evs marked this conversation as resolved.
Show resolved Hide resolved
with open("tests/inputs/symops.lst") as cases:
for case in cases:
if re.match(expression, case):
print(case, end="")
EOF
44 changes: 44 additions & 0 deletions tests/cases/symops_pcre_to_ecma_001.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#! /bin/sh

# Test case: tests the equivalence of the provided PCRE and ECMA regular
# expressions used in the validation of space group symmetry operations.
# The equivalence is tested by translating the expression from PCRE to the
# subset of the ECMA 262 dialect supported by OPTIMADE.

set -ue

#BEGIN DEPEND

INPUT_PCRE_DEFS=tests/generated/symop_definitions.pcre
INPUT_PCRE_GRAMMAR=tests/generated/symops.pcre
INPUT_ECMA_GRAMMAR=tests/generated/symops.ecma

#END DEPEND

PCRE_REGEX=$( \
grep -v -e '^ *#' -e '^\s+$' ${INPUT_PCRE_GRAMMAR} | \
perl -ne 's/\s+//g; s/#.*//; s/[\$]$/\\\$/; print;' | \
perl -ne 's/^[\^][(](.+)[)][(](.+)[)]\{2\}\\[\$]$/^$1$2$2$3\\\$/; print;' \
)

EXPANDED_PCRE_REGEX=$( \
perl -I. -w \
-e "require '${INPUT_PCRE_DEFS}';" \
-e "my \$extended_regex = \"${PCRE_REGEX}\";" \
-e '$extended_regex =~ s/[\s\\]+//g;' \
-e 'print $extended_regex;' \
)

ECMA_REGEX=$( \
grep -v -e '^ *#' -e '^\s+$' ${INPUT_ECMA_GRAMMAR} | \
perl -n -e 's/\s+//g; print;' \
)

if [ "${EXPANDED_PCRE_REGEX}" = "${ECMA_REGEX}" ]
then
printf '%s\n' 'PASS: expanded regular expressions match.'
else
printf '%s\n' 'FAIL: expanded regular expressions do not match.'
echo "PCRE: ${EXPANDED_PCRE_REGEX}"
echo "ECMA: ${ECMA_REGEX}"
fi
8 changes: 7 additions & 1 deletion tests/makefiles/Makelocal-grammars
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ EBNF_FILES ?= ${GRAMMARS:%=${GRAMMAR_DIR}/%.ebnf}
GRAMMAR_FILES ?= ${EBNF_FILES:%.ebnf=%.g}

REGEXPS = $(sort $(shell awk '/^ *${RE_START_STRING}/{print $$3}' ${RST_FILES} | tr -d "\r"))
REGEXP_FILES = ${REGEXPS:%=${GRAMMAR_DIR}/%.ere} ${REGEXPS:%=${GRAMMAR_DIR}/%.pcre}
REGEXP_FILES = ${REGEXPS:%=${GRAMMAR_DIR}/%.ere} ${REGEXPS:%=${GRAMMAR_DIR}/%.pcre} ${REGEXPS:%=${GRAMMAR_DIR}/%.ecma}

GRAMMAR_DEPENDENCIES = .grammars.d

Expand All @@ -47,6 +47,8 @@ ${GRAMMAR_DEPENDENCIES}: ${RST_FILES}
$^ | tr -d "\r" >> $@
awk '/^ *${RE_START_STRING} PCRE/{print "${GRAMMAR_DIR}/"$$3".pcre:", FILENAME}' \
$^ | tr -d "\r" >> $@
awk '/^ *${RE_START_STRING} ECMA/{print "${GRAMMAR_DIR}/"$$3".ecma:", FILENAME}' \
$^ | tr -d "\r" >> $@

${GRAMMAR_DIR}/%.ebnf:
awk '/^ *${GRAMMAR_START_STRING} $*/,/^ *${GRAMMAR_END_STRING} $*/' $< \
Expand All @@ -60,6 +62,10 @@ ${GRAMMAR_DIR}/%.pcre:
awk '/^ *${RE_START_STRING} PCRE $*/,/^ *${RE_END_STRING} PCRE $*/' $< \
| sed 's/^ //' | tr -d "\r" > $@

${GRAMMAR_DIR}/%.ecma:
awk '/^ *${RE_START_STRING} ECMA $*/,/^ *${RE_END_STRING} ECMA $*/' $< \
| sed 's/^ //' | tr -d "\r" > $@

.PHONY: tools

${GRAMMAR_DIR}/%.g: ${GRAMMAR_DIR}/%.ebnf | tools
Expand Down
Loading
Loading