Skip to content

Commit

Permalink
Add ECMA definition of symmetry operation regexp and associated tests
Browse files Browse the repository at this point in the history
  • Loading branch information
rartino authored Mar 22, 2024
2 parents bce1fc9 + dff8688 commit 588f001
Show file tree
Hide file tree
Showing 7 changed files with 3,982 additions and 19 deletions.
3 changes: 3 additions & 0 deletions GNUmakefile
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@
# - tests/generated/identifiers.ere
# - tests/generated/numbers.ere
# - tests/generated/strings.ere
# - tests/generated/symops.pcre
# - tests/generated/symop_definitions.pcre
# - tests/generated/symops.ecma
#
#
# Targets for testing / auditing the specification
Expand Down
49 changes: 31 additions & 18 deletions optimade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4096,34 +4096,47 @@ The Symmetry Operation String Regular Expressions
-------------------------------------------------

Symmetry operation strings that comprise the :property:`space\_group\_symmetry\_operations\_xyz` property MUST conform to the following regular expressions.
The regular expressions are recorded in the Perl Compatible Regular Expression (PCRE) syntax, with `Perl extensions <https://perldoc.perl.org/perlre>`__ used for readability.
The :val:`symop_definitions` section defines several variables in Perl syntax that capture common parts of the regular expressions (REs) and need to be interpolated into the final REs used for matching.
The :val:`symops` section contains the REs themselves.
The whitespace characters in these definitions are not significant; if used in Perl programs, these expressions MUST be processed with the :code:`/x` RE option.
A working example of these REs in action can be found in the :code:`tests/cases/pcre_symops_001.sh` and other test cases.
The regular expressions are recorded below in two forms, one in a more readable form using variables and the other as an explicit pattern compatible with the `OPTIMADE Regular Expression Format`_.

.. code:: PCRE
- Perl Compatible Regular Expression (PCRE) syntax, with `Perl extensions <https://perldoc.perl.org/perlre>`__ used for readability and expressivity.
The :val:`symop_definitions` section defines several variables in Perl syntax that capture common parts of the regular expressions (REs) and need to be interpolated into the final REs used for matching.
The :val:`symops` section contains the REs themselves.
The whitespace characters in these definitions are not significant; if used in Perl programs, these expressions MUST be processed with the :code:`/x` RE modifier.
A working example of these REs in action can be found in the :code:`tests/cases/pcre_symops_001.sh` and other test cases.

#BEGIN PCRE symop_definitions

$translations = '1\/2|[12]\/3|[1-3]\/4|[1-5]\/6';
.. code:: PCRE
$symop_translation_appended = "[-+]? [xyz] ([-+][xyz])? ([-+] ($translations) )?";
$symop_translation_prepended = "[-+]? ($translations) ([-+] [xyz] ([-+][xyz])? )?";
#BEGIN PCRE symop_definitions
$symop_re = "($symop_translation_appended|$symop_translation_prepended)";
$translations = '1\/2|[12]\/3|[1-3]\/4|[1-5]\/6';
#END PCRE symop_definitions
$symop_translation_appended = "[-+]? [xyz] ([-+][xyz])? ([-+] ($translations) )?";
$symop_translation_prepended = "[-+]? ($translations) ([-+] [xyz] ([-+][xyz])? )?";
.. code:: PCRE
$symop_re = "($symop_translation_appended|$symop_translation_prepended)";
#BEGIN PCRE symops
#END PCRE symop_definitions
^ # From the beginning of the string...
($symop_re)(,$symop_re){2}
$ # ... match to the very end of the string
.. code:: PCRE
#END PCRE symops
#BEGIN PCRE symops
^ # From the beginning of the string...
($symop_re)(,$symop_re){2}
$ # ... match to the very end of the string
#END PCRE symops
- The regular expression is also provided in an expanded form as an OPTIMADE regex:

.. code:: ECMA
#BEGIN ECMA symops
^([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?),([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?),([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?)$
#END ECMA symops
OPTIMADE JSON lines partial data format
---------------------------------------
Expand Down
23 changes: 23 additions & 0 deletions tests/cases/ecma_symops_001.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#! /bin/sh

# Test case: test if the provided ECMA-compatible regular expression correctly
# recognises symmetry operation strings.

#BEGIN DEPEND

INPUT_GRAMMAR=tests/generated/symops.ecma

#END DEPEND


/usr/bin/env python << EOF
import re
import sys
with open("${INPUT_GRAMMAR}") as f:
expression = [line.strip() for line in f.readlines() if line.strip() and not line.strip().startswith("#")][0]
with open("tests/inputs/symops.lst") as cases:
for case in cases:
if re.match(expression, case):
print(case, end="")
EOF
44 changes: 44 additions & 0 deletions tests/cases/symops_pcre_to_ecma_001.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#! /bin/sh

# Test case: tests the equivalence of the provided PCRE and ECMA regular
# expressions used in the validation of space group symmetry operations.
# The equivalence is tested by translating the expression from PCRE to the
# subset of the ECMA 262 dialect supported by OPTIMADE.

set -ue

#BEGIN DEPEND

INPUT_PCRE_DEFS=tests/generated/symop_definitions.pcre
INPUT_PCRE_GRAMMAR=tests/generated/symops.pcre
INPUT_ECMA_GRAMMAR=tests/generated/symops.ecma

#END DEPEND

PCRE_REGEX=$( \
grep -v -e '^ *#' -e '^\s+$' ${INPUT_PCRE_GRAMMAR} | \
perl -ne 's/\s+//g; s/#.*//; s/[\$]$/\\\$/; print;' | \
perl -ne 's/^[\^][(](.+)[)][(](.+)[)]\{2\}\\[\$]$/^$1$2$2$3\\\$/; print;' \
)

EXPANDED_PCRE_REGEX=$( \
perl -I. -w \
-e "require '${INPUT_PCRE_DEFS}';" \
-e "my \$extended_regex = \"${PCRE_REGEX}\";" \
-e '$extended_regex =~ s/[\s\\]+//g;' \
-e 'print $extended_regex;' \
)

ECMA_REGEX=$( \
grep -v -e '^ *#' -e '^\s+$' ${INPUT_ECMA_GRAMMAR} | \
perl -n -e 's/\s+//g; print;' \
)

if [ "${EXPANDED_PCRE_REGEX}" = "${ECMA_REGEX}" ]
then
printf '%s\n' 'PASS: expanded regular expressions match.'
else
printf '%s\n' 'FAIL: expanded regular expressions do not match.'
echo "PCRE: ${EXPANDED_PCRE_REGEX}"
echo "ECMA: ${ECMA_REGEX}"
fi
8 changes: 7 additions & 1 deletion tests/makefiles/Makelocal-grammars
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ EBNF_FILES ?= ${GRAMMARS:%=${GRAMMAR_DIR}/%.ebnf}
GRAMMAR_FILES ?= ${EBNF_FILES:%.ebnf=%.g}

REGEXPS = $(sort $(shell awk '/^ *${RE_START_STRING}/{print $$3}' ${RST_FILES} | tr -d "\r"))
REGEXP_FILES = ${REGEXPS:%=${GRAMMAR_DIR}/%.ere} ${REGEXPS:%=${GRAMMAR_DIR}/%.pcre}
REGEXP_FILES = ${REGEXPS:%=${GRAMMAR_DIR}/%.ere} ${REGEXPS:%=${GRAMMAR_DIR}/%.pcre} ${REGEXPS:%=${GRAMMAR_DIR}/%.ecma}

GRAMMAR_DEPENDENCIES = .grammars.d

Expand All @@ -47,6 +47,8 @@ ${GRAMMAR_DEPENDENCIES}: ${RST_FILES}
$^ | tr -d "\r" >> $@
awk '/^ *${RE_START_STRING} PCRE/{print "${GRAMMAR_DIR}/"$$3".pcre:", FILENAME}' \
$^ | tr -d "\r" >> $@
awk '/^ *${RE_START_STRING} ECMA/{print "${GRAMMAR_DIR}/"$$3".ecma:", FILENAME}' \
$^ | tr -d "\r" >> $@

${GRAMMAR_DIR}/%.ebnf:
awk '/^ *${GRAMMAR_START_STRING} $*/,/^ *${GRAMMAR_END_STRING} $*/' $< \
Expand All @@ -60,6 +62,10 @@ ${GRAMMAR_DIR}/%.pcre:
awk '/^ *${RE_START_STRING} PCRE $*/,/^ *${RE_END_STRING} PCRE $*/' $< \
| sed 's/^ //' | tr -d "\r" > $@

${GRAMMAR_DIR}/%.ecma:
awk '/^ *${RE_START_STRING} ECMA $*/,/^ *${RE_END_STRING} ECMA $*/' $< \
| sed 's/^ //' | tr -d "\r" > $@

.PHONY: tools

${GRAMMAR_DIR}/%.g: ${GRAMMAR_DIR}/%.ebnf | tools
Expand Down
Loading

0 comments on commit 588f001

Please sign in to comment.