Skip to content

Commit

Permalink
Add schema validation code to C generator
Browse files Browse the repository at this point in the history
Modify C generator to compare floating point and integer numbers with
enumerations and ranges specified in DFDL schemas.  Test validation
code with additional simple type root elements in simple.dfdl.xsd
schema and additional TDML tests in simple_errors.tdml.

Also allow C generator to compare hexBinary elements to values in
enumerations since Daffodil does the same thing and C generator
already compares hexBinary elements to values in fixed attributes.

Allow C generator to ignore dfdl:assert expressions in DFDL schemas
and generate code anyway instead of throwing an exception.

Allow C generator to get schema version from DFDL schemas and put it
into the generated C code to version the generated code as well.

Found out Daffodil's default alignment properties (alignment="1" and
alignmentUnits="bytes") makes Daffodil append extra fill bits to
elements with odd sizes not a multiple of 8 bits.  Because
compatibility between Daffodil and DaffodilC depends on defining both
dfdl:alignment="1" (default) and dfdl:alignmentUnits="bits"
(non-default), define a known good binary data format in one place
(network/format.dfdl.xsd) and include it in rest of
daffodil-codegen-c's test schemas.

Finally, make changes requested by PR review.

DAFFODIL-2853

BUILD.md: Document how to install iwyu and libcriterion-dev (used only
for Daffodil C code generator development and maintenance).

README.md: Document how to check formatting of or reformat Daffodil.
Also document how to put the Daffodil jars in the maven and ivy
caches.

c/files/.clang-format: Update to clang-format 14.  Format declarations
more concisely by removing AlignConsecutiveDeclarations (this is why
some reformatted C files lose some extra whitespace).

c/files/Makefile: Shorten iwyu's maximum line length from 999 to 111.

c/files/**/**.[ch]: Add "auto-maintained by iwyu" comment to headers.

c/files/libcli/cli_errors.[ch]: Rename CLI_ZZZ, ERR_ZZZ, and FIELD_ZZZ
to CLI__NUM_CODES, ERR__NUM_CODES, and FIELD__NO_ARGS.

c/files/libcli/daffodil_getopt.[ch]: Merge
daffodil_parse_cli and daffodil_unparse_cli structs and objects into
daffodil_pu_cli and daffodil_pu.  Skip and ignore -r and -s options
so one can run "c/daffodil parse" with the same options as "daffodil
parse -r root -s schema -o outfile infile" without having to remove
these options.

c/files/libcli/daffodil_main.c: Merge daffodil_parse_cli and
daffodil_unparse_cli structs and objects into daffodil_pu_cli and
daffodil_pu.  Initialize ParserOrUnparserState fields as well as
PState/UState fields.  Make sure any diagnostics will fail unparse as
well as parse if validate mode is on.

c/files/libruntime/errors.[ch]: Add ERR_ENUM_MATCH and
ERR_OUTSIDE_RANGE error messages to diagnose elements not matching any
of their enumerations or having values outside their allowed ranges.
Rename ERR_ZZZ and FIELD_ZZZ to ERR__NUM_CODES and FIELD__NO_ARGS.
Rename ERR_ENUM_MATCH, ERR_FIXED_VALUE, and ERR_OUTSIDE_RANGE to
ERR_RESTR_ENUM, ERR_RESTR_FIXED, and ERR_RESTR_RANGE. Remove
`daffodil_program_version` since we use `daffodil_version` in
daffodil_version.h instead.

c/files/libruntime/infoset.h: Move common PState/UState fields to
ParseOrUnparseState to allow parse and unparse functions to call
common validate functions.

c/files/libruntime/parsers.[ch]: Use ParserOrUnparserState fields as
well as PState/UState fields.  Replace parse_check_bounds and
parse_validate_fixed functions with validate_array_bounds and
validate_fixed_attribute functions in validators.[ch].  Rename
parse_align and parse_fill_bits to parse_align_to and
parse_alignment_bits.

c/files/libruntime/unparsers.[ch]: Use ParserOrUnparserState fields as
well as UState fields.  Replace unparse_check_bounds and
unparse_validate_fixed functions with validate_array_bounds and
validate_fixed_attribute functions in validators.[ch].  Rename
unparse_align and unparse_fill_bits to unparse_align_to and
unparse_alignment_bits.

c/files/libruntime/validators.[ch]: Move common validation functions
here from parsers.[ch] and unparsers.[ch].  Add new float, hexBinary,
int, and universal validation functions to check that elements match
their allowed enumerations (requires passing array of floating point
numbers, hexBinary structs, or integer numbers) and fit within their
allowed ranges.  Validation functions create diagnostics instead of
errors; the CLI or a caller is responsible for printing the
diagnostics and exiting with the appropriate status code.

c/files/tests/bits.c: Use ParserOrUnparserState fields as well as
PState/UState fields.

c/files/tests/extras.c: Regenerate iwyu comments.

c/DaffodilCCodeGenerator.scala:  Accept but ignore dfdl:assert statements
(for now).

c/DaffodilCExamplesGenerator.scala: Generate code from simple's
all-in-one root element in order to show generated code for all simple
types, enums, and ranges.

c/generators/AlignmentFillCodeGenerator.scala: Rename parse_align and
unparse_align to parse_align_to and unparse_align_to.  Use
ParserOrUnparserState fields as well as PState/UState fields.

c/generators/BinaryBooleanCodeGenerator.scala: Use
ParserOrUnparserState fields as well as PState/UState fields.

c/generators/BinaryValueCodeGenerator.scala: Generate C code to
validate enumerations and ranges of primitive elements.  Get raw
enumeration values into a Seq[String].  Avoid unsigned >= 0
comparisons to prevent gcc warnings.  Use ParserOrUnparserState fields
as well as PState/UState fields.  Call correct function to validate
enums depending on element's primType.  Generate extra C
initialization code for hexBinary enumerations to define arrays of
hexBinary structs and pass them to validate_hexbinary_enumeration.

c/generators/CodeGeneratorState.scala: Remove unnecessary immutable
(built-in Set is immutable).  Use ParserOrUnparserState fields as well
as PState/UState fields.  Rename parse_fill_bits and unparse_fill_bits
to parse_alignment_bits and unparse_alignment_bits.  Get actual schema
version from Daffodil and assign its value to `schema_version` in
generated_code.c.  Declare `schema_version` in generated_code.h.  Call
validate_array_bounds instead of parse/unparse_check_bounds.  Make
cStructFieldAccess work correctly for DFDL expressions like
"/rr:ReqstReply/..." used by NFS schemas.

c/generators/HexBinaryCodeGenerator.scala: Use ParserOrUnparserState
fields as well as PState/UState fields.  Call validate_fixed_attribute
instead of parse/unparse_validate_fixed.

examples/**/generated_code.[ch]: Regenerate to show changes in C
generator such as adding new "auto-maintained by iwyu" comment, calls
to renamed functions such as parse_align_to and unparse_align_to,
defining schema_version, using pu fields, and calling validation
functions.

c/data/simple*.dat: Remove (contents now inside simple.tdml).

c/ex_nums.dfdl.xsd: Define schema version to be "1.0.2".  Include
network format instead of defining own binary format.  Adjust
elements' own format properties as needed to match original data file
(explicitness is better than using defaults).

c/infosets/simple*.xml: Remove (contents now inside simple.tdml).

c/nested.dfdl.xsd: Include network format instead of defining own
binary format.

c/network/format.dfdl.xsd: Define network format in only one place,
making sure to use vitally needed alignment properties but keep the
format as minimal as possible to make it easier to include in a wide
variety of schemas.  Change "Network order binary format" comment to
"Network order big endian format" as requested.  Define bitOrder and
byteOrder explicitly in format schema instead of inheriting them from
DFDLGeneralFormat schema.

c/padtest.dfdl.xsd: Include network format instead of defining own
binary format.

c/simple.dfdl.xsd: Define schema version to be "1.0.0".  Include
network format instead of defining own binary format.  Add new
elements to test enumerations and ranges of primitive types.  Use
custom simple types in order to define simple type root elements,
enumerations of simple type root elements, ranges of simple type
elements, and all-in-one root element as compactly as possible.

c/simple.tdml: Make comments clearer how to run tests and include all
data and infosets in test cases instead of using files.

c/simple_errors.tdml: Make comments clearer how to run tests and add
new test cases to validate enumerations and ranges of primitive types.

c/variablelen.dfdl.xsd: Include network format instead of defining own
binary format.

core/dsom/SchemaDocument.scala: Modify SchemaDocument to capture
schema versions from XML schema definitions and provide to codegen-c.

tdml/TestDaffodilC.scala: Add future-proofing test to check test
schema compiles without any warnings (would catch "relative location
deprecated" warning if DFDLGeneralFormat url was still relative).  Fix
inconsistent use of "dp"/"tdp" and "isError"/"isProcessingError".

c/TestSimpleErrors.scala: Call new test cases in simple_errors.tdml.
  • Loading branch information
tuxji committed Oct 22, 2023
1 parent f3fb555 commit 604d509
Show file tree
Hide file tree
Showing 85 changed files with 3,331 additions and 1,474 deletions.
19 changes: 11 additions & 8 deletions BUILD.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@ following its website's instructions. This is necessary to install the
[libmxml-devel][Mini-XML] package.

You can use the `dnf` package manager to install most of the tools
needed to build Daffodil:
used to develop Daffodil:

sudo dnf install clang gcc git java-11-openjdk-devel llvm make mxml-devel pkgconf
sudo dnf install clang gcc git iwyu java-11-openjdk-devel llvm make mxml-devel pkgconf

If you want to use clang instead of gcc, you'll have to set your
environment variables `CC` and `AR` to the clang binaries' names:
Expand All @@ -68,15 +68,17 @@ commands you type will be able to call the C compiler.

## Ubuntu

You can use the `apt` package manager to install most of the tools
needed to build Daffodil:
You can use the `apt` package manager to install all of the tools
used to develop Daffodil:

sudo apt install build-essential clang clang-format default-jdk git libmxml-dev
sudo apt install build-essential clang clang-format default-jdk git iwyu libcriterion-dev libmxml-dev
# If "iwyu -print-resource-dir" prints /usr/lib/clang/13.0.1 and it doesn't exist:
sudo apt install libclang-common-13-dev

If you want to use clang instead of gcc, you'll have to set your
environment variables `CC` and `AR` to the clang binaries' names:

export CC=clang AR=llvm-ar-14 # or llvm-ar-10 or whatever you have
export CC=clang AR=llvm-ar-14 # or whatever llvm-ar-* version you have

However, Ubuntu has no [sbt] package in its own repositories.
You'll have to install the latest [sbt] version following its
Expand All @@ -95,7 +97,7 @@ Install [MSYS2] following its website's instructions and open a new
libraries.

You can use the `pacman` package manager to install most of the tools
needed to build Daffodil:
used to develop Daffodil:

pacman -S clang diffutils gcc git make pkgconf

Expand Down Expand Up @@ -130,9 +132,10 @@ Install the Xcode Command Line Tools:
xcode-select --install

You can use the [Homebrew] package manager to install most of the tools
needed to build Daffodil:
used to develop Daffodil:

brew install clang-format
brew install criterion
brew install git
brew install llvm # needed by iwyu
brew install include-what-you-use
Expand Down
39 changes: 30 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,32 +59,53 @@ Compile source code:

sbt compile

### Tests
### Test

Run unit tests:
Check all unit tests pass:

sbt test

Run slower integration tests:
Check all integration tests pass:

sbt daffodil-test-integration/test

### Command Line Interface
### Format

Build the command line interface (Linux and Windows shell scripts in
`daffodil-cli/target/universal/stage/bin/`; see the [Command Line
Interface] documentation for details on their usage):
Check format of source and sbt files:

sbt scalafmtCheckAll scalafmtSbtCheck

Reformat source and sbt files if necessary:

sbt scalafmtAll scalafmtSbt

### Build

Build the Daffodil command line interface (Linux and Windows shell
scripts in `daffodil-cli/target/universal/stage/bin/`; see the
[Command Line Interface] documentation for details on their usage):

sbt daffodil-cli/stage

### License Check
Publish the Daffodil jars to a Maven repository (for Java projects) or
Ivy repository (for Scala or schema projects).

Maven (for Java or mvn):

sbt publishM2

Ivy (for Scala or sbt):

sbt publishLocal

### Check Licenses

Run [Apache RAT] (license audit report in `target/rat.txt` and error
if any unapproved licenses are found):

sbt ratCheck

### Test Coverage Report
### Check Coverage

Run [sbt-scoverage] (report in `target/scala-ver/scoverage-report/`):

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,12 @@
# See the License for the specific language governing permissions and
# limitations under the License.

AlignConsecutiveDeclarations: true
AllowShortFunctionsOnASingleLine: None
AllowShortIfStatementsOnASingleLine: true
AllowShortIfStatementsOnASingleLine: WithoutElse
AlwaysBreakAfterReturnType: TopLevelDefinitions
BasedOnStyle: llvm
BasedOnStyle: LLVM
BreakBeforeBraces: Allman
ColumnLimit: 110
IndentWidth: 4
KeepEmptyLinesAtTheStartOfBlocks: false
SortIncludes: false
SortIncludes: Never
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ clean:
# $ make iwyu

FMT = clang-format -i
IWYU = iwyu -Xiwyu --max_line_length=999 -Xiwyu --update_comments
IWYU = iwyu -Xiwyu --max_line_length=111 -Xiwyu --update_comments

format:
$(FMT) $(HEADERS) $(SOURCES) $(TSOURCES)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
* limitations under the License.
*/

// auto-maintained by iwyu
// clang-format off
#include "cli_errors.h"
#include <assert.h> // for assert
Expand All @@ -31,9 +32,9 @@
static const ErrorLookup *
error_lookup(uint8_t code)
{
static const ErrorLookup table[CLI_ZZZ - ERR_ZZZ] = {
static const ErrorLookup table[CLI__NUM_CODES - ERR__NUM_CODES] = {
{CLI_DIAGNOSTICS, "parse failed with %" PRId64 " diagnostics\n", FIELD_D64},
{CLI_FILE_CLOSE, "error closing file\n", FIELD_ZZZ},
{CLI_FILE_CLOSE, "error closing file\n", FIELD__NO_ARGS},
{CLI_FILE_OPEN, "error opening file '%s'\n", FIELD_S},
{CLI_HELP_USAGE,
"Usage: %s [OPTION...] <command> [infile]\n"
Expand Down Expand Up @@ -61,36 +62,36 @@ error_lookup(uint8_t code)
{CLI_INVALID_INFOSET, "invalid infoset type -- '%s'\n" USAGE, FIELD_S},
{CLI_INVALID_OPTION, "invalid option -- '%c'\n" USAGE, FIELD_C},
{CLI_INVALID_VALIDATE, "invalid validate mode -- '%s'\n" USAGE, FIELD_S},
{CLI_MISSING_COMMAND, "missing command\n" USAGE, FIELD_ZZZ},
{CLI_MISSING_COMMAND, "missing command\n" USAGE, FIELD__NO_ARGS},
{CLI_MISSING_VALUE, "option requires an argument -- '%c'\n" USAGE, FIELD_C},
{CLI_PROGRAM_ERROR,
"unexpected getopt code %" PRId64 "\n"
"Check for program error\n",
FIELD_D64},
{CLI_PROGRAM_VERSION, "%s\n", FIELD_S_ON_STDOUT},
{CLI_STACK_EMPTY, "stack empty, stopping program\n", FIELD_ZZZ},
{CLI_STACK_OVERFLOW, "stack overflow, stopping program\n", FIELD_ZZZ},
{CLI_STACK_UNDERFLOW, "stack underflow, stopping program\n", FIELD_ZZZ},
{CLI_STACK_EMPTY, "stack empty, stopping program\n", FIELD__NO_ARGS},
{CLI_STACK_OVERFLOW, "stack overflow, stopping program\n", FIELD__NO_ARGS},
{CLI_STACK_UNDERFLOW, "stack underflow, stopping program\n", FIELD__NO_ARGS},
{CLI_STRTOBOOL, "error converting XML data '%s' to boolean\n", FIELD_S},
{CLI_STRTOD_ERRNO, "error converting XML data '%s' to number\n", FIELD_S},
{CLI_STRTOI_ERRNO, "error converting XML data '%s' to integer\n", FIELD_S},
{CLI_STRTONUM_EMPTY, "found no number in XML data '%s'\n", FIELD_S},
{CLI_STRTONUM_NOT, "found non-number characters in XML data '%s'\n", FIELD_S},
{CLI_STRTONUM_RANGE, "number in XML data '%s' out of range\n", FIELD_S},
{CLI_UNEXPECTED_ARGUMENT, "unexpected extra argument -- '%s'\n" USAGE, FIELD_S},
{CLI_XML_DECL, "error making new XML declaration\n", FIELD_ZZZ},
{CLI_XML_DECL, "error making new XML declaration\n", FIELD__NO_ARGS},
{CLI_XML_ELEMENT, "error making new XML element '%s'\n", FIELD_S},
{CLI_XML_ERD, "unexpected ERD typeCode %" PRId64 " while reading XML data\n", FIELD_D64},
{CLI_XML_GONE, "ran out of XML data\n", FIELD_ZZZ},
{CLI_XML_INPUT, "unable to read XML data from input file\n", FIELD_ZZZ},
{CLI_XML_GONE, "ran out of XML data\n", FIELD__NO_ARGS},
{CLI_XML_INPUT, "unable to read XML data from input file\n", FIELD__NO_ARGS},
{CLI_XML_LEFT, "did not consume all of the XML data, '%s' left\n", FIELD_S},
{CLI_XML_MISMATCH, "found mismatch between XML data and infoset '%s'\n", FIELD_S},
{CLI_XML_WRITE, "error writing XML document\n", FIELD_ZZZ},
{CLI_XML_WRITE, "error writing XML document\n", FIELD__NO_ARGS},
};

if (code >= ERR_ZZZ && code < CLI_ZZZ)
if (code >= ERR__NUM_CODES && code < CLI__NUM_CODES)
{
const ErrorLookup *lookup = &table[code - ERR_ZZZ];
const ErrorLookup *lookup = &table[code - ERR__NUM_CODES];

// Double check that we looked up correct row
assert(code == lookup->code);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,16 @@
#ifndef CLI_ERRORS_H
#define CLI_ERRORS_H

// auto-maintained by iwyu
// clang-format off
#include "errors.h" // for ERR_ZZZ
#include "errors.h" // for ERR__NUM_CODES
// clang-format on

// CliCode - identifiers of libcli errors

enum CliCode
{
CLI_DIAGNOSTICS = ERR_ZZZ,
CLI_DIAGNOSTICS = ERR__NUM_CODES,
CLI_FILE_CLOSE,
CLI_FILE_OPEN,
CLI_HELP_USAGE,
Expand Down Expand Up @@ -59,7 +60,7 @@ enum CliCode
CLI_XML_LEFT,
CLI_XML_MISMATCH,
CLI_XML_WRITE,
CLI_ZZZ,
CLI__NUM_CODES,
};

// CliLimits - limits on how many elements static arrays can hold
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
* limitations under the License.
*/

// auto-maintained by iwyu
// clang-format off
#include "daffodil_getopt.h"
#include <string.h> // for strcmp, strrchr
Expand All @@ -29,23 +30,15 @@ struct daffodil_cli daffodil_cli = {
DAFFODIL_MISSING_COMMAND, // default subcommand
};

// Initialize our "daffodil parse" CLI options
// Initialize our "daffodil parse/unparse" CLI options

struct daffodil_parse_cli daffodil_parse = {
struct daffodil_pu_cli daffodil_pu = {
"xml", // default infoset type
"-", // default infile
"-", // default outfile
false, // default validate
};

// Initialize our "daffodil unparse" CLI options

struct daffodil_unparse_cli daffodil_unparse = {
"xml", // default infoset type
"-", // default infile
"-", // default outfile
};

// Parse our command line interface. Note there is NO portable way to
// parse "daffodil [options] command [more options] arguments" with
// getopt. We will have to put all options before all arguments,
Expand All @@ -63,7 +56,7 @@ parse_daffodil_cli(int argc, char *argv[])

// We expect callers to put all non-option arguments at the end
int opt = 0;
while ((opt = getopt(argc, argv, ":hI:o:V:v")) != -1)
while ((opt = getopt(argc, argv, ":hI:o:r:s:V:v")) != -1)
{
switch (opt)
{
Expand All @@ -78,17 +71,21 @@ parse_daffodil_cli(int argc, char *argv[])
error.arg.s = optarg;
return &error;
}
daffodil_parse.infoset_converter = optarg;
daffodil_unparse.infoset_converter = optarg;
daffodil_pu.infoset_converter = optarg;
break;
case 'o':
daffodil_parse.outfile = optarg;
daffodil_unparse.outfile = optarg;
daffodil_pu.outfile = optarg;
break;
case 'r':
// Ignore "-r root" option/optarg
break;
case 's':
// Ignore "-s schema" option/optarg
break;
case 'V':
if (strcmp("limited", optarg) == 0 || strcmp("on", optarg) == 0)
{
daffodil_parse.validate = true;
daffodil_pu.validate = true;
}
else if (strcmp("off", optarg) != 0)
{
Expand Down Expand Up @@ -122,25 +119,14 @@ parse_daffodil_cli(int argc, char *argv[])
{
const char *arg = argv[i];

if (DAFFODIL_PARSE == daffodil_cli.subcommand)
{
if (strcmp("-", daffodil_parse.infile) == 0)
{
daffodil_parse.infile = arg;
}
else
{
error.code = CLI_UNEXPECTED_ARGUMENT;
error.arg.s = arg;
return &error;
}
}
else if (DAFFODIL_UNPARSE == daffodil_cli.subcommand)
if (DAFFODIL_MISSING_COMMAND != daffodil_cli.subcommand)
{
if (strcmp("-", daffodil_unparse.infile) == 0)
// Set infile only once
if (strcmp("-", daffodil_pu.infile) == 0)
{
daffodil_unparse.infile = arg;
daffodil_pu.infile = arg;
}
// Error if infile is followed by another arg
else
{
error.code = CLI_UNEXPECTED_ARGUMENT;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#ifndef DAFFODIL_GETOPT_H
#define DAFFODIL_GETOPT_H

// auto-maintained by iwyu
// clang-format off
#include <stdbool.h> // for bool
#include "errors.h" // for Error
Expand All @@ -35,24 +36,15 @@ extern struct daffodil_cli
} subcommand;
} daffodil_cli;

// Declare our "daffodil parse" CLI options
// Declare our "daffodil parse/unparse" CLI options

extern struct daffodil_parse_cli
extern struct daffodil_pu_cli
{
const char *infoset_converter;
const char *infile;
const char *outfile;
bool validate;
} daffodil_parse;

// Declare our "daffodil unparse" CLI options

extern struct daffodil_unparse_cli
{
const char *infoset_converter;
const char *infile;
const char *outfile;
} daffodil_unparse;
bool validate;
} daffodil_pu;

// Parse our command line interface

Expand Down
Loading

0 comments on commit 604d509

Please sign in to comment.