Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing region parameters without requiring the Spec. #664

Closed
dkeeney opened this issue Sep 10, 2019 · 12 comments
Closed

Parsing region parameters without requiring the Spec. #664

dkeeney opened this issue Sep 10, 2019 · 12 comments
Assignees

Comments

@dkeeney
Copy link

dkeeney commented Sep 10, 2019

I have been working on a generic EncoderRegion for NetworkAPI which can use C++ and python encoder modules as plugins.

Problem
The problem is how to get the parameters from the Yaml or JSON strings into a format that can be used to create the encoder modules.

Details
Currently we parse the parameter string and then match it up with the region's spec so we know the data type of each item and these are inserted into the ValueMap structure that is passed to the Region when the region implementation is created.

But for the EncoderRegion, it does not have a fixed Spec. The fields needed and their types depend on the encoder modules being plugged in. The encoder modules do not provide a Spec that describes their parameters.

Proposal
What if we just parsed the parameter string, without using the Spec, into a structure of strings (ValueMap) and passed that to the Regions when they are created. So, prior to creating the regions,

  • no validation of field names
  • no data type conversions
  • no insertion of default values for missing or empty fields.

At Region implementation creation time, the Region Imp can then pull out the parameters it needs, performing validation, type conversion, and inserting of default values inside the class. This can be done with almost no coding changes in the Region imp.

In the case of the EncoderRegion, since it does not know about the parameters, it can pass them on down into the encoder and it can validate, convert, and default as it needed. There is no need for the encoders to provide info about their parameters.

Implementation
To make this work, the ValueMap that is passed to the regions needs to hold strings rather than native types so this needs a rewrite. This Modifies the files Value.cpp, Value.hpp (which defines ValueMap) and YAMLUtils.cpp/hpp. And we can eliminate the Scalar.cpp/hpp files.

@breznak
Copy link
Member

breznak commented Sep 10, 2019

But for the EncoderRegion, it does not have a fixed Spec.

or, if things are getting complicated.. would it help to just have some abstract EncoderRegion and derived XYEncoderRegion for each needed enc?

But for the EncoderRegion, it does not have a fixed Spec.

follow up to the above, can you have a GenericEncReg, that takes a Spec as its argument, and derived class XYEncReg which has its specific Spec...or that does not help any?

Implementation
To make this work, the ValueMap that is passed to the regions needs to hold strings rather than native types so this needs a rewrite. This Modifies the files Value.cpp, Value.hpp (which defines ValueMap) and YAMLUtils.cpp/hpp. And we can eliminate the Scalar.cpp/hpp files.

would this mean the region stores only {"age", "99"} instead of current {age, (int) 99}, and each algorithm (ie Encoder) would have to know to parse from "text" (JSON) params? (That could be linked with the task to have each Algorithm enabled with a constructor(JSON)).

@dkeeney
Copy link
Author

dkeeney commented Sep 10, 2019

After spending three days rewriting the ValueMap class I got to thinking about the parser. We are using the yaml-cpp parser which has some problems. See Issue #218. But this builds a structure that is basically what I was developing...a container of parsed strings with templated functions to extract values in whatever format is needed.

In Cereal, our serialization package, contains a JSON parser called RapidJSON. http://rapidjson.org/ This provides a standard DOM object as its output which also could do what we want as a ValueMap to hold the parameters. JSON and YAML have different syntax but otherwise are almost the same. However we cannot pass yaml strings to RapidJSON so this is probably not a solution.

@dkeeney
Copy link
Author

dkeeney commented Sep 10, 2019

would this mean the region stores only {"age", "99"} instead of current {age, (int) 99}

Yes, but the class has functions that can perform the conversion as part of the extract.
For example. Here is the current code inside the SPRegion constructor.

  args_.columnCount = values.getScalarT<UInt32>("columnCount", 0);
  args_.potentialRadius = values.getScalarT<UInt32>("potentialRadius", 0);
  args_.potentialPct = values.getScalarT<Real32>("potentialPct", 0.5);
  args_.globalInhibition = values.getScalarT<bool>("globalInhibition", true);
  args_.localAreaDensity = values.getScalarT<Real32>("localAreaDensity", 0.05f);
  args_.stimulusThreshold = values.getScalarT<UInt32>("stimulusThreshold", 0);
  args_.synPermInactiveDec = values.getScalarT<Real32>("synPermInactiveDec", 0.008f);
  args_.synPermActiveInc = values.getScalarT<Real32>("synPermActiveInc", 0.05f);
  args_.synPermConnected = values.getScalarT<Real32>("synPermConnected", 0.1f);
  args_.minPctOverlapDutyCycles = values.getScalarT<Real32>("minPctOverlapDutyCycles", 0.001f);
  args_.dutyCyclePeriod = values.getScalarT<UInt32>("dutyCyclePeriod", 1000);
  args_.boostStrength = values.getScalarT<Real32>("boostStrength", 0.0f);
  args_.seed = values.getScalarT<Int32>("seed", 1);
  args_.spVerbosity = values.getScalarT<UInt32>("spVerbosity", 0);
  args_.wrapAround = values.getScalarT<bool>("wrapAround", true);
  spatialImp_ = values.getString("spatialImp", "");

values is our ValueMap that is passed to the constructor.
What I am saying is that if the ValueMap justs contains a map of strings, the conversion can occur when the getScalarT( name) function is called. Right now they are already converted in the ValueMap based on the Spec.

@dkeeney
Copy link
Author

dkeeney commented Sep 10, 2019

would it help to just have some abstract EncoderRegion and derived XYEncoderRegion for each needed enc?

I was hoping to avoid having a wrapper around encoders like we do around the other algorithms. It makes for a lot of code to maintain. Although, maybe that is simpler.

@breznak
Copy link
Member

breznak commented Sep 11, 2019

In Cereal, our serialization package, contains a JSON parser called RapidJSON. http://rapidjson.org/ This provides a standard DOM object as its output which also could do what we want as a ValueMap to hold the parameters. JSON and YAML have different syntax but otherwise are almost the same.

that would be a nice idea to use DOM,

However we cannot pass yaml strings to RapidJSON so this is probably not a solution.

I was surprised, a quick search on "cpp yaml json parser" didn't give me any library

ValueMap justs contains a map of strings, the conversion can occur when the getScalarT( name) function is called. Right now they are already converted in the ValueMap based on the Spec.

which would allow you to have different encoders (more precisely encoder region) with different Specs.
But wouldn't there still be the same code needed for "if exists "this param for encoder X", then convert int pX = getScalar(xxx); if exists "param for enc Y", then parse float pY = getScalar(..)" (?)

@dkeeney
Copy link
Author

dkeeney commented Sep 11, 2019

wouldn't there still be the same code needed for "if exists ...

Yes that needs to be someplace. So this does not solve my problem of making a generic encoder region. It does make my code cleaner however.

@Thanh-Binh
Copy link

@dkeeney from my point of view, our implementation of region should be small and clear as possible. Parameter passing is user-depending.
In SPRegion, for example, we have too much getXXX and setXXX for parameters, that makes our codes not so clear!
What do you think?

@dkeeney
Copy link
Author

dkeeney commented Sep 13, 2019

we have too much getXXX and setXXX for parameters,

I absolutely agree. The Region implementations require way too much work to create and maintain.

However I am working under the following constraints:

  • Must not break the API for NetworkAPI interfaces.
  • Parameters are provided in two ways.
  1. via the JSON or YAML string in the third argument of the addRegion( ) call. These are at configuration time.
  2. via the getters and setters. These are after initialization().
  • We need some way of translating parameters in a tree of parsed strings into native data types that can be used to call the algorithm constructor.
  • We need a way for the user to read and/or modify parameters after initialization for parameters that can be used that way. The Region implementation does not need to implement any getters or setters unless the parameters need to be readable or changeable after initialization. If not implemented the calls are passed through to the base class.

These have been driven by the Spec that each Region Implementation must provide. My current investigation is trying to find a way to handle the parameters without the Spec. That will help but getting rid of the getters and setters ... or moving them to the base class is something else I want to do but I have no good ideas at the moment.

@Thanh-Binh
Copy link

@dkeenney understood and hope you will finish it soon

@dkeeney
Copy link
Author

dkeeney commented Sep 27, 2019

yaml_cpp just did a new release. v0.6.3
We were pulling from the master because v0.6.2 release was broken.
Now the master is broken but the new v0.6.3 seems to be ok so far.

@breznak
Copy link
Member

breznak commented Sep 27, 2019

yaml_cpp just did a new release. v0.6.3

guess it still does not resolve the issue on linux & Debug #218 , still great there's finally a new release

@dkeeney
Copy link
Author

dkeeney commented Oct 27, 2019

With PR #715 we fixed the yaml parser problem by switching to libyaml. We also accomplished most of the things being talked about here. I still do not have a universal encoder plugin (EncoderRegion) but that can be a separate issue. So closing this issue.

@dkeeney dkeeney closed this as completed Oct 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants