Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The EncoderRegion, a generic plugin container for encoders. #657

Closed
wants to merge 5 commits into from

Conversation

dkeeney
Copy link

@dkeeney dkeeney commented Sep 6, 2019

This is a first crack at a plugin container for the encoder implementations. Feel free to recommend suggestions. The basic idea is that when you declare a region for an encoder using addRegion( ) you use the type EncoderRegion:<encoder name>. This will load the EncoderRegion implementation and load the requested encoder algorithm module.

  • The GenericEncoder class is the base class for encoders that can be plugins. Once all of them are plugins we can change its name to BaseEncoder. The BaseEncoder cannot use a class template if it is to be used with polymorphism. So there are some differences between the two classes.

  • I have not yet added the Python bindings so that encoders can be loaded from python code as well.

  • You will notice that I had to add a little block of code to each encoder module so that the generic code in the EncoderRegion can create the expected configuration structure without having a reference to the class. I implemented this on ScalerEncoder and RDSE encoder. The Document encoder will have to wait until I extend the parameter handling a little more.

@dkeeney
Copy link
Author

dkeeney commented Sep 6, 2019

I am not quite ready to push this yet ... First I want to makes sure that everyone likes the idea of incorporating the encoders into NetworkAPI as plug-ins. Then I need to add the ability to plug-in the python written encoders.

Questions:

  • Is there a better way for the EncoderRegion to 'discover' the parameters with their types and default values?
  • Does this approach impose restrictions on the encoder algorithm that would prevent writing an algorithm that encodes some other type of data?
  • The Python encoders use the signature sdr = encoder(value). NetworkAPI likes to allocate the output buffer prior to calling the algorithm. Is there a way to do this without copying the output?
  • How should we be thinking about dimensions? Are dimensions naturally a result of the data being encoded and therefore should be set by the algorithm or should they be manually determined before the encoder( ) call? ... or should they all be flat arrays.

@dkeeney
Copy link
Author

dkeeney commented Sep 6, 2019

The Python encoders use the signature sdr = encoder(value). NetworkAPI likes to allocate the output buffer prior to calling the algorithm. Is there a way to do this without copying the output?

I see python already handles this by allowing the SDR buffer to be an optional argument.

@ctrl-z-9000-times
Copy link
Collaborator

How should we be thinking about dimensions? Are dimensions naturally a result of the data being encoded and therefore should be set by the algorithm or should they be manually determined before the encoder( ) call? ... or should they all be flat arrays.

Encoders can have dimensions. The dimensions are typically derived from the raw sensory inputs (which can have dimensions) [1], and from the parameters to the encoder [2]. Both of these things are typically known before the encoder is constructed.
[1] For example vision & images are 2D rectangles, with topology which is preserved as the image is processed by the HTM.
[2] For example, the scalar encoder has a constant number of bits in all of its encoded output SDRs.

However IIRC, all of the encoders currently in htm.core have a single dimension.

@dkeeney
Copy link
Author

dkeeney commented Sep 7, 2019

Thanks for your reply. I was hoping that you would respond. :-)

The dimensions are typically derived from the raw sensory inputs (which can have dimensions) [1], and from the parameters to the encoder [2]. Both of these things are typically known before the encoder is constructed.

This is sort of what I thought. This means that the encoder itself knows this info.

In my NetworkAPI situation, any encoder algorithm module should be able to be plugged into the new EncoderRegion but the EncoderRegion knows nothing about what any particular encoder needs. It does know:

  • It knows there will probably be parameters but it does not know what they are, their type, or what the default values should be. It does have the facility to capture and parse the Yaml parameter string provided when the EncoderRegion is created but it does not know how to treat them (type, etc).
  • It knows there is a structure that must be filled in with the parameters to create the encoder but it does not know the field names, data types or default values of any of them. It does not have a reference to the structure definition.
  • It knows there will probably be some inputs. It could come in from a connected Link from another Region or it could come in from one of the setParameterXXX( ) calls. But it does not know which data type is expected.
  • It does know that the output will be an SDR but it does not know its dimensions.

So, it needs a way get this info from the encoder itself. I do not want to require a lot of new code in each encoder to provide this info but I do need something. So, here is my proposal for C++ encoders:

  • I defined this structure and a macro in the base class for the encoder.
// This structure is populated with the FIELD( ) macro
typedef struct {
  std::string name;
  int offset;
  NTA_BasicType type;
  std::string default_value;
} ParameterDescriptorFields;
typedef struct {
  NTA_BasicType expectedInputType;
  size_t expectedInputSize;
  char *parameterStruct;
  size_t parameterSize;
  std::map<std::string, ParameterDescriptorFields> parameters;
} ParameterDescriptor;

#define FIELD(n)                                        \
  { #n,                                                 \
    { #n,                                               \
      (int)((char *)&args_.n - (char *)&args_),         \
      BasicType::getType(typeid(args_.n)),              \
      std::to_string(args_.n)}                          \
  }
  • Then I require the following little bit of code in each encoder to describe itself:
    std::string getName() const override { return "ScalerEncoder"; }
    // A description of the fields in the Parameters structure.
    ParameterDescriptor getDescriptor() override {
      ParameterDescriptor desc = {
          NTA_BasicType_Real64, // expected input type
          1,                    // expected input array size (0 means variable)
          (char *)&args_,       // pointer to parameter structure (for getters)
          sizeof(args_),        // sizeof parameter structure
          // list of fields in the parameter structure
          {FIELD(minimum),  FIELD(maximum),    FIELD(clipInput), 
           FIELD(periodic), FIELD(category),   FIELD(activeBits), 
           FIELD(sparsity),  FIELD(size),      FIELD(radius),     
           FIELD(resolution)}};
      return desc;
    }

With this information I can build the parameter structure to initialize the encoder and prepare the input to be passed to it.

Is there a better way to do this?
Perhaps a Yaml or JSON string with this info...

@dkeeney
Copy link
Author

dkeeney commented Sep 7, 2019

I am struggling with how to be able to plug-in encoders written in python. I can use inspection to find the fields that it expects in initialization but types and default values are a little hard to derive without making some code changes. Looking for ideas.

Copy link
Member

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a quick review of the code, I'll need to think a bit more about the conceptual questions you are asking.

v2.0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OT, going back with ver?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VERSION file is changed by the build every time it runs.
I don't know if that is good or bad that it does that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add it to gitignore? Or is VERSION to be manually set for the Releases? Then builds should not auto-increment the value

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The builds do not auto increment. They just find the most recent tag and stick that in the VERSION file. The VERSION file is then used by everything that needs version during the build.
I think it is useful to know what was the current tag at the point a build occurred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the latest current tag something like v2.0.10, or so?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on what tags it finds on your branch. If it does not find any it remains unchanged.

* --------------------------------------------------------------------- */

/** @file
* Defines the base class for all encoders.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is renamed from BaseEncoder to GenericEnc ? why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GenericEncoder name is just a temporary name until all encoders are converted. Then we can change its name its name back to BaseEncoder.

(int)((char *)&args_.n - (char *)&args_), \
BasicType::getType(typeid(args_.n)), \
std::to_string(args_.n)} \
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please mention in some comment that this code & structs are used for the NetworkAPI / EncoderRegions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please mention in some comment that this code & structs are used for the NetworkAPI / EncoderRegions

Yes I will, assuming that we use this way of obtaining the spec.

std::string getName() const override { return "RDSE"; }

// A discription of the fields in the RDSE_Parameters structure.
ParameterDescriptor getDescriptor() override {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc that this is for netAPI/Regions.
Ok if the override is optional, I wouldn't be too happy if it were required for each new enc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok if the override is optional, I wouldn't be too happy if it were required for each new enc

If we are to use an encoder as a plugin to networkAPI with (Spec) then yes, all encoders will be required to provide this information. Otherwise, networkAPI does not know how to call it.


const RDSE_Parameters &parameters = args_;

void encode(Real64 input, SDR &output) override;

void encode(Real64 input, SDR &output);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't this required (override) anymore? The c++ API uses this method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't this required (override) anymore? The c++ API uses this method.

Well, that would mean that ALL encoders must require an input of a Real64. I think that only applies to the ScalerEncoder and RDSE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the reason why BaseEncoder was templated, #657 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand, but I cannot use a templeted BaseEncoder with polymorphism. So it we have void encode(Real64 input, SDR &output); it cannot use override. I know...its a bummer.


#include <htm/utils/Log.hpp>

// from http://stackoverflow.com/a/9096509/1781435
#define stringify(x) #x
#define expand_and_stringify(x) stringify(x)

static bool startsWith(const std::string &s, const std::string &prefix) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, startsWith_

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, startsWith_

Good point.

src/htm/engine/YAMLUtils.cpp Show resolved Hide resolved
/**
* Regpresents a boolean value.
*/
typedef bool Bool;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo above, and do we want a Bool, why not just bool ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to be able to create getPrameterBool( ) with a macro. Otherwise it was getParameterbool() and I had to define a function with that name that called getPrameterBool().

@dkeeney
Copy link
Author

dkeeney commented Sep 7, 2019

I am beginning to wonder if all of this under-the-cover magic is useful. I am doing this in order to build the Spec that each Region object must provide in a static function. This is a hold-over from the OPF framework which called this function to get the info to fill in the GUI all of the options available for configuring the Region. The NetworkAPI engine also used this info in an attempt to be more intelligent about setting up the buffers and doing error checks at configuration time rather than run time.

But I wonder if this is over-kill. What if we ripped out the Spec...what would we loose? What would we gain?

  • When parsing the Yaml parameters we would have to parse to a set of strings rather than native types. i.e. the ValueMap containing parsed parameters would be a map containing values as strings rather than native types. Conversion to native types could be performed by the regions when they are being created. Also the engine would not be able to validate the parameter names for a Region. The Region would have to do that when it is created. Not a problem.
  • The engine would not be able to create all buffers and Links at configuration time. The inputs would not be known to a Region until run time when it can see what it was sent. I don't see why that would not work. The link would be less intelligent. Fan-in and buffer delays would have to be done by the Region at run-time rather than by the link between calls to the Regions. Not difficult. The region would just have to make a few more calls to prepare the inputs.
  • Without the need to generate a Spec the EncoderRegion could just pass the ValueMap on to the encoder and let it turn the strings into native values and complain if something is unexpected.
  • Dimensions would be handled at run-time as well. The dimensions on an Input would be determined solely by what the connected output sent. The Region can determine how to use that info at runtime. Output dimensions would be set and buffers created by the Region based on parameters or whatever criterion it wants to use. This increases the flexibility.

This would mean discarding parts of the API but I am not sure that it would be all the visible to external functions and processes other than that the Spec is no longer available. Each Region might have a little more code but in general the NetworkAPI would be simpler and easier to understand.

I don't know... what do you think.

@dkeeney
Copy link
Author

dkeeney commented Sep 7, 2019

Here is another idea. Rather than re-write NetworkAPI to get rid of the Spec, what if we required each Region to provide just a manually created JSON string that described the region .. and in the case of an EncoderRegion, a JSON string provided by each encoder that described that encoder. This would work for code in C++ and in Python.

The down side is that I would still have trouble generically creating and accessing the parameter structure that we currently use to pass parameters to the C++ encoders. What is passed to Regions is the ValueMap which is basically a **kwargs. What I want to do is pass this on to the encoders. This is what the python encoders are expecting. But the C++ encoders use that parameter structure. That was an elegant solution for C++ apps that can create an instance of the structure and fill in the data. But for generic interfaces that cannot include the .hpp file it becomes a problem.

Let me think about what might need to be the minimum needed in a JSON descriptor string if we did not use a structure for the parameters.

@dkeeney
Copy link
Author

dkeeney commented Sep 7, 2019

Here is what a descriptor string might look like in YAML syntax if one were written for the ScalarEncoder.

  • Each encoder would need something like this.
  • With this the EncoderRegion could create the Spec object that would be available to callers of the API.
  • We could probably remove the 'description:' fields since they would only be viewed by external entities such as OPF.
  • We could probably default the 'default' fields to all zeros.
  • The parameter values could get complicated and the 'type' fields may not be sufficient. See the 'excludes' and 'category' parameters on the SimHashDocumentEncoder. Perhaps those could be passed as JSON strings and parsed within the encoder.
  • The "values" input described near the end defines the type of input data to expect. The "reset" input and the "encoded" output will be the same for all encoders so the EncoderRegion could add those.
  • This approach assumes that the constructor with the parameter structure is replaced with something that accepts something more like a **kwargs.
    name: "ScalarEncoder"
    parameters:
      - name: "minimum"
        description: >- 
          Members 'minimum' and 'maximum' define the range of the input signal.
          These endpoints are inclusive.
        type: NTA_BasicType_Real64
        count: 1                 # 0 means variable, 1 means not an array, 
        default: "0.0"           # a quoted JSON string to define default value.
        required: false          # means parameter must be present if true.
        access: "CreateAccess"   # means part of parameters but otherwise Read-only.
       
      - name: "maximum"
        type: NTA_BasicType_Real64
        count: 1           # 0 means variable, 1 means not an array, 
        default: "0"
        required: false
        access: "CreateAccess"
        
      - name: "clipInput"
        description: >- 
          determines whether to allow input values outside the range [minimum, maximum].
          If true, the input will be clipped into the range [minimum, maximum].
          If false, inputs outside of the range will raise an error.
        type: NTA_BasicType_Bool
        count: 1           # 0 means variable, 1 means not an array, 
        default: "false"   
        required: false
        access: "CreateAccess" 
        
      - name: "periodic"
        type: NTA_BasicType_Bool   
        default: "false" 
        # the rest of the parts of the parameter specification can be the default values.
        
      - name: "category"
        type: NTA_BasicType_Bool   
        default: "false" 
        
      - name: "activeBits"
        type: NTA_BasicType_UInt32   
        default: "0" 
        
      - name: "sparsity"
        type: NTA_BasicType_Real32   
        default: "0.0" 

      - name: "size"
        type: NTA_BasicType_UInt32   
        default: "0" 
        
      - name: "radius"
        type: NTA_BasicType_Real64   
        default: "0.0" 
        
      - name: "resolution"
        type: NTA_BasicType_Real64   
        default: "0.0" 
        
      - name: "sensedValue"
        description: "Scalar input (for use with setParameterReal64()"
        type: NTA_BasicType_Real64
        count: 1               # means a scaler value
        default: "0.0"  
        required: false
        access: "ReadWriteAccess"    # means writeable after creation
        
    inputs:
      - name: "values"
        description:
        type: NTA_BasicType_Real64
        count: 0
        required: false
        
      - name: "reset"
        description:
        type: NTA_BasicType_Bool
        count: 1
        required: false
        
    outputs:
      - name: "encoded"
        description:
        type: NTA_BasicType_SDR

I don't know. This seems like a lot of work just to support the creation of the Spec object.

@dkeeney
Copy link
Author

dkeeney commented Sep 7, 2019

Obviously this is going to take some time to decide how to proceed.

So I am going to put this on hold and take a smaller project; That of changing the Yaml parser so that it does not require the Spec during the parsing. It uses the Spec to determine the type of each field. I can remove the Spec requirement by parsing into a ValueMap object that is a map of strings rather than a map of native type fields. When a field is extracted from the map it will convert it to whatever type the caller needs.

@dkeeney
Copy link
Author

dkeeney commented Sep 10, 2019

I created an issue (#664 ) regarding the ValueMap and the Spec so we can talk about this one part.

@dkeeney
Copy link
Author

dkeeney commented Nov 12, 2019

So, I will be making a Region for each encoder rather than a universal EncoderRegion.

This is what the Spec would look like for ScalarEncoder if I use the yaml parser and took advantage of the default fields, did not show the description fields, and tightened up the syntax a little: (Shown here in JSON syntax).

{name: "ScalarEncoder",
 parameters:{
    minimum:     {type: Real64, default: "0.0"},
    maximum:     {type: Real64, default: "0"},
    clipInput:   {type: Bool,   default: "false"},
    periodic:    {type: Bool,   default: "false"},
    category:    {type: Bool,   default: "false"},
    activeBits:  {type: UInt32, default: "0"},
    sparsity:    {type: Real32, default: "0.0"},
    size:        {type: UInt32, default: "0"},
    radius:      {type: Real64, default: "0.0"},
    resolution:  {type: Real64, default: "0.0"},
    sensedValue: {type: Real64, default: "0.0",
                  description: "Scalar input (for use with setParameterReal64()",
                  count: 1, required: false, access: ReadWrite}},
 inputs: {
    values: { type: Real64 },
    reset:  { type: Bool, count: 1, required: false}},
 outputs: {
    encoded: {type: SDR }}}

Should I go with this approach?

@breznak
Copy link
Member

breznak commented Nov 12, 2019

So, I will be making a Region for each encoder rather than a universal EncoderRegion.

thanks David for continuing the work on this! Although at first I was advocating for a Universal Enc region, I think for now the 1:1 approach will be better 👍

inputs: {
values: { type: Real64 },
reset: { type: Bool, count: 1, required: false}},

not sure if Encoder handles reset, and by this syntax, values should be count: 1, required: true ..?

PS: Scalar has some nice properties etc, but for real-world inputs we'll use RDSE, so maybe start with that one first? (And we might not need ScalarE region at all?)

@dkeeney
Copy link
Author

dkeeney commented Nov 12, 2019

values should be count: 1

You are correct. Each iteration only uses 1 Real64 value.
The 'values' input would not be required because an app could also poke the sensedValue field each iteration rather than use an input stream from something like VectorFileSensor Region.

Along this topic... I have been thinking that a small Region that takes a stream input and outputs as a link would be handy.

@dkeeney
Copy link
Author

dkeeney commented Feb 20, 2020

Lets close this because I think each encoder will be its own region.

@dkeeney dkeeney closed this Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants