Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for protected types in JSONToTuple #12

Closed
rohitsw opened this issue Apr 17, 2014 · 15 comments
Closed

Add support for protected types in JSONToTuple #12

rohitsw opened this issue Apr 17, 2014 · 15 comments

Comments

@rohitsw
Copy link
Member

rohitsw commented Apr 17, 2014

For examples this is a valid json
{ "rstring" : "mystring", "type" : "mytype" }

However there is no tuple structure that will allow for this json to be mapped since rstring and type are SPL protected names that cannot be used as attribute names in tuples.
We need a way to have these type of attributes accessible.

@hildrum
Copy link
Contributor

hildrum commented May 11, 2014

FWIW, I did this for the text toolkit.

@rohitsw
Copy link
Member Author

rohitsw commented May 12, 2014

@hildrum how did you implement it?

@hildrum
Copy link
Contributor

hildrum commented May 12, 2014

It looks like it's not a documented feature. Oops, gotta update that.

I picked a quotePrefix protectreserved_. (This is clunky, but it was an unlikely case.) If you want to take something from the SystemT to a streams tuple, and the attribute in SystemT has a name that's a reserved word like "rstring", then you use a streams attribute named "protectreserved_rstring", and the operator knows to strip off the "protectreserved_" when determining how to populate the streams attribute. It only strips off one, I believe, so protectreserved_ could be used to protect an attribute beginning with protectreserved_, though I didn't test that.

@rohitsw
Copy link
Member Author

rohitsw commented May 12, 2014

ok, i had something similar in the earlier version of the JSON operators. However I dont find that to be an elegant solution. I was thinking that the operator could accept parameters that could be in key=value format. Here key would be the JSON attribute name and the value would be its mapping to the corresponding tuple type. This should give more flexibility in attribute naming.

e.g.
attributeMap : "rstring=myrstring";

Thoughts?

@hildrum
Copy link
Contributor

hildrum commented May 12, 2014

I think that's a good, more general, solution. We have something similar proposed for HBASE:
IBMStreams/streamsx.hbase#23 (comment)

@ulemanstreaming
Copy link

The attributeMap parameter approach sounds good. A related but slightly different mechanism is used in the Mining toolkit: Not a single parameter whose value is a map, but one operator parameter for each model parameter. I can't fully articulate the pros and cons of the two approaches, but I think the single parameter with a map value (the one proposed here) is cleaner. There's something weird about parameters that are not predefined in the operator model.

Last I checked, this feature is still missing. Is this likely to move forward? Who is maintaining this toolkit now?

@hildrum
Copy link
Contributor

hildrum commented Sep 1, 2015

I'm the one maintaining the toolkit. But since this issue hadn't been commented on for a year, I didn't think there was anyone who could use this feature. Is this something you need?

@ulemanstreaming
Copy link

Recently I was building a demo using FAA airport data, which included an element called "type". I found myself having to massage the JSON string before converting it:

      stream<Records> FixedUpJSONRecords as O = Functor(JSONRecords as I)
      {
         output O:
            record = regexReplace(I.record, '"type":', '"delayType":', false);
      }

That works, but has workaround written all over it. I had used an earlier version of the toolkit back in 2013 and was wondering what had happened to the protected-prefix parameter I remembered seeing (but did not need at the time). To be honest, I failed to find this discussion thread.

So "need" is relative, but I've only used JSON parsing twice and needed something like it 50% of the time. (And no, I'm not taking that statistic seriously.) I'd say, it's a very nice-to-have.

@hildrum
Copy link
Contributor

hildrum commented Sep 3, 2015

I'll try to get the feature in sometime soon, then. I think the way it'd work is that you'd specify
attributeMap: "streamstype=jsontype"; (this is the reverse of what rohitsw proposed above) ie, in your example:
attributeMap: "delayType=type";

@ulemanstreaming
Copy link

+1 for adding this feature.

Just wondering whether there is a best-practice way of implementing a map-like parameter. Here it seems that you're thinking of a string containing a list of expressions that you have to custom-parse. Would an SPL map literal make sense?
{delayType : "type", ...} , if that's even possible, or {"delayType" : "type", ...} ?

I did not do an exhaustive search of other operators but maybe the compiler team has suggestions.

@hildrum
Copy link
Contributor

hildrum commented Sep 9, 2015

What's best practice depends on whether you're talking about a Java primitive operator or a C++ primitive operator. Java operators don't support parameters that are maps or tuples, so we end up having to work around that with strings.

If this were a C++ primitive operator, we could use maps, but I'd probably take a different approach and use custom output functions, so it'd be something like

output O:
   delayType = getField("type");

@ulemanstreaming
Copy link

Makes sense. Thanks.

@jchailloux
Copy link

I had this issue with a JSON file that contains {type:"xxx" and timestamp:"1234567"}.
The workaround I found was to have string _type and _timestamp.

Then having to tweak the java code to check the _type and _timestamp

private Map<String, Object> jsonToAtributeMap(JSONObject jbase, StreamSchema schema) throws Exception {
    Map<String, Object> attrmap = new HashMap<String, Object>();
    for(Attribute attr : schema) {
        String name = attr.getName();
        boolean underscore=false;
           if(name.startsWith("_type")||name.startsWith("_timestamp")){
            underscore=true;
               name=name.substring(1);
           }
        try {
            if(l.isLoggable(TraceLevel.DEBUG)) {
                l.log(TraceLevel.DEBUG, "Checking for: " + name);
            }
            Object childobj = jbase.get(name);
            if(childobj==null) {
                if(l.isLoggable(TraceLevel.DEBUG)) {
                    l.log(TraceLevel.DEBUG, "Not Found: " + name);
                }
                continue;
            }
            Object obj = jsonToAttribute(name, attr.getType(), childobj, null);
            if(obj!=null){
                attrmap.put((underscore?"_"+name:name), obj);
            }
        }catch(Exception e) {
            l.log(TraceLevel.ERROR, "Error converting object: " + name, e);
            throw e;
        }

    }
    return attrmap;
}

@markheger
Copy link
Member

Operators should handle this similar to the XML parse operator with the ignorePrefix parameter

schubon pushed a commit that referenced this issue Sep 4, 2017
support JSON with reserved SPL keywords: issue #12 and #72
@schubon
Copy link
Member

schubon commented Sep 7, 2017

Closing after merge of Mark's changes.

@schubon schubon closed this as completed Sep 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants