Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Make TICKscript branch points more readable #299

Closed
nathanielc opened this issue Mar 9, 2016 · 27 comments
Closed

[Proposal] Make TICKscript branch points more readable #299

nathanielc opened this issue Mar 9, 2016 · 27 comments

Comments

@nathanielc
Copy link
Contributor

Since TICKscript ignores whitespace it is possible to define a TICKscript that is really hard to read since it is not clear when a new node is being created vs a property is being set on a node. Example:

stream.from()
.groupBy('service')
.alert()
.id('kapacitor/{{ index .Tags "service" }}')
.message('{{ .ID }} is {{ .Level }} value:{{ index .Fields "value" }}')
.info(lambda: "value" > 10)
.warn(lambda: "value" > 20)
.crit(lambda: "value" > 30)
.post("http://example.com/api/alert")
.post("http://another.example.com/api/alert")
.email().to('[email protected]')

A possible solution is to use a different operator for what the docs call property methods and chaining methods, where a property method modifies a node and a chaining method creates a new node in the pipeline. Using the example above and not changing whitespace.

stream->from()
.groupBy('service')
->alert()
.id('kapacitor/{{ index .Tags "service" }}')
.message('{{ .ID }} is {{ .Level }} value:{{ index .Fields "value" }}')
.info(lambda: "value" > 10)
.warn(lambda: "value" > 20)
.crit(lambda: "value" > 30)
.post("http://example.com/api/alert")
.post("http://another.example.com/api/alert")
.email().to('[email protected]')

Or another example with more chaining methods:

stream
->from()
.where(lambda: ...)
.groupBy(...)
->window()
.period(10s)
.every(10s)
->mapReduce(influxql.count('value')).as('value')
->alert()

Or even an example where it is necessary to disambiguate between a property and chaining method.

batch->query('SELECT mean(used_percent) FROM "telegraf"."default"."disk"')
      .period(10s)
      .every(10s)
      .groupBy('host','path') // We want to compute the mean by host and path
    ->groupBy() // But then to we want to compute the top of all groups so we need to change the groupBy. Without a different operator or a node between these steps it is impossible.
    ->top(2, 'mean')
    ->influxDBOut()
      .database('mean_output')
      .measurement('avg_disk')
      .retentionPolicy('default')
      .flushInterval(1s)
      .precision('s')

Questions:

  • Does using a different operator make writing a TICKscript overly complex? You will not be able to define a the task until you have used the correct operator for chaining vs property methods. You will have to learn via trial and error as well as consulting docs.
  • Is -> a good operator? Would | or something else read better?
stream
|from()
.where(lambda: ...)
.groupBy(...)
|window()
.period(10s)
.every(10s)
|mapReduce(influxql.count('value')).as('value')
|alert()

Using whitespace to further improve readability

stream
    |from()
        .where(lambda: ...)
        .groupBy(...)
    |window()
        .period(10s)
        .every(10s)
    |mapReduce(influxql.count('value')).as('value')
    |alert()
@mark-rushakoff
Copy link
Contributor

What if there was a tick fmt command analogous to go fmt that just automatically formats your tick script to the "correct" levels of indentation?

@nathanielc
Copy link
Contributor Author

@mark-rushakoff I think that is a great idea. Not sure how the parser would do it without a different operator to help it along. Unless fmt was aware of the more than just the language EBNF spec but also all available methods.

@rossmcdonald
Copy link
Contributor

I like @mark-rushakoff's idea of a formatting tool that helps with indentation (and maybe highlighting), rather than modifying the syntax of the TICK scripts themselves.

I'm worried it will be more confusing to add more operators to the syntax, since now instead of having to remember the name of the method I want to use, I have to remember the name and the the type of method it is. Also, based on the examples, it doesn't really improve readability of the scripts (though that may just be me).

@yosiat
Copy link
Contributor

yosiat commented Mar 11, 2016

I find the operators of "|" or "->" really hard to understand because they show a sign of semantic meaning and it's much more complex.

I do like @mark-rushakoff idea of "tick fmt" and I have an idea to improve the readability,
currently each node accepts it's "configuration" by method chain which makes long chain, for example:
window.every(1m).period(5m)
alert().id(...).message(...)

I think those configuration should be passed to the node, something like this:
alert(id: "..", message: "...")
window(every: .., period: ..)

otherwise the window and it's config nodes (every, period) might look disconnected.

@nathanielc
Copy link
Contributor Author

@yosiat Something like this?

stream
.from(measurement: 'mymeasurement')
.groupBy(dimensions: 'service')
alert(
 id: 'kapacitor/{{ index .Tags "service" }}',
 message: '{{ .ID }} is {{ .Level }} value:{{ index .Fields "value" }}',
 info: lambda: "value" > 10,
 warn: lambda: "value" > 20,
 crit: lambda: "value" > 30, 
 post: "http://example.com/api/alert".
 post: "http://another.example.com/api/alert",
 email: to: '[email protected]'
)

It works pretty well, how would you handle the .email().to(...) case? Since that is multiply chained properties?

As for the tick fmt idea I like it but it can't be done unless the format engine is made aware of all the available methods and their types, essentially making each method name a keyword in the language. The go fmt tool is able to format code based off the syntax alone, it doesn't care what you named things. To me its an indication of a problem that the code can't be formatted based on syntax alone.

@yosiat
Copy link
Contributor

yosiat commented Mar 11, 2016

Umm.. not exactly..
I really like the chaining of severities and reporters (email/post), I think about something like this:

stream
 from(measurement: 'mymeasurement').
 groupBy('service'). // I ommited the "dimensions" since groupBy have only one argument
  alert(
   id: 'kapacitor/{{ index .Tags "service" }}',
   message: '{{ .ID }} is {{ .Level }} value:{{ index .Fields "value" }}',
  ).
 info(lambda: "value" > 10).
 warn(lambda: "value" > 20).
 crit(lambda: "value" > 30).
 post("http://example.com/api/alert").
 email(to: '[email protected]')
stream.
  from(measurement: 'x').
  influxDbOut(database: 'yosi', retentionPolciy: 'attias')

By the way - which language you are using in github syntax highlighting for the tick scripts?

@nathanielc
Copy link
Contributor Author

By the way - which language you are using in github syntax highlighting for the tick scripts?

javascript

@nathanielc
Copy link
Contributor Author

@yosiat What is the difference between info and id? How would a user know when it belongs in the call to the node and when it doesn't? Seems like mixing it makes the problem worse?

@yosiat
Copy link
Contributor

yosiat commented Mar 11, 2016

I think they differ, because in my head the "alert" and "info" are different "logics" -
So the "id" and "message" configures something generic to alert while "info" / "post" enriches the "alert" logic.

But yes, you are right this can confuse but I think this can be solved with a right tooling - syntax highlighting, fmt (and maybe define some superset nodes - like "alert"/ "influxDbout" and increase the indentation of their leaf nodes?) and docs.

And I don't know how much hard is it to implement, but adding explanative errors like:
"You did alert(info: ...), did you mean: alert().info()"

@gunnaraasen
Copy link
Contributor

How about treating extra parentheses as the equivalent to whitespace to allow disambiguation of chaining and property methods without messing around with the core syntax.

stream
.from(measurement: 'mymeasurement')
.groupBy(dimensions: 'service')
.alert()(
    .id('kapacitor/{{ index .Tags "service" }}')
    .message('{{ .ID }} is {{ .Level }} value:{{ index .Fields "value" }}')
    .info(lambda: "value" > 10)
    .warn(lambda: "value" > 20)
    .crit(lambda: "value" > 30)
    .post("http://example.com/api/alert")
    .post("http://another.example.com/api/alert")
    .email()(
        .to('[email protected]')
    )
)

The above would be equivalent to the original TICKscript (and would parse exactly the same when the extra parentheses are stripped):

stream
.from(measurement: 'mymeasurement')
.groupBy(dimensions: 'service')
.alert()
.id('kapacitor/{{ index .Tags "service" }}')
.message('{{ .ID }} is {{ .Level }} value:{{ index .Fields "value" }}')
.info(lambda: "value" > 10)
.warn(lambda: "value" > 20)
.crit(lambda: "value" > 30)
.post("http://example.com/api/alert")
.post("http://another.example.com/api/alert")
.email().to('[email protected]')

All existing TICKscripts would not be broken. Characters other than parentheses could be supported, like semicolons, to make it extra clear what is parsed as whitespace. The only problem is that new users will still need to look up which methods are chaining vs property. Maybe a fmt tool or editor plugin could automatically add the extra parentheses/indentation when it sees a chaining method.

A more involved change would be to make the parentheses on the chaining methods become the wrapper, e.g. .alert(...) instead of .alert()(...).

@nathanielc
Copy link
Contributor Author

In my mind the options boil down to these:

  • We force the distinction with syntax. Which syntax we can figure out later.
  • We do not force the distinction but have flexible syntax that makes it easy to distinguish. I think whitespace is sufficient here.

If we force the syntax then we gain/loose a few things:

  • TICKscripts are more readable.
  • It is now trivial to create a fmt command to help out with even more readable scripts.
  • The user is always aware of which methods chain and which do not.
  • Writing TICKscripts is harder since the user needs to know more before being able to write correct syntax.
  • We have to migrate. It will likely be possible to keep both old and new syntax for a release and issue a warning for old syntax to ease the migration.

If we don't force it:

  • We have the possibility of writing unreadable TICKscripts
  • Writing a fmt tool is possible but difficult.
  • The user may be confused about where a branch is made in the script.

To me the decision boils down to: Do we think that forcing the syntax will also enable the dissemination of knowledge about the types of methods such that writing a TICKscript is easy again?

Imagine seeing this for the first time:

stream
    |from()
        .where(lambda: ...)
        .groupBy(...)
    |window()
        .period(10s)
        .every(10s)
    |mapReduce(influxql.count('value')).as('value')
    |alert()

You would see different operators and ask yourself why they are important. I think you could easily conclude without consulting any documentation what they mean and what purpose they serve. I think a language should make the author think about the important pieces to a problem and not the insignificant details. Thinking about where the branch points are in a task is important.

@nathanielc
Copy link
Contributor Author

This is a proposal to make a rather big change to TICKscripts. Pinging you guys since you have been quite active on Kapacitor. Thoughts feedback welcome. @jonseymour @ericiles @md14454 @sstarcher @adrianlzt @AlexGrs @rubycut @alprs @jcmartins @wutaizeng @ralidousti Thanks

@rubycut
Copy link

rubycut commented Mar 21, 2016

Hi,

I am coming from ruby background, for me, current TICK script is pure poetry, I like it as it is.

You have to have basic understanding of nodes to be able to write or understand any tick script.

Once you have basic understanding, it is easy to both read and write tick script.

I mean, I was working with Riemann, clojure script is worst than hell.

It is true that by putting separate operator for nodes, it might be slightly harder to write and slightly easier to read.

If readability is only reason that you want to make this change, then I would rather leave it as it is.

I mean people mostly write scripts for themselves, and it is their responsibility to to write it properly. If you are explaining something in blog or documentation, then you can indent it properly for easier understanding.

Python forces you to write code in a certain format, I think that sucks.

But even if you decide to add separate operator, I don't think it's deal-breaker, it just might slow down writing of scripts, one more thing to remember. Writing scripts is faster if you have one operator than if you have two.

@ericiles
Copy link
Contributor

@nathanielc If you are talking about going the syntax route purely for readability, I am not sure a whole lot needs changing. Everything is going to require some level of learning, and I think TICKscript is about as easy as it gets. However, if the need arises for more structure to allow more advanced functionality, I would likely throw my hat into the JSON arena. It is widely used across most languages, and there are plenty of tools to help those that might not know the specific formatting/syntax required to build it. One example would be http://jsoneditoronline.org/.

If we are going to require users to learn a syntax, at least this route would provide a solution that I would figure a decent percentage of users might already know, or at least have some experience working with.

JSON Example:

{
  "stream": {
    "from": {
      "where": {
        "lambda": "..."
      },
      "groupBy": [
        "value1",
        "value2"
      ]
    },
    "window": {
      "period": "10s",
      "every": "10s"
    },
    "mapReduce": {
      "operation": "influxql.count('value')",
      "as": "value"
    },
    "alert": {
      "id": "kapacitor/{{ index .Tags \"service\" }}",
      "message": "{{ .ID }} is {{ .Level }} value:{{ index .Fields \"value\" }}",
      "levels": {
        "info": "value > 10",
        "warn": "value > 20",
        "crit": "value > 40"
      },
      "actions": {
        "post": [
          "http://example.com/api/alert",
          "http://another.example.com/api/alert"
        ],
        "email": {
          "to": "[email protected]"
        }
      }
    }
  }
}

@sstarcher
Copy link
Contributor

@nathanielc We have several discussions already started up so I'll try to keep it brief.

  • Yes to improving readability by forcing a syntax distinction and to supporting a formatter. Code is read more often then written. As for the exact syntax I would have to put more thought into it.
  • @ericiles JSON is fine for computers, but terrible for humans. For the love of god don't do this to me.

@jonseymour
Copy link
Contributor

I like the idea of using syntax to distinguish between properties and chaining - I found TICKScripts hard to read precisely because this distinction is lacking. Formatting of the examples certainly helped, but I was always left wondering how it worked and where the node boundaries really were. With a syntax distinction, it would be far easier to parse where each node begins and would, I think, help readers more quickly develop the conceptual models required to read and write such scripts. The fact that it enables parsers external to the interpreter to correctly parse such scripts can only be a good thing, IMO.

@nathanielc
Copy link
Contributor Author

Thanks everyone for the feedback. What I am hearing is some people are in favor of the change. Some don't have a strong preference and a few people feel like its a bad idea. To me that is weighted towards making the change which is how I am leaning as well.

I think forcing the distinction is going to be better in the long run for three specific reasons:

  1. Disambiguation of using a same named property method vs chaining method. i.e. gorupBy and log was also introduced recently too. This is only going to get worse as more features are added to Kapacitor.
  2. Readability for new users,
    "I found TICKScripts hard to read precisely because this distinction is lacking"
    I have heard that from several people now and seen it in interactions with other confused users in the community.
  3. We can write and use a tickfmt tool to define a standard formatting.

Now to chose an operator...
I wrote a bunch of scripts out using pretty much every special char in a standard US keyboard. My favorites are: +,|,@. In that order:

stream
    +from()
        .where(lambda: ...)
        .groupBy(...)
    +window()
        .period(10s)
        .every(10s)
    +mapReduce(influxql.count('value')).as('value')
    +alert()

stream
    |from()
        .where(lambda: ...)
        .groupBy(...)
    |window()
        .period(10s)
        .every(10s)
    |mapReduce(influxql.count('value')).as('value')
    |alert()

stream
    @from()
        .where(lambda: ...)
        .groupBy(...)
    @window()
        .period(10s)
        .every(10s)
    @mapReduce(influxql.count('value')).as('value')
    @alert()
  • + conveys the addition of something new, like a new node
  • | conveys linking things together like a standard UNIX pipe
  • @ doesn't convey any special meaning, which means we can give it it's own meaning.

The specific operator can easily be changed right up until we merge the change ;) So I am going to get started implementing this change now.

@jonseymour
Copy link
Contributor

@nathanielc Can you explain why you want to avoid the analogy with unix pipes? I would have thought the analogy is a reasonable one, but then I haven't actually written any TICKScripts in anger yet (I am currently knee deep in ansible fixing my docker infrastructure so that I can easily run Kapacitor inside a docker container !!)

@nathanielc
Copy link
Contributor Author

Sorry that wasn't clear, I like the analogy to Unix pipes, that's why I
included it. I would be fine with any of those three operators.
On Mar 22, 2016 5:50 PM, "Jon Seymour" [email protected] wrote:

@nathanielc https://github.com/nathanielc Can you explain why you want
to avoid the analogy with unix pipes? I would have thought the analogy is a
reasonable one, but then I haven't actually written any TICKScripts in
anger yet (I am currently knee deep in ansible fixing my docker
infrastructure so that I can easily run Kapacitor inside a docker container
!!)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#299 (comment)

@rubycut
Copy link

rubycut commented Mar 23, 2016

+1 for +

@rossmcdonald
Copy link
Contributor

@nathanielc My vote would also be for +. I like all of them, but I think the 'adding a new node' analogy makes a lot of sense and makes the script more readable in that context.

@jonseymour
Copy link
Contributor

I'll record my vote for |, since I think the there is a good analogy with unix pipes and + tends to have other meanings unrelated to pipe-like behaviour.

@nathanielc
Copy link
Contributor Author

I like + too but I think it will cause problems later on...

var a = 42
var b = a + c()

Does that mean add the value of a to the result of c()? No, it means call the chain method c on the value 42 which is an error.

Currently the above syntax is invalid and with the change it would become valid syntax and then fail to evaluate since 42 does not have the method c.

Also it would rule out the possibility of using the above syntax later for the actual addition case.

I think we need to use something else besides +, my second choice is |.

Here its obvious that | doesn't make sense

var a = 42
var b = a | c()

but here is still does

var a = stream|from()
var b = a | c()

@rossmcdonald
Copy link
Contributor

Ah, I didn't consider the implications to arithmetic. The + symbol will definitely be more confusing then. I agree that | is probably the better option.

@sstarcher
Copy link
Contributor

Of those 3 my vote would be |

@md14454
Copy link
Contributor

md14454 commented Mar 23, 2016

Sorry I'm a little late to the party.

I understand the need for additional markup and I think it's definitely a good idea that the write of the scripts should understand the distinction of nodes vs properties.

Out of the options presented I like the | character.

Having said that I think there is merit in the JSON style. However, as others have noted it is particularly readable by humans. I wonder whether a markup that is YAML like may help?

stream
    -from
        -where lambda: ...
        -groupBy ...
    -window
        -period 10s
        -every 10s
    -mapReduce
        -influxql
            -count 'value'
        -as value
    -alert

I think that this is then readable both by humans and machines and allows for linters and formatters to be created (reused). This approach also allows for clear hierarchy of the script such that keywords could be reused without conflicting with each other. Each line is of the format; ident, keyword, parameters. It is worth noting that I do have a slight bias in that I tend to code in Python.

@nathanielc
Copy link
Contributor Author

Thanks everyone for the input! I have a PR up #380 that has that makes the change to using the | operator.

While making the change I noticed it would be trivial to add another operator for UDFs as opposed to built-in methods. Something like:

stream
    |from()
        .where(lambda: ...)
        .groupBy(...)
    |window()
        .period(10s)
        .every(10s)
    @customUDF()
    |alert()

This way when you share a TICKscript that uses a UDF it is obvious that it depends on a UDF to be defined in order to work. By using yet another operator it adds clarity to the reader and the writer should already be aware that a UDF is being used since it had to be explicitly configured.

I am going to make the UDF change as well unless people strongly disagree.

nathanielc referenced this issue Mar 28, 2016
Adds a new '|' operator for chaining methods in TICKscript
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants