-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Second attempt at logging refactor #1535
Conversation
35513ea
to
0e50102
Compare
245be10
to
5fb3f06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed about half of the changes.
I have a crazy idea. Should we just drop zap and write our own tiny log package? At this point we have reduced the responsibility of logging down to just formatting data, which we had to write our own logfmt implementation anyways. How much more work is it to just consume the keyvalue.T type and directly write data in logfmt at this point? We could make the package internal to the diagnostic service so its clear its not general purpose.
derivative.go
Outdated
@@ -125,7 +126,7 @@ func (n *DerivativeNode) derivative(prev, curr models.Fields, prevTime, currTime | |||
f1, ok := numToFloat(curr[n.d.Field]) | |||
if !ok { | |||
n.incrementErrorCount() | |||
n.logger.Printf("E! cannot apply derivative to type %T", curr[n.d.Field]) | |||
n.diag.Error("cannot perform derivative", fmt.Errorf("field %s is the wrong %T", n.d.Field, curr[n.d.Field])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one should read:
n.diag.Error("cannot perform derivative", errors.New("field is the wrong type") keyvalue.KV("field",n.d.Field), keyvalue.KV("type", reflect.TypeOf(curr[n.d.Field]).String()))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reflect.TypeOf(curr[n.d.Field]).String()
panics when curr[n.d.Field]
is nil
https://play.golang.org/p/gjlr3kuusL
Went with
keyvalue.KV("type", fmt.Sprintf("%T", curr[n.d.Field]))
instead
derivative.go
Outdated
@@ -140,7 +141,7 @@ func (n *DerivativeNode) derivative(prev, curr models.Fields, prevTime, currTime | |||
elapsed := float64(currTime.Sub(prevTime)) | |||
if elapsed == 0 { | |||
n.incrementErrorCount() | |||
n.logger.Printf("E! cannot perform derivative elapsed time was 0") | |||
n.diag.Error("cannot perform derivative", errors.New("elaspsed time was 0")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the n.diag.Error() method call n.incrementErrorCount() internally? That would remove the onus on each node to correctly count their errors and rather only need to report them to the Diagnostic interface. This probably means you need a small shim implementation of the interface within the kapacitor package to intercept Error and With* calls.
Does that sound reasonable and useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about doing something like that, but I wasn't sure exactly the best way to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
log.go
Outdated
return batch, nil | ||
} | ||
n.logger.Println(n.key, n.buf.String()) | ||
// TODO: fix prefix and other loger here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO and others in the file complete?
node.go
Outdated
@@ -132,7 +153,8 @@ func (n *node) start(snapshot []byte) { | |||
err = fmt.Errorf("%s: Trace:%s", r, string(trace[:n])) | |||
} | |||
n.abortParentEdges() | |||
n.logger.Println("E!", err) | |||
n.diag.Error("encountered panic", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not always true that a panic was encountered here. This error message should be something like node failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
server/config.go
Outdated
Task task_store.Config `toml:"task"` | ||
InfluxDB []influxdb.Config `toml:"influxdb" override:"influxdb,element-key=name"` | ||
// TODO(desa): this should change to diagnostic eventually | ||
Logging diagnostic.Config `toml:"logging"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config is about logging, but logging is implemented through diagnostics so I think this name is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
services/diagnostic/handlers.go
Outdated
} | ||
|
||
// TODO: deal with prefix and level | ||
func (h *KapacitorHandler) LogBatchData(level, prefix string, batch edge.BufferedBatchMessage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think prefix can just be a String field. The meaning prefix will be wrong at that point, but the intent is to make the logs unique which a field will do fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nathanielc any idea what the field should be named?
services/diagnostic/handlers.go
Outdated
} | ||
|
||
func (h *ReplayHandler) Error(msg string, err error, ctx ...keyvalue.T) { | ||
if len(ctx) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again repeated code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup. don't know how I missed this.
services/diagnostic/handlers.go
Outdated
} | ||
|
||
func (h *EdgeHandler) Collect(mtype edge.MessageType) { | ||
h.l.Debug("collected message", zap.Stringer("message_type", mtype)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we are logging every message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comes from https://github.com/influxdata/kapacitor/blob/md-log-v2/edge/log.go#L24-L35 and I don't think the LogEdge
is ever used.
services/diagnostic/handlers.go
Outdated
h.l.Debug("emitted message", zap.Stringer("message_type", mtype)) | ||
} | ||
|
||
type LogLevel int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does these need to be exported types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope. No reason they need to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed now.
services/diagnostic/handlers.go
Outdated
h.l.Debug(string(buf)) | ||
case LLError: | ||
h.l.Error(string(buf)) | ||
case LLFatal: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need Fatal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope.
Don't think it would be too much work, but I could be off. Currently the |
@desa If the types of keyvalue.T are strings doesn't that work well for logging, since everything will be encoded as text to be able to log it? |
1320843
to
3d88527
Compare
@nathanielc I think I've addressed all of the issues from the review so far. |
@nathanielc this should be in a ready for another review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few small suggestions , then I think its ready to go.
services/diagnostic/handlers.go
Outdated
var el klog.Field | ||
switch v.(type) { | ||
case int: | ||
el = klog.Int(k, v.(int)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fields are never int
they are always int64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
services/diagnostic/handlers.go
Outdated
ts := []klog.Field{} | ||
for k, v := range tags { | ||
var el klog.Field | ||
switch v.(type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type switches in Go can create an autocasted identifier
switch t := v.(type) {
case int64:
el = klog.Int64(k, t)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that 👍 .
Fixed.
services/diagnostic/handlers.go
Outdated
case bool: | ||
el = klog.Bool(k, v.(bool)) | ||
default: | ||
el = klog.String(k, fmt.Sprintf("%v", v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious why there are fields implementations for each type instead of simply using fmt.Sprintf("%v",v) for everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I had already needed the other types elsewhere and I was trying to be somewhat performance conscious. In the past whenever I've used fmt.Sprintf
a ton I've had a very sad CPU.
@nathanielc I think it's ready to merge on your signal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read through your TODOs, and may some comments.
services/httpd/logger.go
Outdated
// // response time, report in microseconds because this is consistent | ||
// // with apache's %D parameter in mod_log_config | ||
// strconv.FormatInt(time.Since(start).Nanoseconds()/1000, 10), | ||
//} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
services/httpd/logger.go
Outdated
// // response time, report in microseconds because this is consistent | ||
// // with apache's %D parameter in mod_log_config | ||
// strconv.FormatInt(time.Since(start).Nanoseconds()/1000, 10), | ||
//} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
services/influxdb/service.go
Outdated
@@ -1225,15 +1232,15 @@ func (c *influxdbCluster) startUDPListener(se subEntry, port string) (*net.UDPAd | |||
conf.Buffer = c.udpBuffer | |||
conf.ReadBuffer = c.udpReadBuffer | |||
|
|||
l := c.LogService.NewLogger(fmt.Sprintf("[udp:%s.%s] ", se.db, se.rp), log.LstdFlags) | |||
service := udp.NewService(conf, l) | |||
d := c.diag.WithUDPContext(fmt.Sprintf("udp:%s.%s", se.db, se.rp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have the WithUDPContext accept two string arguments for db and rp? Then the context is correctly tagged instead of having a composite string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea.
services/influxdb/service.go
Outdated
service.PointsWriter = c.PointsWriter | ||
err := service.Open() | ||
if err != nil { | ||
return nil, err | ||
} | ||
c.services[se] = service | ||
c.logger.Println("I! started UDP listener for", se.db, se.rp) | ||
c.diag.StartedUDPListener(fmt.Sprintf("%s.%s", se.db, se.rp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again here, why not pass the db and rp strings to the method directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
task.go
Outdated
@@ -551,7 +554,8 @@ func (et *ExecutingTask) runSnapshotter() { | |||
case <-ticker.C: | |||
snapshot, err := et.Snapshot() | |||
if err != nil { | |||
et.logger.Println("E! failed to snapshot task", et.Task.ID, err) | |||
// TODO(desa): I think this might already have the task id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the result of this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's set.
task.go
Outdated
@@ -562,7 +566,8 @@ func (et *ExecutingTask) runSnapshotter() { | |||
if size > 0 { | |||
err = et.tm.TaskStore.SaveSnapshot(et.Task.ID, snapshot) | |||
if err != nil { | |||
et.logger.Println("E! failed to save task snapshot", et.Task.ID, err) | |||
// TODO(desa): I think this might already have the task id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's set.
udf.go
Outdated
@@ -266,7 +265,10 @@ func (p *UDFProcess) logStdErr() { | |||
defer p.logStdErrGroup.Done() | |||
scanner := bufio.NewScanner(p.stderr) | |||
for scanner.Scan() { | |||
p.logger.Println("I!P", scanner.Text()) | |||
// TODO(desa): fix the issue here. issue comes from loadUDFInfo in services/udf/service.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
services/udf/service.go
Outdated
|
||
// TODO(desa): that okay? Nil check against this happens in udf.go logStdErr() | ||
// alternative would be to have a mock implementation that gets passed in here | ||
u, err := s.Create(name, "", "", nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is okay, its possible valuable information is lost if we do not provide a diagnostic here. Looking at it the simplest approach would be to add a WithUDFDiagnotic method to the Diagnostic interface within this package. The udf.Diagnostic interface is quite simple so its shouldn't be hard to accommodate. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebase and squash and this is good to go!
vendor zap for logging refactor Change name of diagnostic handlers file Add diagnostics for slack service Use keyvalue instead of map[string]string for ctx Add diagnostic to task_store service Add diagnostics to reporting handler Add diagnostics to storage service Fix wrong service type Add diagnostic to http service Add diagnostics to alerta service Add diagnostic to kapacitor package SQUASH ME: figure out how to deal with UDF node Add diagnostic to alert service Add hipchat diagnostic Add diagnostic for udf and udf service Add diagnostic to pagerduty Add diagnostic to SMTP service Add diagnostic to opsgenie service Add diagnostic to pushover service Add diagnostic to httppost service Add diagnostic to sensu service Add diagnostic to snmp service Add diagnostic to telegram service Add diagnostic to mqtt service Add diagnostic to talk handler Add diagnostics to config override service Remove logger from test service I'm not sure if this is what we want to do Add diagnostic to server package Not totally sold on the interface, for server, but it felt okay for this package for some reason Add diagnostic to replay service Add diagnostic to kubernetes service Add diagnostic to swarm service Add diagnostic to deadman service Add diagnostic to noauth service Add diagnostic to stats service Add diagnostic to UDP service Add diagnostic to InfluxDB service Add diagnostic to scraper services Add diagnostic to edge logger Add static level diagnostic handler Remove log service from server package Remove log service from kapacitord Change diagnostic Service to struct SQUASH ME Address various TODO items Address various TODOs Remove logging service Change from interfaces to explicit types on service Add time configuration to to zap Use ioutil.Discard in tests Set default level to debug Remove commented out code Address TODOs in diagnostic package Prevent newlines from scaper service DRY up common code Fix level logger Add log level setting from CLI Fix log data node Address a few of the changes suggested in the PR Fix addition comments from PR Fix log node level and prefix Wrap NodeDiagnostic to increment error count Clarify comment in Error function Fix derivative error function Fix alerta diagnostic First pass: impelment basic logfmt logger Remove zap dependency Remove usage of zap Don't use pointers to objects Fix log line format Format fields and tags appropriately Add time to log function Add logger tests Add tests to logger Fix logger test wording Add changes suggested in PR Address issues from PR
No description provided.