Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluentd-plugin-grafana-loki: error_class=NoMethodError error="undefined method `each' for nil:NilClass" #749

Closed
candlerb opened this issue Jul 14, 2019 · 2 comments · Fixed by #750

Comments

@candlerb
Copy link
Contributor

Describe the bug

fluentd-plugin-grafana-loki gives a backtrace whenever any message written to loki. Full backtrace provided at end.

To Reproduce
Steps to reproduce the behavior:

  1. Started loki (de83272)
  2. Install fluentd gem (1.6.2)
  3. Install fluent-plugin-grafana-loki gem (1.0.0). Note that local gem build is required as described in Can't install fluent-plugin-grafana-loki #719 (comment)
  4. Configure:
<source>
  @type udp
  tag syslog
  port 5140
  bind 0.0.0.0
  source_address_key instance
  <parse>
    @type none
  </parse>
</source>

<match syslog.**>
  @type copy
<store>
  @type file
  path /var/log/fluent/syslog
  compress gzip
  #<format>
  #  @type single_value
  #</format>
  <buffer>
    timekey_use_utc
  </buffer>
</store>

<store>
  @type loki
  url "http://x.x.x.x:3100"
  extra_labels {"job":"syslog"}
  drop_single_key true
  flush_interval 10s
  flush_at_shutdown true
  buffer_chunk_limit 1m
</store>
</match>
  1. Send syslog packets to port 5140

Expected behavior

syslogs to be delivered to loki

Environment:

  • Infrastructure: Ubuntu 16.04.5 lxd container inside Ubuntu 18.04.2 host
  • Deployment tool: loki built from source, fluentd installed using "gem install"

Screenshots, promtail config, or terminal output

Jul 14 19:23:47 fluentd fluentd[15731]: 2019-07-14 19:23:47 +0000 [warn]: #0 got unrecoverable error in primary and no secondary error_class=NoMethodError error="undefined method `each' for nil:NilClass"
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-grafana-loki-1.0.0/lib/fluent/plugin/out_loki.rb:184:in `line_to_loki'
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-grafana-loki-1.0.0/lib/fluent/plugin/out_loki.rb:213:in `block in chunk_to_loki'
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.6.2/lib/fluent/event.rb:323:in `each'
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.6.2/lib/fluent/event.rb:323:in `block in each'
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.6.2/lib/fluent/plugin/buffer/memory_chunk.rb:80:in `open'
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.6.2/lib/fluent/plugin/buffer/memory_chunk.rb:80:in `open'
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.6.2/lib/fluent/event.rb:322:in `each'
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-grafana-loki-1.0.0/lib/fluent/plugin/out_loki.rb:210:in `chunk_to_loki'
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-grafana-loki-1.0.0/lib/fluent/plugin/out_loki.rb:116:in `generic_to_loki'
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-grafana-loki-1.0.0/lib/fluent/plugin/out_loki.rb:84:in `write'
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.6.2/lib/fluent/plugin/output.rb:1128:in `try_flush'
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.6.2/lib/fluent/plugin/output.rb:1434:in `flush_thread_run'
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.6.2/lib/fluent/plugin/output.rb:457:in `block (2 levels) in start'
Jul 14 19:23:47 fluentd fluentd[15731]:   2019-07-14 19:23:47 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.6.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
Jul 14 19:23:47 fluentd fluentd[15731]: 2019-07-14 19:23:47 +0000 [warn]: #0 bad chunk is moved to /tmp/fluent/backup/worker0/object_536830/58da91480cb370a8382cb05ec00edd47.log

Subsequent messages just give:

Jul 14 19:23:57 fluentd fluentd[15731]: 2019-07-14 19:23:57 +0000 [warn]: #0 got unrecoverable error in primary and no secondary error_class=NoMethodError error="undefined method `each' for nil:NilClass"
Jul 14 19:23:57 fluentd fluentd[15731]:   2019-07-14 19:23:57 +0000 [warn]: #0 suppressed same stacktrace
Jul 14 19:23:57 fluentd fluentd[15731]: 2019-07-14 19:23:57 +0000 [warn]: #0 bad chunk is moved to /tmp/fluent/backup/worker0/object_536830/58da91521a9ad4f35c976a8483102e2d.log
@candlerb
Copy link
Contributor Author

@remove_keys and @label_keys default to nil, and as the error says, you can't iterate over nil. The following fixes it for me:

--- /var/lib/gems/2.3.0/gems/fluent-plugin-grafana-loki-1.0.0/lib/fluent/plugin/out_loki.rb.orig	2019-07-14 19:40:13.521379017 +0000
+++ /var/lib/gems/2.3.0/gems/fluent-plugin-grafana-loki-1.0.0/lib/fluent/plugin/out_loki.rb	2019-07-14 19:40:48.352519196 +0000
@@ -183,14 +183,14 @@
           # remove needless keys.
           @remove_keys.each { |v|
             record.delete(v)
-          }
+          } if @remove_keys
           # extract white listed record keys into labels.
           @label_keys.each do |k|
             if record.key?(k)
               chunk_labels[k] = record[k]
               record.delete(k)
             end
-          end
+          end if @label_keys
           line = record_to_line(record)
         else
           line = record.to_s

Looks like this plugin hasn't had much in the way of testing though :-(

@dawidmalina
Copy link
Contributor

Good catch 👍 Can you drop a pull request with this change? I agree testing plan (strategy) for this plugin would be needed.

candlerb added a commit to candlerb/loki that referenced this issue Jul 14, 2019
cyriltovena pushed a commit that referenced this issue Jul 15, 2019
udyvish added a commit to udyvish/loki that referenced this issue Jul 30, 2019
* Make sure the default for EnforceMetricName is ✅ (grafana#518)

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Enable tracing loki in helm chart (grafana#496)

* Enable tracing loki in helm chart

Tracing support from grafana#328, need to support config in helm chart

Signed-off-by: Xiang Dai <[email protected]>

* Use camel case for variables

Signed-off-by: Xiang Dai <[email protected]>

* update condition

Signed-off-by: Xiang Dai <[email protected]>

* Update troubleshooting (grafana#498)

* Update troubleshooting

- remove "no label" title
- add tracing part

Signed-off-by: Xiang Dai <[email protected]>

* Add helm help

Signed-off-by: Xiang Dai <[email protected]>

* Take 2 for fix the limit settings (grafana#519)

* Revert "Make sure the default for EnforceMetricName is ✅ (grafana#518)"

This reverts commit 199746a.

* Fix overrides unmarshalling properly

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Add target config (grafana#486)

* Add target config

Support costomize target config.

Signed-off-by: Xiang Dai <[email protected]>

* Update default value

Signed-off-by: Xiang Dai <[email protected]>

* bump helm version

Signed-off-by: Xiang Dai <[email protected]>

* fix a typo

Signed-off-by: Xiang Dai <[email protected]>

* bump up chart version

Signed-off-by: Xiang Dai <[email protected]>

* support config chunk size (grafana#464)

* support config chunk size

Signed-off-by: Xiang Dai <[email protected]>

* fix lint

Signed-off-by: Xiang Dai <[email protected]>

* review feedback

* Add option (grafana#528)

* Add chunk_block_size option

Introduce by grafana#464.

Signed-off-by: Xiang Dai <[email protected]>

* fix white noise

Signed-off-by: Xiang Dai <[email protected]>

* Bump up helm version

Signed-off-by: Xiang Dai <[email protected]>

* Fix typo in example config (grafana#530)

* Fix grafana#531 chunk_block_size not found

Signed-off-by: Steven Sheehy <[email protected]>

* don't delete elements from a map while iterating over it
improve locking of positions

* need to remove this return statement or positions will never be cleaned up

* Adding `make debug` support to build debug binaries and debug containers which wrap the binary with delve and allow for remote debugging

* using the timestamp from the log file might lead to undesired behavior as it could be way in the past or future depending on the clock for the source system generating the logs.  Given that it's compared to the servers time.Now() when looking for idle chunks it's probably best to use the server time when setting the value. (grafana#546)

* Better buckets for chunk sizes (grafana#529)

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* switch to golangci-lint linter (grafana#539)

* switch to golangci-lint linter

* fixes for review comments

* Promtail targets and service discovery pages (grafana#532)

* add a custom server for loki with assets embedded

* adds a targets page for promtail to see discovered files and labels

* lint fixes

* fix css table size

* add service discovery logic

* add service discovery template

* add promtail pages documentation

* ignored gen file in linter

* improves memory usage of golangci-linter

* Send logs to multiple loki instances (grafana#536)

* Adds the ability to provide multiple Loki URL

For backward compatibility `client:` still works with flag.

* add some tests for multi client

* update ksonnet module to support multiple client

* fix comment

* fix lint issues

* fix backward compatibility with client config (grafana#554)

* fix backward compatibility with client config

* fix comment

* Add scrape config for control plane static pods

Signed-off-by: Steven Sheehy <[email protected]>

* make clean removes `pkg/promtail/server/ui/assets_vfsdata.go` which meant the `pkg/promtail/server/ui/assets_vfsdata.go: assets` target is never activated causing a build error, updating to make the target hit on `server.go` which has a dependency on the deleted `assets_vfsdata.go` triggering a run of `make assets` before go builds that file (grafana#555)

* Support CRI 1.14+ directory change

Signed-off-by: Steven Sheehy <[email protected]>

* Parse pipelines of regex matches.

Signed-off-by: Tom Wilkie <[email protected]>

* Errors and wiring.

Signed-off-by: Tom Wilkie <[email protected]>

* Wire up LogQL filters in the ingester.

Signed-off-by: Tom Wilkie <[email protected]>

* Remove RegexpIter, we don't need it.

Signed-off-by: Tom Wilkie <[email protected]>

* Fix logcli.

Signed-off-by: Tom Wilkie <[email protected]>

* Document the filter expression sytax.

Signed-off-by: Tom Wilkie <[email protected]>

* Fix ingester test.

Signed-off-by: Tom Wilkie <[email protected]>

* Add an extra filter step if the user specifies a regex using old API.

Signed-off-by: Tom Wilkie <[email protected]>

* Better error messages.

Signed-off-by: Tom Wilkie <[email protected]>

* Review feedback.

Signed-off-by: Tom Wilkie <[email protected]>

* link to helm/README.md

* Typo

* Add two more links to readme (grafana#560)

* Close all chunks before flushing.
Moved the check for already closed to make sure we honor this check when `immediate` flushing is requested

* beginnings of label extraction pipeline

* starting to add label extraction, super rough but need to do some work in a different branch

* Adds json log entry parser with configuration.

* rebase to the new structure

* Plug the log entry pipeline into promtail

Co-authored-by: Edward Welch <[email protected]>

* Polish json entry stage

* adding regex pipeline stage

* updating tests
adding CRI and Docker format pipeline stage extensions
adding backwards compatibility with current config

* adding benchmarks

* adding a pipeline processing histogram

* fixing capitalization on CRI stage and some lints

* Updating helm/ksonnet to use new pipeline config
Updating docs
Rebase fixes

* Remove unnecessarily long wait times in tests.

Signed-off-by: Tom Wilkie <[email protected]>

* Explictly listen on localhost in tests, to stop warning on MacOS.

Signed-off-by: Tom Wilkie <[email protected]>

* Update vendor for weaveworks/commont to Tom's fork, for now.

Signed-off-by: Tom Wilkie <[email protected]>

* rename __filename__ label to filename

* fix review feedback

* Add a histogram for chunk encoding time (grafana#565)

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* add optional PodDisruptionBudget to helm chart (grafana#515)

* add optional PodDisruptionBudget to helm chart

* address comments

* add back new line - no change to helpers template

* fix silly typo in loki-stack version

* Fix a bug where we should use the glob matcher on fsnotify new file create.
Changed the timing of the sync function in the unit test to fix flakiness in the test.

* Support using an IPv6 in the Loki push url (grafana#566)

* Clean up the metric for pipeline duration to keep it within the pipeline file, also changing unit to seconds and namespace to logentry

* initial checkin
mostly working, need to extract out the config
needs a proper ksonnet or helm deployment or both
needs a makefile

* improving metrics
improving tests
config via flags
added ksonnet config

* Add support for memcaches (grafana#564)

* vendor: update cortex

Don't be blocked on cortexproject/cortex#1345 to
be merged and still maintain the fork

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* add memcached ksonnet

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* commit generated proto

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* fix minor issues with memcached config in ksonnet (grafana#578)

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Add LICENSE

* using an environment variable passed into an arg instead of reading the pod name from a file and the downardApi
added a makefile

* adding namespace to ksonnet config, moved images into a config file

* fixing how auth is applied

* change response_latency bucket sizes

* changing response_latency buckets some more, making exponential and configurable

* Limiting query start time with config (grafana#572)

* Limiting query start time with config

* Fixed maxLookBackPeriod check, added it to existing configs, some other fixes

* gofmted file

* Changed maxLookBackPeriod to 4 weeks in helm chart

* Setting maxLookBackPeriod in helm chart to 0 and bumping up the helm chart version

* changed maxLookBackPeriod in yaml in cmd to 0 and some nit

* Update table manager image in production/ksonnet (grafana#582)

* Create CODE_OF_CONDUCT.md (grafana#516)

* Add a pull request template (grafana#517)

* Helm: Allow custom pipeline stages (grafana#580)

Signed-off-by: Steven Sheehy <[email protected]>

* adds external labels to be passed via flags (grafana#510)

* impr/logcli: Added label output filters + tests (grafana#563)

* impr/logcli: Added label filters + tests

* Address review

* udpate readme

* updating README

* adding diagram

* impr/clients: Handle TLS config and MTLS for logcli and promtail (grafana#540)

* impr/clients: Handle TLS config and MTLS for logcli and promtail

* fix/tls: Please gofmt...

* impr/clients: use prometheus HTTPClientConfig for logcli and promtail

* fix/promtail: Set proper Client config name

* impr/promtail: Use prometheus HTTPClientConfig configuration

* adapt with master

* address review

* fix conflicts

* address requested changes

* remove file

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Change tail response for backward compatibilty with future changes (grafana#590)

Changed it to dict with single key called "streams" and value set to list of logproto.Stream

* updating to match websocket api changes

* fixes test in circle ci getting OOMKilled (grafana#593)

* Improve readiness probe output (grafana#556)

Signed-off-by: Steven Sheehy <[email protected]>

* bump up promtail chart version

Signed-off-by: Xiang Dai <[email protected]>

* fixes chunks lazy loading (grafana#595)

* fixes ingester querier not honoring filters (grafana#594)

* Get rid of the cortex fork with lazy loading upstreamed (grafana#596)

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Fix local config: Use DayTime (grafana#598)

* Fix local config: Use DayTime

Psst, the date is the first commit date for loki ;)

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Fix the config in ksonnet and helm too

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Typo (grafana#588)

Fix typo

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* need to bump helm chart after PR grafana#598

* Update operations.md (grafana#615)

Small update to fix the link to the Cortex project.

* Also bumping lok-stack version which needs to be done if the loki or promtail chart versions are changed (grafana#612)

* Added namespace to helm install notes (grafana#617)

* This add make target to deploy a dev version using helm (grafana#586)

* add a dev target to deploy the current image in k8s

* impr/clients: Handle TLS config and MTLS for logcli and promtail (grafana#540)

* impr/clients: Handle TLS config and MTLS for logcli and promtail

* fix/tls: Please gofmt...

* impr/clients: use prometheus HTTPClientConfig for logcli and promtail

* fix/promtail: Set proper Client config name

* impr/promtail: Use prometheus HTTPClientConfig configuration

* adapt with master

* address review

* fix conflicts

* address requested changes

* remove file

* add helm dev targets

* adding back assets

* fix review comments

* Review feedback

* Update cortex vendor (grafana#610)

* Update cortex vendor
Use query max look back from cortex
Update config changes from cortex
Fixed breaking code due to updating cortex vendor

* fixed linter error

* Changes for running Table Manager with loki in single binary (grafana#600)

* Update readme (loki-stack)

* ksonnet changes for running loki in single binary (grafana#622)

* ksonnet changes for running loki in single bianry
Added retention config with default values

* Fixed indentation

* provide Cassandra Index Store Example (grafana#625)

Mentioned in title. To save new beginner's time, provide one config example for use Cassandra as Index Storage.

* Helm chart tracing variable fix (grafana#621)

* loki chart deployment.yaml: Only set JAEGER_AGENT_HOST if set.

* docs/troubleshooting.md: Fixed variable name for chart option.

* Bumped loki chart version for variable fix.

* Switch Loki to StatefulSet (grafana#585)

Signed-off-by: Steven Sheehy <[email protected]>

* Update details about logs retention

* Limits: Reject entries based on age set in limits (grafana#631)

* Reject entries based on age set in limits

* Save people time until grafana#535 is resolved.

* working version with Gatherer

* better counters

* implements json stage metrics

* adds test for json metrics

* adds matchers and metrics to regex and json

* tidy up

* add more memory to circle ci

* tweaking test target

* fix typo

* move custom metrics to /metrics and prefix them

* fix linter

* Refactor to make everything a stage.
Still needs:  Matcher stage work, pipeline test work

* Making the pipeline itself a Stage so that we can use it to better implement the Match stage (and it cleans up the Docker and CRI extensions some too)
Still needs to fix the metrics test

* implementing all the functions for the counter and gauge metric types
metric test is still failing because of missing metrics when the counter is 0
added a lot of TODO's I need to go back and cleanup

* cleaned up most of the TODO's
refactored and improved some of the tests, each stage type should now have a test using actual YAML
A few TODO's remaining and some comment cleanup

* cleaning up remaining TODO's, adding tests
cleaning up GoDoc
pipeline_name is now optional for matcher stage

* Remove LogCount

* fix flaky timestamp test

* PR feedback

* renaming `metric` stage to `metrics` as it defines multiple metrics, similar to labels stage which is also plural.
Adding a couple unit tests to regex and json stage to act as examples

* fix labels for PodDisruptionBudget on helm (grafana#623)

* updating docs for new pipeline config.
removed helm entry in scrape_config.sh because there doesn't seem to be anyway to make it work with new pipeline config

* PR Feedback

* adding a release process

* also restrict the release to master branch only (not sure if this is actually necessary)

* change the helm pullPolicy since we will be using releases now

* updating versions for loki v0.1.0

* release stage is broken, removing for now

* Remove label __name__ from store querier (grafana#648)

* chore(vendor): update cortex vendor

Updates cortex vendor to get the changes from cortexproject/cortex#1431.
Required for making promtail build on Windows

Signed-off-by: sh0rez <[email protected]>

* feat(ci): promtail cross platform

Build promtail on linux and windows, discard artifacts

* Remove 404 link (grafana#637)

Signed-off-by: Xiang Dai <[email protected]>

* feat(promtail): initContainers (grafana#655)

* fix(promtail/targets): remove dependency on prometheus/relabel

`filetargetmanager.go` had a dependency on prometheus/relabel, a package that
has been removed in favor of prometheus/pkg/relabel.
This converts the code to the new package, to allow the prometheus vendor be
updated to master

* chore(vendor): update prometheus vendor

Updates prometheus vendor to current master, to add support for InitContainers
in kubernetes service discovery

* chore(vendor): do dep's homework

Gives dep hints on how to resolve the vendor so that it work

* fix(promtail/targets): non-nil check

Accidentally checked against the wrong labelSet, this one can never get nil

* fix(loki): honor log level from config file (grafana#657)

Because of the logger being initialized before the configuration file is parsed,
the log_level from the config file is ignored.
To solve this, the logger is reinitialized after the file is parsed, as it is
already being done in promtail.

* add support for RFC3339Nano in query timestamps (grafana#656)

* refactoring things so that the comparator can use the reader to make a direct query to loki to look for logs not received over the websocket  before reporting them as missing

* Helm: Integration testing (grafana#641)

* Helm chart integration testing

Signed-off-by: Steven Sheehy <[email protected]>

* Don't check helm version on every push

Signed-off-by: Steven Sheehy <[email protected]>

* Remove chart upgrade testing

Signed-off-by: Steven Sheehy <[email protected]>

* Add dynamodb sample for overriding default provisioning capacity units (grafana#626)

* adding a list of received entries (acknowledged) to use for comparison against entries which were not expected so that we can report them as duplicates.

* Docker Logging Driver (grafana#663)

* adds first version of docker driver

* without logrus and fixes some linter issue

* fix the driver and start a build system

* adds swarm label discovery

* Add documentation and more targets

* make the linter happy ❤️

* indent config.json

* with circleci steps for master and branch

* Review Feebacks

* fix docker plugin ci (grafana#664)

* fix docker plugin ci

* use binary cache

* Fix publish-helm failure (grafana#665)

Signed-off-by: Steven Sheehy <[email protected]>

* rename fluent plugin and update docs

* Removing the pre-allocation of a buffer when serializing blocks, in most of my empirical testing we were using 9-10k out of the 32k allocated leaving about 2/3 of the buffer allocated and unused times the number of blocks per chunk, times the number of chunks kept in memory.  This was adding up to a fair amount of allocated but unused space.

* fix missing logger in client (grafana#673)

* updating the loki dashboards to use the metrics from the new gateway

* fix helm lint issue

Signed-off-by: Xiang Dai <[email protected]>

* Improvements in live tailing of logs (grafana#541)

* Improvements in live tailing of logs

Instead of polling for new logs, a grpc stream is opened between ingester and querier to get live logs
Querier reconnects to disconnected or newly added ingesters

* Added more comments to code for live log tailing

* Some code refactoring in live log tailing

* handling delayfor in logcli, max delayfor to be 5 seconds

* some changes in tail response in live tailing

* Fixed issue with stopping ingesters gracefully when live tailing is being used

* Added tests for tailer in querier

* Live tailing made logql compatible, some code refactoring suggested in PR

* Fix helm test error

Signed-off-by: Xiang Dai <[email protected]>

* Update chart version

Signed-off-by: Xiang Dai <[email protected]>

* Removed test for tailer in querier due to issues with synchronization

* adding resource requests in jsonnet

* changing the default prune interval to 60 seconds, at the previous 1 second it would cause some way too aggressive querying when scaled out over all our clusters and loki had a hiccup

* fix logcli code src path

* changing the log length histogram to be a normal histogram with only a `path` label, also removing the label values when no longer tailing the file.  Following the same pattern we used for readBytes and totalBytes.

* removing entries.go as we are no longer using the custom Histogram for the log_entries_bytes histogram

* Add selector as required by k8s 1.8 and higher. (grafana#716)

Fixes grafana#715

* fluent-plugin: Mark as multi-workers ready (grafana#709)

* fluent-plugin: Mark as multi-workers ready

* fluent-plugin: Add info on multi-worker usage to README

* sync with Cortex for s3 path style url (grafana#705)

* fix(loki|promtail): logger re-init nil config panic (grafana#697)

When a more or less invalid config (e.g. `null`) is supplied, the log level prop
of the config receives an invalid value which causes the logger to panic on
re-init.

Prevents this by checking for a nil log-level and notifies the user in case

* BREAKING fix(loki-mixin): rename rules key (grafana#691)

Renames the `prometheus_rules` key to `prometheusRules` to comply to the
spec (https://github.com/monitoring-mixins/docs/blob/master/design.pdf)

BREAKING `prometheus_rules` is not available anymore

* update grafana to fix dashboard provider (grafana#674)

* fluent-plugin: Add separate license to fluent-plugin-grafana-loki to fix gem installation (grafana#682)

* Fixed orderedDeps() order stability (grafana#721)

* ability to specify keys to remove (grafana#669)

* feat(docker): multi-arch Dockerfile (grafana#668)

Adds an entirely new Dockerfile to the repository root which is capable of:

- building promtail and loki from the same image
- based on `scratch`, so that the final image can be executed on every supported
  GOARCH

Please note that this Dockerfile should always be built using
BuildKit (`buildctl` or `DOCKER_BUILDKIT=1` in recent versions) for maximum
performance, as BuildKit's DAG allows for smart skipping of uneeded stages.

These changes were proposed in grafana#659

* fix(loki): panic on missing config (grafana#720)

* fix(loki): pass missing config error to user

Missing config errors are handled at the library level. Our own check mitigated
this and causes loki to SEGFAULT later on

* feat(loki): default config file in container

The container provides a default config file. Use it by default

* Revert "fix(loki): pass missing config error to user"

This reverts commit b2744fc, because loki it
assumed loki was incapable of running without config, which is not the case.

* move Dockerfile multi-arch to build and ignore that folder (grafana#723)

* move Dockerfile multi-arch to build and ignore that folder

* docker-driver fix

* Query label values and names are now fetched from the store. (grafana#521)

* Query label values and names are now fetched from the store.

A time range is now required by the /api/prom/label with a sane default (6 hours from now).

* fix http querystring and update doc

* update vendor

* rebased

* Typo on values.yaml (grafana#728)

* prune interval is configurable
canary will suspend all operations on SIGINT but not exit, allowing you to shutdown the canary without it being restarted by docker/kubernetes
SIGTERM will shutdown everything and end the process

* Add support to timestamp stage to parse Unix seconds, milliseconds, and nanosecond timestamps

* Redirect / to /targets in promtail server

* Fixed RFC3339Nano examples in doc

* use strconf.FormatFloat instead of fmt.Sprintf for converting floats to strings, this way we can eliminate non significant trailing zeros such that the float value 1 would be "1" as a string instead of "1.000000"

* adding a golang Template stage

* Documented /ready, /metrics and /flush endpoints (grafana#743)

* feat(logcli): query from absolute timestamp (grafana#736)

* feat(logcli): query from absolute timestamp

Adds a new flag (`--from`) as to complement `--since`.
While since subtracts a relative duration from the current time, from allows to
specify the absolute start of the lookback window.

Note: `--from` takes precedence over `--since`, but only if set.

* fix(logcli): use RFC3339Nano for -from

To comply with Prometheus, this flag now honors Nanoseconds as well (was using
RFC3339 so far)

* Added source support to regex and json stages

* Converted source in regex and josn stages to string pointer

* Added source validation to regex and json stages

* Parallelly run regex and json stages pipeline tests

* feat(logcli): output modes (grafana#731)

* feat(logcli): quiet mode

Adds a quiet mode (-q / --quiet) to suppress the debug messages to stderr

* feat(logcli): output modes

Adds two alternative output modes (-o / --output)

- raw: emits the line as parsed
- jsonl: emits the line plus all known metadata as JSONL (JSON Line)

Usage: -o [default, raw, jsonl]

* feat(logcli): quiet tailing mode

* feat(logcli): output modes while tailing

Supports the three different output modes in tail mode as well

* feat(logcli): print labels in jsonl tail mode

* refactor(logcli): clean up entry printing

Moves the entry printing into a standardized interface, implements this three
times:

- default (human readable)
- jsonl (for scripts)
- raw ('as is')

* fluentd-plugin-grafana-loki: change log.info to log.debug (grafana#751)

Reduces background chatter which in turn can trigger more log writes

Signed-off-by: Brian Candler <[email protected]>

* fluentd-plugin-grafana-loki: avoid exception when remove/label_keys is unset (grafana#750)

Fixes grafana#749

Signed-off-by: Brian Candler <[email protected]>

* Added a note about regex accepted syntax in the filter expression (grafana#746)

* cleanup rake warnings, bump version

* promtail: Add systemd journal support (grafana#730)

Support for reading systemd journal entries has been added. 
promtail will look for a job in scrape_configs with a journal key 
to activate the journal target. 

If GOOS=linux and CGO_ENABLED=1, promtail will now require 
libsystemd headers to be available for building. If GOOS is not 
linux or CGO_ENABLED is not 1, journal support will be unavailable
and a log message will be printed warning the user that their config 
file has journal tailing configured without it being built into promtail. 

See docs/promtail-examples.md for a concrete example of 
using journal support. 

Other structural changes made: 

  1. Ability for checking if scrape.Config.ServiceDiscoveryConfig is 
     non-zero has been added. 

     This was chosen over making ServiceDiscoveryConfig a pointer
     as yaml.v2 cannot parse an inline struct into a pointer value. 

  2. Updated pipeline logger component name to journal_pipeline and
     file_pipeline for JournalTargetManager and FileTargetManager
     respectively.

  3. The positions file will now store positions as strings instead of 
     integers. Existing positions will be read in properly but written out 
     as strings the next time the positions file is saved. This is done to 
     be able to store the journal cursor, which is a string. The positions 
     API has been updated to support reading in the old integer values 
     and the new string values.

* Rollback changes to Makefile and build/Dockerfile from grafana#730 (grafana#758)

This commit rolls back the changes to the Makefile and build/Dockerfile
that caused CGO_ENABLED=1 to be present during some builds. This commit
causes the journal support to be disabled in any build produced by make.

Journal support can still be enabled in a manual build:

  go build -o cmd/promtail/promtail cmd/promtail

* Storage memory improvement (grafana#713)

* add benchmark for storage queries
* improve iterator to load only on next
* fix memory retained by lazy chunks
* reverse backward lazy iterator

* fixed helm installation instructions (grafana#761)

* Added date without year support to timestamp stage (grafana#760)

* Documented timestamp's custom format syntax (grafana#763)

* Added tail length limit to limit duration of live tailing of logs to 1 hour (grafana#756)

* Added tail length limit to limit duration of live tailing of logs to 1 hour

* Some code refactoring suggested from PR review for live tailing duration limit

* Fixed error messages in live tailing of logs

* Fixed lint errors

* preparing for move into loki repo

* finishing up loki-canary move into loki repo

* rounds nanoseconds boundaries to milliseconds (grafana#771)

* rounds nanoseconds boundaries to milliseconds

* convert also store query

* feat(logcli): add --to flag to specify latest RFC3339 time for query (grafana#776)

Fixes grafana#774

* Parse the addr into a URL so we can extract the Host name for use in the TLSConfig (grafana#778)

* fix path escape (grafana#779)

* better job of fixing url parsing and host name extraction for logcli

* feat(loki): extended tailing (grafana#764)

* refactor(querier/ingester): TailRequest Lookback window

Moves the specifications of the Lookback Window out of the
logproto.QueryRequest into it's own type logproto.Lookback.
This is required, because the Lookback Window will be used in the TailRequest
as well.

* feat(querier): parse Lookback from HTTP Request

* feat(logcli): send Lookback Window spec with tail request

* feat(querier): include historic entries in tail mode

Extends tailing by sending a configurable amount of historic entries with
before the live entries. This enables a behaviour that is closer to kubectl logs
-f and docker logs -f.

It is implemented by running a regular Query before subscribing to the ingesters.

* fix: adapt tests to Lookback change

* feat(querier): check all errors to make the linter happy

* fix(logproto): flatten Lookback window spec

Flattens the Lookback window spec into the individual queries

* fix(ingester): adapt test to Lookback flatten

* adjust old instructions for troubleshooting docker daemon (grafana#785)

* promtail: clarify linux build instructions in docs

* Use prometheus pool for line buffer. (grafana#790)

* Use prometheus pool for line buffer.

* Document centos dep

- document centos dep
- add `-y` option

Signed-off-by: Xiang Dai <[email protected]>

* add missing `

Signed-off-by: Xiang Dai <[email protected]>

* fix: Speed up Loki shutdown when using the sample local config (grafana#784)

* adding ability to supply timezone to timestamp pipeline stage

* ingester: support chunk transfers on ingester shutdown.

This commit introduces chunk transfers, borrowing the mechanism from
Cortex's implementation: when an ingester is shut down with claim
on rollout enabled, the ingester will find a pending ingester and
transfer all of its chunks to it.

* ingester: fix lint issues for chunk transfers

* ingester: Add test for chunk transfers

* add a Name() method to the stage interface so that debug logging can show you the name of the pipeline stage which just processed the log
remove some unnecessary logging around fsnotify events we don't care about and saving positions
Make the Processing Log Lines doc a first class citizen, I reference this a lot and currently it's hidden behind 3 clicks

* ingester: clean up chunk transfer code

* ingester: fix feedback from PR review

* ingester: log error if closing client after transfer fails

* Added -querier.query_timeout support (grafana#788)

* Add prometheus for metrics and upgrade some of the outdated gems. (grafana#792)

* Fix broken link in readme file

* fix dependencies order and bump version (grafana#803)

* promtail: restore ability to show target labels in promtail UI

PR grafana#791 accidentally removed the ability to use the promtail UI
by removing methods from the Target interface that were only
used within the HTML templates.

Along with restoring the methods in the Target interface, this
commit also introduces details for the JournalTarget, which
currently only provides the position in the journal being
tracked.

* ingester: register Ingester service in gRPC on loki start

Registering the Ingester service to gRPC enables the chunk transfer
mechanism to work.

* Simplify our makefile as much as possible (grafana#753)

refactor(makefile): simplify makefile by removing the autogenerated targets and make building binaries and dockerfiles separate operations.

* fix panic in docker driver for newer docker version (18.09.7+) (grafana#813)

* docker-driver-push does a build of all the go files and as such needs to make called with BUILD_IN_CONTAINER=false from circleci

* fix tail library logs to use our own log format (grafana#579)

* fix tail library logs to use our own log format

* PR Feedbacks

* Update logcli usage in docs and improve help text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants