Skip to content
This repository has been archived by the owner on Oct 25, 2022. It is now read-only.

BareOS exporter stops working when there multiple filesets with the same name #10

Closed
karlism opened this issue Jul 16, 2021 · 5 comments · Fixed by #13
Closed

BareOS exporter stops working when there multiple filesets with the same name #10

karlism opened this issue Jul 16, 2021 · 5 comments · Fixed by #13
Labels
bug Something isn't working

Comments

@karlism
Copy link

karlism commented Jul 16, 2021

BareOS exporter (version 0.5.6) stops working when there are failed backup jobs:

$ curl -s localhost:9625/metrics
An error has occurred while serving metrics:

480 error(s) occurred:
* collected metric "bareos_jobs_run" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > counter:<value:8 > } was collected before with the same name and label values
* collected metric "bareos_bytes_saved" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > counter:<value:0 > } was collected before with the same name and label values
* collected metric "bareos_files_saved" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > counter:<value:0 > } was collected before with the same name and label values
* collected metric "bareos_last_job_bytes_saved" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > label:<name:"last_selector" value:"last" > counter:<value:0 > } was collected before with the same name and label values
* collected metric "bareos_last_job_files_saved" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > label:<name:"last_selector" value:"last" > counter:<value:0 > } was collected before with the same name and label values
* collected metric "bareos_last_job_errors" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > label:<name:"last_selector" value:"last" > counter:<value:1 > } was collected before with the same name and label values
* collected metric "bareos_last_job_start_unix_timestamp" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > label:<name:"last_selector" value:"last" > counter:<value:1.626365576e+09 > } was collected before with the same name and label values
* collected metric "bareos_last_job_end_unix_timestamp" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > label:<name:"last_selector" value:"last" > counter:<value:1.626365577e+09 > } was collected before with the same name and label values
* collected metric "bareos_last_job_status" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > label:<name:"last_selector" value:"last" > label:<name:"status" value:"C" > counter:<value:0 > } was collected before with the same name and label values
* collected metric "bareos_last_job_status" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > label:<name:"last_selector" value:"last" > label:<name:"status" value:"R" > counter:<value:0 > } was collected before with the same name and label values
* collected metric "bareos_last_job_status" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > label:<name:"last_selector" value:"last" > label:<name:"status" value:"B" > counter:<value:0 > } was collected before with the same name and label values
* collected metric "bareos_last_job_status" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > label:<name:"last_selector" value:"last" > label:<name:"status" value:"T" > counter:<value:0 > } was collected before with the same name and label values
* collected metric "bareos_last_job_status" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > label:<name:"last_selector" value:"last" > label:<name:"status" value:"E" > counter:<value:0 > } was collected before with the same name and label values
* collected metric "bareos_last_job_status" { label:<name:"client" value:"devdb2" > label:<name:"fileset" value:"PostgreSQL" > label:<name:"jobname" value:"backup-devdb2-postgresql" > label:<name:"jobtype" value:"B" > label:<name:"last_selector" value:"last" > label:<name:"status" value:"e" > counter:<value:0 > } was collected before with the same name and label values

Here's database entries related to this backup job:

$ echo "select * from job where name='backup-devdb2-postgresql';" | psql bareos -x
-[ RECORD 1 ]---+---------------------------------------------------
jobid           | 253
job             | backup-devdb2-postgresql.2021-07-15_15.29.01_08
name            | backup-devdb2-postgresql
type            | B
level           | F
clientid        | 4
jobstatus       | f
schedtime       | 2021-07-15 15:29:01
starttime       | 2021-07-15 15:29:03
endtime         | 2021-07-15 15:29:03
realendtime     | 2021-07-15 15:29:03
jobtdate        | 1626362943
volsessionid    | 215
volsessiontime  | 1625060238
jobfiles        | 0
jobbytes        | 0
readbytes       | 0
joberrors       | 1
jobmissingfiles | 0
poolid          | 3
filesetid       | 17
priorjobid      | 0
purgedfiles     | 0
hasbase         | 0
hascache        | 1
reviewed        | 0
comment         | 
-[ RECORD 2 ]---+---------------------------------------------------
jobid           | 247
job             | backup-devdb2-postgresql.2021-07-15_14.09.47_53
name            | backup-devdb2-postgresql
type            | B
level           | F
clientid        | 4
jobstatus       | f
schedtime       | 2021-07-15 14:09:47
starttime       | 2021-07-15 14:09:49
endtime         | 2021-07-15 14:09:50
realendtime     | 2021-07-15 14:09:50
jobtdate        | 1626358190
volsessionid    | 209
volsessiontime  | 1625060238
jobfiles        | 0
jobbytes        | 0
readbytes       | 0
joberrors       | 2
jobmissingfiles | 0
poolid          | 3
filesetid       | 14
priorjobid      | 0
purgedfiles     | 0
hasbase         | 0
hascache        | 1
reviewed        | 0
comment         | 
-[ RECORD 3 ]---+---------------------------------------------------
jobid           | 244
job             | backup-devdb2-postgresql.2021-07-15_10.13.56_17
name            | backup-devdb2-postgresql
type            | B
level           | F
clientid        | 4
jobstatus       | f
schedtime       | 2021-07-15 10:13:56
starttime       | 2021-07-15 10:13:59
endtime         | 2021-07-15 10:13:59
realendtime     | 2021-07-15 10:13:59
jobtdate        | 1626344039
volsessionid    | 206
volsessiontime  | 1625060238
jobfiles        | 0
jobbytes        | 0
readbytes       | 0
joberrors       | 1
jobmissingfiles | 0
poolid          | 3
filesetid       | 15
priorjobid      | 0
purgedfiles     | 0
hasbase         | 0
hascache        | 1
reviewed        | 0
comment         | 
-[ RECORD 4 ]---+---------------------------------------------------
jobid           | 250
job             | backup-devdb2-postgresql.2021-07-15_14.28.29_23
name            | backup-devdb2-postgresql
type            | B
level           | F
clientid        | 4
jobstatus       | f
schedtime       | 2021-07-15 14:28:29
starttime       | 2021-07-15 14:28:31
endtime         | 2021-07-15 14:28:32
realendtime     | 2021-07-15 14:28:32
jobtdate        | 1626359312
volsessionid    | 212
volsessiontime  | 1625060238
jobfiles        | 0
jobbytes        | 0
readbytes       | 0
joberrors       | 2
jobmissingfiles | 0
poolid          | 3
filesetid       | 14
priorjobid      | 0
purgedfiles     | 0
hasbase         | 0
hascache        | 1
reviewed        | 0
comment         | 
-[ RECORD 5 ]---+---------------------------------------------------
jobid           | 242
job             | backup-devdb2-postgresql.2021-07-15_10.00.18_21
name            | backup-devdb2-postgresql
type            | B
level           | F
clientid        | 4
jobstatus       | f
schedtime       | 2021-07-15 10:00:18
starttime       | 2021-07-15 10:00:21
endtime         | 2021-07-15 10:00:21
realendtime     | 2021-07-15 10:00:21
jobtdate        | 1626343221
volsessionid    | 204
volsessiontime  | 1625060238
jobfiles        | 0
jobbytes        | 0
readbytes       | 0
joberrors       | 1
jobmissingfiles | 0
poolid          | 3
filesetid       | 14
priorjobid      | 0
purgedfiles     | 0
hasbase         | 0
hascache        | 1
reviewed        | 0
comment         | 
-[ RECORD 6 ]---+---------------------------------------------------
jobid           | 248
job             | backup-devdb2-postgresql.2021-07-15_14.25.03_05
name            | backup-devdb2-postgresql
type            | B
level           | F
clientid        | 4
jobstatus       | f
schedtime       | 2021-07-15 14:25:03
starttime       | 2021-07-15 14:25:06
endtime         | 2021-07-15 14:25:06
realendtime     | 2021-07-15 14:25:06
jobtdate        | 1626359106
volsessionid    | 210
volsessiontime  | 1625060238
jobfiles        | 0
jobbytes        | 0
readbytes       | 0
joberrors       | 1
jobmissingfiles | 0
poolid          | 3
filesetid       | 14
priorjobid      | 0
purgedfiles     | 0
hasbase         | 0
hascache        | 1
reviewed        | 0
comment         | 
-[ RECORD 7 ]---+---------------------------------------------------
jobid           | 243
job             | backup-devdb2-postgresql.2021-07-15_10.07.54_05
name            | backup-devdb2-postgresql
type            | B
level           | F
clientid        | 4
jobstatus       | f
schedtime       | 2021-07-15 10:07:54
starttime       | 2021-07-15 10:07:57
endtime         | 2021-07-15 10:07:57
realendtime     | 2021-07-15 10:07:57
jobtdate        | 1626343677
volsessionid    | 205
volsessiontime  | 1625060238
jobfiles        | 0
jobbytes        | 0
readbytes       | 0
joberrors       | 1
jobmissingfiles | 0
poolid          | 3
filesetid       | 15
priorjobid      | 0
purgedfiles     | 0
hasbase         | 0
hascache        | 1
reviewed        | 0
comment         | 
-[ RECORD 8 ]---+---------------------------------------------------
jobid           | 252
job             | backup-devdb2-postgresql.2021-07-15_15.17.23_04
name            | backup-devdb2-postgresql
type            | B
level           | F
clientid        | 4
jobstatus       | f
schedtime       | 2021-07-15 15:17:23
starttime       | 2021-07-15 15:17:26
endtime         | 2021-07-15 15:17:26
realendtime     | 2021-07-15 15:17:26
jobtdate        | 1626362246
volsessionid    | 214
volsessiontime  | 1625060238
jobfiles        | 0
jobbytes        | 0
readbytes       | 0
joberrors       | 1
jobmissingfiles | 0
poolid          | 3
filesetid       | 16
priorjobid      | 0
purgedfiles     | 0
hasbase         | 0
hascache        | 1
reviewed        | 0
comment         | 
-[ RECORD 9 ]---+---------------------------------------------------
jobid           | 246
job             | backup-devdb2-postgresql.2021-07-15_14.06.32_39
name            | backup-devdb2-postgresql
type            | B
level           | F
clientid        | 4
jobstatus       | f
schedtime       | 2021-07-15 14:06:32
starttime       | 2021-07-15 14:06:34
endtime         | 2021-07-15 14:06:35
realendtime     | 2021-07-15 14:06:35
jobtdate        | 1626357995
volsessionid    | 208
volsessiontime  | 1625060238
jobfiles        | 0
jobbytes        | 0
readbytes       | 0
joberrors       | 1
jobmissingfiles | 0
poolid          | 3
filesetid       | 14
priorjobid      | 0
purgedfiles     | 0
hasbase         | 0
hascache        | 1
reviewed        | 0
comment         | 

Judging by Prometheus up metric, I can see that exporter stopped working around 10:08, thus I conclude that this failure was triggered by jobid 243. I'm using BareOS 20.0.1 and PostgreSQL 13 backend database.

@vierbergenlars
Copy link
Owner

Can you check the results for the following query: SELECT j.Name, j.Type, j.ClientId, j.FileSetId, COALESCE(c.Name, ''), COALESCE(f.FileSet, ''), COUNT(*), SUM(j.JobBytes), SUM(j.JobFiles) FROM job j LEFT JOIN client c ON c.ClientId = j.ClientId LEFT JOIN fileset f ON f.FileSetId = j.FileSetId GROUP BY j.Name, j.Type, j.ClientId, j.FileSetId, c.Name, f.FileSet HAVING MAX(j.SchedTime) >= '2021-07-15 00:00:00'

For my case, the first column (j.Name) (the job ID) is unique, but from the query output you posted, it looks like you have multiple different filesets for the same job name, which I did not know could happen during normal operations.

To determine how bareos_exporter can handle this situation, it would be helpful to know how this situation occured.
It looks like you were trying to get a job working, and trying to get it working. So I wonder if you modified the fileset at som
e point?
In that case, the failed jobs are just not the actual problem, but multiple rows for a job name being returned. As a quick fix, you can delete the failed jobs with delete jobid=? in bconsole. Or you can wait for 7 days (default, or the number specified with the -job-discovery-days argument) until the jobs with an obsolete fileset ID are outside the window considered by bareos_exporter.

@vierbergenlars vierbergenlars added the bug Something isn't working label Jul 17, 2021
@vierbergenlars
Copy link
Owner

Hmm, I tried to reproduce this case, where a job has different filesets, and that appears to work.

I have no clue what exactly goes wrong here.

@karlism if you can run the query from my previous comment, that would be very helpful.

I don't know what to look for exactly yet, but the combination (j.Name, j.Type, c.Name, f.FileSet) should be unique in the results.

Although it is not enforced in the database schema, I believe j.ClientId and c.Name (client name) map one-to-one to each other, as well as j.FileSetId and f.FileSet. It might be interesting to look at your client and fileset tables to see if my assumption there is correct.

@karlism
Copy link
Author

karlism commented Jul 20, 2021

@vierbergenlars, thanks for looking into this!

echo "SELECT j.Name, j.Type, j.ClientId, j.FileSetId, COALESCE(c.Name, ''), COALESCE(f.FileSet, ''), COUNT(*), SUM(j.JobBytes), SUM(j.JobFiles) FROM job j LEFT JOIN client c ON c.ClientId = j.ClientId LEFT JOIN fileset f ON f.FileSetId = j.FileSetId GROUP BY j.Name, j.Type, j.ClientId, j.FileSetId, c.Name, f.FileSet HAVING MAX(j.SchedTime) >= '2021-07-15 00:00:00';" | psql bareos
                  name                   | type | clientid | filesetid |    coalesce    |         coalesce         | count |     sum     |   sum   
-----------------------------------------+------+----------+-----------+----------------+--------------------------+-------+-------------+---------
 backup-xxxdeploytest1-vm-xxxdeploytest1 | B    |       11 |        21 | xxxdeploytest1 | vm-xxxdeploytest1        |     6 |  8274234063 |      10
 backup-xxxdevketcd1-etc                 | B    |        2 |         2 | xxxdevketcd1   | LinuxEtc                 |    23 |    11123529 |    1682
 backup-xxxawx1-postgresql               | B    |       13 |        28 | xxxawx1        | PostgreSQL_13_RedHat     |     4 |           0 |       0
 backup-xxxdevdb2-postgresql             | B    |        4 |        17 | xxxdevdb2      | PostgreSQL               |     2 |           0 |       0
 backup-xxxdevdb2-postgresql             | B    |        4 |        27 | xxxdevdb2      | PostgreSQL               |     1 |           0 |       0
 backup-yyydevketcd1-etc                 | B    |        3 |         2 | yyydevketcd1   | LinuxEtc                 |    22 |    11122419 |    1681
 backup-xxxdevdb2-postgresqlbasebackup   | B    |        4 |         3 | xxxdevdb2      | PostgreSQL_pg_basebackup |    29 |   160831503 |   36801
 backup-xxxdevdb2-postgresql             | B    |        4 |        26 | xxxdevdb2      | PostgreSQL               |     1 |           0 |       0
 backup-xxxdevdb2-postgresql             | B    |        4 |        14 | xxxdevdb2      | PostgreSQL               |     9 |     5516504 |    1287
 backup-xxxdevdb2-postgresql             | B    |        4 |        24 | xxxdevdb2      | PostgreSQL               |     1 |           0 |       0
 backup-xxxexchange1-exchangeconf        | B    |        8 |        10 | xxxexchange1   | Exchange_Conf            |     7 | 10254884770 |  133607
 backup-yyyexchange1-windowsc            | B    |        5 |         9 | yyyexchange1   | WindowsC                 |    14 | 61807025613 | 1053778
 backup-zzzdeploytest3-etc               | B    |        6 |         2 | zzzdeploytest3 | LinuxEtc                 |    17 |      539737 |    1315
 backup-xxxexchange1-windowsc            | B    |        8 |         9 | xxxexchange1   | WindowsC                 |     9 | 28717961249 |  526179
 backup-xxxdevdb2-postgresql             | B    |        4 |        16 | xxxdevdb2      | PostgreSQL               |     1 |           0 |       0
 backup-yyydeploytest1-etc               | B    |       12 |         2 | yyydeploytest1 | LinuxEtc                 |     3 |     5424299 |     795
 backup-xxxdeploytest1-etc               | B    |       11 |         2 | xxxdeploytest1 | LinuxEtc                 |     5 |     5436981 |     829
 backup-xxxdevdb2-postgresql             | B    |        4 |        25 | xxxdevdb2      | PostgreSQL               |     1 |           0 |       0
 backup-xxxdevdb2-postgresql             | B    |        4 |        19 | xxxdevdb2      | PostgreSQL               |     8 |    27681914 |    6439
 backup-xxxad1-windowsc                  | B    |        9 |         9 | xxxad1         | WindowsC                 |     7 | 17714683477 |  397606
 RestoreFiles                            | R    |        4 |         0 | xxxdevdb2      |                          |     2 |   107559404 |    2559
 backup-yyydevketcd1-etcd                | B    |        3 |        13 | yyydevketcd1   | etcd_all                 |    13 |   132666174 |      76
 backup-yyygit1-etc                      | B    |        7 |         2 | yyygit1        | LinuxEtc                 |    17 |     7283331 |    5728
 backup-xxxdevdb2-postgresql             | B    |        4 |        18 | xxxdevdb2      | PostgreSQL               |     3 |           0 |       0
 backup-yyyexchange1-exchangedb          | B    |        5 |        11 | yyyexchange1   | Exchange_DB              |    38 |  3148882950 |   67267
 backup-yyyexchange1-exchangeconf        | B    |        5 |        10 | yyyexchange1   | Exchange_Conf            |    15 | 21539936089 |  251007
 backup-yyydeploytest1-vm-yyydeploytest1 | B    |       12 |        23 | yyydeploytest1 | vm-yyydeploytest1        |     8 |  8725194082 |      10
 backup-xxxexchange1-exchangedb          | B    |        8 |        11 | xxxexchange1   | Exchange_DB              |    10 |  1954248567 |   42427
 backup-yyygit1-gitlab_backup            | B    |        7 |         6 | yyygit1        | GitLab_backup            |    17 | 42480888703 |      37
 backup-yyyad1-windowsc                  | B    |       10 |         9 | yyyad1         | WindowsC                 |     7 | 16640927093 |  391981
 RestoreFiles                            | R    |       12 |         0 | yyydeploytest1 |                          |     9 | 13031844378 |       6
 backup-yyybackup1-mycatalog             | B    |        1 |         1 | yyybackup1     | Catalog                  |    22 |  1469221104 |    3303
 backup-xxxdeploytest1-vm-xxxdeploytest1 | B    |       11 |        20 | xxxdeploytest1 | vm-xxxdeploytest1        |     1 |           0 |       0
 backup-xxxdevdb2-postgresql             | B    |        4 |        28 | xxxdevdb2      | PostgreSQL_13_RedHat     |     4 |           0 |       0
 backup-yyydeploytest1-vm-yyydeploytest1 | B    |       12 |        22 | yyydeploytest1 | vm-yyydeploytest1        |     1 |           0 |       0
 backup-xxxdevdb2-postgresql             | B    |        4 |        15 | xxxdevdb2      | PostgreSQL               |     2 |           0 |       0
(36 rows)

Yes, it is possible that fileset has been changed at some point as we are now in process of switching from different backup solution to BareOS so me and my colleagues are doing lots of tests and adjustments to existing BareOS configuration.

@vierbergenlars
Copy link
Owner

According to this, bareos can have multiple entries with the same fileset name. Bareos will select the newest one.

When creating a new fileset record, bareos will first check if one with the same name and MD5 sum exists (https://github.com/bareos/bareos/blob/a89617abc0f40425a1a9faa00c1825200be6777a/core/src/cats/sql_create.cc#L702-L705), if it does not, it will insert a new record (https://github.com/bareos/bareos/blob/a89617abc0f40425a1a9faa00c1825200be6777a/core/src/cats/sql_create.cc#L747-L750).

So, multiple filesets with the same name and different ids should be handled. Ideally, they are handled as the single fileset they are. The difference between these is the contents of include/exclude rules, which we don't care about as a metrics exporter.

vierbergenlars added a commit that referenced this issue Jul 24, 2021
There can be multiple entries with the same fileset name present in the fileset table.
vierbergenlars added a commit that referenced this issue Jul 27, 2021
Previously, filesets would be indexed by their ID. We shouldn't do that, as there can be multiple filesets with the same name and a different id
in the database. This happens when fileset include/exclude configurations are changed.

Fixes #10
@vierbergenlars vierbergenlars changed the title BareOS exporter stops working when there are failed backups BareOS exporter stops working when there multiple filesets with the same name Jul 28, 2021
@karlism
Copy link
Author

karlism commented Jul 28, 2021

@vierbergenlars, thanks a lot for fixing this! I will try it out as soon as 0.5.7 binaries are published.

roock referenced this issue in roock/bareos_exporter Apr 13, 2023
Previously, filesets would be indexed by their ID. We shouldn't do that, as there can be multiple filesets with the same name and a different id
in the database. This happens when fileset include/exclude configurations are changed.

Fixes #10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants