Skip to content
This repository has been archived by the owner on Jan 31, 2020. It is now read-only.

Net::Statsd error. #28

Closed
obigriffith opened this issue Oct 24, 2013 · 12 comments
Closed

Net::Statsd error. #28

obigriffith opened this issue Oct 24, 2013 · 12 comments
Assignees
Labels

Comments

@obigriffith
Copy link
Collaborator

No idea what this is about, or whether it is consequential. Don't think I have seen it before either.

ogriffit@GGMS ~> genome model build start 2891230328 2891230330
'models' may require verification...
Resolving parameter 'models' from command argument '2891230328,2891230330'... found 2
Trying to start #1: H_NJ-HCC1395-HCC1395.prod-microarray.wugc.infinium.GRCh37-lite-build37 (2891230328)...
************** bsub -N -H -P build7ec6ea183cf011e3a15473ebc88d4673 -q normal -u [email protected] -o /opt/gms/2JDB553/fs/2JDB553/info/model_data/2891230328/build7ec6ea183cf011e3a15473ebc88d4673/logs/workflow-server.out -e /opt/gms/2JDB553/fs/2JDB553/info/model_data/2891230328/build7ec6ea183cf011e3a15473ebc88d4673/logs/workflow-server.err " annotate-log genome model services build run --model-id 2891230328 --build-id 7ec6ea183cf011e3a15473ebc88d4673 1>/opt/gms/2JDB553/fs/2JDB553/info/model_data/2891230328/build7ec6ea183cf011e3a15473ebc88d4673/logs/workflow-server.out 2>/opt/gms/2JDB553/fs/2JDB553/info/model_data/2891230328/build7ec6ea183cf011e3a15473ebc88d4673/logs/workflow-server.err " ***********
DEBUG: Recorded LSF job submission (101).
DEBUG: Added commit observer to resume LSF job (101).
Successfully started build (7ec6ea183cf011e3a15473ebc88d4673 of H_NJ-HCC1395-HCC1395.prod-microarray.wugc.infinium.GRCh37-lite-build37).
Net::Statsd can't create a socket to apipe-statsd.gsc.wustl.edu:8125: Invalid argument at /data/opt-gms/2JDB553/sw/genome/lib/perl/Genome/Utility/Instrumentation.pm line 77

@sakoht
Copy link
Contributor

sakoht commented Oct 30, 2013

The Net::Statsd module is okay with there being no server, but makes noise.

This quiets it:
git show 659668

@gatoravi
Copy link
Contributor

gatoravi commented Nov 7, 2013

So my builds fail to start after install, and I've tracked down where they fail. Its a line in the timer subroutine in Genome/Utility/Instrumentation.pm file and the line is 'return unless $net::Statsd::HOST;'. Looks like this variable is not defined, looking into why !

@gatoravi
Copy link
Contributor

gatoravi commented Nov 8, 2013

/opt/gms/DY46678/sw/genome/lib/perl/Genome/Site/TGI.pm: $ENV{GENOME_STATSD_HOST} ||= 'apipe-statsd.gsc.wustl.edu';
will this work on the standalone install ?
@sakoht

@malachig
Copy link
Collaborator

malachig commented Nov 8, 2013

'apipe-statsd.gsc.wustl.edu' will not work on a standalone installation...

In the standalone GMS are we even attempting to install statsd and get the service up and running? From Scott's comment above it looks like we added logic to avoid warning messages when the statsd service is not present. From that I assume we do not have the service up and running. Questions:

1.) Should we try to install and configure statsd so that we get the benefit of aggregating stats? How hard would this be. The instructions don't seem too crazy (https://github.com/etsy/statsd/). Perhaps Nathan could advise...

2.) If we did get this service up and running would it burn much resources in a test environment where all services are running on a single box? So far everything seems to be fine with apache, postgres, openlava, etc. all running on one master box.

3.) Is the current problem with starting builds related to the commit mentioned above? Or perhaps something that was added during the last merge from master into gms-pub? Should we just patch that logic?

4.) What is the relationship between the environment variables defined in genome/lib/perl/Genome/Site/TGI.pm and those defined in the standalone installation in /etc/genome.conf? Is Site/TGI.pm invoked in the standalone installation?

If we can add installation/configuration of statsd to the standalone install without too much difficulty, that would seem to be the desired solution because it means one less divergence from GMS internal and it allows us to potentially take advantage of statsd + graphite features down the road.

@malachig
Copy link
Collaborator

This problem was resolved in the end by a modification to:
lib/perl/Genome/Utility/Instrumentation.pm

https://github.com/genome/gms-core/commit/f1f5482ddf742439e986d1ecdf39332ac66087fb#diff-32604f06c3fd2f13766fdd79d133a353

Basically if the statsd host environment variable is not defined. It is assumed to be localhost as follows:

$Net::Statsd::HOST = $ENV{GENOME_STATSD_HOST} || 'localhost';

Further modifications to lib/perl/Genome/Utility/Instrumentation.pm were added so that if there is not statsd service running, warning messages are silenced.

@obigriffith
Copy link
Collaborator Author

I'm still getting Statsd errors when starting builds on external GMS box. They don't seem to prevent successful builds just look messy.

Net::Statsd can't create a socket to apipe-statsd.gsc.wustl.edu:8125: Invalid argument at /opt/gms/LNXE927/sw/genome/lib/perl/Genome/Utility/Instrumentation.pm line 82

@obigriffith obigriffith reopened this Jan 11, 2014
@obigriffith
Copy link
Collaborator Author

I don't understand how stand-alone GMS even knows of such a path as "apipe-statsd.gsc.wustl.edu:8125". The only place where this is specified in the code base is TGI.pm and it looks like this should all be skipped in a standalone install.

@obigriffith
Copy link
Collaborator Author

With some simple debugging the problem is not clearer to me. When a new build is started 'Genome::Utility::Instrumentation::timer' is called many times. I am printing out the $name, $start, $end, $hostname, and $genome_sys_id for each time. For some unknown reason, when running timer on 'command.model.build.start' it is trying to use 'apipe-statsd.gsc.wustl.edu' but all other times is using 'localhost' as expected. Why is this happening?

obig@GGMS ~> genome model build start "name='hcc1395-normal-snparray'" --force
'models' may require verification...
Resolving parameter 'models' from command argument 'name='hcc1395-normal-snparray''... found 1
Trying to start #1: hcc1395-normal-snparray (2891230330)...
DEBUGGING: disk.allocation.require, 1389568044.72472, 1389568044.58934, localhost, LNXE927
DEBUGGING: disk.allocation.get_parent_allocation, 1389568045.18387, 1389568045.15272, localhost, LNXE927
DEBUGGING: disk.allocation.child_allocation_query, 1389568045.18488, 1389568045.18398, localhost, LNXE927
DEBUGGING: disk.allocation.create.candidate_volumes.selection, 1389568045.30308, 1389568045.18494, localhost, LNXE927
DEBUGGING: disk.volume.allocated_kb, 1389568045.37023, 1389568045.35475, localhost, LNXE927
DEBUGGING: disk.allocation.create.candidate_volumes.existing_allocation_path_check, 1389568045.37037, 1389568045.37032, localhost, LNXE927
DEBUGGING: disk.volume.allocated_kb, 1389568045.69994, 1389568045.69094, localhost, LNXE927
DEBUGGING: disk.allocation.create.get_allocation_without_lock, 1389568045.71583, 1389568045.30314, localhost, LNXE927
DEBUGGING: disk.allocation.create.create_directory, 1389568045.71667, 1389568045.71589, localhost, LNXE927
DEBUGGING: disk.allocation.create, 1389568046.03594, 1389568043.29227, localhost, LNXE927
************** bsub -N -H -P build18ad534d4d0a491c82114c166fe99264 -q normal -u [email protected] -o /opt/gms/LNXE927/fs/LNXE927/info/model_data/2891230330/build18ad534d4d0a491c82114c166fe99264/logs/workflow-server.out -e /opt/gms/LNXE927/fs/LNXE927/info/model_data/2891230330/build18ad534d4d0a491c82114c166fe99264/logs/workflow-server.err " annotate-log genome model services build run --model-id 2891230330 --build-id 18ad534d4d0a491c82114c166fe99264 1>/opt/gms/LNXE927/fs/LNXE927/info/model_data/2891230330/build18ad534d4d0a491c82114c166fe99264/logs/workflow-server.out 2>/opt/gms/LNXE927/fs/LNXE927/info/model_data/2891230330/build18ad534d4d0a491c82114c166fe99264/logs/workflow-server.err " ***********
Successfully started build (18ad534d4d0a491c82114c166fe99264 of hcc1395-normal-snparray).
DEBUGGING: command.model.build.start, 1389568046.3296, 1389568041.84493, apipe-statsd.gsc.wustl.edu, LNXE927
Build IDs: 18ad534d4d0a491c82114c166fe99264

@obigriffith
Copy link
Collaborator Author

Tracing backwards with Net::Statsd it seems that 'localhost' is changed to 'apipe-statsd.gsc.wustl.edu' in Model.pm at the command:
my $operation_type = Workflow::OperationType::Command->get('Genome::Model::Build::ExecuteBuildWrapper');

Still going down that rabbit hole.

@obigriffith
Copy link
Collaborator Author

It looks like the problem is here in Workflow: https://github.com/genome/tgi-workflow/blob/master/lib/Workflow/Instrumentation.pm

BEGIN {
    if ($ENV{UR_DBI_NO_COMMIT}) {
        $Net::Statsd::HOST = ''; # disabled if testing
        $Net::Statsd::PORT = 0;
    } else {
        $Net::Statsd::HOST = 'apipe-statsd.gsc.wustl.edu';
        $Net::Statsd::PORT = 8125;
    }
};

@obigriffith
Copy link
Collaborator Author

I think we can change this to the following, similar to https://github.com/genome/gms-core/blob/gms-pub/lib/perl/Genome/Utility/Instrumentation.pm.

BEGIN {
    if ($ENV{UR_DBI_NO_COMMIT}) {
        $Net::Statsd::HOST = ''; # disabled if testing
        $Net::Statsd::PORT = 0;
    } else {
        $Net::Statsd::HOST = $ENV{GENOME_STATSD_HOST} || 'localhost';
        $Net::Statsd::PORT = $ENV{GENOME_STATSD_PORT} || 8125;
    }
};

I committed this to gms-pub on our internal git server. I does not seem to have synced to github. Perhaps gms-pub branch of Workflow is not set up correctly for automatic sync.

@obigriffith
Copy link
Collaborator Author

The syncing issues were resolved and the change seems to prevent statsd warnings. Resolving this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants