Skip to content

Distributed Framework

Sandesh Gade edited this page Mar 28, 2017 · 12 revisions

Introduction

The MLaaS Distributed Framework is an automated framework for scheduling tasks on a cluster of worker nodes. Certain tasks associated with MLaaS framework require async operations and hence cannot be implemented in a "blocking" manner and hence this needs to handled using a distributed framework.

Deploying/Running the MLaaS Distributed Framework

The MLaaS Distributed Framework has the following dependencies which need to be satisfied before using the following commands.

  1. Docker
  2. Docker-machine
  3. virtualbox (to run a docker machine locally for testing)
  4. Cloud Provider keys (Currently AWS EC2 is supported)

The first step to building the distributed framework is to prepare the necessary configuration files for the services that will be compiled by the framework specifications.

  • Cloud Provider config

    First create a file called "env" under server/distributedframework with the following content:

      export AWS_ACCESS_KEY_ID="<<YOUR_ACCESS_KEY_ID>>"
      export AWS_SECRET_ACCESS_KEY="<<YOUR_AWS_SECRET_ACCESS_KEY>>"
      export AWS_VPC_ID="<<YOUR_AWS_VPC_ID>>"
      export AWS_EC2_REGION="<<YOUR_PREFERRED_REGION eg. us-west-2>>"
    
  • Redis config

    The next step is to define the redis config file. Again, feel free to specify any custom settings and configuration as per your requirements. The important configuration parameter to be specified is the binding of the process to all interfaces. Make sure to include the following line in your config file.

      bind 0.0.0.0
    

After preparation of the config files required for the platform, the next step is to build the multi-container docker application. Before doing that, it is important to provision for docker-machine instances to run the services. Currently, the distributed platform supports Amazon AWS and consequently the following instructions are AWS specific ONLY.

The first thing to do on AWS is to create a Multi-host Keystore.

docker-machine create --driver amazonec2 --amazonec2-access-key $AWS_ACCESS_KEY_ID --amazonec2-secret-key $AWS_SECRET_ACCESS_KEY --amazonec2-vpc-id $AWS_VPC_ID --amazonec2-region $AWS_EC2_REGION --engine-opt dns=8.8.8.8 aws-mh-keystore

After doing that, use the following command to replicate the environment on your local machine.

eval "$(docker-machine env aws-mh-keystore)"

Now start Consul at the Keystore Machine

docker run -d -p "8500:8500" -h "consul"  progrium/consul -server -bootstrap

After doing that, it is IMPORTANT to configure the security group (inbound) rules for your docker-machine. The essential rules should allow traffic over the relevant TCP, UDP and ICMP ports. We leave it to the discretion of the user to figure out how to do that.

Now, create the Swarm Master.

docker-machine create --driver amazonec2 --amazonec2-access-key $AWS_ACCESS_KEY_ID --amazonec2-secret-key $AWS_SECRET_ACCESS_KEY --amazonec2-vpc-id $AWS_VPC_ID --amazonec2-region $AWS_EC2_REGION --engine-opt dns=8.8.8.8 --engine-label n_type=master --swarm --swarm-master --swarm-strategy "spread" --swarm-discovery="consul://$(docker-machine ip aws-mh-keystore):8500" --engine-opt="cluster-store=consul://$(docker-machine ip aws-mh-keystore):8500" --engine-opt="cluster-advertise=eth0:2376" aws-swarm-master

Use the following command to replicate the docker environment variables on your terminal environment so that you can use docker-machine locally to query on machines running on AWS. This step is important to allow docker-machine to treat your remote docker applications as though they were running locally. Debugging issues becomes a lot more easier if this is enabled.

eval $(docker-machine env -swarm aws-swarm-master)

After this, use the following code template to create your swarm nodes.

docker-machine create --driver amazonec2 --amazonec2-access-key $AWS_ACCESS_KEY_ID --amazonec2-secret-key $AWS_SECRET_ACCESS_KEY --amazonec2-vpc-id $AWS_VPC_ID --amazonec2-region $AWS_EC2_REGION --engine-opt dns=8.8.8.8 --engine-label n_type=worker --swarm --swarm-discovery="consul://$(docker-machine ip aws-mh-keystore):8500" --engine-opt="cluster-store=consul://$(docker-machine ip aws-mh-keystore):8500" --engine-opt="cluster-advertise=eth0:2376"  <<YOUR_PREFERRED_NAME_FOR_NODE>>

Now use the following commands to build and deploy MLAAS's distributed framework across your AWS instances.

cd server/distributedframework
docker-compose up --build -d

To access the IP address that can be used to access the distributed framework, run the following command.

docker-compose port web-api 5000

SOLVED ISSUES:

  1. docker-compose runs successfully, but services exit throwing Error Message: standard_init_linux.go:178: exec user process caused "no such file or directory"
    • SOLUTION dos2unix run_*