Skip to main content

Posts

Showing posts from January, 2018

CI/CD with CircleCI - Heroku deploy

Note In this post, we'll deploy a Flask app to Heroku. Any commit to Github, the CircleCI will be triggered and test will be performed. If the test finished successfully, the CircleCI will deploy our app to Heroku. Signup by authorizing Github First, sign up for CircleCI. We can login to CircleCI platform via Github by allowing access to the repo. At the click on "Authorize application", we'll have welcome screen with a list of our Github repositories: Installing and running locally We'll use the following source in GitHub:  circleci-heroku . Here are the steps to install and run the test: Clone the repo and  cd circleci-heroku . Setup virtualenv :  virtualenv venv  and then  source venv/bin/activate . Run  pip install -r requirements.txt  (preferably inside a virtualenv) to install the dependencies. To run the "hello" app locally: (venv) k@laptop:~/TEST/circleci-heroku$ python hello/hello_app.py * Running on http...

Scaling Kubernetes to 2,500 Nodes

We’ve been running  Kubernetes  for deep learning research for over two years. While our largest-scale workloads manage bare cloud VMs directly, Kubernetes provides a fast iteration cycle, reasonable scalability, and a lack of boilerplate which makes it ideal for most of our experiments. We now operate several Kubernetes clusters (some in the cloud and some on physical hardware), the largest of which we’ve pushed to over 2,500 nodes. This cluster runs in Azure on a combination of D15v2 and NC24 VMs. On the path to this scale, many system components caused breakages, including etcd, the Kube masters, Docker image pulls, network, KubeDNS, and even our machines’ ARP caches. We felt it’d be helpful to share the specific issues we ran into, and how we solved them. etcd After passing 500 nodes in our cluster, our researchers started reporting regular timeouts from the  kubectl  command line tool. We tried adding more Kube masters (VMs running  kube-apiserv...