Python, Anaconda and Tensorflow

Posted on Fri 11 November 2016 in development

So I have used python in the past, but generally my experiences with python have been limited to either small sinatra scripts and other cli tools. With my new position though, I have needed to get more familiar with the python ecosystem.

One of the tools that at least a number of other engineers on my team are using is anaconda.

Overview

I have used python dependencies before, however up until this point that has been limited to both virtualenv and pip. I really enjoyed the system and the ability to be able to quickly and easily install the dependencies that I needed using pip install <dependency>.

Once I got to the position of needing many dependencies and running them on different machines I got use the requirements files pip install -r requirements.txt.

Both of these systems would work well, however I noticed myself sometimes running into systems that were failing because of OS dependencies that were missing. This was never more present then when I started to install some of the DataScience tools.

Anaconda

It was a this point that I stumbled across Anaconda. I was quite surprised by this library and the ease at which it was to create environments, so long as I was in bash or zsh. The installation itself was painless and straightforward. However what I really wanted was the ability to define the different dependencies that I needed and to have those be quickly enabled within anaconda. Luckily this was easily accomplished using the environment file format.

name: myenv
dependencies:
  - numpy
  - jupyter

So once I had this file created I was able to create and use this file to both create the virtual environment and install the dependencies in the environment, using this command.

    conda env create -f environment.yml

Now there was one issue that I ran into, and that was the desire to install a dependency that did not have an existing conda package. I found that I could still install dependencies to the system using pip.

    name: myenv
    dependencies:
      - numpy
      - jupyter
      - pip
      - pip:
        - https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.11.0-cp35-cp35m-linux_x86_64.whl

Dockerize It

So for me, that last part was being able to get the above system working correctly with a docker container. Luckily this was a very straight forward process, even allowed me to easily install python3.5 on ubuntu 14.04.

    FROM ubuntu:16.04

    RUN apt-get update && \
        apt-get install -y curl build-essential && \
        apt-get clean && \
        rm -rf /tmp/ /var/tmp/ /var/lib/apt/lists/*

This is just the basic system installing the curl command (needed to pull down the anaconda installer) and the build tools (aka gcc).

    RUN curl -sSL -o installer.sh https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh && \
        bash /installer.sh -b -f && \
        rm /installer.sh

In the above snippet I am actually pulling down the file and installing it. The reason I chose to use curl instead of the docker ADD command is that I wanted to keep the layers as thin as possible.

The other thing to note is the -b and -f options that are supplied with the installer. These allow for us to run the installer without needing tty access by using the defaults for the installer (including the prefix which puts anaconda at /root/anaconda3).

    ENV PATH "$PATH:/root/anaconda3/bin"
    ADD startup /startup

    EXPOSE 8888 6006
    VOLUME /notebooks
    WORKDIR "/notebooks"

    CMD ["jupyter", "notebook", "--port=8888", "--no-browser", "--ip=0.0.0.0"]

Here we are setting up the anaconda information for the system, including some exposed ports with 8888 being the main port for the notebook. We also set our default commands to startup the jupyter notebook allowing access from non-localhost machines (which of course is needed since it is running in a container).

    ADD environment.yml /environment.yml
    RUN conda env create -f /environment.yml
    ENV CONDA_ENV tensorflow

These final 3 values are useful as they allow use to create onbuild versions of the same dockerfile that will allow others to specify the environment dependencies.

Conclusion

I really enjoyed using anaconda and the ease that it gave me in setting up some custom python environments. I have already moved most of the existing machines that I have over to anaconda and am sure that I will find many more reasons to enjoy it.

If you are interested in checking out either the docker image or the repo with the Dockerfile, links can be found below.