This tutorial is similar to the docker introduction found on this website. It has been expanded to include multiple files and scripts/csv files for data analysis.
What is Docker?
Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.
Why use Docker?
Your code works on your machine, but doesn’t work on another person’s machine. A reason for this is that the other person’s machine doesn’t have libraries or dependencies that are as up-to-date as yours. By using containers, your application will run on any other Linux machine regardless of any customized settings that machine might have that could differ from the machine used for writing and testing the code.
Assumptions
Let’s assume we have a local folder called Analysis
. In it are two scripts: Analysis_1.py
and Analysis_2.py
.
We want to create a container that executes both these scripts and outputs the results.
We also have our csv files found in folder data
. data
is on the same folder level as Analysis
.
1. Install Docker
Instructions here help get Docker installed
2. Add requirements.txt and entrypoint.sh
Below are libraries to include for our container. These are saved in a file requirements.txt
Because we want to run multiple scripts in our file, we’ll need a bash file for our container to execute. This is saved in a file entrypoint.sh
.
3. Configure Dockerfile
Create an empty directory on your local machine. Change directories (cd) into the new directory, create a file called Dockerfile, copy-and-paste the following content into that file, and save it.
4. Build Dockerfile
At the top level of your directory, you should have the following when you do ls
.
Run this command to create a container called dataanalysis
To see if the image is created, run $ docker image ls
.
You can now run $ docker run -p dataanalysis
, and share the output.