All project files were given by the school (except Docker). Here is a small description:
- it is a web application that has a poll part - where you can vote - and result part - where you can see the results of the vote
- poll - written in Python Flask, it pushes the results of the vote to the Redis queue
- redis - holds the results until worker consumes and process them
- worker - Java application that consumes votes and saves them to PostgreSQL database
- database - persistently stores the results
- result - Node.js application that fetches the results from database and displays them. Uses Socket.io
So the general schema for infrastructure is the following:
[Poll] [Result] <- (frontend)
| |
[Redis] - [Worker] - [Database] <- (backend)
So there are 5 microservices that needs to be connected and "communicate" with each other. The final project tree will be the following:
.
├── compose.yml
├── poll
│ ├── Dockerfile
│ └── (poll python files)
├── result
│ ├── Dockerfile
│ └── (result JS files)
├── schema.sql
└── worker
├── Dockerfile
└── (worker Java files)
So there will be 3 different containers: poll, result and worker, and a compose file that will connect them all together
There is always a problem, that the same code will be working perfectly on one machine, and giving errors on the other. It happens due to different OS, versions of libraries, etc. This is where Docker comes into the game: it creates an isolated environment called container, where your code will be running, and since it is coming pre-configured, it will be working everywhere the same.
- Dockerfile
- Text file containing instructions to build a Docker image
- Series of commands that Docker executes to create an image
- Each instruction creates an immutable layer in the image
- Base component for creating reproducible builds
- Docker Image
- A read-only template containing application code, libraries, dependencies, tools, and other files
- Like a "snapshot" or blueprint for creating containers
- Can be stored in registries (like Docker Hub)
- Built using a Dockerfile
- Layered architecture (each instruction creates a new layer)
- Docker Container
- A runnable instance of a Docker image
- Isolated environment with its own filesystem, network interface, and process space
- Can be started, stopped, moved, and deleted
- Like a lightweight, isolated virtual machine
- Docker Volume
- Mechanism for persisting data generated by and used by containers
- Exists outside the container lifecycle
- Three types:
- Named volumes (managed by Docker)
- Bind mounts (direct link to host filesystem)
- tmpfs mounts (stored in host memory)
- Necessary because data is not persistent in Docker container. Once it is stopped - all data generated will disappear
- Docker Network
- Enables communication between containers
- Isolates container communications
- Types:
- Bridge (default)
- Host
- None
- Custom networks
- Docker Registry
- Storage and distribution system for Docker images
- Can be public (like Docker Hub) or private
- Repository for sharing and versioning images
- Docker Compose
- Tool for defining and running multi-container applications
- Uses YAML file to configure application services
- Manages the complete application lifecycle, sets up networks and control the order of container creations (if there are specific dependencies)
Relationships:
Dockerfile -> Docker Image -> Docker Container
^
Docker Registry
Container <-> Volume (for persistence)
Container <-> Network (for communication)
Install Docker Desktop, that includes everything necessary + UI.
To check if you have Docker and Compose:
docker --version
docker compose version
FROM # Base image to build upon
WORKDIR # Sets working directory for instructions
COPY # Copies files from host to container
ADD # Copies files (with extra features like URL support and tar extraction)
RUN # Executes commands during image build
ENV # Sets environment variables
EXPOSE # Documents which ports are intended to be published
CMD # Default command to run when container starts
ENTRYPOINT # Main command to run (CMD becomes arguments to this)
VOLUME # Creates a mount point for external volumes
# Images
docker build -t name:tag . # Build image from Dockerfile
docker pull image:tag # Pull image from registry
docker push image:tag # Push to registry
docker images # List local images
docker rmi image # Remove image
# Containers
docker run image # Create and start container
docker start/stop name # Start/stop existing container
docker ps # List running containers
docker ps -a # List all containers
docker rm container # Remove container
docker logs container # View container logs
docker exec -it container bash # Enter running container
# System
docker system prune # Clean up unused resources
docker volume ls # List volumes
docker network ls # List networks
version: # Compose file version
services: # Define application services
webapp: # Service name
build: # Build from Dockerfile
image: # Use existing image
ports: # Port mapping (host:container)
volumes: # Mount volumes
environment: # Environment variables
networks: # Connect to networks
depends_on: # Service dependencies
restart: # Restart policy
networks: # Define custom networks
volumes: # Define named volumes
-d # Run in background (detached)
-p # Port mapping
-v # Volume mounting
--name # Assign container name
--network # Connect to network
-e # Set environment variables
--rm # Remove container when it exits
-it # Interactive terminal
# Named volumes
volumes:
mydata:
# Bind mounts
volumes:
- ./host/path:/container/path
# tmpfs mounts (memory only)
tmpfs:
- /temp
restart:
no # Never restart
always # Always restart
on-failure # Restart only on failure
unless-stopped # Always restart unless manually stopped
There are the following requirements:
- create 3 images, respecting the specifications described below.
- no ENTRYPOINT.
- no latest versions
- Poll
- the image is based on an official Python image ;
- the app exposes and runs on port 80 ;
- Result
- the image is based on an official Node.js Alpine image ;
- the app exposes and runs on port 80 ;
- The node_modules folder must be excluded from the build context.
- Worker
- The image is built using a multi-stage build:
- First stage - compilation:
– is based on maven:3.9.6-eclipse-temurin-21-alpine and is named builder.
– is used to build and package the Worker application using:
mvn dependency:resolve
from within the folder containing pom.xml;- then
mvn package
from within the folder containing the src folder. – generates a file in the target folder named worker-jar-with-dependencies.jar
- Second stage - run:
– is based on eclipse-temurin:21-jre-alpine ;
– is the one really running the worker using
java -jar worker-jar-with-dependencies.jar
.
- Docker images must be as simple and lightweight as possible.
- Name of the Compose file is
compose.yml
- Compose file should contain:
- 5 services: – poll (builds poll image, redirects port 5000 of the host to the port 80 of the container) – redis (uses an existing official image of Redis, opens port 6379) – worker (builds worker image) – db (uses an existing official image of PostgreSQL, has its database schema created during container first start) – result (builds result image, redirects port 5001 of the host to the port 80 of the container)
- 3 networks: poll-tier, result-tier and back-tier.
- 1 volume: db-data.
# poll/Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
ENV PORT=80
EXPOSE 80
CMD ["python", "app.py"]
- FROM - takes official Python image with version 3.11. Slim for minimal image size
- WORKDIR - creates a directory INSIDE container (each container has its own filesystem) and sets it as working directory. It is like running command
mkdir /app && cd /app
- COPY - copies files from local directory (first
.
) into the container's current working directory (second.
). First path to local directory is just.
, because Dockerfile is already in thepoll
directory will all the source code and configs being insidepoll
directory, too. Command COPY includes all files except those in.dockerignore
- RUN executes specified command during image build. In this case command is
pip install -r requirements.txt
which is the command to install libraries included in requirements.txt - ENV sets environment variable inside the container. In this case variable PORT is created with assigned value 80
- EXPOSE for documentation purposes and is not affecting the container in the global sense. In this case it is exposing port 80 just to put a label that "this container will use port 80". This command itself does not make this port accesible, you need to explicitly publish it in order to use, for example, with command
docker run -p 5000:80 your-image
- CMD is to precise default command to run when container starts. This command can be overridden when starting a container, unlike ENTRYPOINT that sets a fixed command that cannot be easily overridden
# result/Dockerfile
FROM node:21-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
ENV PORT=80
EXPOSE 80
CMD ["npm", "start"]
Here, everything is the same as in Poll Dockerfile, with a small key difference: there are 2 COPY.
- First COPY copies just package.json files and after that install necessary dependencies
- Second COPY copies the rest (including
package.json
twice, but it will just rewrite package.json without installing dependencies one more time) - Why it is done? Docker uses layer caching. If layer created with
COPY package*.json
is not changed, and only code (second COPY) is changed - then Docker will copy just the rest of the files and will skip the step with installation of dependencies, saving time. Docker will reinstall dependencies only ifpackage.json
is changed. - both
./
and.
mean "current directory", but:.
is "everything in current directory"./
is explicitly "current directory"
# worker/Dockerfile
FROM maven:3.9.6-eclipse-temurin-21-alpine AS builder
WORKDIR /app
COPY . .
RUN mvn dependency:resolve
RUN mvn package
FROM eclipse-temurin:21-jre-alpine
WORKDIR /app
COPY --from=builder /app/target/worker-jar-with-dependencies.jar .
CMD ["java", "-jar", "worker-jar-with-dependencies.jar"]
- this Dockerfile is interesting because it is multi-stage, which is used to create a smaller and more efficient final image.
- First stage:
- AS builder names this stage for later reference
- RUN 2 commands:
mvn dependency:resolve
to download dependencies,mvn package
to create jar file
- Second stage:
- COPY --from=builder takes this file from the stage named
builder
- COPY --from=builder takes this file from the stage named
Stage 1 (builder) Stage 2 (final)
+------------------+ +----------------+
| Maven image | | JRE image |
| Source code | JAR | |
| Builds JAR -----→-------→ Only JAR file |
| ~400MB | | ~100MB |
+------------------+ +----------------+
- It tells Docker which files/directories to EXCLUDE during the build process
- Makes builds faster by copying fewer files
- Reduces the final image size
In result folder, I have create .dockerignore
and added to the file node_modules
.
Thanks to this:
- Docker skips copying the
node_modules
directory - Dependencies are cleanly installed inside the container (because they could have been installed using specific OS, which can be unsupposted by other systems)
- Build process is faster and cleaner
Now in the root of the project, create compose.yml
file:
version: '3.8'
services:
redis:
image: redis:alpine
ports:
- "6379:6379"
networks:
- poll-tier
- back-tier
restart: unless-stopped
db:
image: postgres:15-alpine
volumes:
- db-data:/var/lib/postgresql/data
- ./schema.sql:/docker-entrypoint-initdb.d/schema.sql
environment:
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=${POSTGRES_DB}
networks:
- back-tier
- result-tier
restart: unless-stopped
worker:
build: ./worker
environment:
- REDIS_HOST=redis
- POSTGRES_HOST=db
- POSTGRES_PORT=${POSTGRES_PORT}
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=${POSTGRES_DB}
networks:
- back-tier
depends_on:
- redis
- db
restart: unless-stopped
poll:
build: ./poll
ports:
- "5000:80"
environment:
- REDIS_HOST=redis
- OPTION_A=${OPTION_A}
- OPTION_B=${OPTION_B}
- OPTION_C=${OPTION_C}
- OPTION_D=${OPTION_D}
networks:
- poll-tier
depends_on:
- redis
restart: unless-stopped
result:
build: ./result
ports:
- "5001:80"
environment:
- POSTGRES_HOST=db
- POSTGRES_PORT=${POSTGRES_PORT}
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=${POSTGRES_DB}
networks:
- result-tier
depends_on:
- db
restart: unless-stopped
networks:
poll-tier:
result-tier:
back-tier:
volumes:
db-data:
version
affects available features and syntax and specifies Docker Compose file format version. 3.8 is very stable, even though not the latest oneservices
defines application containers. The order of service declarations in thecompose.yml
file doesn't determine the startup order. The actual startup order is determined by thedepends_on
configuration.:
redis:
image: redis:alpine # Uses pre-built Redis image
ports:
- "6379:6379" # Port mapping (host:container)
networks: # Connected networks
- poll-tier
- back-tier
restart: unless-stopped
- regarding ports:
5000:80
means that outside world uses localhost:5000, inside container uses port 80. When you openlocalhost:5000
in browser, Docker forwards that traffic to port 80 in the container
db:
image: postgres:15-alpine
volumes: # Data persistence
- db-data:/var/lib/postgresql/data # Named volume
- ./schema.sql:/docker-entrypoint-initdb.d/schema.sql # Init script
environment: # Environment variables
- POSTGRES_USER=${POSTGRES_USER}
- environment creates environment values in the container. ${POSTGRES_USER} is a value saved in
.env
file. - for volumes:
db-data:/var/lib/postgresql/data
creates a named volume in location/var/lib/postgresql/data
./schema.sql:/docker-entrypoint-initdb.d/schema.sql
-./schema.sql
is source file on host machine,/docker-entrypoint-initdb.d/schema.sql
is a destination path in container./docker-entrypoint-initdb.d/
is a special directory in PostdreSQL:- PostgreSQL automatically executes any .sql files in this directory
- Only runs when the database is first initialized (first time startup)
- Used to set up initial database schema, tables, etc.
- so you don't need to additionaly create a user or configure the database, the container will:
- Use the credentials from your .env file
- Automatically create the user
- Run the schema.sql when the container first starts
worker:
build: ./worker # Build from Dockerfile
environment:
- REDIS_HOST=redis # Service discovery
- POSTGRES_HOST=db # Reference other services
depends_on: # Startup order
- redis
- db
build
includes path to the corresponding Dockerbuild file- environment values in this case (
redis
anddb
) are the name of the services. Services on the same network can talk to each other using service names as hostnames. Example: Worker can reach Redis using just "redis" as hostname. Docker's internal DNS automatically resolves these service names to the correct container IP addresses. SoREDIS_HOST=redis
tells the app to look for a host named 'redis' depends_on
declares an order. Worker must be created afterredis
anddb
networks:
poll-tier: # For poll and redis
result-tier: # For result and db
back-tier: # For worker, redis, and db
is a declaration for the networks
volumes:
db-data: # Named volume for database persistence
is a declaration of the volumes
In root, there is also .env
file with all values for environment values.
# Database settings
POSTGRES_USER=____
POSTGRES_PASSWORD=____
POSTGRES_DB=____
POSTGRES_PORT=____
# Vote options
OPTION_A=____
OPTION_B=____
OPTION_C=____
OPTION_D=____
And .gitignore
includes files that are not needed to be pushed to the git:
.env
result/node_modules
worker/target
Now to start the project, open the Docker Desktop to start the Docker, and then run command docker compose up --build
.
You don't need to use --build
everytime to launch the project, only the for the first time.
To stop container: docker compose down
.
The application should be accessible at:
- Poll interface: http://localhost:5000
- Results interface: http://localhost:5001