Learning Docker โ What I Wish Someone Had Told Me Earlier
Why most Docker tutorials fail beginners, the critical difference between images and containers, and what actually happens when you run a container.

I tried learning Docker three times before it clicked. The first time, I followed a tutorial that compared containers to shipping containers on a boat. Cute analogy but it didn't help me understand why my Dockerfile wasn't working. The second time, someone told me "it's like a lightweight virtual machine" which is technically wrong and set me back months because I kept thinking about Docker in VM terms. The third time, I sat down and actually read what happens at the Linux kernel level when you run docker run, and suddenly the weird behaviors I'd been fighting all made sense.
I'm going to try to explain Docker the way I wish it had been explained to me. Less analogies, more what's actually happening.
What a Container Actually Is
Forget the shipping container metaphor. Forget the VM comparison. Here's what's happening when you run a Docker container.
Your operating system has a kernel. The kernel manages processes, memory, filesystems, network interfaces. When you run a normal program โ say, node server.js โ the kernel starts a process that can see all the files on your disk, all the network interfaces, all the other running processes. It has full visibility of the host system.
A container is just a regular process with restrictions applied. The kernel uses two features โ namespaces and cgroups โ to limit what the process can see and how much resources it can use. A containerized process gets its own view of the filesystem (it can only see the files in the container image), its own process tree (PID 1 inside the container is not PID 1 on the host), its own network stack. But there's no separate kernel. No separate OS. It's the same Linux kernel running the container process and all your other processes. The container just can't see the rest of the system.
This is why containers start in milliseconds while VMs take minutes. A VM boots an entire operating system. A container just starts a process with some kernel-level restrictions. There's way less going on.
Understanding this changed how I thought about everything else in Docker. Volumes make sense โ they're punching a hole through the filesystem restriction so the container process can access a specific directory on the host. Port mapping makes sense โ you're routing network traffic from a host port through to the container's isolated network namespace. Even the security model makes sense โ the container process runs as root inside its namespace by default, but the kernel restrictions contain the blast radius.
Images vs Containers โ They're Not the Same Thing
This confused me for longer than I'd like to admit. An image and a container are different things and the terminology matters.
An image is a read-only filesystem snapshot. It contains your OS libraries, your application code, your dependencies โ everything your program needs to run. Think of it like a zip file of a perfectly configured computer. When you build an image with a Dockerfile, you're creating this snapshot layer by layer.
A container is what you get when you run an image. It's a live process based on that snapshot. The container can read files from the image but if it writes or modifies anything, those changes go into a temporary layer that sits on top of the image. When the container stops and gets removed, that temporary layer disappears. The original image is unchanged.
This means two things that surprised me when I was learning:
First, you can run multiple containers from the same image simultaneously. Ten copies of the same web server, each with their own temporary write layer, all sharing the same underlying read-only image. This is efficient โ the image data isn't duplicated in memory.
Second, and this is the one that catches people โ containers lose their changes when they're removed. If you boot a Postgres container, create a database, insert 50,000 rows, and then run docker rm on the container, all of that data is gone. It lived in the temporary write layer, which got deleted.
The first time this happened to me, I thought Docker was broken. Nope. That's the design. Containers are disposable. If you need data to persist, you have to tell Docker to store it outside the container.
Volumes โ Why Data Disappears and How to Stop It
Here's the command that would have saved me an hour of frustration early on:
docker volume create my_database_data
docker run -d -v my_database_data:/var/lib/postgresql/data postgres:16
The -v flag maps a Docker volume to a path inside the container. Postgres stores its data files at /var/lib/postgresql/data. By mounting a volume there, you're telling Docker "don't use the temporary container layer for this directory โ use persistent storage instead."
Now when you stop and remove the container, the volume remains. Start a new Postgres container with the same volume mounted, and all your data is right where you left it.
You can also mount a directory from your host machine instead of using a Docker volume:
docker run -d -v /home/anurag/pgdata:/var/lib/postgresql/data postgres:16
Same idea but now the data lives in a regular directory on your host filesystem. I use this approach in development because I can browse the files with my normal tools. In production, Docker volumes are better because they're managed by Docker's storage driver and perform better on some platforms.
The thing that wasn't obvious to me is that you need to think about volumes before you start the container, not after. There's no easy way to retroactively add a volume to a running container. You have to stop it, remove it, and start a new one with the volume mount. Which is fine once you know, but it's a gotcha for beginners.
Dockerfiles โ Building Your Own Images
A Dockerfile is a recipe for building an image. Each line creates a new layer in the image filesystem.
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]
Here's what each line does:
FROM node:20-alpine โ Start with an existing image that has Node.js 20 installed on Alpine Linux. Alpine is a tiny Linux distribution, about 5MB. Using it instead of the default Debian-based image means your final image will be much smaller.
WORKDIR /app โ Set the working directory inside the image. All subsequent commands run from here.
COPY package*.json ./ โ Copy just the package files first. This matters for caching, which I'll explain in a second.
RUN npm ci โ Install dependencies. This creates a new layer with all of node_modules.
COPY . . โ Now copy the rest of your source code.
The order here is deliberate. Docker caches each layer. If a layer hasn't changed since the last build, Docker reuses the cached version instead of rebuilding it. By copying package.json before the source code, we ensure that the expensive npm ci step only re-runs when the dependencies change. If you just changed a line in server.js, Docker uses the cached node_modules layer and only rebuilds the COPY . . layer. This can cut build times from minutes to seconds.
I didn't understand this when I started and my builds were painfully slow because I had COPY . . at the top, which invalidated the cache every time any file changed.
CMD ["node", "server.js"] โ The command that runs when a container starts from this image. Note that this doesn't run during the build. It's metadata that says "when someone runs this image, start this command."
Security: Root by Default Is a Problem
Here's something that bothered me once I understood it. By default, the process inside a Docker container runs as root. Not root on your host machine โ root inside the container's namespace. But still, if there's a vulnerability in your application that lets an attacker execute commands, they're executing as root inside the container. Container isolation limits what they can do, but root is still root, and there have been container escape vulnerabilities in the past.
The fix is simple but I don't see it in enough beginner tutorials:
# Create a non-privileged user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
# Ownership of the app files
RUN chown -R appuser:appgroup /app
# Switch to that user
USER appuser
CMD ["node", "server.js"]
Now the application process runs as appuser instead of root. If someone exploits a vulnerability in your Node.js app, they get shell access as an unprivileged user, which limits the damage.
I've been adding this to every Dockerfile I write, even for development. It's habit now. There's almost never a reason for your application to run as root.
Image Tags and the Problem with latest
When you pull an image from Docker Hub, you specify a tag: docker pull node:20-alpine. The tag 20-alpine points to a specific version of Node.js on Alpine Linux. Predictable. If you pull this on two different machines, you get the same image.
The latest tag is different. It's a moving target. It points to whatever the most recent version is at the time you pull. Today it might be Node 20. Next month it might be Node 22. If you build your production image with FROM node:latest, the image you build today and the image your colleague builds next week could be running different versions of Node.js. Same Dockerfile, different results.
In development, latest is fine. You probably want the newest version anyway. In production, pin your versions:
# Don't do this in production Dockerfiles
FROM node:latest
# Do this instead
FROM node:20.11.1-alpine3.19
I go one step further and pin the image digest for production:
FROM node:20.11.1-alpine3.19@sha256:abcdef123456...
The digest is a hash of the actual image content. Even if someone re-tags 20.11.1-alpine3.19 to point to a different image (which is possible, though unlikely with official images), the digest won't match and the build will fail instead of silently pulling a different image. This is probably overkill for most teams but it's nice for paranoia.
Networking Between Containers
This tripped me up early on. You run a Postgres container and a Node.js container. Your Node.js app tries to connect to localhost:5432 and gets connection refused. Why?
Because each container has its own network namespace. localhost inside the Node.js container refers to the Node.js container itself, not the Postgres container. They're separate network environments.
If both containers are on the same Docker network (which Docker Compose sets up automatically), they can reach each other using the container name or service name as the hostname:
// Inside the Node.js container, connect to the Postgres container
const client = new Client({
host: 'db', // 'db' is the service name from docker-compose.yml
port: 5432,
user: 'postgres',
password: 'secret',
});
Docker's built-in DNS resolves db to the IP address of the Postgres container. I think this is one of those things that's easy to explain in hindsight but genuinely confusing when you're starting out. The networking model is simple once you understand it, but most tutorials gloss over it.
When Not to Use Docker
I feel like I should mention this because not everything needs to be containerized. If you're a solo developer working on a personal project, and you've already got Node.js installed, running npm run dev directly on your machine is perfectly fine. Docker adds overhead โ build times, disk space for images, another layer of abstraction to debug through.
Where Docker pays off is when other people need to run your code. A new teammate joins and instead of spending half a day installing PostgreSQL, Redis, and the right version of Node, they run docker compose up and everything starts. Or when you need your CI/CD pipeline to run tests in the same environment as production. Or when you're deploying to a cloud provider that expects container images.
For solo development on a project only you will ever touch? Honestly, probably not worth the setup friction. I know that's not the popular opinion but it's been my experience. Docker is a collaboration and deployment tool. If you're not collaborating or deploying, the benefit is marginal.
That said, I still use Docker for local databases even on solo projects, because upgrading Postgres through homebrew has burned me more than once. A docker run postgres:16 that I can throw away and recreate in seconds is worth the small overhead.
The bottom line for me: Docker is a deployment tool first and a development tool second. Once you accept that ordering, the decisions about when to use it get simpler.
Written by
Anurag Sinha
Developer who writes about the stuff I actually use day-to-day. If I got something wrong, let me know.
Found this useful?
Share it with someone who might find it helpful too.
Comments
Loading comments...
Related Articles
How Docker Compose Actually Went for My Development Setup
Addressing misconceptions about containerized local setups, configuration pain points, and why volume mounting behaves the way it does.
Do You Actually Need Kubernetes?
Evaluating whether the overhead of Kubernetes is worth it for your team, and what the migration actually looks like.
My Terminal Setup, Explained
The Zsh, tmux, and Git configs I use daily. Annotated so you can understand what each block does and take what's useful.