Docker originally used (LXC), but later switched to (formerly known as libcontainer), which runs in the same operating system as its host. This allows it to share a lot of the host operating system resources.
Docker for Mac/Windows is a great product - it has allowed me to roll out Docker to developers who wouldn't otherwise deal with Vagrant or other solutions atombender 10 months ago I haven't run into that issue, but another issue that will likely trip up a lot of devs is the extremely slow shared volume support.
Also, it uses a layered filesystem and manages networking. AuFS is a layered file system, so you can have a read only part and a write part which are merged together. One could have the common parts of the operating system as read only (and shared amongst all of your containers) and then give each container its own mount for writing.
So, let's say you have a 1 GB container image; if you wanted to use a full VM, you would need to have 1 GB times x number of VMs you want. With Docker and AuFS you can share the bulk of the 1 GB between all the containers and if you have 1000 containers you still might only have a little over 1 GB of space for the containers OS (assuming they are all running the same OS image). A full virtualized system gets its own set of resources allocated to it, and does minimal sharing.
You get more isolation, but it is much heavier (requires more resources). With Docker you get less isolation, but the containers are lightweight (require fewer resources). So you could easily run thousands of containers on a host, and it won't even blink. Try doing that with Xen, and unless you have a really big host, I don't think it is possible. A full virtualized system usually takes minutes to start, whereas Docker/LXC/runC containers take seconds, and often even less than a second. There are pros and cons for each type of virtualized system.
If you want full isolation with guaranteed resources, a full VM is the way to go. If you just want to isolate processes from each other and want to run a ton of them on a reasonably sized host, then Docker/LXC/runC seems to be the way to go. For more information, check out which do a good job of explaining how LXC works.
Why is deploying software to a docker image (if that's the right term) easier than simply deploying to a consistent production environment? Deploying a consistent production environment is easier said than done. Even if you use tools like and, there are always OS updates and other things that change between hosts and environments. Docker gives you the ability to snapshot the OS into a shared image, and makes it easy to deploy on other Docker hosts. Locally, dev, qa, prod, etc.: all the same image. Sure you can do this with other tools, but not nearly as easily or fast.
This is great for testing; let's say you have thousands of tests that need to connect to a database, and each test needs a pristine copy of the database and will make changes to the data. The classic approach to this is to reset the database after every test either with custom code or with tools like - this can be very time-consuming and means that tests must be run serially. However, with Docker you could create an image of your database and run up one instance per test, and then run all the tests in parallel since you know they will all be running against the same snapshot of the database. Since the tests are running in parallel and in Docker containers they could run all on the same box at the same time and should finish much faster. Try doing that with a full VM. From comments. I suppose I'm still confused by the notion of 'snapshotting the OS'.
How does one do that without, well, making an image of the OS? Well, let's see if I can explain. You start with a base image, and then make your changes, and commit those changes using docker, and it creates an image. This image contains only the differences from the base.
When you want to run your image, you also need the base, and it layers your image on top of the base using a layered file system: as mentioned above, Docker uses AUFS. AUFS merges the different layers together and you get what you want; you just need to run it. You can keep adding more and more images (layers) and it will continue to only save the diffs. Since Docker typically builds on top of ready-made images from a, you rarely have to 'snapshot' the whole OS yourself.
It might be helpful to understand how virtualization and containers work at low level. That will clear up lot of things. Note: I'm simplifying a bit in describing below. See references for more information. How virtualization works at low level?
In this case VM manager takes over the CPU ring 0 (or the 'root mode' in newer CPUs) and intercepts all privileged calls made by guest OS to create illusion that guest OS has its own hardware. Fun fact: Before 1998 it was thought to be impossible to achieve this in x86 architecture because there was no way to do this kind of interception. The folks at VMWare who had an idea to rewrite the executable bytes in memory for privileged calls of guest OS to achieve this. The net effect is that virtualization allows you to run two completely different OS on same hardware. Each guest OS goes through all the process of bootstrapping, loading kernel etc. You can have very tight security, for example, guest OS can't get full access to host OS or other guests and mess things up. How containers works at low level?
Around, people including some of the employees at Google implemented new kernel level feature called namespaces (however the idea before ). One function of the OS is to allow sharing of global resources like network and disk to processes. What if these global resources were wrapped in namespaces so that they are visible only to those processes that run in the same namespace? Say, you can get a chunk of disk and put that in namespace X and then processes running in namespace Y can't see or access it. Similarly, processes in namespace X can't access anything in memory that is allocated to namespace Y.
Of course, processes in X can't see or talk to processes in namespace Y. This provides kind of virtualization and isolation for global resources. This is how docker works: Each container runs in its own namespace but uses exactly the same kernel as all other containers. The isolation happens because kernel knows the namespace that was assigned to the process and during API calls it makes sure that process can only access resources in its own namespace.
The limitations of containers vs VM should be obvious now: You can't run completely different OS in containers like in VMs. However you can run different distros of Linux because they do share the same kernel. The isolation level is not as strong as in VM.
In fact, there was a way for 'guest' container to take over host in early implementations. Also you can see that when you load new container, the entire new copy of OS doesn't start like it does in VM. All containers. This is why containers are light weight. Also unlike VM, you don't have to pre-allocate significant chunk of memory to containers because we are not running new copy of OS.
This enables to run thousands of containers on one OS while sandboxing them which might not be possible to do if we were running separate copy of OS in its own VM. Wow, thanks for the great low-level explanation (and historical facts). I was looking for that and is not found above. What do you mean by 'you can run different distros of Linux because they do share the same kernel.' Are you saying that a guest container must have the exact same Linux kernel version or that it doesn't matter?
If it doesn't matter what if I invoke an OS command on the guest but is only supported in the guest kernel. Or for example a bug fixed in the guest kernel but not in the host kernel.
All guests would manifest the bug, correct? Even though the guests were patched.
– Jun 9 '16 at 21:23. I like Ken Cochrane's answer. But I want to add additional point of view, not covered in detail here. In my opinion Docker differs also in whole process. In contrast to VMs, Docker is not (only) about optimal resource sharing of hardware, moreover it provides a 'system' for packaging application (preferable, but not a must, as a set of microservices). To me it fits in the gap between developer-oriented tools like rpm, packages, npm + Git on one side and ops tools like, VMware, Xen, you name it. Why is deploying software to a docker image (if that's the right term) easier than simply deploying to a consistent production environment?
Your question assumes some consistent production environment. But how to keep it consistent?
Consider some amount (10) of servers and applications, stages in the pipeline. To keep this in sync you'll start to use something like Puppet, or your own provisioning scripts, unpublished rules and/or lot of documentation.
In theory servers can run indefinitely, and be kept completely consistent and up to date. Practice fails to manage a server's configuration completely, so there is considerable scope for configuration drift, and unexpected changes to running servers. So there is a known pattern to avoid this, the so called. But the immutable server pattern was not loved. Mostly because of the limitations of VMs that were used before Docker. Dealing with several gigabytes big images, moving those big images around, just to change some fields in the application, was very very laborious.
With a Docker ecosystem, you will never need to move around gigabytes on 'small changes' (thanks aufs and Registry) and you don't need to worry about losing performance by packaging applications into a Docker container at runtime. You don't need to worry about versions of that image. And finally you will even often be able to reproduce complex production environments even on your Linux laptop (don't call me if doesn't work in your case;)) And of course you can start Docker containers in VMs (it's a good idea). Reduce your server provisioning on the VM level. All the above could be managed by Docker. Meanwhile Docker uses its own implementation 'libcontainer' instead of LXC. But LXC is still usable.
Docker isn't a virtualization methodology. It relies on other tools that actually implement container-based virtualization or operating system level virtualization. For that, Docker was initially using LXC driver, then moved to libcontainer which is now renamed as runc. Docker primarily focuses on automating the deployment of applications inside application containers.
Application containers are designed to package and run a single service, whereas system containers are designed to run multiple processes, like virtual machines. So, Docker is considered as a container management or application deployment tool on containerized systems. In order to know how it is different from other virtualizations, let's go through virtualization and its types. Then, it would be easier to understand what's the difference there.
Virtualization In its conceived form, it was considered a method of logically dividing mainframes to allow multiple applications to run simultaneously. However, the scenario drastically changed when companies and open source communities were able to provide a method of handling the privileged instructions in one way or another and allow for multiple operating systems to be run simultaneously on a single x86 based system. Hypervisor The hypervisor handles creating the virtual environment on which the guest virtual machines operate. It supervises the guest systems and makes sure that resources are allocated to the guests as necessary. The hypervisor sits in between the physical machine and virtual machines and provides virtualization services to the virtual machines. To realize it, it intercepts the guest operating system operations on the virtual machines and emulates the operation on the host machine's operating system. The rapid development of virtualization technologies, primarily in cloud, has driven the use of virtualization further by allowing multiple virtual servers to be created on a single physical server with the help of hypervisors, such as Xen, VMware Player, KVM, etc., and incorporation of hardware support in commodity processors, such as Intel VT and AMD-V.
Types of Virtualization The virtualization method can be categorized based on how it mimics hardware to a guest operating system and emulates guest operating environment. Primarily, there are three types of virtualization:.
Emulation. Paravirtualization. Container-based virtualization Emulation Emulation, also known as full virtualization runs the virtual machine OS kernel entirely in software. The hypervisor used in this type is known as Type 2 hypervisor.
It is installed on the top of host operating system which is responsible for translating guest OS kernel code to software instructions. The translation is done entirely in software and requires no hardware involvement. Emulation makes it possible to run any non-modified operating system that supports the environment being emulated. The downside of this type of virtualization is additional system resource overhead that leads to decrease in performance compared to other types of virtualizations. Examples in this category include VMware Player, VirtualBox, QEMU, Bochs, Parallels, etc.
Paravirtualization Paravirtualization, also known as Type 1 hypervisor, runs directly on the hardware, or “bare-metal”, and provides virtualization services directly to the virtual machines running on it. It helps the operating system, the virtualized hardware, and the real hardware to collaborate to achieve optimal performance. These hypervisors typically have a rather small footprint and do not, themselves, require extensive resources. Examples in this category include Xen, KVM, etc. Container-based Virtualization Container-based virtualization, also know as operating system-level virtualization, enables multiple isolated executions within a single operating system kernel. It has the best possible performance and density and features dynamic resource management. The isolated virtual execution environment provided by this type of virtualization is called container and can be viewed as a traced group of processes.
The concept of a container is made possible by the namespaces feature added to Linux kernel version 2.6.24. The container adds its ID to every process and adding new access control checks to every system call. It is accessed by the clone system call that allows creating separate instances of previously-global namespaces.
Namespaces can be used in many different ways, but the most common approach is to create an isolated container that has no visibility or access to objects outside the container. Processes running inside the container appear to be running on a normal Linux system although they are sharing the underlying kernel with processes located in other namespaces, same for other kinds of objects. For instance, when using namespaces, the root user inside the container is not treated as root outside the container, adding additional security. The Linux Control Groups (cgroups) subsystem, next major component to enable container-based virtualization, is used to group processes and manage their aggregate resource consumption. It is commonly used to limit memory and CPU consumption of containers.
Since a containerized Linux system has only one kernel and the kernel has full visibility into the containers, there is only one level of resource allocation and scheduling. Several management tools are available for Linux containers, including LXC, LXD, systemd-nspawn, lmctfy, Warden, Linux-VServer, OpenVZ, Docker, etc. Containers vs Virtual Machines Unlike a virtual machine, a container does not need to boot the operating system kernel, so containers can be created in less than a second. This feature makes container-based virtualization unique and desirable than other virtualization approaches. Since container-based virtualization adds little or no overhead to the host machine, container-based virtualization has near-native performance For container-based virtualization, no additional software is required, unlike other virtualizations. All containers on a host machine share the scheduler of the host machine saving need of extra resources. Container states (Docker or LXC images) are small in size compared to virtual machine images, so container images are easy to distribute.
Resource management in containers is achieved through cgroups. Cgroups does not allow containers to consume more resources than allocated to them. However, as of now, all resources of host machine are visible in virtual machines, but can't be used. This can be realized by running top or htop on containers and host machine at the same time. The output across all environments will look similar. Update: How does Docker run containers in non-Linux systems? If containers are possible because of the features available in the Linux kernel, then the obvious question is that how do non-Linux systems run containers.
Both Docker for Mac and Windows use Linux VMs to run the containers. Docker Toolbox used to run containers in Virtual Box VMs. But, the latest Docker uses Hyper-V in Windows and Hypervisor.framework in Mac.
Now, let me describe how Docker for Mac runs containers in detail. Docker for Mac uses to emulate the hypervisor capabilities and Hyperkit uses hypervisor.framework in its core. Hypervisor.framework is Mac's native hypervisor solution. Hyperkit also uses VPNKit and DataKit to namespace network and filesystem respectively. The Linux VM that Docker runs in Mac is read-only. However, you can bash into it by running: screen /Library/Containers/com.docker.docker/Data/vms/0/tty.
Now, we can even check the Kernel version of this VM: # uname -a Linux linuxkit-01 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:8664 Linux. All containers run inside this VM. There are some limitations to hypervisor.framework. Because of that Docker doesn't expose docker0 network interface in Mac. So, you can't access containers from the host.
As of now, docker0 is only available inside the VM. Hyper-v is the native hypervisor in Windows.
They are also trying to leverage Windows 10's capabilities to run Linux systems natively. Through this post we are going to draw some lines of differences between VMs and LXCs.
Let's first define them. VM: A virtual machine emulates a physical computing environment, but requests for CPU, memory, hard disk, network and other hardware resources are managed by a virtualization layer which translates these requests to the underlying physical hardware. In this context the VM is called as the Guest while the environment it runs on is called the host.
LXCs: Linux Containers (LXC) are operating system-level capabilities that make it possible to run multiple isolated Linux containers, on one control host (the LXC host). Linux Containers serve as a lightweight alternative to VMs as they don’t require the hypervisors viz.
Virtualbox, KVM, Xen, etc. Now unless you were drugged by Alan (Zach Galifianakis- from the Hangover series) and have been in Vegas for the last year, you will be pretty aware about the tremendous spurt of interest for Linux containers technology, and if I will be specific one container project which has created a buzz around the world in last few months is – Docker leading to some echoing opinions that cloud computing environments should abandon virtual machines (VMs) and replace them with containers due to their lower overhead and potentially better performance. But the big question is, is it feasible?, will it be sensible? LXCs are scoped to an instance of Linux.
It might be different flavors of Linux (e.g. A Ubuntu container on a CentOS host but it’s still Linux.) Similarly, Windows-based containers are scoped to an instance of Windows now if we look at VMs they have a pretty broader scope and using the hypervisors you are not limited to operating systems Linux or Windows. LXCs have low overheads and have better performance as compared to VMs. Docker which are built on the shoulders of LXC technology have provided developers with a platform to run their applications and at the same time have empowered operations people with a tool that will allow them to deploy the same container on production servers or data centers. It tries to make the experience between a developer running an application, booting and testing an application and an operations person deploying that application seamless, because this is where all the friction lies in and purpose of DevOps is to break down those silos. So the best approach is the cloud infrastructure providers should advocate an appropriate use of the VMs and LXC, as they are each suited to handle specific workloads and scenarios. Abandoning VMs is not practical as of now.
So both VMs and LXCs have their own individual existence and importance. I have a hard time understanding the '(e.g.
A Ubuntu container on a Centos host but it’s still Linux)' part of the containers. The way I understand it, is that containers share the host kernel. If I have a host VM running Linux kernel 4.6 for example, having several guest VM's running Linux kernels 2.4, 2.6, 3.2, 4.1 and 4.4. If I execute commands specific to that kernel, I will get the guest kernel's behavior (and not the host). But if my guest VM's are containers now, would the executed command be determined by the host kernel? – Jun 9 '16 at 21:06. Most of the answers here talk about virtual machines.
I'm going to give you a one-liner response to this question that has helped me the most over the last couple years of using Docker. It's this: Docker is just a fancy way to run a process, not a virtual machine. Now, let me explain a bit more about what that means. Virtual machines are their own beast.
I feel like explaining what Docker is will help you understand this more than explaining what a virtual machine is. Especially because there are many fine answers here telling you exactly what someone means when they say 'virtual machine'.
A Docker container is just a process (and its children) that is compartmentalized using inside the host system's kernel from the rest of the processes. You can actually see your Docker container processes by running ps aux on the host. For example, starting apache2 'in a container' is just starting apache2 as a special process on the host.
It's just been compartmentalized from other processes on the machine. It is important to note that your containers do not exist outside of your containerized process' lifetime.
When your process dies, your container dies. That's because Docker replaces pid 1 inside your container with your application ( pid 1 is normally the init system). This last point about pid 1 is very important. As far as the filesystem used by each of those container processes, Docker uses -backed images, which is what you're downloading when you do a docker pull ubuntu. Each 'image' is just a series of layers and related metadata. The concept of layering is very important here.
Each layer is just a change from the layer underneath it. For example, when you delete a file in your Dockerfile while building a Docker container, you're actually just creating a layer on top of the last layer which says 'this file has been deleted'. Incidentally, this is why you can delete a big file from your filesystem, but the image still takes up the same amount of disk space. The file is still there, in the layers underneath the current one.
Bug 262398 Add Support For Docker For Mac
Layers themselves are just tarballs of files. You can test this out with docker save -output /tmp/ubuntu.tar ubuntu and then cd /tmp && tar xvf ubuntu.tar. Then you can take a look around. All those directories that look like long hashes are actually the individual layers. Each one contains files ( layer.tar) and metadata ( json) with information about that particular layer.
Those layers just describe changes to the filesystem which are saved as a layer 'on top of' its original state. When reading the 'current' data, the filesystem reads data as though it were looking only at the top-most layers of changes. That's why the file appears to be deleted, even though it still exists in 'previous' layers, because the filesystem is only looking at the top-most layers.
This allows completely different containers to share their filesystem layers, even though some significant changes may have happened to the filesystem on the top-most layers in each container. This can save you a ton of disk space, when your containers share their base image layers. However, when you mount directories and files from the host system into your container by way of volumes, those volumes 'bypass' the UnionFS, so changes are not stored in layers. Networking in Docker is achieved by using an ethernet bridge (called docker0 on the host), and virtual interfaces for every container on the host. It creates a virtual subnet in docker0 for your containers to communicate 'between' one another. There are many options for networking here, including creating custom subnets for your containers, and the ability to 'share' your host's networking stack for your container to access directly. Docker is moving very fast.
Its is some of the best documentation I've ever seen. It is generally well-written, concise, and accurate. I recommend you check the documentation available for more information, and trust the documentation over anything else you read online, including Stack Overflow. If you have specific questions, I highly recommend joining #docker on Freenode IRC and asking there (you can even use Freenode's for that!).