I'm looking at using OCI at $DAY_JOB for model distribution for fleets of machines also so it's good to see it's getting some traction elsewhere.
OCI has some benefits over other systems, namely that tiered caching/pull-through is already pretty battle-tested as is signing etc, beating more naive distribution methods for reliability, performance and trust.
If combined with eStargz or zstd::chunked it's also pretty nice for distributed systems as long as you can slice things up into files in such a way that not every machine needs to pull the full model weights.
Failing that there are P2P distribution mechanisms for OCI (Dragonfly etc) that can lessen the burden without resorting to DIY on Bittorrent or similar.
wofo 2 minutes ago [-]
I've been pretty disappointed with eStargz performance, though... Do you have any numbers you can share? All over the internet people refer to numbers from 10 years ago, from workloads that don't seem realistic at all. In my experiments it didn't provide a significant enough speedup.
Damn, that's handy. I now wonder how much trouble making a CSI driver that does this would be for backporting to the 1.2x clusters (since I don't think that kubernetes does backports for anything)
israrkhan 10 hours ago [-]
Be aware of licensing restrictions. Docker Desktop is free for personal use, but it requires a paid license if you work for an organization sized 250+. This feature seems to be available in Docker Desktop only.
francesco-corti 3 hours ago [-]
Note: I'm part of the team developing this feature.
Soon (end of May, according to the current roadmap) this feature will also be available with the Docker Engine (so not only as part of Docker Desktop).
As a reminder, Docker Engine is the Community Edition, Open Source and free for everyone.
cmiles74 1 hours ago [-]
My understanding has always been that Docker Engine was only available directly on Linux. If you are running another operating system then you will need to run Docker Desktop (which, in turn, runs a Docker Engine instance in a VM).
This comment kind of makes it sound like maybe you can run Docker Engine directly on these operating systems (MacOS, Windows, etc.), is that the case?
mdaniel 31 minutes ago [-]
I wanted to offer that the (Rancher Desktop, lima, colima, etc) products also launch a virtual machine and install docker on it, so one doesn't need Docker Desktop to do that. My experience has been that the choice of "frontend" to manage the VM and its software largely comes down to one's comfort level with the CLI, and/or how much customization one wishes over that experience
dboreham 43 minutes ago [-]
Quick note that on Windows you don't need docker desktop. It's convenient, but regular docker can be run in WSL2 (which is the same VM that docker desktop uses).
daveguy 1 hours ago [-]
Is it still the case that you can't run Docker Engine Community Edition on a windows machine?
leowoo91 10 hours ago [-]
I don't understand why add another domain-specific command to a container manager and go out of scope for what the tool was designed for at first place.
anentropic 3 hours ago [-]
gotta have an AI strategy to report to the board
saidinesh5 8 hours ago [-]
The main benefit I see for cloud platforms: caching/co-hosting various services based on model instead of (model + user's API layer on top).
For the end user, it would be one less deployment headache to worry about: not having to package ollama + the model into docker containers for deployment. Also a more standardized deployment for hardware accelerated models across platforms.
Havoc 5 hours ago [-]
Can’t say I'm a fan of packaging models as docker images. Feels forced - a solution in search of a problem.
The existing stack - a server and model file - works just fine. There doesn’t seem to be a need to jam an abstraction layer in there. The core problem docker solves just isn’t there
gardnr 4 hours ago [-]
> GPU acceleration on Apple silicon
There is at least one benefit. I'd be interested to see what their security model is.
cmiles74 1 hours ago [-]
Is this really a Docker feature, though? llama.cpp provides acceleration on Apple hardware, I guess you could create a Docker image with llama.cpp and an LLLM model and have mostly this feature.
rockwotj 14 hours ago [-]
Looks exactly like ollama but built into Docker desktop? Anyone know of any differences?
blitzar 9 hours ago [-]
Hear me out here ... it's like docker, but with Ai <pause for gasps and applause>.
Seems fair to raise 1bn at a valuation of 100bn. (Might roll the funds over into pitching Kubernetes, but with Ai next month)
danparsonson 8 hours ago [-]
What they really need is a Studio Ghibli'd version of their logo
ammo1662 13 hours ago [-]
They are using OCI artifacts to package models, so you can use your own registry to host these models internally. However, I just can't see any improvement comparing with a simple FTP server. I don't think the LLM models can adopt hierarchical structures like Docker, and thus cannot leverage the benefits of layered file systems, such as caching and reuse.
Yes, ollama also uses OCI, but currently only works with unauthenticated registries.
jesserwilliams 3 hours ago [-]
It's not the only one using OCI to package models. There's a CNCF project called KitOps (https://kitops.org) that has been around for quite a bit longer. It solves some of the limitations that using Docker has, one of those being that you don't have to pull the entire project when you want to work on it. Instead, you can pull just the data set, tuning, model, etc.
krick 14 hours ago [-]
They imply it should be somehow optimized for apple silicon, but, yeah, I don't understand what this is. If docker can use GPU, well, it should be able to use GPU in any container that makes use of it properly. If (say) ollama as an app doesn't use it properly, but they figured a way to do it better, it would make more sense to fix ollama. I have no idea why this should be a different app than, well, the very docker daemon itself.
mappu 13 hours ago [-]
All that work (AGX acceleration...) is done in llama.cpp, not ollama. Ollama's raison d'être is a docker-style frontend to llama.cpp, so it makes sense that Docker would encroach from that angle.
Looks like Docker is feeling left out of the GenAI bubble. It’s a little late…
bsenftner 4 hours ago [-]
I wonder if the adult kids of some Docker execs own Macs, and they make it. Why on Earth make this not for the larger installed OSes, you know, the ones running Docker in production?
pridkett 3 hours ago [-]
Because the ones running Docker in production aren’t paying the license fees they make you pay to use Docker Desktop.
amouat 3 hours ago [-]
I'm pretty sure that's in development, it's just more difficult.
avs733 56 minutes ago [-]
I'm going to take a contrarian perspective to the theme of comments here...
There are currently very good uses for this and likely going to be more. There are increasing numbers of large generative AI models used in technical design work (e.g., semiconductor rules based design/validation, EUV mask design, design optimization). Many/most don't need to run all the time. Some have licensing that is based on length of time running, credits, etc. Some are just huge and intensive, but not run very often in the design glow. Many are run on the cloud but industrial customers are remiss to run them on someone else's cloud
Being able to have my GPU cluster/data center be running a ton of different and smaller models during the day or early in the design, and then be turned over to a full CFD or validation run as your office staff goes home seems to be to be useful. Especially if you are in anyway getting billed by your vendor based on run time or similar. It can mean a more flexible hardware investment. The use casae here is going to be Formula 1 teams, silicon vendors, etc. - not pure tech companies.
tuananh 13 hours ago [-]
they are about ~2 years late.
ako 2 hours ago [-]
Doesn't matter. I have docker and ollama running, would be nice to ditch ollama and run everything through docker.
15 hours ago [-]
Rendered at 15:31:42 GMT+0000 (Coordinated Universal Time) with Vercel.
OCI has some benefits over other systems, namely that tiered caching/pull-through is already pretty battle-tested as is signing etc, beating more naive distribution methods for reliability, performance and trust.
If combined with eStargz or zstd::chunked it's also pretty nice for distributed systems as long as you can slice things up into files in such a way that not every machine needs to pull the full model weights.
Failing that there are P2P distribution mechanisms for OCI (Dragonfly etc) that can lessen the burden without resorting to DIY on Bittorrent or similar.
(I ended up developing an alternative pull mechanism, which is described in https://outerbounds.com/blog/faster-cloud-compute though note that the article is a bit light on the technical details)
Soon (end of May, according to the current roadmap) this feature will also be available with the Docker Engine (so not only as part of Docker Desktop).
As a reminder, Docker Engine is the Community Edition, Open Source and free for everyone.
This comment kind of makes it sound like maybe you can run Docker Engine directly on these operating systems (MacOS, Windows, etc.), is that the case?
For the end user, it would be one less deployment headache to worry about: not having to package ollama + the model into docker containers for deployment. Also a more standardized deployment for hardware accelerated models across platforms.
The existing stack - a server and model file - works just fine. There doesn’t seem to be a need to jam an abstraction layer in there. The core problem docker solves just isn’t there
There is at least one benefit. I'd be interested to see what their security model is.
Seems fair to raise 1bn at a valuation of 100bn. (Might roll the funds over into pitching Kubernetes, but with Ai next month)
There are currently very good uses for this and likely going to be more. There are increasing numbers of large generative AI models used in technical design work (e.g., semiconductor rules based design/validation, EUV mask design, design optimization). Many/most don't need to run all the time. Some have licensing that is based on length of time running, credits, etc. Some are just huge and intensive, but not run very often in the design glow. Many are run on the cloud but industrial customers are remiss to run them on someone else's cloud
Being able to have my GPU cluster/data center be running a ton of different and smaller models during the day or early in the design, and then be turned over to a full CFD or validation run as your office staff goes home seems to be to be useful. Especially if you are in anyway getting billed by your vendor based on run time or similar. It can mean a more flexible hardware investment. The use casae here is going to be Formula 1 teams, silicon vendors, etc. - not pure tech companies.