You are viewing a single comment's thread:
You can actually run a very small gemma 4 model with about 8 GB vram on your computer but I tried that with my old one and it made a lot of noise and was very slow (I have only 16 GB ram and windows uses a big part of it and no dedicated GPU. So it didn't work. I researched a bit and the best solutions are the following to host real big models locally:
all three models are actually quite low in power consumption and can run pretty big models efficiently. Their build up is similar with shared RAM.
I think with 16gb Vram you could actually run gemma 4:e2b or even gemma 4:e4b. You would need to install ollama and download it over it. Then you could execute it directly in ollama and use it as local llm. Probably it wouldn't be enough to run agents but enough for a local llm. You would need to check if your computer can deal with the workload :-)
ahh it even seems to run on linux , let's make a ubuntu VM and play with it :)