The future of AI – self hosted open sourced models?

(edited)

The future of AI – self hosted open sourced models.jpg

In the past weeks, I have been studying AI agents, especially in the context of the Hermes agent framework that is in my opinion one of the best setups. The great thing of ai agents is that they are not just text and answer machines, they can do actual tasks from writing e-books, generating social media posts, youtube thumbnails or even 3D printing.

On the other hand the problem with AI agents is that the costs for API calls can explode. I watched tons of tutorials and reviews on Youtube and was surprised that a lot of people using AI agents were using Claude Opus 4.8 which is probably the best model but also by far the most expensive to run. Now when you run agents, their process is not to give you a straight answer but to think things through, to look at their skills and instructions and generate the output you want. This means that you have no idea how many tokens your process will take. If your agent is stuck in a loop, it can get expensive quite quickly and you might end up with huge monthly bills that go far beyond the monthly subscription that you have.

This was the main reason why I hesitated for a long time to go into AI agents. Another point that I don't really appreciate is how people deal in these tutorials with api keys and mcp accesses. The problem is that if they are not well hidden and protected, prompt injections could make them freely visible. If your api key is compromised and you don't realise it, somebody can use it and you pay the bill. These are the reasons why I started with AI agents at a very slow pace, trying to eliminate risks and problems.

How to limit costs?

For most processes it is overkill to use Claude Opus, the best model. To scrape a website a much lighter and cheaper model is by far enough. I have a pro Claude account and with a subscription I can't use the claude account with hermes or open claw. So what I do, I use claude to generate the planning and the PRD's for my projects. I then feed this to my manager agent in hermes. That's where the costs could explode if no guardrails are in place. I was exploring many different set ups:
1- API from anthropic or open ai
2- API from services like open router or other free or cheap services
3- open source models on local machine

To be honest, solution 1 doesn't make any sense to me. It's simply too expensive. While learning how to deal with hermes, I don't want to have thousands of dollars of api costs just for running a pseudo swarm of agents. Solution 3, seems for me the long term holy grail. There are very powerful open source models available that you can download on your computer and run locally. The advantage, no more API costs for AI! The issue with this solution is that you need a computer that is powerful enough to run such models. So far, I thought that only nvidia cards could handle this but soon realised that you need several cards to get enough ram to run these models. There are some other solutions that work from apple and AMD that not only are slightly cheaper than nvidia but also use way less power.

My long time goal is to go for that but only once I have learned how to work with agents and I see that there is a real utility for me.

My free set up

So to test things out, I decided to run hermes on my old desktop computer and then use the only free api model that I have found which is gemma 4. This model is part of the google family and you can get a free API on google ai studio. I first run gemini 3.5 flash but wasn't happy with the results and reached rate limits fast. Then I set up gemma 4 31B and it was actually much better than gemini for agentic work. I have to say that gemma 4 has become one of my favourite models to run. This set up is basically free to run but it can happen that the servers are too busy and my agent stop. Another thing is that to run this, my computer needs to be on. So I can't really use cron jobs and telegram or discord connection only works if my computer is up and running. I'm using the hermes desktop app that runs quite well on windows and it's good enough to experiment and create agents and test them out.

image.png
hermes desktop app

When I look at this, a computer capable or running big models costs between 4'000 and 5'000 $ which is actually not that much if you think that you will never need to pay api costs again. The big advantage linked to that is that your data is not sent to api providers, it goes to your own computer and stays there. A framework like hermes allows you to define what each agent is allowed, which mcp's and which api's it can access and also which model it uses. With a local infrastructure, you become model agnostic. You can download the models that you want and then they are loaded in your ram according to which model you use.

image.png
A small extract of open source models on ollama to host locally

For me this set up with locally hosted models is the only solution for extensive work with ai agents. What do you think about open source models?


With @ph1102, I'm running the @liotes project.

Please consider supporting our Witness nodes:

2.21116157 BEE
4 comments

I used the "famous" ones not because it's open or not, but because it was the most readily available with lots of resources on how to get everything up and running. 😜 !PIZZA !LOL !LUV

0.00000871 BEE

Chat gpt and Claude dominate the content world and you will find much more tutorials how to use them than gemma 4 :-)

0.00000000 BEE

i think i could setup an ai at home very easy but as far as i know i have no use for it at the moment .
the comparing you make are hghard to judge without knowing the hardware specs of your old desktop , it could be an old 486-dx100 cpu with 8mb memory 😁 , but i guess it's something more modern and maybe no longer windows 11 compatible ?

What hardware would you need to run it at home without to many problems ? will it run on a VM server ? does it require a dedicated GPU ? i know i have played with Amuse 3.0 on my pc to make graphics and it works fine with 16gb Vram . But do all AI servers need Vram ?

0.00000869 BEE

You can actually run a very small gemma 4 model with about 8 GB vram on your computer but I tried that with my old one and it made a lot of noise and was very slow (I have only 16 GB ram and windows uses a big part of it and no dedicated GPU. So it didn't work. I researched a bit and the best solutions are the following to host real big models locally:

  • amd ryzen ai max+ 395 with 128 GB - this is the superpower in my opinion for locally hosted AI's
  • nvidia dgx spark that has also a similar build up with 128 GB but is slightly more expensive
  • mac studio but it's difficult to find one that has enough GB Ram and high ram gets quite expensive.

all three models are actually quite low in power consumption and can run pretty big models efficiently. Their build up is similar with shared RAM.

I think with 16gb Vram you could actually run gemma 4:e2b or even gemma 4:e4b. You would need to install ollama and download it over it. Then you could execute it directly in ollama and use it as local llm. Probably it wouldn't be enough to run agents but enough for a local llm. You would need to check if your computer can deal with the workload :-)

0.00056886 BEE

ahh it even seems to run on linux , let's make a ubuntu VM and play with it :)

0.00000000 BEE

I've never tried installing an AI locally before, though I've read that it's possible even on older machines, but it probably takes a lot longer to run. There are tons of them here: https://huggingface.co/models
As for privacy, I use duck.ai (I think it used to be a search engine).

0.00000000 BEE

Curious about HivePakistan? Join us on Discord!

Delegate your HP to the Hivepakistan account and earn 90% of curation rewards in liquid hive!


Follow our Curation Trail and don't miss voting!


Additional Perks: Delegate To @ pakx For Earning $PAKX Investment Token


Curated by les90

0.00000000 BEE