My Progress wih Local AI - Running LLMs on AMD Ryzen 7640hs

avatar

I've been learning about using AI locally for a few months now. First, I learned about Quantization, Llamacpp and the GGUF format. I managed to get some models run on my Steam Deck, though heavily quantized and they weren't very useful. Models smaller than 3GB haven't gotten there yet...

Now I finally got a Zen 4 device with a decent RAM, and I went wild. I downloaded a bunch of models, and strove to test the limits of what I can do with this new computer.

I dabbled with LMStudio and I'm thinking of adding Open-WebUI too, but right now I'm using Koboldcpp as the server and the frontend. I like how it provides both an API and web-based UIs accessible from my tablet and other devices connected to the same router.

HuggingFace is full of models I wanted to try, but my daily internet bandwidth was limited so I could only download one big model or a few smaller ones...

I managed to get Qwen 30B A3B running. It's the most useful (for now) model I ran locally! The fastest token generation speed I got was 18 tokens/s with my RAM bandwidth probably being the bottleneck. Surprisingly, I got lower generation speed on the iGPU than I did on CPU.

On models like Ministral 8B, and GLM4.6V Flash I got around 10 tokens/s at best. They lose speed even quicker than smaller models, so I'll stick to models with 4B or less Active Parameters at a time.

Another limitation I found is the Context Window. All models lose speed at 4000 tokens, and the speed drop becomes way worse at 8000 tokens. I didn't test with context bigger than 10,000 tokens yet, but if it reaches speeds of less than 4t/s, the model is basically useless to me.

Anyway, this experience made me decide that if I want to use local AI for coding, I WILL need a dedicated GPU, or at least a specialized AI machine. I will stick to Venice.ai's inference for coding for now... I hope these become affordable in 2026, but the current trends are devouring this hope I have...

So, What Do You Think?

I'd love to keep you guys updated on my AI journey. See you in another article.~


Related Threads

https://inleo.io/threads/view/ahmadmanga/re-leothreads-2mc8ooqca?referral=ahmadmanga

https://inleo.io/threads/view/ahmadmanga/re-leothreads-gyriwmnl?referral=ahmadmanga

https://inleo.io/threads/view/ahmadmanga/re-leothreads-wbayqgrf

https://inleo.io/threads/view/ahmadmanga/re-leothreads-23alz8hss?referral=ahmadmanga

Posted Using INLEO



0
0
0.000
3 comments
avatar

I have tested local LLMs in the past and they are cool although the hardware limitation is what frustrates me because to run something worthy you have to put up some real cash, its been a while like almost a year since I try anything new as Im now a heave Perplexity and Claude Code user but I truly want to run a tailor Ai locally, been looking at Llama Factory for tine tuning https://github.com/hiyouga/LlamaFactory although not sure if 2026 will be the year I put down some cash into hardware but I hope we get to enjoy cheap local Ai in the next 5 year

0
0
0.000
avatar

I'm betting that smaller models will become more and more useful...

0
0
0.000
avatar

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

Consider setting @stemsocial as a beneficiary of this post's rewards if you would like to support the community and contribute to its mission of promoting science and education on Hive. 
 

0
0
0.000