Local LLM Serving

Local model serving - Using Ollama

July 14, 2025 • inferencing, local serving

There are several options available for running Large Language Model (LLM) inference locally. Ollama is one such option and my favorite among all. Ollama offers access to a wide range of models and has recently enabled cloud-hosted models as well. It offers both CLI and GUI (chat interface) to interact with the loaded models.

2

Local model serving - Using LM Studio

July 15, 2025 • inferencing, local serving

There are several options available for running Large Language Model (LLM) inference locally. LM Studio is one such option. It is more comprehensive and offers some great features.

3

Local model serving - Using Docker model runner

July 16, 2025 • inferencing, local serving

Docker Model Runner — a faster, simpler way to run and test AI models locally, right from your existing workflow. Whether you’re experimenting with the latest LLMs or deploying to production, Model Runner brings the performance and control you need, without the friction.

4

Local model serving - Using Foundry Local

July 17, 2025 • inferencing, local serving

There are several options available for running Large Language Model (LLM) inference locally. Foundry Local by Microsoft is a new entrant.

Local LLM Serving

Local model serving - Using Ollama

Local model serving - Using LM Studio

Local model serving - Using Docker model runner

Local model serving - Using Foundry Local

Cookie Preferences

Essential

Comments

Embedded Content