Hey everyone,
I recently got a MacBook Pro M5 and the first thing I asked myself was: how do I run a model locally? And how far can I actually push it?
In this article I’ll walk you through how to set up a local model and use it directly in Visual Studio Code.
LM Studio
LM studio is a free app for running AI models locally, and a solid alternative to Ollama.
After installing it, the first thing you’ll want to do is download a model. I went with Qwen3 Coder at 30 billion parameters. To download it, click the robot icon with the magnifying glass in the left sidebar,

then pick the model you want.

Image formats
On the right side you’ll see two labels next to “Format”: GGUF and MLX.
GGUF is a runtime format mainly used with llama.cpp and one of the most widely adopted out there. MLX is instead a format optimized specifically for Apple silicon (M1 -> MXd).
On Apple devices, MLX is the way to go since you’ll get noticeably better performance compared to GGUF.
Testing the model
LM Studio comes with a built-in chat so you can test your freshly downloaded model right away.
Click the colored icon at the top of the left sidebar to open the chat section.
First, select your model by clicking at the top.

The modal that opens has two sections: the top one shows which models are currently loaded in memory (you can load more than one, depending on how much RAM you have), while the bottom section, “your models”, lists everything you’ve downloaded locally.
If you don’t see any models there, you’ll need to download one first as described above.
Once you’ve picked a model, you’re ready to start chatting.

Performance
On my MacBook Pro M5 32GB, this model runs at around 55 tokens/second. It’s fast like Claude Code or GitHub Copilot.
Starting the local server
To use the model outside of LM Studio we need to spin up a local server. Click the terminal icon in the left sidebar, then toggle the checkbox to enable the server.

You’ll notice the model you used in the chat is already loaded and will be ready to go as soon as the server starts.
Once active, a local endpoint will be exposed that we’ll use to integrate with Visual Studio Code.

At this point the server is up and running and we’re ready to plug it into our IDE.
Visual studio code
There are several ways to use a local model in VS Code. Today we’ll cover one of the cleanest options: an extension called “Continue”.
Continue is a VS Code plugin that brings an AI assistant into your IDE without being tied to any specific brand or service, and it supports local models out of the box. The experience is very similar to what you’re used to with GitHub Copilot or Claude Code.
Click on the model name (it’ll be empty the first time) and then hit “Add Chat Model”.

A modal will appear where you can configure things: select LM Studio as the provider and set the model to “autodetect”. This gives you the flexibility to swap models and experiment without having to create multiple profiles.

You can also set this up directly in the config file (there’s a link to it below the connect button). The configuration looks like this:
name: Local Config
version: 1.0.0
schema: v1
models:
- name: Autodetect
provider: lmstudio
model: AUTODETECT
apiBase: http://localhost:1234/v1/
Testing the model
Let’s try it out with the following prompt:
You are a python expert with 10+ year of experience. Create a command line tool in python that accept 2 arguments as input: "lenght" (number, integer) and "useSpecialChars" (1 or 0). This utility should be used to generate a secure password.
Rules:
- Add comment to explain the code
- Use python 3
- Create a readme that explain how to run toolThe model nailed it, generating both a well-written readme and the Python tool itself.

Conclusions
In this article we saw how to set up LM Studio to run Qwen 3 Coder 30B locally and connect it to Visual Studio Code via the Continue extension, ending up with a fully integrated AI assistant right inside our IDE.
This is just the beginning. A local model opens up a much wider range of possibilities, not just for development scenarios where privacy matters, but also for situations where heavy, continuous LLM usage becomes practical precisely because it runs locally at no cost. Just think about what you could build with a local model and Ralph!
On top of that, having a local model is a real advantage whenever you’re working without an internet connection or in environments where access is restricted.
See you next time!