View and Unload Running Models
Monitor active models and free up memory resources
Overview
When you interact with models in OllaMan, they stay loaded in memory for faster responses in future conversations. The Dashboard allows you to monitor all running models and unload them when needed to free up memory.
Viewing Running Models
Open Dashboard
Click Dashboard in the sidebar to access the overview page.
Locate Running Models Section
Scroll down to find the "Running Models" section. This area displays all models currently loaded in memory.

View Model Information
For each running model, you can see:
- Model Name: The full name and tag (e.g.,
llama3:8b) - Quantization Level: Compression method used (e.g., Q4_0, Q8_0)
- Memory Usage: Current GPU (VRAM) occupied by the model
- Disk Usage: Disk space occupied by the model
- Expiration Time: Countdown until auto-unload (if enabled)
Unloading Models
When you need to free up memory for other tasks or models, you can manually unload running models.
Identify Model to Unload
In the "Running Models" section, find the model you want to unload.
Click Unload Button
On the right side of the model card, click the "Unload" button.

Confirm Action
The model will be immediately unloaded from memory. You'll see:
- The model disappears from the running models list
- Memory usage statistics update
- Confirmation message appears
What Happens After Unloading?
- The model files remain installed on your disk
- You can start chatting with it again anytime
- The model will be reloaded into memory when needed
- First response after reloading may take a few seconds longer
Understanding Memory Usage
Why Models Stay Loaded
OllaMan keeps models in memory to provide:
- Faster Response Times: No reload delay for subsequent queries
- Better User Experience: Instant conversation continuation
- Efficient Resource Use: Automatic management based on activity
When to Unload Models
Consider unloading models when:
- High Memory Usage: Your system is running low on available memory
- Switching Tasks: You're done with a model and won't use it soon
- Running Large Models: You need to free space for a bigger model
- Troubleshooting: Resolving memory-related issues
Auto-Unload Feature
Ollama automatically unloads inactive models after a period of time (default: 5 minutes). The countdown timer shows when each model will be auto-unloaded.
Memory Usage Statistics
The Dashboard displays real-time memory metrics:
VRAM Usage
Shows how much video memory (GPU) is currently in use
Running Models Count
Total number of models currently loaded in memory
Memory per Model
Estimated memory consumption for each loaded model
Quick Actions
From the Running Models section, you can also:
Start Chat
Click the "Chat" button next to any running model to immediately start a conversation without reloading.
View Model Details
Click on the model card to see more detailed information about the model's configuration and capabilities.
Troubleshooting
Best Practices
Memory Management Tips
- Monitor Regularly: Check the Dashboard periodically during heavy use
- Unload After Use: Free memory when done with large models
- Close Chats: Closing chat windows doesn't unload models—use the Unload button
- Plan Ahead: Unload smaller models before loading very large ones
OllaMan Docs