View and Unload Running Models

Overview

When you interact with models in OllaMan, they stay loaded in memory for faster responses in future conversations. The Dashboard allows you to monitor all running models and unload them when needed to free up memory.

Viewing Running Models

Open Dashboard

Click Dashboard in the sidebar to access the overview page.

Locate Running Models Section

Scroll down to find the "Running Models" section. This area displays all models currently loaded in memory.

Running Models Section

View Model Information

For each running model, you can see:

Model Name: The full name and tag (e.g., llama3:8b)
Quantization Level: Compression method used (e.g., Q4_0, Q8_0)
Memory Usage: Current GPU (VRAM) occupied by the model
Disk Usage: Disk space occupied by the model
Expiration Time: Countdown until auto-unload (if enabled)

Unloading Models

When you need to free up memory for other tasks or models, you can manually unload running models.

Identify Model to Unload

In the "Running Models" section, find the model you want to unload.

Click Unload Button

On the right side of the model card, click the "Unload" button.

Unload Button Location

Confirm Action

The model will be immediately unloaded from memory. You'll see:

The model disappears from the running models list
Memory usage statistics update
Confirmation message appears

What Happens After Unloading?

The model files remain installed on your disk
You can start chatting with it again anytime
The model will be reloaded into memory when needed
First response after reloading may take a few seconds longer

Understanding Memory Usage

Why Models Stay Loaded

OllaMan keeps models in memory to provide:

Faster Response Times: No reload delay for subsequent queries
Better User Experience: Instant conversation continuation
Efficient Resource Use: Automatic management based on activity

When to Unload Models

Consider unloading models when:

High Memory Usage: Your system is running low on available memory
Switching Tasks: You're done with a model and won't use it soon
Running Large Models: You need to free space for a bigger model
Troubleshooting: Resolving memory-related issues

Auto-Unload Feature

Ollama automatically unloads inactive models after a period of time (default: 5 minutes). The countdown timer shows when each model will be auto-unloaded.

Memory Usage Statistics

The Dashboard displays real-time memory metrics:

VRAM Usage

Shows how much video memory (GPU) is currently in use

Running Models Count

Total number of models currently loaded in memory

Memory per Model

Estimated memory consumption for each loaded model

Quick Actions

From the Running Models section, you can also:

Start Chat

Click the "Chat" button next to any running model to immediately start a conversation without reloading.

View Model Details

Click on the model card to see more detailed information about the model's configuration and capabilities.

Troubleshooting

Best Practices

Memory Management Tips

Monitor Regularly: Check the Dashboard periodically during heavy use
Unload After Use: Free memory when done with large models
Close Chats: Closing chat windows doesn't unload models—use the Unload button
Plan Ahead: Unload smaller models before loading very large ones

Overview

Viewing Running Models

Open Dashboard

Locate Running Models Section

View Model Information

Unloading Models

Identify Model to Unload

Click Unload Button

Confirm Action

Understanding Memory Usage

Why Models Stay Loaded

When to Unload Models

Auto-Unload Feature

Memory Usage Statistics

VRAM Usage

Running Models Count

Memory per Model

Quick Actions

Start Chat

View Model Details

Troubleshooting

Best Practices

Next Steps

Delete Models

Performance Testing

Dashboard Overview

Table of Contents

View and Unload Running Models

VRAM Usage

Running Models Count

Memory per Model

Model won't unload

Memory not freeing up

Running models list is empty

Delete Models

Performance Testing

Dashboard Overview

Table of Contents