LogoOllaMan Docs

View and Unload Running Models

Monitor active models and free up memory resources

Overview

When you interact with models in OllaMan, they stay loaded in memory for faster responses in future conversations. The Dashboard allows you to monitor all running models and unload them when needed to free up memory.


Viewing Running Models

Open Dashboard

Click Dashboard in the sidebar to access the overview page.

Locate Running Models Section

Scroll down to find the "Running Models" section. This area displays all models currently loaded in memory.

Running Models Section

View Model Information

For each running model, you can see:

  • Model Name: The full name and tag (e.g., llama3:8b)
  • Quantization Level: Compression method used (e.g., Q4_0, Q8_0)
  • Memory Usage: Current GPU (VRAM) occupied by the model
  • Disk Usage: Disk space occupied by the model
  • Expiration Time: Countdown until auto-unload (if enabled)

Unloading Models

When you need to free up memory for other tasks or models, you can manually unload running models.

Identify Model to Unload

In the "Running Models" section, find the model you want to unload.

Click Unload Button

On the right side of the model card, click the "Unload" button.

Unload Button Location

Confirm Action

The model will be immediately unloaded from memory. You'll see:

  • The model disappears from the running models list
  • Memory usage statistics update
  • Confirmation message appears

What Happens After Unloading?

  • The model files remain installed on your disk
  • You can start chatting with it again anytime
  • The model will be reloaded into memory when needed
  • First response after reloading may take a few seconds longer

Understanding Memory Usage

Why Models Stay Loaded

OllaMan keeps models in memory to provide:

  • Faster Response Times: No reload delay for subsequent queries
  • Better User Experience: Instant conversation continuation
  • Efficient Resource Use: Automatic management based on activity

When to Unload Models

Consider unloading models when:

  • High Memory Usage: Your system is running low on available memory
  • Switching Tasks: You're done with a model and won't use it soon
  • Running Large Models: You need to free space for a bigger model
  • Troubleshooting: Resolving memory-related issues

Auto-Unload Feature

Ollama automatically unloads inactive models after a period of time (default: 5 minutes). The countdown timer shows when each model will be auto-unloaded.


Memory Usage Statistics

The Dashboard displays real-time memory metrics:

VRAM Usage

Shows how much video memory (GPU) is currently in use

Running Models Count

Total number of models currently loaded in memory

Memory per Model

Estimated memory consumption for each loaded model


Quick Actions

From the Running Models section, you can also:

Start Chat

Click the "Chat" button next to any running model to immediately start a conversation without reloading.

View Model Details

Click on the model card to see more detailed information about the model's configuration and capabilities.


Troubleshooting


Best Practices

Memory Management Tips

  • Monitor Regularly: Check the Dashboard periodically during heavy use
  • Unload After Use: Free memory when done with large models
  • Close Chats: Closing chat windows doesn't unload models—use the Unload button
  • Plan Ahead: Unload smaller models before loading very large ones

Next Steps