Skip to main content

Getting Started: A Novice-Friendly Guide to Running Local AI With Ollama and AnythingLLM

This guide will walk you through setting up a powerful, private and user-friendly local AI environment on your computer. You'll use Ollama (to run local large language models) and AnythingLLM (for a modern GUI with chat threads and search) and can supercharge your retrieval system with the Nomic Embed model for local document search (Retrieval-Augmented Generation, or RAG). All steps are designed for users with minimal technical experience.

Note: Models like these can be very large, enough to fill up all available storage space on a typical computer, so please consider the space you have available before you set up a local AI. One way to check the size of a model is to visit the Ollama library and select a category (e.g., deepseek-r1, gemma3n).

This setup:

  • Is fully local and private: All data stays on your device.
  • Offers a modern, easy interface: AnythingLLM gives you a ChatGPT-like GUI.
  • Is flexible: Use local models for privacy, or connect to SOTA models via Azure if needed.
  • Has a powerful search: The Nomic embedder makes your document search smarter and faster.

Setup

Step 1: Install Ollama

Ollama lets you run language models (LLMs) locally.

  1. Download Ollama: Go to https://ollama.com/download and choose the installer for your system (Mac/Windows/Linux).
  2. Install: Run the installer and follow the prompts.

Step 2: Run Ollama

  1. Open Your Terminal (Command Prompt on Windows, Terminal on Mac/Linux).
  2. Start the Ollama server: ollama serve.

If Ollama is already running, it will tell you.

Step 3: Choose and Download a Model

  • High-end computers (e.g., Mac M3/M4 with 64GB+ RAM): ollama run gemma3:27b
  • Most laptops/desktops: ollama run gemma3:4b

You can experiment with gemma3:12b or llama3:8b if your system handles the 4b model smoothly.

This command will download and load the model. The first load is slow; subsequent uses are fast.

Step 4: Install AnythingLLM (GUI)

AnythingLLM is a desktop app to chat with your models in a user-friendly way.

  1. Download AnythingLLM Desktop: Go to https://anythingllm.com/desktop and download the installer for your OS.
  2. Install and launch: Run the installer, then open AnythingLLM.

Step 5: Connect AnythingLLM to Ollama

  1. In AnythingLLM, go to Settings.
  2. Select Ollama as your model provider.
  3. Choose your preferred model (e.g., gemma3:4b or llama3:8b) from the dropdown.
  4. Save settings.

Step 6: Supercharge Your Document Search With Nomic Embedder (Optional/Advanced)

If you want smarter document search (RAG):

  1. Install Nomic Embed Model: Visit https://www.nomic.ai/embed and follow the setup instructions for your OS.
  2. Configure AnythingLLM to use Nomic Embed:
    • In AnythingLLM's settings, look for the "Embedder" section.
    • Switch from the default embedder to the Nomic embedder (you may need to specify the path or endpoint).
    • Set Chunk Size to 2000 and Overlap to 200 for optimal performance with larger documents.

Step 7: Start Chatting & Exploring

  • Chat: Use the AnythingLLM GUI to ask questions, try threads and manage documents.
  • Document Search: Upload PDFs or text files; AnythingLLM will help you search and summarize them using your local LLM and the powerful Nomic embedder.

Tips & Troubleshooting

  • The first time you load a model, it may take a while (especially for larger models).
  • If you get “out of memory” errors, try a smaller model.
  • For the most privacy, use only local models. For SOTA access, AnythingLLM can be configured to use your secure Azure subscription (if you have one).

Need help?

Created by David Liebovitz, MD.

Follow Institute for Artificial Intelligence in Medicine on BlueskyLinkedIn