WikiJS/home/homelab/grok-AI-server-guide.md
2025-06-05 13:04:03 +00:00

13 KiB
Raw Blame History

title description published date tags editor dateCreated
Full Guide - Install Ubuntu Server and Configure Ollama with CodeGemma and Phi-3 Mini AI Project true 2025-06-01T20:09:17.154Z ai, guide, walk-though, ubuntu server, server, ollama markdown 2025-06-01T20:09:15.206Z

Install Ubuntu Server and Configure Ollama with CodeGemma and Phi-3 Mini

This guide provides step-by-step instructions to set up a headless Ubuntu Server 24.04 LTS on a PC with the following specs, install Ollama with CodeGemma 7B for user arti (Python coding assistance) and Phi-3 Mini (3.8B) for user phixr (system administration tasks), and restrict each users SSH access to their respective interactive AI session:

  • GPU: Radeon RX 6600 (8 GB VRAM)
  • CPU: AMD Ryzen 7 2700 (8 cores, ~3.2 GHz)
  • RAM: 64 GB (2133 MT/s)
  • Storage: 465.8 GB NVMe SSD (nvme0n1), 2x 931.5 GB SSDs (sda, sdb)

The setup is command-line only, with no desktop environment or window manager, and assumes youre replacing any existing OS (e.g., Proxmox). Both models use Q4_K_M quantization to fit within 8 GB VRAM and <20 GB disk space, leveraging ROCm for GPU acceleration.


Step 1: Prepare for Ubuntu Server Installation

Lets prepare to install Ubuntu Server on your NVMe SSD, replacing any existing OS.

  1. Download Ubuntu Server 24.04 LTS:

    • On another computer, download the ISO:
      wget https://releases.ubuntu.com/24.04/ubuntu-24.04-live-server-amd64.iso
      
    • Verify the ISO:
      sha256sum ubuntu-24.04-live-server-amd64.iso
      
  2. Create a Bootable USB Drive:

    • Use a USB drive (≥4 GB). Identify it with:
      lsblk
      
    • Write the ISO (replace /dev/sdX with your USB device):
      sudo dd if=ubuntu-24.04-live-server-amd64.iso of=/dev/sdX bs=4M status=progress && sync
      
      • Warning: Double-check /dev/sdX to avoid overwriting other drives.
    • Alternatively, use Rufus (Windows) or Etcher (cross-platform).
  3. Backup Existing Data:

    • If replacing Proxmox or another OS, back up data to an external drive or another system:
      scp -r /path/to/data user@other-machine:/destination
      
  4. Boot from USB:

    • Insert the USB, reboot, and enter the BIOS (usually Del or F2).
    • Set the USB as the first boot device.
    • Save and reboot to start the Ubuntu installer.

Step 2: Install Ubuntu Server 24.04 LTS

Lets install Ubuntu Server on the NVMe SSD (nvme0n1, 465.8 GB).

  1. Start the Installer:

    • Select “Install Ubuntu Server”.
    • Set language (English), keyboard layout, and network (DHCP or static IP).
  2. Configure Storage:

    • Choose “Custom storage layout”.
    • Partition nvme0n1:
      • EFI Partition: 1 GB, fat32, mount at /boot/efi.
      • Root Partition: 464.8 GB, ext4, mount at /.
    • Example (in installer):
      • Select nvme0n1, create partitions as above.
      • Write changes and confirm.
    • Optional: Use sda or sdb (931.5 GB SSDs) for additional storage (e.g., mount as /data).
  3. Set Up Users and SSH:

    • Set hostname (e.g., ai-server).
    • Create an admin user (e.g., admin):
      • Username: admin
      • Password: Set a secure password.
    • Enable “Install OpenSSH server”.
    • Skip importing SSH keys unless needed.
  4. Complete Installation:

    • Select no additional packages (Ollama and ROCm will be installed later).
    • Finish and reboot.
  5. Verify Boot:

    • Remove the USB, boot into Ubuntu, and log in as admin via a local terminal or SSH:
      ssh admin@<server-ip>
      

Step 3: Install AMD ROCm for Radeon RX 6600

Lets set up ROCm to enable GPU acceleration for Ollama.

  1. Update System:

    sudo apt update && sudo apt upgrade -y
    
  2. Add ROCm Repository:

    • Install dependencies and add ROCm 5.7:
      sudo apt install -y wget gnupg
      wget -qO - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
      echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/5.7 ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list
      
  3. Install ROCm:

    sudo apt update
    sudo apt install -y rocm-libs rocminfo
    
  4. Verify ROCm:

    • Reboot:
      sudo reboot
      
    • Check GPU:
      rocminfo
      
      • Look for “Navi 23 [Radeon RX 6600]”.
    • Check VRAM:
      rocm-smi --showmeminfo vram
      
      • Expect ~8192 MB.
  5. Troubleshooting:

    • If no GPU is detected, verify:
      lspci | grep -i vga
      
    • Try ROCm 5.6:
      echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/5.6 ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list
      sudo apt update && sudo apt install -y rocm-libs
      

Step 4: Install Ollama and Models

Lets install Ollama and download both CodeGemma 7B and Phi-3 Mini.

  1. Install Ollama:

    curl -fsSL https://ollama.com/install.sh | sh
    
    • Verify:
      ollama --version
      
  2. Pull CodeGemma 7B:

    • Download Q4_K_M (~4.2 GB):
      ollama pull codegemma:7b
      
    • Verify:
      ollama list
      
      • Expect codegemma:7b (q4_k_m).
  3. Test CodeGemma:

    • Run:
      ollama run codegemma:7b
      
    • Prompt: “Debug: x = [1, 2]; print(x[2]).”
    • Expected: “Check the lists length with len(x).”
    • Exit: Ctrl+D.
  4. Pull Phi-3 Mini:

    • Download Q4_K_M (~2.3 GB):
      ollama pull phi3:mini
      
    • Verify:
      ollama list
      
      • Expect phi3:mini (q4_k_m).
  5. Test Phi-3 Mini:

    • Run:
      ollama run phi3:mini
      
    • Prompt: “Walk me through configuring a firewall.”
    • Expected: “Install ufw with sudo apt install ufw. Enable with sudo ufw enable.”
    • Exit: Ctrl+D.
  6. Verify GPU Usage:

    • During a session, check:
      rocm-smi
      
      • CodeGemma: ~56 GB VRAM.
      • Phi-3 Mini: ~3.54.5 GB VRAM.
  7. Enable Ollama Service:

    sudo systemctl enable ollama
    sudo systemctl start ollama
    
    • Verify:
      systemctl status ollama
      

Step 5: Configure User arti for CodeGemma 7B

Lets restrict artis SSH access to an interactive CodeGemma 7B session for Python coding.

  1. Create User arti:

    sudo adduser arti
    
    • Set a secure password, optional details (e.g., full name: “Artificial Intelligence”).
  2. Restrict Home Directory:

    sudo chown arti:arti /home/arti
    sudo chmod 700 /home/arti
    
    • Verify:
      ls -ld /home/arti
      
      • Expect: drwx------ arti arti
  3. Create Shell Script:

    sudo nano /usr/local/bin/ollama-shell
    
    • Add:
      #!/bin/bash
      echo "Starting CodeGemma 7B interactive session..."
      /usr/bin/ollama run codegemma:7b
      
    • Save and exit.
    • Make executable:
      sudo chmod +x /usr/local/bin/ollama-shell
      sudo chown root:root /usr/local/bin/ollama-shell
      sudo chmod 755 /usr/local/bin/ollama-shell
      
  4. Set Shell:

    sudo usermod -s /usr/local/bin/ollama-shell arti
    
    • Verify:
      getent passwd arti
      
      • Expect: arti:x:1000:1000:,,,:/home/arti:/usr/local/bin/ollama-shell
  5. Add GPU Access:

    sudo usermod -a -G render arti
    
  6. Restrict SSH:

    sudo nano /etc/ssh/sshd_config
    
    • Add:
      Match User arti
          ForceCommand /usr/local/bin/ollama-shell
      
    • Restart SSH:
      sudo systemctl restart sshd
      
  7. Limit Permissions:

    sudo usermod -G nogroup arti
    
  8. Test SSH:

    ssh arti@<server-ip>
    
    • Expect: Starting CodeGemma 7B interactive session...
    • Prompt: “Debug: x = '5'; y = 3; print(x + y).”
    • Expected: “Check types with type(x).”
    • Exit: Ctrl+D (terminates SSH).
    • Try: ssh arti@<server-ip> bash (should fail).

Step 6: Configure User phixr for Phi-3 Mini

Lets restrict phixrs SSH access to a Phi-3 Mini session for system administration.

  1. Create User phixr:

    sudo adduser phixr
    
    • Set password, optional details (e.g., full name: “Phi-3 System Admin”).
  2. Restrict Home Directory:

    sudo chown phixr:phixr /home/phixr
    sudo chmod 700 /home/phixr
    
    • Verify:
      ls -ld /home/phixr
      
      • Expect: drwx------ phixr phixr
  3. Create Shell Script:

    sudo nano /usr/local/bin/ollama-phi3-shell
    
    • Add:
      #!/bin/bash
      echo "Starting Phi-3 Mini interactive session..."
      /usr/bin/ollama run phi3:mini
      
    • Save and exit.
    • Make executable:
      sudo chmod +x /usr/local/bin/ollama-phi3-shell
      sudo chown root:root /usr/local/bin/ollama-phi3-shell
      sudo chmod 755 /usr/local/bin/ollama-phi3-shell
      
  4. Set Shell:

    sudo usermod -s /usr/local/bin/ollama-phi3-shell phixr
    
    • Verify:
      getent passwd phixr
      
      • Expect: phixr:x:1001:1001:,,,:/home/phixr:/usr/local/bin/ollama-phi3-shell
  5. Add GPU Access:

    sudo usermod -a -G render phixr
    
  6. Restrict SSH:

    sudo nano /etc/ssh/sshd_config
    
    • Add (below Match User arti):
      Match User phixr
          ForceCommand /usr/local/bin/ollama-phi3-shell
      
    • Restart SSH:
      sudo systemctl restart sshd
      
  7. Limit Permissions:

    sudo usermod -G nogroup phixr
    
  8. Test SSH:

    ssh phixr@<server-ip>
    
    • Expect: Starting Phi-3 Mini interactive session...
    • Prompt: “Walk me through installing pfSense.”
    • Expected: “Download the ISO from pfsense.org. Create a USB with dd if=pfSense.iso of=/dev/sdX bs=4M.”
    • Exit: Ctrl+D (terminates SSH).
    • Try: ssh phixr@<server-ip> bash (should fail).

Step 7: Optimize and Troubleshoot

Lets ensure optimal performance and address potential issues.

  1. Performance Optimization:

    • CodeGemma 7B: ~56 GB VRAM, ~812 tokens/second. Good for Python debugging.
    • Phi-3 Mini: ~3.54.5 GB VRAM, ~1015 tokens/second. Ideal for system administration guidance.
    • Prompting:
      • arti: “Debug this Python code: [snippet].”
      • phixr: “Walk me through [task] step-by-step.”
    • Temperature: For precise responses, set temperature to 0.2:
      • For CodeGemma:
        nano ~/.ollama/models/codegemma-modelfile
        
        Add:
        FROM codegemma:7b
        PARAMETER temperature 0.2
        
        Create:
        ollama create codegemma-lowtemp -f ~/.ollama/models/codegemma-modelfile
        
        Update /usr/local/bin/ollama-shell to use ollama run codegemma-lowtemp.
      • For Phi-3 Mini:
        nano ~/.ollama/models/phi3-modelfile
        
        Add:
        FROM phi3:mini
        PARAMETER temperature 0.2
        
        Create:
        ollama create phi3-lowtemp -f ~/.ollama/models/phi3-modelfile
        
        Update /usr/local/bin/ollama-phi3-shell to use ollama run phi3-lowtemp.
  2. Troubleshooting:

    • No Session:
      • Check scripts:
        ls -l /usr/local/bin/ollama-shell /usr/local/bin/ollama-phi3-shell
        cat /usr/local/bin/ollama-shell
        cat /usr/local/bin/ollama-phi3-shell
        
    • GPU Issues: If slow (~15 tokens/second), verify ROCm:
      rocminfo
      rocm-smi --showmeminfo vram
      
      • Reinstall ROCm 5.6/5.7 if needed.
    • Shell Access: If arti or phixr access Bash:
      getent passwd arti
      getent passwd phixr
      
      • Confirm shells. Re-run usermod -s.
    • SSH Errors:
      sudo systemctl status sshd
      
      • Restart: sudo systemctl restart sshd.

Expected Performance

  • Hardware Fit: CodeGemma (~56 GB VRAM, ~4.2 GB disk) and Phi-3 Mini (~3.54.5 GB VRAM, ~2.3 GB disk) fit your Radeon RX 6600, Ryzen 7 2700, 64 GB RAM, and 465.8 GB NVMe SSD.
  • Use Case:
    • arti: Guides Python coding/debugging (e.g., “Check your list index with len()”).
    • phixr: Provides detailed system administration instructions (e.g., “Download pfSense ISO, then use dd”).
  • Speed: CodeGemma (~812 tokens/second), Phi-3 Mini (~1015 tokens/second). Responses in ~12 seconds.
  • Restriction: arti locked to CodeGemma; phixr to Phi-3 Mini. No Bash access.

Example Usage

  • For arti:
    ssh arti@<server-ip>
    >>> Debug: x = [1, 2]; print(x[2]).
    The error suggests an invalid index. Check the lists length with `len(x)`.
    
  • For phixr:
    ssh phixr@<server-ip>
    >>> Walk me through installing pfSense.
    Download the ISO from pfsense.org. Create a USB with `dd if=pfSense.iso of=/dev/sdX bs=4M`. Check with `lsblk`.