Notes on a Personal Computer Built for Calculation and AI

2025-04-28

Notes on a Personal Computer Built for Calculation and AI

Introduction

Ten years later, the system described beginning in Personal Linux R Server in a Mini-ITX Gaming Case was getting old and slow. Time for a replacement...

Hardware
Component List
Intel i9-13900KS CPU
CPU cooler
Motherboard
Memory (RAM)
Video Card (GPU)
Power supply
Power connections

Mass storage
Fans
ARGB elements
ARGB controllers
Speakers
Disk-activity LED
Watt meter
Five beeps at boot
EDID dongle
Microcode issues
Operating Systems
BIOS
Windows 11
Debian 12
AlmaLinux 9.5
ssh
Gnome setup
Linux NVIDIA driver
Uninstall NVIDIA driver
llama.cpp Setup
LLM timing tests

Hardware

Step-by-step reference: How to Build a PC

Understanding fan specs and theory: Fan airflow and static pressure

Component List

PCPartPicker: 2025 RTX 4070 Build

Case	ASUS Prime AP201 Black MicroATX Tempered Glass Front Panel (Amazon)
Power Supply	Corsair SF1000 SFX Power Supply (Black) (B&H Photo)
Motherboard	GIGABYTE Z790M AORUS ELITE AX ICE LGA 1700 Intel (Newegg)
Processor	Intel Core i9-13900KS (B&H Photo)
CPU Cooler	Cooler Master 240 Atmos White High Performance Close-Loop AIO Liquid Cooler (Amazon)
RAM	Crucial Pro 96GB DDR5 RAM Kit (2x48GB) 5600MHz (Amazon)
M.2 SSD	SAMSUNG 990 EVO Plus SSD 2TB, PCIe Gen 4x4, Gen 5x2 M.2 2280 (Amazon)
SATA SSD	Samsung 850 EVO 500GB 2.5-Inch SATA III Internal SSDX
SATA Hard Drive	WD Black 2TB - 7200 RPM SATA 6 Gb/s 64MB Cache 3.5 Inch WD2003FZEX
Video Card	ASUS Dual GeForce RTX™ 4070 EVO OC Edition 12GB (Amazon)
120mm Case Fans	3x Noctua NF-A12x25 PWM chromax.Black.swap, Premium Quiet Fan, 4-Pin (120mm, Black) (Amazon)
92mm Case Fan	Noctua NF-A9 PWM chromax.Black.swap, Premium Quiet Fan, 4-Pin (92mm, Black) (Amazon)

Update: replaced WD Black 2TB drive with another M.2 drive.

M.2 SSD	WD_BLACK 8TB SN850X NVMe SSD with Heatsink - Gen4 PCIe, M.2 2280 (Amazon)

Miscellaneous Parts:

Straight SATA cables	BENFEI 3 Pack SATA Cable III 6Gbps Straight HDD SDD Data Cable 18 Inch (Amazon)
PWM Fan Splitters	JBtek All Black Sleeved PWM Fan Splitter Cable 1 to 2 Converter, 2 Pack (Amazon)
Motherboard Speaker	5 PCS Motherboard Speaker (Amazon)
External Speaker	USB Computer Speaker for Desktop PC Laptop \| Small Plug-N-Play External Speaker (Amazon)
Wireless Mouse	Logitech M196 Bluetooth Wireless Mouse - Graphite (Amazon)
Windows	New Windows 10 Professional USB 32/64-Bit and license key Sealed Retail box (eBay)
Cable Tape	XFasten Wire Harness Tape 3/4 Inch x 50 ft, Residue-Free Cloth Electrical Felt Tape (Amazon)
Watt meter	Upgraded Watt Power Meter Plug Home Electrical Usage Monitor with Backlight, Overload Protection, 7 Modes Display (Amazon)
EDID Dongle	HDMI EDID Emulator Passthrough - 1920x1200 @59hz (Amazon)

Bling Accessories:

LED Fan Halos	Addressable RGB Fan Halo, Airgoo 3 Packs Rainbow Fan Frame for 120mm Noctua PWM Fans (Amazon)
ARGB Fan Controller for Testing	Thermalright 5V 3PIN A-RGB Fan Controller, 5V Lighting Controller (Amazon)
ARGB Cables	BULIK 5V 3Pin ARGB Extension Cable, Motherboard Port to SM3P 1-to-1 Female ARGB Connector Adapter Cable (Amazon)
ARGB Light Bar	Addressable RGB LED Strip for Gaming Case, 0.98ft 30LEDs Diffused Rainbow Magnetic ARGB Strip (Amazon)
Arduino-Based ARGB Controller	Gelid Solutions CODI6 ARGB 6-Channel Programmable USB Controller Kit PWM LED Fan (eBay)
USB B Extension Cable	NFHK USB 2.0 B Type Male to Female Extension Cable Down Angled 90 Degree 20cm for Printer Scanner Disk (Amazon)

Intel i9-13900KS CPU

Discussion of i9-139xx and i9-149xx CPUs:

Intel 14th Gen vs 13th Gen Desktop CPU – Which One To Get?

Core i9-14900K ≈ i9-13900KS

At the time of purchase, the 13900KS was actually cheaper and more readily available than the 13900K.

Reviews:

Intel Core i9-13900KS Review: The World's First 6 GHz 320W CPU 2023-01-29
Intel Core i9-13900KS Review - The Empire Strikes Back 2023-04-21

CPU cooler

I opted for a smaller (240mm, 2-fan) liquid cooler; the case can actually accomodate a full-size (360mm) cooler, but that would have required moving the PSU case to a lower position, which could inhibit the ability to install a longer graphics card and/or install a 3.5" hard drive on the inside front of the case.

Review:

Cooler Master MasterLiquid 240 Atmos Review: Dynamite in a Small Package 2023-10-26

Installation video:

Cooler Master 360/240 Atmos Install Guide

I first tried a Noctua NH-D15 cooler, but it's so big I was concerned about whether it would impinge on the video card, which has to go in the first slot next to it.

Motherboard

Manual here: GIGABYTE Z790M AORUS ELITE AX ICE LGA 1700 Intel

Memory (RAM)

I initially installed two 48GB DIMMs in the recommended slots on the motherboard, for a total of 96GB.

Once everything was working, and operating under the incorrect assumption that more would better, I installed two more 48GB DIMMs, populating all four slots for a total of 192GB. What I failed to appreciate was how much the system would slow down when this was done. The BIOS set the RAM speed at the stock value of 5600 MT/s when two DIMMs were installed; it dropped the speed to 4000 MT/s when four were installed. I tried enabling each of the XMP profiles (XMP 1 and XMP 2) but both promptly started producing memory errors when I ran a memory stress test (stressapptest.

sudo apt install stressapptest
sudo stressapptest -W -s 120

I was not interested in doing extensive tweaking and tuning to get the overclocking right, so I returned the extra DIMMs and reverted to 96GB and the stock 5600 MT/s.

See LLM timing tests for comparative results of 96GB vs. 192GBs when running actual LLMs.

Video Card (GPU)

I did initially testing of the system without a video card installed; I used the Intel internal graphics that is part of the i9-13900KS CPU. After everything else was demonstrated to be working, I unboxed the video card and installed it.

This is the one I wanted:

GeForce RTX 5090 Founders Edition
32GB VRAM

But it sold out a few seconds after it was launched. So I got this one:

ASUS Dual GeForce RTX™ 4070 EVO OC Edition 12GB GDDR6
12GB VRAM

Specification comparison here

ASUS installation instructions:

How to install the graphics card on motherboard

Power supply

Although the case can hold an ATX PSU, I opted for the smaller SFX format to maximize room. The Corsair SF1000 SFX could (just barely) handle an RTX 5090 (my first choice) and give plenty of headroom for anything else.

Review:

Corsair SF1000 ATX v3.1 PSU Review 2024-06-14

The box didn't come with a manual. I found these on the Corsair website after I had already hooked everything up, first the wrong way, then the right way:

I hooked up most of the cables I needed before installing the PSU in the case, but I found it helpful to have the following photo to refer to later on when attaching a few more.

PCPartPicker estimated the power requirement to be 713W (excluding LED bling) for the RTX 4070 build. The 4070 accounts for 200W of that. If it were replaced with by an RTX 5090, the estimated power requirement would be 1088W, which could exceed the limit of the SF1000 if the CPU and GPU were both maxed out at the same time.

Possible alternate PSU if RTX 5090 installed:

Corsair HX1500i Fully Modular Ultra-Low Noise ATX Power Supply

Power connections

My initial mistake was assuming that the larger power cable labelled "Motherboard" was the only one I needed to hook up to the motherboard. When I did that, the motherboard powered up, but nothing else happened, and the yellow CPU warning light on the motherboard was lit.

Apparently, I'm not the only one to have been dismayed by this, fearing an expensive CPU or other failure; the solution was provided here as well, I just needed to hook up an additional one or two power cables. There was nothing wrong with any of the components.

Gigabyte Z790 is showing yellow CPU Status LED

The additional power connections are labelled ATX_12V_2X2 and ATX_12V_2X4 on the motherboard. The ATX_12V_2X4 is required. As explained here, the ATX_12V_2X2 might not be necessary in general, but given that the i9-13900KS is particularly power-hungry (as is the RTX 5090 if I ever get one), it seemed a good idea to hook that up too.

See Corsair document Which PSU cables go where?. Note that a 2x4 power cable connector can be split in half to give two 2x2 connectors.

For discussion of GPU power connections, see GPU Power Cable Guide – 6-Pin, 8-Pin, (6+2), 12-Pin PCIe.

Mass storage

The SATA SSD and the SATA HD came from the previous build.

The case came with two SATA cables with right-angle connectors, which was not useful. They can't be used with drives mounted on the case walls, hence the need to purchase additional SATA cables with straight connectors.

Fans

The case is designed to take two 120mm fans on the bottom, and has holes for mounting screws in the appropriate positions. I mounted an additional third fan, the smaller 92mm one, on the bottom with zip ties.

The standard Noctua NF-A12x25 PWM fans, in brown and tan, conveniently come with a Y-splitter cable. Stylish builders who want the all-black version are penalized not only by having to pay two dollars more for each one, but they also don't come with the Y-splitters, which must be purchased separately.

ARGB elements

Noctua does not make fans that light up (because dignity), but Airgoo makes add-on LED "halos" that attach to their 120mm fans. For a good fit, I needed to slightly enlarge the halo mounting holes by running a 3/16" drill bit through them. The halos come in a set of three. They can be daisy-chained, but to run them independently, I needed the additional Bulik adapter cables.

Airgoo also makes a nice foot-long LED bar. I had to glue some rare earth magnets to the edge of the bottom fans so the bar could attach there. (The bar has weak magnetic strips along two edges.)

For immediate testing, the $8 Thermalright controller allowed for nice LED colors and pattern demos for all the available LEDs. For a permanent LED controller, the Gelid Solutions CODI6 is an Arduino-based controller for up to 6 units. It has a motherboard USB cable than allows for programming from within the system. In order for it to fit, it needed a short USB extension cable; one with a right-angle connector (in the correct direction) allows for optimum placement.

Unit	LED Count	LED Position
Light Bar	30	LED #0 is next to wire
Halo	30	LED #0 is near wire, proceeds CCW
Cooler Pump	12	LED #0 is at bottom vertex; proceeds CW
Cooler Fans	8	Point towards front of case is between LEDs 7 and 0; proceeds CW

The Arduino and LEDs get their power from a SATA-type power connector. I measured the current draw of the LEDS at various intensity levels to make sure this would be OK.

Light Bar Current (mA) at Intensity
LEDs	25%	50%	100%
R	70	100	100
G	70	100	100
B	70	100	100
RGB	125	150	150

Halo Current (mA) at Intensity
LEDs	25%	50%	100%
R	50	90	135
G	50	90	135
B	50	90	135
RGB	115	170	200

ARGB controllers

The Thermalright controller is a little $8 package that has some canned LED display colors and patterns. I could get it right away and use it for initial testing.

The CODI6 controller is more expensive, purchased via eBay from the UK. It contains a programmable Arduino controller for up to six devices, and the cables necessary for internal connection to a motherboard USB connector. Thus it can be programmed and controlled from within the main computer itself to create arbitrary, changing LED patterns.

It works as advertised, and is fun to program. The header pins that connect to the ARGB cables do not provide a secure connection, however. I found the slightest jar to a cable could cause it come loose, which was irritating. (If I have to open the case again to move things around, I may just want to glue the cable connectors onto the board.)

Speakers

One tiny piezo speaker mounts on a motherboard header, and is needed to hear BIOS startup and error beeps. The other, a standard USB speaker, is needed to hear beeps and boops from software and operating systems.

Disk-activity LED

The motherboard came with header pins for a disk activity LED, but the case did not have one built in, so I had to add one myself. With the glass side panel facing the front of my desk, this was fine, since I could mount the additional LED where it could be seen through the panel; no drilling necessary.

The motherboard manual did not give specs for the output, so I had to make measurements. The crucial question was whether I needed a series current-limiting resistor, or not, and if so, how big.

The no-load output of the motherboard connection was 3.236V.
With a 1kΩ load resistor across the output, the voltage was exactly the same, suggesting there was no internal current-limiting resistor (although the output could be current source).
With a 300Ω resistor in series with a green LED I got reasonable brightness.
Current was 2.4mA, which seems reasonable for safe operation.

Watt meter

I wanted some indicator for when I might be getting close to the limits of the power supply; probably not an issue for the RTX 4070, but definitely might be a problem for the RTX 5090, if I can get one. Therefore I hooked up an external power meter that can continually show the total wattage input to the SF1000 power supply. The SF1000 is rated at 1000W total output, and the input will always be greater than output due to efficiency always being less than 100%.

At best, the SF1000 is not expected to exceed about 93% efficiency. At 100% load, that drops to about 90% efficiency:

The above suggests that the 1000W output limit would correspond to about 1100W input as read on on the wattmeter. It would be reasonable to use a safety factor and simply keep the input at 1000W or less.

Five beeps at boot

In early testing, I sometimes heard the computer produce 5 beeps at boot, and show the CPU error light on the motherboard. This was alarming; 5 beeps is alleged to mean processor error. Did I have a damaged CPU?

After further research (e.g., here), I found that in this case the 5 beeps simply meant that a monitor had not been detected at boot. As long as I made sure that my KVM switch was set to the new computer before booting it, everything was fine.

EDID dongle

I wanted to use the new computer with a KVM switch that also had my Mac Mini attached. I noticed early on that the new box often presented only a blank screen when I switched to it after a minute or so away. If the delay was short, the computer seemed to remember my monitor configuration, otherwise it wouldn't. The problem was solved (in a suggestion that ChatGPT actually supplied) by the use of an EDID emulator dongle.

This also solved the above "5 beeps" problem described above. The dongle specifies the exact resolution of my monitor, and makes it appear to the computer as if a monitor is always connected. Microcode issues

In 2024 there were reports of problems with instability and damage to certain Intel processors. The problem was eventually confirmed by Intel and a microcode fix was released.

I had hoped that by the time I bought my CPU in January 2025 that the fix would have been applied to the currently-shipping units. Output from the following command:

cat /proc/cpuinfo

gives information about each of the 32 processors and shows that the required microcode version (0x12b) is in place:

...
processor	: 31
vendor_id	: GenuineIntel
cpu family	: 6
model		: 183
model name	: 13th Gen Intel(R) Core(TM) i9-13900KS
stepping	: 1
microcode	: 0x12b
...

Another way to get this information (and more) is via the Intel System Support Utility:

# Install these first:
sudo apt install lshw net-tools ethtool hdparm smartmontools wodim

# Then download the utility here.

# They made an error in naming the file
mv ssu_3.0.0.2_tar.gz Downloads/ssu_3.0.0.2.tar.gz

# unpack it, then run
sudo ./ssu.sh

# It writes the results to a text file with the name of computer.

Operating Systems

For initial testing, I just installed Debian, without a separate video card and with Secure Boot turned off. I intended to later install various other Linux distributions my two larger drives, and Windows on a third separate drive that would have no other systems on it. Then I learned:

Windows does not like to be installed if there are any other systems installed. Windows wants to be first.
Windows 10 does not require Secure Boot, but Windows 11 does.
The operating systems seem to have a smoother installation if they are installed with Secure Boot on, rather than installing with Secure Boot off and trying to enable it later.
The operating systems seem to have a smoother installation if they are installed with the video card installed, rather than installing without a video card and then trying to enable it later.
All operating systems will initially be able to work with the video card even if Secure Boot is enabled, but some will not be able to load proprietary Nvidia drivers with Secure Boot on until signing issues have been dealt with.

So I did a lot of erasing, reformatting, and reinstalling.

BIOS

Summary of BIOS settings

Switch to Advanced settings.
Tweaker > Intel Default Settings: default is Extreme, changed to Performance.
Settings > Platform Power > ErP: default is Disabled, changed to Enabled.
Boot > Secure Boot: default is Disabled, changed to Enabled.
Smart Fan (F6) > default is Silent, changed to Normal.

References

Gigabyte Z790 BIOS Manual

Intel Power Delivery Profiles
What exactly is ErP in BIOS?

ErP Support determines whether to let the system consume less than 1W of power in S5 (shutdown) state. When the setting is enabled, the following four functions will become unavailable: PME Event Wake Up, Power On By Mouse, Power On By Keyboard, and Wake On LAN.
13th Generation Intel Core, Intel Core 14th Generation, Intel Core Processor (Series 1) and (Series 2), and Intel® Xeon™ E 2400 Processor Datasheet, Volume 1 of 2

Secure Boot

https://www.linux.org/threads/should-i-enable-secure-boot-after-installation-of-linux.54247/

....There is a boot shim that gets installed during the initial installation. It checks to see if secure boot is enabled or not.

Most always, if you have secure boot disabled during install, the non-secure boot shim gets installed. If you re-enable secure boot, it likely won't work.
What is "Platform is in setup mode" mean? SecureBoot disabled although TPM is enabled
Debian Wiki: Secure Boot
Debian Secure Boot: To be, or not to be, that is the question!
Overview: How UEFI Secure Boot Works in Linux
AlmaLinux: Secure Boot
FreeBSD: Secure Boot
Ubuntu Wiki: Secure Boot
Fedora: Need help to enable Secure Boot

Notes

With the Extreme setting, the mprime torture test sent CPU temp to peak of 92°C and peak input power to 490W momentarily, and then ramped down to a continuous 390W. With the Performance setting, the peak temperature was 77°C and the peak wattage was 390W.

Arduino in CODI6 ARGB controller remained powered on at all times unless ErP was enabled.

Initial Secure Boot state:

~$ mokutil --sb-state
SecureBoot disabled
Platform is in Setup Mode

BIOS steps to enable Secure Boot:

Standard -> Custom
Reinstall factory keys
Custom -> Standard
MODE = User
Enable Secure Boot

After enabling:

~$ mokutil --sb-state
SecureBoot enabled

Windows 11

Windows wants to go first, without any other systems installed. To decouple the Windows installation as much as possible from everything else, I installed it on the separate 500GB SD dedicated solely to Windows.

Make sure Secure Boot is enabled.
Wipe other system installations.
Install Windows 10.
Download Windows 11 Installation Assistant and update.
Hope for the best.

Debian 12

Prepare installation flash drivem:

Go to Debian website.
Download 64-bit PC DVD-1 iso. This is the larger complete installation, which does not require an internet connection when installing.
Download balenaEtcher from https://etcher.balena.io
Use balenaEtcher to burn ISO to flash drive

Reboot to installation drive. Setup and install to 2TB drive nvme0n1. Use Standard Partitions (not LVM). Use Guided Partitioning to make use of entire drive.

Configure the network: [choose wired interface]
Do not configure the network at this time
Hostname: dprime
Partition disks: Guided - use entire disk
Select disk: /dev/nvme0n1 2.0TB
Partitioning scheme: All files in one partition
Software selection: Uncheck all except SSH server, standard system utilities

The above should produce a working CLI system derived entirely from offline resources. The Debian GUI failed to start with the RTX 4070 unless an NVIDIA driver was installed separately.

Note: changing the BIOS setting for the internal graphics to anything but Enabled resulted in a non-bootable system that required resetting the motherboard BIOS to work again.

Prevent system from asking CD to be reinserted when later installing software. Explanation.

su -
vi /etc/apt/sources.list
# comment out following line:

Set up static networking:

# look up name of wired interface:
ls /sys/class/net

su -
vi /etc/network/interfaces
# insert the following:
---
auto enp5s0
iface enp5s0 inet static
     address 172.16.1.20/24
     gateway 172.16.1.10
---

vi /etc/resolv.conf
# insert the following, replacing x.x.x.x with your DNS name server(s):
---
nameserver x.x.x.x
nameserver x.x.x.y
---

# bring up interface
ifup enp5s0

# test:
ping apple.com

exit

Retrieve previous ssh keys, if any

su -
mkdir -p /media/michael/data1
mount /dev/nvme1n1p2 /media/michael/data1

cp /media/michael/data1/ssh_host/* /etc/ssh/
reboot

Set up to install software from network per advice.


su -
tee /etc/apt/sources.list<<EOF
deb http://deb.debian.org/debian bookworm main contrib non-free-firmware
# deb-src http://deb.debian.org/debian bookworm main contrib non-free-firmware

deb http://deb.debian.org/debian bookworm-updates main contrib non-free-firmware
# deb-src http://deb.debian.org/debian bookworm-updates main contrib non-free-firmware

# deb http://deb.debian.org/debian bookworm-backports main contrib non-free-firmware
# deb-src http://deb.debian.org/debian bookworm-backports main contrib non-free-firmware

deb http://security.debian.org/debian-security bookworm-security main contrib non-free-firmware
# deb-src http://security.debian.org/debian-security bookworm-security main contrib non-free-firmware
EOF

Set up sudo for user.

su -
apt update
apt install sudo
usermod -aG sudo michael
exit
exit
# [login]

Install GNOME:

sudo apt install task-gnome-desktop

But make sure we don't boot into GUI yet!

sudo systemctl set-default multi-user.target

See below to install NVIDIA driver.

To switch to GUI after booting (only after installing driver):

startx

Fix issue of wired connection not being manageable in GNOME, per advice.

sudo vi /etc/NetworkManager/NetworkManager.conf
---
# change following entry to 'true':
managed=true
---

sudo service NetworkManager restart

AlmaLinux 9.5

Prepare installation flash drive using an existing Linux system:

Go to AlmaLinux website.
Download x86-64 AlmaLinux OS 9.5 Boot ISO
Download balenaEtcher from https://etcher.balena.io
Use balenaEtcher to burn ISO to flash drive

Reboot to installation drive.

Set up networking.
Set up root and admin user passwords.
Set time zone.
Select workstation software. Select all sub-options that might be useful.

Setup and install to 8TB drive nvme1n1. Use Standard Partitions (not LVM).

Installation Partitions
Order	Path	Size
1.	/boot/efi	1024 MiB
2.	/boot	1024 MiB
3.	/	200 GiB
4.	[swap]	10 GiB

Change hostname to make clear which distro is in use:

sudo hostnamectl set-hostname aprime

ssh

If relevant files are already present on system, mount drive and copy them:

sudo cp -r /run/media/michael/data1/ssh_host ~

If relevant files are on Mac, copy via sftp:

# delete current ssh key on Mac for linux box
sudo vi ~/.ssh/known_hosts

sftp michael@prime

mkdir ssh_host
cd ssh_host
put /Users/michael/Documents/Prime\ System/ssh_host/*

sudo reboot

# delete new ssh key on Mac for linux box
sudo vi ~/.ssh/known_hosts

Use previous ssh keys, if any

su -
mkdir -p /run/media/michael/data1
mount /dev/nvme1n1p2 /run/media/michael/data1

cp /run/media/michael/data1/ssh_host/* /etc/ssh/
reboot

Gnome settings

Multitasking > Active Screen Edges: Off
Power > Power Mode: Performance
Power > Power Savings Options > Screen Blank: Never
Suspend & Power Button > Power Button Option: Power Off
Privacy > Screen Lock > Automatic Screen Lock: Off

Linux NVIDIA driver

Be sure computer boots to command line, by typing

sudo systemctl set-default multi-user.target

Reference:

AlmaLinux Wiki - NVIDIA: Installation on 9.x Variant III

If relevant files are already present on system, mount drive and copy them:

# For AlmaLinux use /run/media/michael/...
sudo cp -r /media/michael/data1/mok ~

If relevant files are on Mac, copy via sftp:

sftp michael@prime

cd ..
mkdir mok
cd mok
put /Users/michael/Documents/Prime\ System/mok/*

exit

AlmaLinux prep:

sudo dnf update

sudo dnf install epel-release
sudo dnf config-manager --enable crb
sudo dnf config-manager --set-enabled extras

sudo dnf install kernel-devel
sudo dnf install kernel-headers
sudo dnf install dkms
# sudo dnf install redhat-lsb-core	*** No match for argument: redhat-lsb-core
sudo dnf install vulkan
sudo dnf install vulkan-tools
sudo dnf install vulkan-headers
sudo dnf install vulkan-loader-devel

# this is also needed
sudo dnf install libglvnd-egl.i686

echo "blacklist nouveau" | sudo tee /etc/modprobe.d/nouveau-blacklist.conf
echo "options nouveau modeset=0" | sudo tee -a /etc/modprobe.d/nouveau-blacklist.conf

sudo dracut --force

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

sudo reboot

Debian prep:

sudo apt install build-essential cmake git

sudo apt source linux
E: You must put some 'deb-src' URIs in your sources.list

sudo vi /etc/apt/sources.list
#uncomment deb-src lines
sudo apt update

sudo apt source linux
sudo apt install linux-headers-`uname -r`

sudo dpkg --add-architecture i386
sudo apt update
sudo apt install libc6:i386

sudo apt install pkg-config libglvnd-dev

Download Latest Production Branch Version from https://www.nvidia.com/en-us/drivers/unix/:

Manual driver search: NVIDIA RTX / Quadro, NVIDIA RTX Series, NVIDIA RTX 4000 Ada Generation, Linux 64-bit, English (US)

cd Downloads/
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/570.144/NVIDIA-Linux-x86_64-570.144.run
chmod +x NVIDIA-Linux-x86_64-570.144.run

# AlmaLinux:
sudo ./NVIDIA-Linux-x86_64-570.144.run --glvnd-egl-config-path=/usr/share/glvnd/egl_vendor.d/

# Debian
#   (Make sure latest kernel headers are installed:)
sudo apt install linux-headers-$(uname -r)
sudo ./NVIDIA-Linux-x86_64-570.144.run

Multiple kernel module types are available for this system. Which would you like to use?

NVIDIA Proprietary

The target kernel has CONFIG_MODULE_SIG set, which means that it supports cryptographic signatures on kernel modules. On some systems, the kernel may refuse to load modules without a valid signature from a trusted key. This system also has UEFI Secure Boot enabled; many distributions enforce module signature verification on UEFI systems when Secure Boot is enabled. Would you like to sign the NVIDIA kernel module?

Sign the kernel modul

First time on computer:

Would you like to sign the NVIDIA kernel module with an existing key pair, or would you like to generate a new one?

Generate a new key pair

The NVIDIA kernel module was successfully signed with a newly generated key pair. Would you like to delete the private signing key?

An X.509 certificate containing the public signing key will be installed to /usr/share/nvidia/nvidia-modsign-crt-C3A74FC1.der. The SHA1 fingerprint of this certificate is: 2E:18:9F:2F:9D:E5:A1:39:A2:14:D5:8C:8C:FE:DE:DC:2A:AF:46:C1. This certificate must be added to a key database which is trusted by your kernel in order for the kernel to be able to verify the module signature.

The private signing key will be installed to /usr/share/nvidia/nvidia-modsign-key-C3A74FC1.key. After the public key is added to a key database which is trusted by your kernel, you may reuse the saved public/private key pair to sign additional kernel modules, without needing to re-enroll the public key. Please take some reasonable precautions to secure the private key: see the README for suggestions.

The signed kernel module failed to load. Secure boot is enabled on this system, so this is likely because the kernel does not trust any key which is capable of verifying the module signature. Would you like to install the signed kernel module anyway? Note that if this module loading failure is due to the lack of a trusted signature, you will not be able to load the installed module until after a key that can verify the module signature is added to a key database that is trusted by the kernel. This will likely require rebooting your computer.

Install signed kernel module

Save the .key and .der files on a separate computer and/or separate partition on Linux computer so they can be re-used on subsequent Linux installations

AFTER installation complete, need the following:

sudo mokutil --import /usr/share/nvidia/nvidia-modsign-crt-C3A74FC1.der
sudo reboot

[michael@aprime ~]$ sudo mokutil --import /usr/share/nvidia/nvidia-modsign-crt-C3A74FC1.der
input password: 
wa1mco
sudo reboot
choose option: "Enroll MOK"
save keys

# For AlmaLinux use /run/media/michael/...
sudo cp /usr/share/nvidia/nvidia-modsign* ~/mok
sudo cp /usr/share/nvidia/nvidia-modsign* /media/michael/data1/mok

Later installations on computer:

Sign the kernel module

Would you like to sign the NVIDIA kernel module with an existing key pair, or would you like to generate a new one?

Use an existing key pair

Please provide the path to the private key:

/home/michael/mok/nvidia-modsign-key-C3A74FC1.key

Please provide the path to the public key:

/home/michael/mok/nvidia-modsign-crt-C3A74FC1.der

Install NVIDIA's 32-bit compatibility libraries?

Yes

Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if your kernel changes later.

Yes

The initramfs will likely need to be rebuilt due to the following condition(s): * Nouveau is present in the initramfs. Would you like to rebuild the initramfs?

Rebuild initramfs

Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X? Any pre-existing X configuration file will be backed up.

Yes

Your X configuration file has been successfully updated. Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 570.86.16) is now complete.

Reboot and verify installation:

$ nvidia-smi
Wed Apr  2 18:38:10 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07             Driver Version: 570.133.07     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070        Off |   00000000:01:00.0  On |                  N/A |
|  0%   31C    P8              7W /  200W |     110MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2148      G   /usr/libexec/Xorg                        36MiB |
|    0   N/A  N/A            2223      G   /usr/bin/gnome-shell                     54MiB |
+-----------------------------------------------------------------------------------------+

After driver is successfully installed, start GUI by typing the following after logging in:

startx

Uninstall NVIDIA driver

Sometimes the driver needs to be uninstalled and reinstalled when a kernel update happens. The driver should always be uninstalled before changing to a new video card.

(Make sure the latest kernel headers are also installed before reinstalling the driver)

sudo ./NVIDIA-Linux-x86_64-570.144.run --uninstall

sudo apt install linux-headers-$(uname -r)

If you plan to no longer use the NVIDIA driver, you should make sure that no X screens are configured to use the NVIDIA X driver in your X configuration file. If you used nvidia-xconfig to configure X, it may have created a backup of your original configuration. Would you like to run `nvidia-xconfig --restore-original-backup` to attempt restoration of the original X configuration file?

Yes

Uninstallation of existing driver: NVIDIA Accelerated Graphics Driver for Linux-x86_64 (570.86.16) is complete.

Build llama.cpp

Download and install CUDA Toolkit, etc., by visiting https://developer.nvidia.com/cuda-downloads and specifying desired platform (use Rocky Linux version).

Installer: Linux, x86_64, Rocky, 9, runfile (local)

Copy and run instructions.


cd ~/Downloads
wget https://developer.download.nvidia.com/compute/cuda/12.8.1/local_installers/cuda_12.8.1_570.124.06_linux.run
sudo sh cuda_12.8.1_570.124.06_linux.run

Accept the EULA, then install the following:


┌─ttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt┐
│ CUDA Installer                                                               │
│ - [ ] Driver                                                                 │
│      [ ] 570.124.06                                                          │
│ + [X] CUDA Toolkit 12.8                                                      │
│   [X] CUDA Demo Suite 12.8                                                   │
│   [X] CUDA Documentation 12.8                                                │
│ - [ ] Kernel Objects                                                         │
│      [ ] nvidia-fs                                                           │
│   Options                                                                    │
│   Install                                                                    │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
└─ttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt┘

After installation, the following note appears:


===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-12.8/

Please make sure that
 -   PATH includes /usr/local/cuda-12.8/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-12.8/lib64, or, add /usr/local/cuda-12.8/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.8/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 570.00 is required for CUDA 12.8 functionality to work.
To install the driver using this installer, run the following command, replacing  with the name of this run file:
    sudo .run --silent --driver

Logfile is /var/log/cuda-installer.log

llama.cpp Setup

Download llama.cpp from https://github.com/ggerganov/llama.cpp:

mkdir ~/Projects
cd ~/Projects/

git clone https://github.com/ggerganov/llama.cpp

Build (AlmaLinux):


cd ~/Projects/llama.cpp

export PATH=/usr/local/cuda-12.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH

sudo dnf update
sudo dnf install cmake
sudo dnf install libcurl-devel

sudo dnf install gcc-toolset-13-gcc.x86_64 gcc-toolset-13-gcc-gfortran gcc-toolset-13-gcc-c++

# enable software collection needed for llama.cpp
source scl_source enable gcc-toolset-13

cmake -B build -DGGML_CUDA=ON -DGGML_CCACHE=OFF
cmake --build build --config Release

exit

Mount data partition and run llama.cpp (AlmaLinux):

sudo mkdir -p /run/media/michael/data1
sudo mount /dev/nvme1n1p2 /run/media/michael/data1

~/Projects/llama.cpp/build/bin/llama-cli -m /run/media/michael/data1/models/Ministral-8B-Instruct-2410.bf16.gguf \
  -t 16 --n-gpu-layers 20

Build (Debian):


cd ~/Projects/llama.cpp

export PATH=/usr/local/cuda-12.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH

sudo apt update
sudo apt upgrade
sudo apt install cmake

sudo apt install curl libcurl4-openssl-dev

cmake -B build -DGGML_CUDA=ON -DGGML_CCACHE=OFF
cmake --build build --config Release

Mount data partition and run llama.cpp (Debian):

sudo mkdir -p /media/michael/data1
sudo mount /dev/nvme1n1p2 /media/michael/data1

~/Projects/llama.cpp/build/bin/llama-cli -m /media/michael/data1/models/Ministral-8B-Instruct-2410.bf16.gguf \
  -t 16 --n-gpu-layers 20

If necessary, convert Hugging Face to .gguf files (Debian):

First time through, set up virtual Python environment:

# https://askubuntu.com/questions/320996/how-to-make-python-program-command-execute-python-3
sudo apt install python-is-python3

sudo apt install python3-venv
python -m venv mvirte
mvirte/bin/pip install transformers torch
mvirte/bin/pip install sentencepiece

# Main argument is path to folder containing files of the form model-00001-of-00007.safetensors
mvirte/bin/python ~/Projects/llama.cpp/convert_hf_to_gguf.py /media/michael/data1/models/CodeLlama-34b-Instruct-hf-BF16 \
   --outfile  /media/michael/data1/models/CodeLlama-34b-Instruct-hf-BF16.gguf

If necessary, convert Hugging Face to .gguf files (AlmaLinux):

First time through, set up virtual Python environment:

# https://stackoverflow.com/questions/75608323/how-do-i-solve-error-externally-managed-environment-every-time-i-use-pip-3#75722775
python -m venv mvirte
mvirte/bin/pip install transformers torch
mvirte/bin/pip install sentencepiece

# Main argument is path to folder containing files of the form model-00001-of-00007.safetensors
mvirte/bin/python ~/Projects/llama.cpp/convert_hf_to_gguf.py /run/media/michael/data1/models/CodeLlama-34b-Instruct-hf-BF16 \
   --outfile  /run/media/michael/data1/models/CodeLlama-34b-Instruct-hf-BF16.gguf

If necessary, merge segmented .gguf files (Debian):

# (for AlmaLinux, use /run/media/michael/...)
~/Projects/llama.cpp/build/bin/llama-gguf-split --merge \
  /media/michael/data1/models/Llama-4-Scout-17B-16E-Instruct-GGUF/Llama-4-Scout-17B-16E-Instruct-Q4_K_M-00001-of-00002.gguf \
  /media/michael/data1/models/Llama-4-Scout-17B-16E-Instruct-GGUF/Llama-4-Scout-17B-16E-Instruct-Q4_K_M.gguf

LLM timing tests

Hints from ChatGPT on suggested parameters to use for consistent timing tests:

To ensure repeatable testing with llama-cli, you can use the following command format:

./main -m models/llama-2-13b.Q4_K_M.gguf \
       --prompt "Benchmark test." \
       --n-predict 128 \
       --threads 16 \
       --batch_size 512 \
       --log-disable \
       --repeat-penalty 1.0 \
       --temp 0.7

Explanation of Key Options:

-m models/llama-2-13b.Q4_K_M.gguf → Selects the model file.
--prompt "Benchmark test." → Keeps input consistent.
--n-predict 128 → Ensures the same number of tokens are generated.
--threads 16 → Uses 16 threads for CPU processing (adjustable).
--batch_size 512 → Helps performance but can be tuned if needed.
--log-disable → Prevents excess log output for clean results.
--repeat-penalty 1.0 & --temp 0.7 → Keeps generation behavior stable.

GPU-Specific:

To disable GPU completely, add --n-gpu-layers 0.
To test with GPU assist, set --n-gpu-layers X (e.g., 30 for part offloaded).

Test results, showing tokens per second and effect of 2 DIMMs vs. 4 DIMMs and CPU-only vs. CPU + GPU:

CPU only (tps)	192GB RAM	96GB RAM	2-DIMM Speedup
lama-2-13b.Q4_K_M.gguf	7.04	9.84	40%
llama-2-13b.Q8_0.gguf	4.03	5.71	42%
llama-2-70b.Q4_K_M.gguf	1.35	1.92	42%
Llama-2-70B-fp16_Q8_0.gguf	0.79	1.12	42%

GPU Assist (tps)	192GB RAM	96GB RAM	2-DIMM Speedup	gpu layers
llama-2-13b.Q4_K_M.gguf	50.95	51.03	0%	100
llama-2-13b.Q8_0.gguf	9.52	12.44	31%	27
llama-2-70b.Q4_K_M.gguf	1.73	2.39	38%	21
Llama-2-70B-fp16_Q8_0.gguf	0.90	1.27	41%	12

GPU Speedup	192GB RAM	96GB RAM
llama-2-13b.Q4_K_M.gguf	7.24x	5.19x
llama-2-13b.Q8_0.gguf	2.36x	2.18x
llama-2-70b.Q4_K_M.gguf	1.28x	1.24x
Llama-2-70B-fp16_Q8_0.gguf	1.14x	1.13x

blog