Approx. read time: 7 min.
Post: Unlocking LLaMA 2 on MS‑DOS: A Retrocomputing Journey from 486 to Ryzen
1. Background on Meta’s LLaMA 2
1.1 The LLaMA Family
Meta released LLaMA 2 as an open‑source foundation model family ranging from 7 billion to 70 billion parameters, intended for research and commercial use under a permissive license GitHub. These models use 16‑ or 32‑bit weights, requiring substantial memory (tens to hundreds of megabytes) and compute power, typically served on GPUs or modern CPUs.
1.2 TinyStories and Minimalism
Karpathy’s llama2.c repo demonstrates that narrow‑domain, TinyStories‑trained LLaMA models (~260 KB to 42 MB) still perform useful text completion within constrained environments GitHub. This minimalism—700 lines of pure C for inference—opened the door for porting LLaMA to non‑standard platforms.
2. MS‑DOS Architecture & Memory Limitations
2.1 Real Mode Constraints
MS‑DOS and FreeDOS boot in real mode, offering only 20‑bit addressing (1 MB total memory) with 640 KB conventional memory available for applications Wikipedia. No built‑in memory protection, multitasking, or extended memory management limits the usability of modern, large‑scale software.
2.2 The Need for Protected Mode
To break the 640 KB barrier, programs must switch the CPU into protected mode, introduced with the Intel 80286 and expanded on the 80386. Protected mode grants access to extended memory (>1 MB), 32‑bit addressing, and advanced CPU features Pikuma.
3. DOS Extenders & DPMI Hosts
3.1 DOS Extenders
A DOS extender is software that sits between DOS and an application, handling the switch to protected mode and presenting a flat 32‑bit address space to programs while maintaining DOS API compatibility Wikipedia. Key extenders include:
-
DOS/4G (Rational/Tenberry): up to 64 MB memory support Wikipedia
-
DOS/32 (open‑source DOS/4GW replacement) Wikipedia
3.2 DPMI Hosts
“DOS Protected Mode Interface” (DPMI) hosts provide standardized services (memory allocation, interrupts) to protected‑mode programs. The most popular:
-
CWSDPMI: free 32‑bit host bundled with DJGPP, supporting up to 4 GB virtual memory and real‑mode interrupt reflection Wikipedia
DOSBox emulators also include DPMI servers for running extenders in a virtual environment VOGONS.
4. Key Ports of LLaMA 2 to DOS
4.1 Andrej Karpathy’s llama2.c
Karpathy’s llama2.c is a compact, single‑file C inference engine (∼700 lines) that loads GGUF models into RAM and performs forward passes using FP32 weights. It prioritizes portability (Linux, Windows, Mac, and now DOS) over raw speed GitHub.
4.2 Yeo Kheng Meng’s DOS LLaMA 2 Client
Yeo’s project implements llama2.c under MS‑DOS 6.22 using Open Watcom v2 and DOS/4G, adding custom patches for missing math functions, memory mapping, DOS timing, and 8.3 filename constraints YKM’s Corner on the Web. The GitHub repo provides precompiled binaries (dosllam2.exe
) and source code.
4.3 Hackaday Verification
Tyler August’s Hackaday article, “Will It Run Llama 2? Now DOS Can”, documents Yeo’s retro demos on a ThinkPad T42 (Pentium M 735 1.7 GHz) and Toshiba Satellite 315CDT (Pentium MMX 200 MHz), as well as a benchmark on a 486 DX‑2 66 MHz Hackaday.
5. Porting Challenges & Technical Adaptations
Porting seven hundred lines of modern C to DOS demanded overcoming several API and system incompatibilities:
-
Math Library Substitutions: Replace missing
sqrtf
,expf
, andfabs
with custom macros or fixed‑point approximations YKM’s Corner on the Web. -
File I/O & Memory Mapping: Windows/Linux use
mmap
for lazy loading; on DOS this was replaced by reading the entire model file into a malloc’d buffer YKM’s Corner on the Web. -
Timing Functions: Swap POSIX
clock_gettime
orgettimeofday
for DOS’sclock()
or BIOS interrupts for performance metrics YKM’s Corner on the Web. -
Filename Length Limits: Conform model filenames to 8.3 format (e.g.,
tiny-llama-260k.gguf
→TINYSTOR.GGU
) YKM’s Corner on the Web. -
Linker & Extender Stubs: Embed DOS/4G or CWSDPMI stubs in the EXE to auto-switch to protected mode upon launch WikipediaWikipedia.
6. Step‑by‑Step Setup Guide
-
Hardware & OS
-
Acquire Tools
-
Compiler: DJGPP (GCC for DOS) blogsystem5.substack.com or Open Watcom v2 open-watcom.github.io.
-
DPMI Host: Copy
CWSDPMI.EXE
(r7) orDOS4G.EXE
into working folder WikipediaWikipedia.
-
-
Download Source & Models
-
Clone Karpathy’s llama2.c:
-
Download a TinyStories GGUF model (e.g., 260 KB) from community mirrors or Hugging Face:
-
-
Configure & Compile
-
DJGPP Example:
-
Ensure
djgpp.env
,djgpp.exe
, andCWSDPMI.EXE
are present in the directory blogsystem5.substack.com.
-
-
Run Inference
You should see real‑time token outputs, ~2.08 t/s on a 486 and scaling higher on faster CPUs Hackaday.
7. Performance Benchmarks
The following table aggregates results from Hackaday and Yeo’s demos:
Hardware | CPU | RAM | Model Size | Tokens/sec |
---|---|---|---|---|
Generic 486 DX‑2 66 MHz | 486 DX‑2 66 MHz | 32 MB EDO | 260 KB | 2.08 Hackaday |
Toshiba Satellite 315CDT (1996) | Pentium MMX 200 MHz | 96 MB EDO | 260 KB | 15.32 Hackaday |
ThinkPad T42 (2004) | Pentium M 735 1.7 GHz | 2 GB DDR | 110 MB | 1.71 Hackaday |
ThinkPad X13 Gen 1 (2020) | Core i5‑10310U 1.7 GHz | 16 GB DDR4 | 42 MB | 3.89 YKM’s Corner on the Web |
Modern Desktop (2024) | Ryzen 5 7600 3.8 GHz | 128 GB DDR5 | 110 MB | ✗ (alloc fail) YKM’s Corner on the Web |
Note: The 110 MB model fails to allocate on the modern desktop due to a DOS extender memory bug, letting the older ThinkPad T42 outperform it YKM’s Corner on the Web.
8. Optimizations & Troubleshooting
-
Model Size Management: Use smallest GGUF files (<16 MB) to avoid DPMI limits.
-
Compiler Flags: Enable
-march=i386
(DJGPP) or Watcom’s-bt=dos386
for optimal code blogsystem5.substack.comopen-watcom.github.io. -
Extender Variants: If DOS/4G fails, swap to DOS/32 for open‑source flexibility Wikipedia.
-
Testing in DOSBox: Configure
dosbox.conf
to enable internal or external DPMI server before flashing real hardware VOGONS. -
Memory Mapping Hacks: For larger models, implement manual segmented loads to stream weights in chunks.
9. The Future: 16‑bit Ports & Beyond
While 32‑bit i386 machines can now host LLaMA 2, the next frontier is 16‑bit platforms (286, 68000), where protected‑mode support is rudimentary or nonexistent Retrocomputing Stack Exchange. Achieving LLM inference on a 286 would require innovative segmented memory management, custom allocators, and possibly FP16 or fixed‑point quantization to fit within 1 MB addressable space.
10. Conclusion
By integrating Karpathy’s llama2.c, DOS extenders (DOS/4G, DOS/32), DPMI hosts (CWSDPMI), and community ingenuity, running LLaMA 2 under DOS on hardware as old as a 486 is now viable. This retrocomputing triumph not only showcases the flexibility of DOS extenders and minimal C code but also inspires broader exploration of AI inference on unconventional platforms. Whether you’re a retro enthusiast or a developer seeking to understand low‑level memory and CPU modes, this DOS‑based LLaMA 2 port is a testament to what can be achieved by marrying modern AI with vintage computing.
References
-
Andrej Karpathy, “karpathy/llama2.c: Inference Llama 2 in one file of pure C,” GitHub, accessed Apr 2025. GitHub
-
“DOS/4G,” Wikipedia, accessed Apr 2025. Wikipedia
-
“DOS extender,” Wikipedia, accessed Apr 2025. Wikipedia
-
“DOS/32,” Wikipedia, accessed Apr 2025. Wikipedia
-
“CWSDPMI,” Wikipedia, accessed Apr 2025. Wikipedia
-
Julio Merino, “Running GNU on DOS with DJGPP,” Blog System/5, Feb 2024. blogsystem5.substack.com
-
“Open Watcom v2,” Open Watcom GitHub, accessed Apr 2025. open-watcom.github.io
-
Tyler August, “Will It Run Llama 2? Now DOS Can,” Hackaday, Apr 19 2025. Hackaday
-
“Llama 2 LLM on DOS,” Yeo Kheng Meng, Apr 2025. YKM’s Corner on the Web
-
“DOS/4GW and Protected Mode,” Pikuma, 2021. Pikuma
-
“TinyStories llama2 gguf download,” Hugging Face, accessed Apr 2025. Hugging Face
-
“(Deprecated) Llama 2,” meta-llama/llama GitHub, accessed Apr 2025. GitHub
-
“Real mode,” Wikipedia, accessed Apr 2025. Wikipedia
-
“DOSBOX and DPMI,” VOGONS, accessed Apr 2025. VOGONS
-
“Why did ‘protected-mode MS‑DOS’ never happen?” Retrocomputing StackExchange, Jan 2023. Retrocomputing Stack Exchange