⚡ Rocket.net – Managed WordPress Hosting

MiltonMarketing.com  Powered by Rocket.net – Managed WordPress Hosting

Bernard Aybouts - Blog - MiltonMarketing.com

Approx. read time: 7 min.

Post: Unlocking LLaMA 2 on MS‑DOS: A Retrocomputing Journey from 486 to Ryzen

1. Background on Meta’s LLaMA 2

1.1 The LLaMA Family

Meta released LLaMA 2 as an open‑source foundation model family ranging from 7 billion to 70 billion parameters, intended for research and commercial use under a permissive license GitHub. These models use 16‑ or 32‑bit weights, requiring substantial memory (tens to hundreds of megabytes) and compute power, typically served on GPUs or modern CPUs.

1.2 TinyStories and Minimalism

Karpathy’s llama2.c repo demonstrates that narrow‑domain, TinyStories‑trained LLaMA models (~260 KB to 42 MB) still perform useful text completion within constrained environments GitHub. This minimalism—700 lines of pure C for inference—opened the door for porting LLaMA to non‑standard platforms.


2. MS‑DOS Architecture & Memory Limitations

2.1 Real Mode Constraints

MS‑DOS and FreeDOS boot in real mode, offering only 20‑bit addressing (1 MB total memory) with 640 KB conventional memory available for applications Wikipedia. No built‑in memory protection, multitasking, or extended memory management limits the usability of modern, large‑scale software.

2.2 The Need for Protected Mode

To break the 640 KB barrier, programs must switch the CPU into protected mode, introduced with the Intel 80286 and expanded on the 80386. Protected mode grants access to extended memory (>1 MB), 32‑bit addressing, and advanced CPU features Pikuma.


3. DOS Extenders & DPMI Hosts

3.1 DOS Extenders

A DOS extender is software that sits between DOS and an application, handling the switch to protected mode and presenting a flat 32‑bit address space to programs while maintaining DOS API compatibility Wikipedia. Key extenders include:

  • DOS/4G (Rational/Tenberry): up to 64 MB memory support Wikipedia

  • DOS/32 (open‑source DOS/4GW replacement) Wikipedia

3.2 DPMI Hosts

“DOS Protected Mode Interface” (DPMI) hosts provide standardized services (memory allocation, interrupts) to protected‑mode programs. The most popular:

  • CWSDPMI: free 32‑bit host bundled with DJGPP, supporting up to 4 GB virtual memory and real‑mode interrupt reflection Wikipedia

DOSBox emulators also include DPMI servers for running extenders in a virtual environment VOGONS.


4. Key Ports of LLaMA 2 to DOS

4.1 Andrej Karpathy’s llama2.c

Karpathy’s llama2.c is a compact, single‑file C inference engine (∼700 lines) that loads GGUF models into RAM and performs forward passes using FP32 weights. It prioritizes portability (Linux, Windows, Mac, and now DOS) over raw speed GitHub.

4.2 Yeo Kheng Meng’s DOS LLaMA 2 Client

Yeo’s project implements llama2.c under MS‑DOS 6.22 using Open Watcom v2 and DOS/4G, adding custom patches for missing math functions, memory mapping, DOS timing, and 8.3 filename constraints YKM’s Corner on the Web. The GitHub repo provides precompiled binaries (dosllam2.exe) and source code.

4.3 Hackaday Verification

Tyler August’s Hackaday article, “Will It Run Llama 2? Now DOS Can”, documents Yeo’s retro demos on a ThinkPad T42 (Pentium M 735 1.7 GHz) and Toshiba Satellite 315CDT (Pentium MMX 200 MHz), as well as a benchmark on a 486 DX‑2 66 MHz Hackaday.


5. Porting Challenges & Technical Adaptations

Porting seven hundred lines of modern C to DOS demanded overcoming several API and system incompatibilities:

  • Math Library Substitutions: Replace missing sqrtf, expf, and fabs with custom macros or fixed‑point approximations YKM’s Corner on the Web.

  • File I/O & Memory Mapping: Windows/Linux use mmap for lazy loading; on DOS this was replaced by reading the entire model file into a malloc’d buffer YKM’s Corner on the Web.

  • Timing Functions: Swap POSIX clock_gettime or gettimeofday for DOS’s clock() or BIOS interrupts for performance metrics YKM’s Corner on the Web.

  • Filename Length Limits: Conform model filenames to 8.3 format (e.g., tiny-llama-260k.ggufTINYSTOR.GGU) YKM’s Corner on the Web.

  • Linker & Extender Stubs: Embed DOS/4G or CWSDPMI stubs in the EXE to auto-switch to protected mode upon launch WikipediaWikipedia.


6. Step‑by‑Step Setup Guide

  1. Hardware & OS

    • Intel 80386+ CPU (486, Pentium, Pentium M, etc.)

    • ≥4 MB RAM recommended for tiny models, ≥128 MB for larger ones.

    • MS‑DOS 6.22 or FreeDOS 1.4 installed on machine or in DOSBox HackadayVOGONS.

  2. Acquire Tools

  3. Download Source & Models

    • Clone Karpathy’s llama2.c:

      bash
      git clone https://github.com/karpathy/llama2.c
    • Download a TinyStories GGUF model (e.g., 260 KB) from community mirrors or Hugging Face:

      bash
      grab https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/tiny-llama-260k.gguf
      ``` :contentReference[oaicite:23].
  4. Configure & Compile

    • DJGPP Example:

      bash
      export DJGPP=g:/djgpp/bin
      gcc -O2 -march=i386 -o llama2.exe llama2.c -lm
    • Ensure djgpp.env, djgpp.exe, and CWSDPMI.EXE are present in the directory blogsystem5.substack.com.

  5. Run Inference

    dos
    C:\DOSLLAMA> dos4g llama2.exe --model TINYSTOR.GGU

    You should see real‑time token outputs, ~2.08 t/s on a 486 and scaling higher on faster CPUs Hackaday.


7. Performance Benchmarks

The following table aggregates results from Hackaday and Yeo’s demos:

Hardware CPU RAM Model Size Tokens/sec
Generic 486 DX‑2 66 MHz 486 DX‑2 66 MHz 32 MB EDO 260 KB 2.08 Hackaday
Toshiba Satellite 315CDT (1996) Pentium MMX 200 MHz 96 MB EDO 260 KB 15.32 Hackaday
ThinkPad T42 (2004) Pentium M 735 1.7 GHz 2 GB DDR 110 MB 1.71 Hackaday
ThinkPad X13 Gen 1 (2020) Core i5‑10310U 1.7 GHz 16 GB DDR4 42 MB 3.89 YKM’s Corner on the Web
Modern Desktop (2024) Ryzen 5 7600 3.8 GHz 128 GB DDR5 110 MB ✗ (alloc fail) YKM’s Corner on the Web

Note: The 110 MB model fails to allocate on the modern desktop due to a DOS extender memory bug, letting the older ThinkPad T42 outperform it YKM’s Corner on the Web.


8. Optimizations & Troubleshooting

  • Model Size Management: Use smallest GGUF files (<16 MB) to avoid DPMI limits.

  • Compiler Flags: Enable -march=i386 (DJGPP) or Watcom’s -bt=dos386 for optimal code blogsystem5.substack.comopen-watcom.github.io.

  • Extender Variants: If DOS/4G fails, swap to DOS/32 for open‑source flexibility Wikipedia.

  • Testing in DOSBox: Configure dosbox.conf to enable internal or external DPMI server before flashing real hardware VOGONS.

  • Memory Mapping Hacks: For larger models, implement manual segmented loads to stream weights in chunks.


9. The Future: 16‑bit Ports & Beyond

While 32‑bit i386 machines can now host LLaMA 2, the next frontier is 16‑bit platforms (286, 68000), where protected‑mode support is rudimentary or nonexistent Retrocomputing Stack Exchange. Achieving LLM inference on a 286 would require innovative segmented memory management, custom allocators, and possibly FP16 or fixed‑point quantization to fit within 1 MB addressable space.


10. Conclusion

By integrating Karpathy’s llama2.c, DOS extenders (DOS/4G, DOS/32), DPMI hosts (CWSDPMI), and community ingenuity, running LLaMA 2 under DOS on hardware as old as a 486 is now viable. This retrocomputing triumph not only showcases the flexibility of DOS extenders and minimal C code but also inspires broader exploration of AI inference on unconventional platforms. Whether you’re a retro enthusiast or a developer seeking to understand low‑level memory and CPU modes, this DOS‑based LLaMA 2 port is a testament to what can be achieved by marrying modern AI with vintage computing.


References

  1. Andrej Karpathy, “karpathy/llama2.c: Inference Llama 2 in one file of pure C,” GitHub, accessed Apr 2025. GitHub

  2. “DOS/4G,” Wikipedia, accessed Apr 2025. Wikipedia

  3. “DOS extender,” Wikipedia, accessed Apr 2025. Wikipedia

  4. “DOS/32,” Wikipedia, accessed Apr 2025. Wikipedia

  5. “CWSDPMI,” Wikipedia, accessed Apr 2025. Wikipedia

  6. Julio Merino, “Running GNU on DOS with DJGPP,” Blog System/5, Feb 2024. blogsystem5.substack.com

  7. “Open Watcom v2,” Open Watcom GitHub, accessed Apr 2025. open-watcom.github.io

  8. Tyler August, “Will It Run Llama 2? Now DOS Can,” Hackaday, Apr 19 2025. Hackaday

  9. “Llama 2 LLM on DOS,” Yeo Kheng Meng, Apr 2025. YKM’s Corner on the Web

  10. “DOS/4GW and Protected Mode,” Pikuma, 2021. Pikuma

  11. “TinyStories llama2 gguf download,” Hugging Face, accessed Apr 2025. Hugging Face

  12. “(Deprecated) Llama 2,” meta-llama/llama GitHub, accessed Apr 2025. GitHub

  13. “Real mode,” Wikipedia, accessed Apr 2025. Wikipedia

  14. “DOSBOX and DPMI,” VOGONS, accessed Apr 2025. VOGONS

  15. “Why did ‘protected-mode MS‑DOS’ never happen?” Retrocomputing StackExchange, Jan 2023. Retrocomputing Stack Exchange

Favicon
Favicon
Favicon
Favicon
Favicon
Sources

Leave A Comment


About the Author: Bernard Aybout (Virii8)

Avatar of Bernard Aybout (Virii8)
I am a dedicated technology enthusiast with over 45 years of life experience, passionate about computers, AI, emerging technologies, and their real-world impact. As the founder of my personal blog, MiltonMarketing.com, I explore how AI, health tech, engineering, finance, and other advanced fields leverage innovation—not as a replacement for human expertise, but as a tool to enhance it. My focus is on bridging the gap between cutting-edge technology and practical applications, ensuring ethical, responsible, and transformative use across industries. MiltonMarketing.com is more than just a tech blog—it's a growing platform for expert insights. We welcome qualified writers and industry professionals from IT, AI, healthcare, engineering, HVAC, automotive, finance, and beyond to contribute their knowledge. If you have expertise to share in how AI and technology shape industries while complementing human skills, join us in driving meaningful conversations about the future of innovation. 🚀