4mac — Premium LLM inference, optimized for your Mac

4mac provides a drop-in local replacement for the official OpenAI API. Just point your favorite agent to our local reverse proxy.

CONTEXT	PROMPT TPS	TOKEN TPS	PEAK MEM
1k	588 tok/s	34.0 tok/s	227 GB
4k	704 tok/s	30.3 tok/s	228 GB
8k	663 tok/s	26.3 tok/s	229 GB
32k	426 tok/s	14.9 tok/s	235 GB

BATCH	TOKEN TPS	SPEEDUP
1x	34.0 tok/s	1.00x
2x	49.7 tok/s	1.46x
4x	109.8 tok/s	3.23x
8x	126.3 tok/s	3.71x

CONTEXT	PROMPT TPS	TOKEN TPS	PEAK MEM
1k	768 tok/s	56.6 tok/s	65.5 GB
8k	941 tok/s	54.0 tok/s	69 GB
16k	886 tok/s	48.3 tok/s	71 GB
32k	765 tok/s	42.4 tok/s	73 GB

BATCH	TOKEN TPS	SPEEDUP
1x	56.6 tok/s	1.00x
2x	92.1 tok/s	1.63x
4x	135.1 tok/s	2.39x
8x	190.2 tok/s	3.36x

CONTEXT	PROMPT TPS	TOKEN TPS	PEAK MEM
1k	1,462 tok/s	58.7 tok/s	80 GB
8k	2,009 tok/s	54.9 tok/s	83 GB
16k	1,896 tok/s	52.3 tok/s	83 GB
32k	1,624 tok/s	45.1 tok/s	85 GB

BATCH	TOKEN TPS	SPEEDUP
1x	58.7 tok/s	1.00x
2x	100.5 tok/s	1.71x
4x	164.0 tok/s	2.79x
8x	243.3 tok/s	4.14x

CONTEXT	PROMPT TPS	TOKEN TPS	PEAK MEM
1k	187 tok/s	16.7 tok/s	392 GB
4k	180 tok/s	13.7 tok/s	394 GB
16k	117 tok/s	12.0 tok/s	403 GB
32k	78 tok/s	10.7 tok/s	415 GB

BATCH	TOKEN TPS	SPEEDUP
1x	16.7 tok/s	1.00x
2x	23.7 tok/s	1.42x
4x	47.0 tok/s	2.81x
8x	60.3 tok/s	3.61x

Open Cursor Settings > Models. Disable standard providers, enable OpenAI Compatible, and add:

Base URL: http://localhost:8000/v1
API Key:  sk-4mac  (or leave blank)
Model:    4mac    (type exactly this, 4mac will auto-route)

Run this single configuration command in your terminal to completely redirect Claude Code's brain to your local Mac:

claude config add local --provider openai --model 4mac \
  --base-url http://localhost:8000/v1

# Start your local session:
claude -p local

If you are using the 4claw desktop/CLI ecosystem, simply edit your global `config.yaml` standard provider settings:

llm:
  provider: openai
  model: 4mac
  base_url: http://localhost:8000/v1
  api_key: sk-4mac

For developers writing custom scripts, substitute the standard OpenAI client instantiation with your local endpoint:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-4mac"
)

completion = client.chat.completions.create(
    model="4mac",
    messages=[{"role": "user", "content": "Analyze my repo."}]
)

Local AI, no more waiting
on your Mac.

Serving Stats

Built for the way
agents actually work.

Paged SSD KV caching

Continuous batching

Native macOS menu bar app

Multi-model serving

OpenAI + Anthropic drop-in

Tool calling + MCP

Real numbers,
real hardware.

Single request performance

Continuous batching

Single request performance

Continuous batching

Single request performance

Continuous batching

Single request performance

Continuous batching

Common questions.

Up and running
in two minutes.

macOS App

THE JOURNEY

From zero to
local AI in 4 steps.

Initial Configuration

Start Server

Open Settings (Models)

Generate Tokens

INTEGRATION

Connect your
AI workflow.

Built for the wayagents actually work.

Paged SSD KV caching

Continuous batching

Native macOS menu bar app

Multi-model serving

OpenAI + Anthropic drop-in

Tool calling + MCP

Real numbers,real hardware.

Single request performance

Continuous batching

Single request performance

Continuous batching

Single request performance

Continuous batching

Single request performance

Continuous batching

Common questions.

Up and runningin two minutes.

macOS App

THE JOURNEY

From zero tolocal AI in 4 steps.

Initial Configuration

Start Server

Open Settings (Models)

Generate Tokens

INTEGRATION

Connect your AI workflow.

Download 4mac

macOS (Apple Silicon / ARM64)

macOS 早期内测版安装必读

方法一（终端极速秒开）：

方法二（系统原生放行）：

Built for the way
agents actually work.

Real numbers,
real hardware.

Up and running
in two minutes.

From zero to
local AI in 4 steps.

Connect your
AI workflow.