How AirModel Works

Everything you need to understand the platform. What it does, how it protects you, and why it's built this way.

The Short Version

AirModel runs open source language models on dedicated GPU infrastructure. You connect a wallet, we verify you hold enough $AIR, and you get access. That's the entire relationship.

There are no accounts. No emails. No phone numbers. No API keys. No passwords. No usernames. Your wallet address signs a single message to prove ownership, and even that signature is discarded after verification. We don't want to know who you are. The system is designed so we can't know who you are.

When you send a message, it travels to our GPU servers alongside messages from every other person using the platform at that moment. Your query enters a shared inference pool — the same GPU processes your request and a hundred others in the same batch. The model generates a response, streams it back to you, and then both the question and the answer are gone. Not archived. Not logged. Not backed up. Gone.

Your conversation history is stored in your workspace, encrypted with AES-256-GCM. The ciphertext sits on our server, but without the decryption key, it's indistinguishable from random noise. We store your conversations because you asked us to — not because we want them.

Request lifecycle — from wallet to GPU and back

The important thing to understand is that this architecture isn't an accident. Every decision — shared inference, token-gated access, encrypted storage, no accounts — exists because we believe the default relationship between a person and an AI should be private. Not private "with terms and conditions." Private like thinking is private.

Privacy Model

Most AI companies describe privacy as a feature. Something they added on top of their product, like an option in a settings menu. AirModel treats privacy as the foundation. Not something bolted on — something everything else is built around.

What we don't collect

This is a short list because there isn't much to say. We don't collect your name. We don't collect your email. We don't collect your phone number. We don't collect your IP address beyond what's needed for the TCP connection to function, and we don't store it. We don't collect your browser fingerprint. We don't run analytics. There are no tracking pixels on this website. There is no Google Analytics. There is no Mixpanel. There is no Segment. There is no Hotjar.

We don't store your queries. When you send a message to a model, the message travels from our server to the GPU, the model processes it, the response streams back to you, and then both the input and the output are discarded from the inference pipeline. There is no query log. There is no audit trail. There is no "anonymized" dataset of prompts being quietly assembled for future training runs.

What we do store

Your encrypted workspace. If you create conversations and save messages, those messages are encrypted with AES-256-GCM and stored as ciphertext. We store them so you can come back and continue a conversation. The plaintext never touches our disk — encryption happens before storage, decryption happens after retrieval. If someone seized our database, they'd have a collection of random-looking byte strings.

We also store your wallet address to associate your workspace with your wallet. This is necessary so that when you reconnect, we know which workspace is yours. Your wallet address is already public on the blockchain — we're not learning anything new by storing it. What we don't store is anything else. No IP logs, no session history, no access patterns. The wallet address is the only identifier, and it's the same one the whole world can already see on-chain.

Design principle

We don't trust ourselves with your data. That's not modesty — it's engineering. A system that never has your data in the first place can't leak it, can't be compelled to hand it over, and can't quietly start using it for something you didn't agree to. The best security policy is having nothing worth stealing.

Shared inference and anonymity

This is the part that's harder to explain but arguably more important than encryption. When your query reaches our GPU servers, it doesn't run alone. It's batched with queries from other users — sometimes dozens, sometimes hundreds, depending on load. The GPU processes them all at once. There is no per-user queue, no per-user log, no tagging system that associates a response with the person who asked for it.

Even if someone were to intercept traffic between our server and the GPU cluster, they would see a stream of interleaved prompts and completions with no way to determine which query came from which connection. This isn't a privacy policy. It's a property of the architecture. You can't extract what was never separated.

Threat Model

It's worth being honest about what AirModel protects against and what it doesn't. No system is perfectly anonymous, and anyone who tells you otherwise is either confused or lying.

What we protect against

Corporate data harvesting. Your conversations are not training data. They're not being fed into a fine-tuning pipeline. They're not being reviewed by a "trust and safety" team in San Francisco. They don't exist outside your encrypted workspace.
Subpoena compliance. We can't comply with demands for data we don't have. If a government agency requests "all conversations from user X," there is no user X. There are encrypted blobs associated with wallet hashes. The ciphertext is useless without the key, and we don't have the key.
Internal bad actors. Even if someone on our team went rogue, the infrastructure limits what they could access. Encrypted storage means database access yields nothing. No query logs means there's nothing to exfiltrate from the inference pipeline.
Censorship. The models we run are open source with no corporate guardrails. We don't filter inputs. We don't filter outputs. The model answers what you ask. If you're used to getting "I can't help with that" responses, you'll notice the difference.

What we don't protect against

Your own device being compromised. If someone has access to your browser, they can see what you see. AirModel can't protect against a keylogger on your laptop or someone looking over your shoulder.
Network-level surveillance at your end. Your ISP can see that you connected to AirModel's servers. They can't see what you said (the connection is TLS-encrypted), but they can see that you used the service. Use a VPN or Tor if this matters to you.
Wallet tracing. Your wallet address is a public key on a public blockchain. If someone knows your wallet address and knows you use AirModel, they can infer a connection. Use a fresh wallet if this concerns you.

Honest limitation

We are not a government-resistant anonymity tool in the same way Tor is. We are a private AI platform that minimizes what we know about you and what we store. The privacy guarantees come from architectural choices, not legal promises. If you need stronger anonymity guarantees for your specific situation, layer AirModel with other tools (VPN, fresh wallet, separate device).

$AIR Token

Access to AirModel is controlled by a single mechanism: holding $AIR tokens on Base. There are no subscriptions, no credit cards, no payment processing, no invoices. You hold the token, you have access. You sell the token, you lose access. The relationship is that simple.

The threshold is 0.1% of the total supply. Hold that much and you get full, unlimited access to every model, every feature, no rate limits, no tiers. There is no "premium" plan. There is no "enterprise" offering. Everyone who holds enough $AIR gets the same platform. The token is the membership.

Why token-gated?

Because every other payment method requires identity. Credit cards require your name and billing address. PayPal requires an email and phone number. Even cryptocurrency payment processors typically require KYC. By making the token itself the access credential, we eliminate the need for any payment infrastructure that collects personal information.

There's a second reason, which is alignment. When access is controlled by a token, the incentives change. We don't need to maximize "monthly active users" or "time on platform" to hit revenue targets. We need the platform to be good enough that people want to hold $AIR. If we build something valuable, the token appreciates. If we don't, it doesn't. The feedback loop is direct and honest.

Token-gated access flow — no accounts, no passwords

Connecting Your Wallet

AirModel uses MetaMask on the Base network. When you click "Connect Wallet," the following happens:

MetaMask opens

The app requests your wallet address. If you're not on Base, MetaMask will prompt you to switch networks (chain ID 8453). If Base hasn't been added to your wallet yet, it will be added automatically.

You sign a message

This is an EIP-191 personal signature. It costs zero gas and creates no transaction. It simply proves you control the wallet. The message is a plaintext string that says when the signature was created — nothing hidden, nothing sneaky.

We verify on-chain

The server checks the $AIR token contract on Base to confirm your wallet holds at least 0.1% of the total supply. This is a read-only call to the blockchain — we call balanceOf and totalSupply and compare. No transaction. No gas.

Session token issued

If verification passes, the server issues a JWT that lasts 7 days. You're redirected to the chat interface. When the JWT expires, you reconnect your wallet and the process repeats. The JWT is the only thing that identifies your session — and it contains no personally identifiable information.

Dev mode

While the $AIR token hasn't been deployed yet, the platform runs in dev mode. Token verification is skipped, and you can connect with any wallet address. This is temporary — once $AIR is live on Base, the token gate will be active.

Models

AirModel runs open source models. Not wrappers around proprietary APIs — actual model weights running on GPU hardware we control. This matters for two reasons. First, open source models can't be recalled or restricted. The weights exist in the world. Second, nobody can insert a logging layer between you and the model, because we run the full stack.

The current model lineup is designed to cover different use cases without overwhelming you with choices. Five models, each with a clear purpose.

Model	Parameters	Best For	Speed
Llama 3 8B	8 billion	General conversation, quick answers, everyday tasks. The workhorse.	Fast
Mistral 7B	7 billion	Similar to Llama 3 8B with a slightly different reasoning style. Good for variety.	Fast
Llama 3 70B	70 billion	Complex reasoning, analysis, long-form writing. Near GPT-4 quality on many benchmarks.	Moderate
Dolphin-Mistral	8 billion	Uncensored variant with no refusal training. Answers everything. No guardrails, no apologies.	Fast
DeepSeek Coder	varies	Programming, debugging, code generation. Trained specifically on code.	Moderate

These are the launch models. As the platform grows and GPU capacity expands, we'll add more — including larger models, specialized models for math and science, and multimodal models that can process images. The architecture supports loading new models without any changes to the app.

One thing worth noting: these models are uncensored. Not "less censored than ChatGPT" — actually uncensored. The open source community has produced model variants that have had RLHF safety training specifically removed. Dolphin-Mistral is the most notable example, but even the base Llama and Mistral weights are significantly less restricted than anything from OpenAI or Anthropic. The model answers what you ask. Whether that's a good thing depends on you. We think adults can handle unrestricted access to a thinking tool.

Workspaces

A workspace is your personal space on AirModel. Each workspace has its own conversation history, its own system prompt, and its own context. You can have multiple workspaces for different purposes — one for coding, one for writing, one for research, whatever makes sense for how you work.

System prompts

Every workspace has a configurable system prompt. This is a set of instructions that gets prepended to every message you send. If you want the AI to behave a certain way — always respond in Spanish, always write in a particular style, always assume you're a senior engineer — you put that in the system prompt. The system prompt is encrypted at rest along with everything else in your workspace.

Conversations

Within a workspace, conversations work the way you'd expect. Each conversation maintains its own message history. Start a new conversation for a new topic, or continue an existing one. Conversations auto-title themselves based on the first exchange so you can find them later. You can delete any conversation, and deletion means deletion — the encrypted data is removed from our storage, not "soft deleted" or "archived."

Encryption

All messages and system prompts are encrypted at rest using AES-256-GCM — the same encryption standard used by banks and governments for classified data. Each piece of data is encrypted with a unique initialization vector (IV), producing ciphertext and an authentication tag that detects any tampering.

The encrypted data is stored in the format iv:ciphertext:authTag, all hex-encoded. Without the encryption key, this is computationally infeasible to decrypt. "Computationally infeasible" here means it would take all the computers on Earth longer than the age of the universe to brute-force a single message.

Messages encrypted before storage, decrypted after retrieval — plaintext never touches disk

Future: wallet-derived keys

The current implementation uses a server-side encryption key. This is pragmatic — it works, it's secure against external attackers, and it means you don't lose your data if you switch devices. The tradeoff is that we technically have access to the key (even though we have no interest in using it).

The planned upgrade is wallet-derived encryption using PBKDF2. Your wallet signature will generate a unique encryption key that only you can produce. This means even if someone compromised our entire server, they still couldn't read your conversations — because the key lives in your wallet, not on our infrastructure. This is the endgame: true zero-knowledge encryption where we are mathematically incapable of reading your data.

API

AirModel has a straightforward chat completions API. Send messages, get responses. It uses the standard chat completions format that most AI tools already understand, so integrating with existing workflows is trivial.

Endpoint

HTTP
POST https://app.airmodel.io/api/v1/chat/completions

Request format

Send a JSON body with the model you want and an array of messages. Each message has a role (system, user, or assistant) and content. The system message is optional — use it to set the model's behavior for the conversation.

JSON
{
  "model": "llama-3-8b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum entanglement simply."}
  ],
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 4096
}

Available model IDs

llama-3-8b — fast general purpose
mistral-7b — fast general purpose (alternative)
llama-3-70b — high quality reasoning
dolphin-mistral — uncensored
deepseek-coder — code generation

Streaming

Set "stream": true to receive Server-Sent Events (SSE). Each event contains a delta with the next chunk of the response, so you can display tokens as they arrive. The stream ends with a [DONE] event. This is the same streaming format used across the industry, so any SSE client will work.

Non-streaming

Set "stream": false (or omit it) to receive the complete response as a single JSON object. Useful for batch processing or simple integrations where you don't need real-time output.

Standard format

The AirModel API follows the standard chat completions format. If you've built against any chat completions API before, you already know how this works. Same message structure, same streaming protocol, same response shape.

Integration Examples

Python

Python
import requests

response = requests.post(
    "https://app.airmodel.io/api/v1/chat/completions",
    json={
        "model": "llama-3-8b",
        "messages": [{"role": "user", "content": "Hello"}]
    }
)

data = response.json()
print(data["choices"][0]["message"]["content"])

JavaScript

JavaScript
const res = await fetch("https://app.airmodel.io/api/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "dolphin-mistral",
    messages: [{ role: "user", content: "Hello" }]
  })
});

const data = await res.json();
console.log(data.choices[0].message.content);

cURL

Shell
curl -X POST https://app.airmodel.io/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3-70b",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": false
  }'

The API is currently open to allow easy integration and testing. Once the $AIR token is live, API access will be token-gated — the same wallet-based verification used by the web app will be available for programmatic access.