research · in-house models

We build the models
that power Turborg.

xShellz runs an AI research effort focused on one thing: models that do real work fast, efficiently, and cheap. We train our own compact models, build on the best open-weight models where they fit, fine-tune both on our own data, quantize them to run lean, and wrap them in agent harnesses. The AI features in Turborg run on this work, and we contribute back to the open source projects we build on.

train · fine-tune · quantize · distill · serve · evaluate

What we work on

From a raw dataset to an agent that runs in a second.

Our work spans the whole stack, from the data a model learns on to the harness that turns it into a working agent. Here is what that looks like.

pretraining

Training our own models

We train compact language models from the ground up, sized for the jobs Turborg actually does. Smaller, task-focused models beat a giant general model on the work that matters here: reading IRC, summarizing channels, and acting on commands.

open-weight

Built on open weights

We are not precious about training everything ourselves. We are big fans of open source, so we build on the best open-weight models out there, fine-tune them for our tasks, and contribute fixes and findings back to the projects we lean on. The right tool for the job, whether we trained it or the community did.

fine-tuning

Fine-tuning for IRC tasks

We take strong base models, ours or open weights, and fine-tune them on the exact tasks our users run. Supervised fine-tuning plus preference tuning teaches a model to write a clean summary or a polished message the way our users expect, not the way a generic chatbot would.

dataset

Our own training data

Fifteen years of running IRC infrastructure means we understand chat at scale. We build our own curated datasets for the tasks we care about, labelled and filtered in-house, so our models learn from data that reflects real conversations instead of scraped noise.

quantize

Quantization and distillation

A model is only useful if it runs cheap. We quantize to 8-bit and 4-bit and distill large models into small ones, cutting memory and latency while keeping the quality our users feel. That is how an AI summary comes back in a second, not a minute.

harness

AI agent harnesses

Models are one piece. The harness around them is what turns a model into a working agent: tool calling, memory, retrieval over your backlog, and guardrails that keep it on task. We build the scaffolding that lets a Turborg agent read a channel, run a skill, and answer on command.

inference

Fast, cheap inference

We optimize the full inference path: batching, caching, speculative decoding, and serving our quantized models on hardware we already operate. The result is AI that stays responsive under load and stays affordable enough to include in every plan.

evals

Evals we trust

We do not ship on vibes. Every model and prompt change runs against task-specific evals: does the summary capture the thread, does Polish keep your meaning, does the agent pick the right tool. We measure quality, latency, and cost before anything reaches a user.

rag

Retrieval and memory

Long channels and weeks of backlog do not fit in a prompt. We use retrieval to pull the right context and give agents persistent memory across sessions, so an agent can reason over history it was never shown all at once.

How we build

Every AI feature follows the same path.

  1. 01

    Collect and curate

    Build a clean, labelled dataset for the task from data we own and understand.

  2. 02

    Train and fine-tune

    Pretrain a compact model or fine-tune a base model on the task with supervised and preference tuning.

  3. 03

    Compress

    Quantize and distill until it runs in a fraction of the memory and time, with the quality held steady.

  4. 04

    Wrap in a harness

    Add tool calling, retrieval, memory, and guardrails so the model becomes a dependable agent.

  5. 05

    Evaluate

    Score it against task evals for quality, latency, and cost. Iterate until it clears the bar.

  6. 06

    Ship and watch

    Serve it on our own hardware, behind the features you use, and keep measuring in production.

What we optimize for

Fast, efficient, cheap. In that order, every time.

Fast

A summary or a polished message has to land while you are still looking at the screen. We train and serve small models and cut every millisecond we can out of the inference path so the answer feels instant.

Efficient

Quantization, distillation, and the right model size for the task mean we get good output from a fraction of the compute a general-purpose model would burn. Less waste, more throughput on the hardware we already run.

Cheap

Efficient models on our own infrastructure keep the cost per request low, which is why AI is part of every Turborg plan rather than a premium add-on you pay extra for.

See the research running in Turborg.

AI Polish, channel summaries, and agents, all on models we train ourselves.