research · in-house models

We build the models
that power Turborg.

xShellz runs an AI research effort focused on one thing: models that do real work fast, efficiently, and cheap. We build on the best open-weight models (DeepSeek and Gemma), fine-tune them on our own data, quantize them to run lean, and wrap them in agent harnesses. We don't pretrain from scratch. The AI features in Turborg run on this work, and we contribute back to the open source projects we build on.

Try Turborg Free → See the product

train · fine-tune · quantize · distill · serve · evaluate

What we work on

From a raw dataset to an agent that runs in a second.

Our work spans the whole stack, from the data a model learns on to the harness that turns it into a working agent. Here is what that looks like.

foundations

Built on open weights

We don't pretrain language models from scratch, and we don't pretend to. We build on open-weight foundation models (DeepSeek and Gemma) and tune them for the jobs Turborg actually does. Smaller, task-focused models beat a giant general one on the work that matters here: reading IRC, summarizing channels, writing code, and acting on commands.

open-source

Honest about it

We are big fans of open source. We build on the best open-weight models out there, fine-tune them for our tasks, and contribute fixes and findings back to the projects we lean on. You always know what's under the hood, and bring-your-own open-weights model is coming, so you're never locked in.

fine-tuning

Fine-tuning for IRC tasks

We take strong base models, ours or open weights, and fine-tune them on the exact tasks our users run. Supervised fine-tuning plus preference tuning teaches a model to write a clean summary or a polished message the way our users expect, not the way a generic chatbot would.

dataset

Our own training data

Fifteen years of running IRC infrastructure means we understand chat at scale. We build our own curated datasets for the tasks we care about, labelled and filtered in-house, so our models learn from data that reflects real conversations instead of scraped noise.

quantize

Quantization and distillation

A model is only useful if it runs cheap. We quantize to 8-bit and 4-bit and distill large models into small ones, cutting memory and latency while keeping the quality our users feel. That is how an AI summary comes back in a second, not a minute.

harness

AI agent harnesses

Models are one piece. The harness around them is what turns a model into a working agent: tool calling, memory, retrieval over your backlog, and guardrails that keep it on task. We build the scaffolding that lets a Turborg agent read a channel, run a skill, and answer on command.

inference

Fast, cheap inference

We optimize the full inference path: batching, caching, speculative decoding, and serving our quantized models on hardware we already operate. The result is AI that stays responsive under load and stays affordable enough to include in every plan.

evals

Evals we trust

We do not ship on vibes. Every model and prompt change runs against task-specific evals: does the summary capture the thread, does Polish keep your meaning, does the agent pick the right tool. We measure quality, latency, and cost before anything reaches a user.

rag

Retrieval and memory

Long channels and weeks of backlog do not fit in a prompt. We use retrieval to pull the right context and give agents persistent memory across sessions, so an agent can reason over history it was never shown all at once.

How we build

Every AI feature follows the same path.

01

Collect and curate

Build a clean, labelled dataset for the task from data we own and understand.
02

Fine-tune

Fine-tune an open-weight base model on the task with supervised and preference tuning.
03

Compress

Quantize and distill until it runs in a fraction of the memory and time, with the quality held steady.
04

Wrap in a harness

Add tool calling, retrieval, memory, and guardrails so the model becomes a dependable agent.
05

Evaluate

Score it against task evals for quality, latency, and cost. Iterate until it clears the bar.
06

Ship and watch

Serve it on our own hardware, behind the features you use, and keep measuring in production.

What we optimize for

Fast, efficient, cheap. In that order, every time.

Fast

A summary or a polished message has to land while you are still looking at the screen. We fine-tune and serve small models and cut every millisecond we can out of the inference path so the answer feels instant.

Efficient

Quantization, distillation, and the right model size for the task mean we get good output from a fraction of the compute a general-purpose model would burn. Less waste, more throughput on the hardware we already run.

Cheap

Efficient models on our own infrastructure keep the cost per request low, which is why AI is part of every Turborg plan rather than a premium add-on you pay extra for.

See the research running in Turborg.

AI Polish, channel summaries, and agents, all on open models we fine-tune ourselves.

Try Turborg Free → What is Turborg

We build the modelsthat power Turborg.