MQTT.Agent - open protocol for AI agents

Retour au blog
Collaboration Real-time

The Three Primitives Every Collaboration App Needs

Presence, ordered message delivery, and per-document access control. Everything else is product polish.

February 3, 2026
6 min de lecture
By CloudSignal Team

What “collaboration” actually requires

Most posts about building collaboration apps start with CRDTs or operational transform. Those are real, hard, and well-studied, but they are the merge layer, not the substrate. They tell you how to reconcile two conflicting edits to the same paragraph. They do not tell you how the second editor knew the first one was there, how the change got from one machine to the other, or how the broker decided either of them was allowed to be in the document at all. Picking a CRDT before you have those answers is picking a roof before you have walls.

Underneath any collaborative feature, there are three primitives that have to work. You need to know who is in the room, identified by an account you can audit. You need a delivery channel that preserves the order of changes within a scope, so the second client sees the same sequence the first client published. And you need access control evaluated by the server, not the client, so a curious user cannot just point their tab at someone else’s document. Get those three right and the rest is product. Get any of them wrong and the rest does not matter.

Primitive 1: Presence with identity

Presence is the easy one to fake and the hard one to do honestly. Plenty of products show “5 people editing” by counting WebSocket connections in a Redis set. It looks alive, but the dots are anonymous, the audit log is empty, and you have no way to answer the question your customer will eventually ask, which is “who saw this document on the 14th.” Real presence ties every cursor and avatar to a user_id that has been authenticated, that is scoped to an organization, and that can be revoked.

We covered the mechanics of presence in detail earlier, so we will not relitigate them here. The short version is that MQTT-based brokers have a last-will-and-testament primitive that fires a goodbye message automatically when a connection dies, so you do not have to count heartbeats and you do not accumulate ghost users on flaky mobile networks. What matters for collaboration is that the topic carries identity:

will: {
  topic: `presence/${orgShortId}/${userId}`,
  payload: '{"status":"offline"}',
  retain: true,
}

The userId in that topic is the same id your authentication layer issued. Subscribers see who is here, not how many connections are open.

Primitive 2: Ordered delivery within a document

The second primitive is delivery order. Every change to a document needs a stable sequence on the way out, or undo stacks lie, cursors jump, and selection-sharing turns into a flicker. The cheap way to get this is to give each document its own topic. Every edit, every cursor move, every comment publishes to docs/{org}/{doc_id}/{channel}, and the broker delivers messages to all subscribers of that topic in publish order. There is one queue per document, and the queue does the work that a custom event bus would otherwise do badly.

It is worth being precise about what this guarantees. The broker preserves the order of messages from any single publisher to all subscribers. It does not magically resolve concurrent edits from two different publishers, and it does not give you a globally consistent total order across documents. If Alice and Bob both type at the same instant, both edits reach every subscriber, in some order, and the merge layer is still your job. What you get for free is enough ordering to make local features feel correct. Cursors arrive in the order they moved. A comment posted after an edit shows up after the edit. Undo on the originating client matches what other clients saw.

Primitive 3: Per-document access control

The third primitive is the one most homegrown collab stacks botch. ACL rules have to live on the server and they have to be scoped narrowly enough that a single rule cannot leak a whole workspace. We scope them by topic prefix. A user with access to one document gets a rule allowing publish and subscribe on docs/{org}/{doc_id}/+. A user with access to a workspace gets the wildcard one level up. Anything else is denied by default.

The reason the topic-prefix model is worth caring about is that it is enforced where it cannot be bypassed. ACL evaluation happens server-side at subscribe time. A client that ignores its own permission check, or that ships a tampered build, still gets rejected at the broker. Per-organization mountpoints make this a hard isolation boundary rather than a soft one. A leaked credential in one tenant cannot subscribe to topics in another, because the topic spaces are physically separate inside the broker. You do not have to trust the client to enforce anything.

What we don’t ship and why

We deliberately do not ship a CRDT, an operational-transform engine, an undo stack, or a conflict-resolution UI. Those are application concerns, and the right shape for them depends on what you are building. A text editor wants Yjs or Automerge. A diagram tool wants something tree-aware. A form builder might not need a merge layer at all, because edits are field-scoped and rarely overlap. Picking one for you would be making your product decision in our roadmap.

The earlier generation of “all-in-one collab platforms” tried to do this and ended up too opinionated to adopt. You inherited their data model, their offline strategy, their auth, and their pricing in one bundle. If any one of those was wrong for your app, the whole thing was wrong. Splitting the substrate from the merge layer is the move that lets a team adopt real-time without adopting a worldview. We carry presence, delivery, and access control. You carry the part that makes your product yours.

What we want to add next

What we want to publish next is a reference collaboration app on top of these three primitives, with the merge layer pluggable so you can swap in Yjs, Automerge, or your own. Alongside it, recipes for the patterns that sit one level above the substrate: cursor sharing on a separate topic with QoS 0, selection broadcasting with short-lived retained messages, typing indicators with a TTL of a few seconds. None of these need new infrastructure. They are all the same primitive in different shapes, and we want to make that obvious by showing it rather than describing it. We are not committing to a date. We are committing to keeping the substrate small enough that the recipes stay short.

Ready to get started?

Try CloudSignal free and connect your first agents in minutes.

Start Building Free