Attention Labs

Attention Labs Opens SAA, an SDK That Gives Voice Agents Selective Hearing

Voice models can now hold fluid conversations, use tools and respond in real time. Put one in a room with two people, and a more basic failure can appear before any of that intelligence matters: the agent may assume every clear sentence is meant for it.

Attention Labs is releasing an open-source SDK called SAA, short for Selective Auditory Attention, to handle that decision before the rest of a voice pipeline starts working. Developers can learn more and access the full SDK at Attention Labs. SAA evaluates each utterance and determines whether the speaker was addressing the device. Speech accepted by the gate continues to speech-to-text, the language model and text-to-speech. Side conversations, background media and the agent’s own playback can be stopped before they become user turns.

The launch demonstration uses the same Grok voice model on two laptops in the same room. The untreated agent repeatedly enters a conversation between two people. The laptop running SAA stays quiet until it is addressed. In a stress test, the speaker looks directly toward the laptop’s camera while continuing to talk to the person beside him, and the SAA-equipped system still stays out of the exchange.

That comparison is important because it isolates the layer Attention Labs is selling. The language model stays the same. The voice stays the same. The behavior changes because one system can make an addressee decision and the other cannot.

SAA ships with JavaScript and Python clients, adapters and runnable examples for LiveKit, Pipecat on Daily, ElevenLabs Conversational AI and Twilio Media Streams, plus a direct streaming interface for custom stacks. The repository is licensed under Apache 2.0. Developers can install the packages through npm or PyPI, connect with an Attention Labs API key and place the gate in front of the speech services they already use.

The open-source scope is the integration layer, not the trained classifier. The public SDK connects to Attention Labs’ hosted service for inference. The company offers a separate path for on-device and embedded deployment, and says it is working with hardware manufacturers to bring the model into local edge environments.

The architecture addresses a gap between several components that already exist in voice products. Voice activity detection can tell whether speech is present. Speaker diarization can separate or label speakers. Turn detection can estimate when a person has finished. A wake word can require an explicit trigger phrase. None of those signals, on its own, answers whether a normal utterance was intended for the device.

SAA is designed to make that answer available as a routing primitive. The highest-level client event returns a completed, device-directed audio turn that an application can forward to its recognizer or realtime model. Lower-level events expose predictions, state, interruptions and connection statistics. The SDK also lets an application mark when the agent is speaking, which helps prevent its own text-to-speech output from re-entering the pipeline as fresh human input.

The immediate use cases are products with microphones in shared spaces: laptops, robots, kiosks, meeting tools, phone agents and drive-through systems. Those environments contain valid speech that is often meant for someone else. A false activation can create an unwanted reply, unnecessary downstream processing or, for an agent with tool access, an action based on a conversation it was never part of.

Attention Labs is making a focused infrastructure bet. Better foundation models will improve what an agent can do after it receives a turn. SAA is intended to decide whether that turn belongs to the agent in the first place.

The SDK is available now at attentionlabs ai , with hosted access through Attention Labs and on-device deployment available separately for OEM and enterprise projects.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *