Aura AI — Living, talking experiences

01 · EXECUTIVE SUMMARY

What Aura AI is, in three paragraphs.

Aura AI is Sumeru’s AI Concierge — an enterprise-grade conversational presence layer that turns a business’s existing content into a face-driven, voice-led representative. It sells, supports, onboards, and interviews. It is, by design, far beyond a chatbot: an immersive conversational video experience that adapts in real time to who is on the other end.

It exists because customers and internal teams today lose 20–30% of their time navigating fragmented information — websites, PDFs, internal docs, disconnected tools. Aura collapses that into a single conversation. Powered by next-gen agentic AI, it instantly retrieves the most relevant answer from your knowledge and delivers it through a lifelike avatar — the same way a great human concierge would.

The platform is a six-layer stack — knowledge, conversation, voice, avatar, visual intelligence, and deployment — designed to clear a sub-second time-to-first-response budget and to render a 30 fps photoreal head on commodity GPUs. It is built for teams who need control, performance, and presence at scale: enterprises that demand SOC 2, GDPR, and HIPAA-aware data handling, and platform teams who want a self-hosted GPU option for sovereign deployments.

02 · ARCHITECTURE

Six layers, one representative.

Each layer is independently scalable, observable, and replaceable. The orchestrator binds them with a single representative manifest.

Knowledge

Vectorised content store. URL, PDF, video, and internal-doc sources, refreshed on a schedule or on demand.

Conversation

Agentic routing. A small, fast model handles routing and clarifications; a larger model handles substantive replies.

Voice

Streaming TTS with phoneme timestamps for lip-sync. Sub-300 ms first-audio after token start.

Avatar

GPU-rendered photoreal head, 30 fps, lip-sync locked to voice phonemes, gaze and blink driven by conversation state.

Visual Intelligence

Detects visitor appearance, emotion, and surroundings in real time to hyper-personalize each turn.

Deployment

Hosted page, widget embed, WordPress, Shopify, Flutter SDK, or self-hosted GPU pod.

03 · KNOWLEDGE

Your content becomes the representative’s memory.

A representative is grounded by between one and eight sources. URLs are crawled to a depth you control. PDFs and Markdown are parsed with structure preserved — headings, lists, tables. Plain prompts are accepted for cases where the knowledge is the brand voice itself. Each source can be tagged for routing — a sales rep doesn’t read the engineering wiki by default.

Content is chunked at semantic boundaries, embedded into a 1024-dimension vector index, and attributed at retrieval time so every reply can carry a citation. Indexes refresh hourly by default, with on-demand reindex for time-sensitive launches. Pricing pages, inventory, and docs that change daily are first-class sources, not afterthoughts.

GLOBE — URLFILE — PDFHASH — MARKDOWNDB — STRUCTUREDMSG — PROMPTRSS — RSS / SITEMAP

04 · CONVERSATION

Two models, one conversation.

Every user turn first hits a routing pass on a 7B-parameter model. It classifies the turn — clarification, factual question, objection, action request — and selects the slice of the knowledge index to retrieve. A typical routing pass completes in under 80 ms.

The substantive reply runs on a 70B-parameter model with the retrieved context, the representative’s persona prompt, and the live conversation history. Output is streamed token-by-token to the voice layer so audio synthesis can begin before the reply finishes generating.

The orchestrator enforces representative scope. A sales rep cannot answer engineering questions; a support rep cannot quote sales discounts. Out-of-scope turns are gracefully redirected with a copy you control. Structured output — book a demo, capture a qualified lead — is emitted as a JSON tool call that downstream systems can consume.

05 · VOICE, AVATAR & VISION

The face — and the eyes — on top of the model.

Aura is a conversational video experience, not a chat thread with extra steps. Three distinct subsystems give it presence: a streaming voice that speaks in 28 languages, a photoreal avatar that lip-syncs to that voice, and a perception layer that watches the visitor in real time so the conversation can adapt to who is on the other end.

Voice synthesis

Streaming TTS with phoneme timestamps. The first audio chunk leaves the server within 280 ms of the first token from the reply model. We support 28 languages out of the box, with real-time translation between any pair; voice cloning is available for Studio and Enterprise plans with verified consent.

FIRST AUDIO≤ 280 ms
SAMPLE RATE22 kHz · 16-bit PCM
LANGUAGES28 · LIVE TRANSLATE
VOICE CLONINGStudio +

Avatar rendering

A photoreal head rendered on a single GPU pod, 30 frames per second at 720p. Lip-sync is driven by the voice layer’s phoneme timestamps; gaze and blink are driven by conversation state. The avatar runs as a WebRTC stream on the client, with a fallback to MJPEG for restricted networks.

FRAME BUDGET≤ 33 ms
RESOLUTION720p (1080p β)
TRANSPORTWebRTC, MJPEG fb
GPU PER REP1 · autoscaled

Visual intelligence

When the visitor consents to camera, Aura perceives presence, emotion, and ambient context — and tunes the conversation accordingly. Reading the room is what separates a concierge from a script. The signals are processed on-device or in your tenant; raw video never leaves the perimeter you choose.

SIGNALSpresence · emotion · env
CONSENTexplicit, per-session
PROCESSINGon-device or tenant
RAW VIDEOnever persisted

06 · LATENCY

A second of presence, broken down.

Time-to-first-audio measured from the user’s last word. Targets are p95 on production traffic.

ASR

ROUTE

FETCH

LLM

TTS

NET

p95 · time-to-first-audioTOTAL · 800 ms

Latency budget per layer (p95)
Stage	Budget (ms)
ASR	90
ROUTE	80
FETCH	70
LLM	220
TTS	280
NET	60
Total	800

07 · DEPLOYMENT

Five surfaces. One representative manifest.

A representative is described by a single manifest — its sources, persona, voice, avatar, and guardrails — and that manifest is the only thing that needs to travel between surfaces. Every surface listed below pulls from the same backend; switching surfaces does not require re-training, re-indexing, or re-uploading anything.

Hosted page

A standalone URL we host. Zero-integration option.

Website widget

A floating bubble that opens to a panel on any site.

Inline embed

An iframe-free div that lives inside your existing page.

WordPress plugin

Drop-in plugin, single-shortcode placement.

Shopify app

Storefront app for product Q&A and assisted sale.

Flutter SDK

Mobile-native rendering for iOS and Android apps.

Self-hosted GPU pod

Run the entire stack inside your own VPC. Bring-your-own model weights for Enterprise. Sovereign deployments by request.

Self-host quote →

08 · SECURITY

Built for teams that have to answer to a CISO.

Data is encrypted in transit (TLS 1.3) and at rest (AES-256). Conversation transcripts are tenant-isolated and retained only as long as your policy requires — Studio defaults to 30 days, Enterprise is configurable down to zero retention. Voice and avatar streams are never recorded by default. PII fields can be redacted before they reach the reply model. We are SOC 2 Type II audited; HIPAA-aware data handling and GDPR data residency in EU and US regions ship today; FedRAMP is on the 2026 roadmap.

SOC 2 Type II

AUDITED · ANNUAL

GDPR

EU & US RESIDENCY

HIPAA-aware

BAA AVAILABLE

AES-256 / TLS 1.3

TRANSIT + REST

SSO / SAML / SCIM

ENTERPRISE

PII Redaction

PRE-MODEL

Self-host option

BYO INFRASTRUCTURE

Configurable Retention

0 — UNLIMITED

09 · INTEGRATIONS

Where the conversation goes.

Aura emits structured events for every conversation milestone — turn started, intent detected, lead captured, demo booked, escalation requested. Events fan out to your existing systems without polling. The Intelligent Agent Analytics dashboard gives you live engagement, conversion, and resolution metrics across every representative.

WEBHOOKS

SLACK

SEGMENT

HUBSPOT

SALESFORCE

INTERCOM

ZAPIER

PIPEDRIVE

ZENDESK

SHIPPING ROADMAP

View event schema

representative.session.started
representative.turn.completed
representative.intent.detected
representative.lead.captured
representative.demo.booked
representative.escalation.requested
representative.transcript.archived
representative.session.ended

10 · ROADMAP

What’s shipping, and what’s next.

A non-binding view of the next two quarters. Dates are intent, not contract.

NOW · Q2 2026

1080p avatar streams
Voice cloning, GA
BYO weights, Enterprise
Slack & Segment integrations
EU residency, GA

NEXT · Q3 2026

On-device avatar (Apple Silicon)
Multilingual voice cloning
Self-host operator UI
Zapier, Pipedrive, Zendesk
APAC residency

LATER · Q4 2026+

FedRAMP Moderate
Real-time translation, 28 ↔ 28
Multi-rep orchestration
Direct CRM sync (bidirectional)
Sovereign cloud partnerships

11 · FAQ

Engineering questions, answered short.

Can we run Aura in our own VPC?

Yes. The Self-hosted GPU pod option ships the entire stack — knowledge, conversation, voice, avatar — as a Kubernetes operator. You bring the GPUs, we bring the operator. Air-gapped deployments are supported for Enterprise.

Do you train on our conversations?

No. Customer conversations are never used to train shared models. Per-tenant fine-tuning is opt-in and isolated to your tenant.

What happens if a model goes down?

The orchestrator falls back to a smaller backup model and emits a degradation event. The avatar continues to render; the voice layer keeps streaming. End-users see a slightly slower response, not a broken experience.

How do you handle PII?

PII fields can be redacted at the orchestrator before any text reaches the reply model. Redaction patterns are configurable per representative. Transcripts can be configured to exclude PII at archive time.

Can we use our own LLM?

Enterprise customers can bring their own model weights for the reply layer. The routing layer remains Aura-managed for orchestration consistency.

What's the smallest deployment?

One representative, one source, the Free tier. Indexing typically completes in under two minutes for a single-page URL. The hosted page is live the same minute the index completes.

Aura AI, under the hood.