Skip to main content

Features

QALITA Studio offers a powerful conversational interface to interact with your data quality. Discover all its capabilities.

Conversational Chat

Streaming Mode

Streaming mode displays model responses in real-time using Server-Sent Events (SSE).

Advantages:

  • ✅ Immediate feedback
  • ✅ Better user experience
  • ✅ Real-time token delivery
  • ✅ Optimized for long responses

API Endpoint:

curl -X POST https://your-platform/api/v1/studio/chat/stream \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"message": "Explain data quality",
"conversation_id": "my-conv-123"
}'

SSE Response Format:

data: {"content": "Data quality refers to"}
data: {"content": " the accuracy, completeness"}
data: {"content": ", and consistency of data."}
data: [DONE]

Fallback Behavior:

  • If streaming fails, falls back to direct LLM streaming
  • If that fails, returns chunked response for visual streaming effect

Non-Streaming Mode

Traditional mode with complete response at the end.

API Endpoint:

curl -X POST https://your-platform/api/v1/studio/chat \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"message": "Hello",
"conversation_id": "my-conv-123"
}'

Response:

{
"ok": true,
"response": "Hello! How can I help you with your data quality needs?",
"conversation_id": "my-conv-123",
"analysis_completed": false,
"analysis_result": null,
"scripts_generated": null
}

Conversation Management

Each conversation is identified by a unique conversation_id.

Format:

  • Alphanumeric with hyphens and underscores
  • Example: conv_20241020_143022_a1b2c3d4

Auto-generation:

If no conversation_id is provided, one is automatically generated based on date/time and a random suffix.

Contextual Enrichment

Studio automatically enriches your prompts with context from the QALITA Platform.

Enriched Context Structure

The frontend sends enriched context data that the agent uses to personalize responses:

Request with context:

{
"message": "Explain this problem",
"context": {
"issue_id": 123,
"source_id": 456,
"project_id": 1,
"enriched": {
"issue": {
"id": 123,
"title": "High null rate in customer_email",
"status": "open",
"description": "45% null values detected"
},
"source": {
"id": 456,
"name": "customers",
"type": "postgresql"
},
"schema": [
{"key": "column_name", "value": "customer_email"},
{"key": "data_type", "value": "varchar"}
],
"recommendations": [
{"content": "Add NOT NULL constraint", "severity": "high"}
],
"metrics": [
{"key": "score", "value": 55.0}
],
"dataSample": {
"headers": ["id", "customer_email", "name"],
"rows": [["1", "john@example.com", "John"], ["2", null, "Jane"]],
"totalRows": 10000
}
}
}
}

Context Components

ComponentDescription
IssueTitle, status, description, scope, due date
SourceName, type, description
SchemaColumn names and data types
RecommendationsQuality improvement suggestions
MetricsQuality scores and counts
Data SamplePreview of actual data

System Prompt Construction

The agent constructs a rich system prompt including:

Tu es un assistant expert en qualité des données...

# Contexte actuel

## Ticket en cours
- **Titre**: High null rate in customer_email
- **Statut**: open
- **Description**: 45% null values detected

## Source de données
- **Nom**: customers
- **Type**: postgresql

## Schéma de la source (15 colonnes)
**Colonnes**: id, customer_email, name, ...

## Recommandations de qualité (3 suggestions)
1. 🔴 Add NOT NULL constraint
...

## Métriques de qualité
- **Score moyen**: 55.0%

## Aperçu des données (10000 lignes au total)
**Colonnes**: id, customer_email, name
**Échantillon** (3 premières lignes):
1. 1 | john@example.com | John
2. 2 | | Jane
...

This enrichment provides the LLM with full awareness of the user's working context.

Conversation Management

Conversations in Studio are stored in the Platform database and linked to issues, sources, and projects.

Conversation Model

Each conversation record includes:

FieldDescription
idDatabase ID
conv_idUnique conversation identifier
filenameOriginal filename
partner_idOrganization ID
issue_idLinked issue (optional)
source_idLinked source (optional)
project_idLinked project (optional)
line_countNumber of messages
created_atCreation timestamp
updated_atLast update timestamp

Contextual Conversations

Retrieve conversations based on current context:

Endpoint: GET /api/v1/studio/conversations/context

Query Parameters:

  • source_id: Get conversations from issues linked to this source
  • issue_id: Get conversations from this specific issue
  • project_id: Get conversations from all project sources and issues

Response:

{
"source_conversations": [
{
"id": 1,
"conv_id": "conv_20241020_143022",
"filename": "conv_20241020_143022.jsonl",
"project_id": 1,
"project_name": "E-Commerce",
"issue_id": 123,
"issue_title": "High null rate",
"source_id": 456,
"source_name": "customers",
"line_count": 12,
"created_at": "2024-10-20T14:30:22Z",
"updated_at": "2024-10-20T14:35:00Z"
}
],
"issue_conversations": [...],
"project_conversations": [...],
"total_count": 5
}

Conversation Grouping

Conversations are grouped by context:

  1. Issue Conversations: Directly linked to the selected issue
  2. Source Conversations: From other issues linked to the same source
  3. Project Conversations: From all sources and issues in the project

This allows users to see related conversations when working on similar data quality topics.

Data Tools

Studio's agent can interact directly with your data sources through the Worker infrastructure using LangChain tools.

Available Tools

ToolDescription
execute_sql_queryRun SQL queries on database sources
read_source_dataRead data from files or tables
describe_sourceGet metadata, schema, row count
sample_source_dataGet random samples
count_rowsCount rows with optional condition
filter_dataFilter data by condition
aggregate_dataGroup by and aggregate

How It Works

┌─────────────┐    gRPC     ┌─────────────┐    Local    ┌─────────────┐
│ Studio │ ────────→ │ Worker │ ────────→ │ Source │
│ (Agent) │ ←──────── │ │ ←──────── │ (DB/File) │
└─────────────┘ Response └─────────────┘ Data └─────────────┘
  1. Agent decides to use a data tool based on user request
  2. Tool sends gRPC request to connected Worker
  3. Worker executes the operation locally
  4. Results are formatted and returned to the agent
  5. Agent incorporates results into its response

Example Usage

User asks: "Show me the columns with null values"

Agent uses: describe_source tool → gets schema, then execute_sql_query → runs SQL to check nulls

Response includes: Actual data from your source

Tool Responses

Tool results are formatted for LLM consumption:

Execution time: 45ms
Columns: id, name, email, created_at

Rows (20 shown, 1000 total):
id | name | email | created_at
------------------------------------
1 | John Doe | john@ex.com | 2024-01-15
2 | Jane Smith | null | 2024-01-16
...

Data Preview

Preview source data directly through Studio.

Preview Endpoint

GET /api/v1/studio/sources/{source_id}/preview

Query Parameters:

  • limit: Maximum rows (default 1000, max 10000)
  • source_version_id: Specific version
  • query: Custom SQL query (for databases)

Response:

{
"ok": true,
"data_type": "table",
"headers": ["id", "name", "email"],
"rows": [
{"values": ["1", "John", "john@example.com"]},
{"values": ["2", "Jane", "jane@example.com"]}
],
"total_rows": 10000
}

Data Types:

  • table: Structured data with headers/rows
  • text: Plain text content
  • json: JSON content
  • image: Base64 encoded image
  • pdf: Base64 encoded PDF
  • error: Error response

API Endpoints Reference

All endpoints require authentication via Bearer token.

Chat

MethodEndpointDescription
POST/api/v1/studio/chatSend message to agent
POST/api/v1/studio/chat/streamStreaming SSE response

Status & Capabilities

MethodEndpointDescription
GET/api/v1/studio/statusWorker connection status
GET/api/v1/studio/agent/capabilitiesAgent and LLM status

Data Preview

MethodEndpointDescription
GET/api/v1/studio/sources/{id}/previewPreview source data
POST/api/v1/studio/sources/{id}/previewPreview with body params

Conversations

MethodEndpointDescription
GET/api/v1/studio/conversations/contextGet contextual conversations

Advanced Use Cases

1. Issue Analysis with Complete Context

import requests

headers = {"Authorization": "Bearer YOUR_TOKEN"}

response = requests.post('https://your-platform/api/v1/studio/chat',
headers=headers,
json={
"message": "Analyze this problem and suggest solutions",
"context": {
"issue_id": 123,
"source_id": 456,
"enriched": {
"issue": {
"id": 123,
"title": "Duplicate customer records",
"status": "open"
},
"source": {
"id": 456,
"name": "customers",
"type": "postgresql"
}
}
},
"conversation_id": "analysis_issue123"
}
)

print(response.json()["response"])

2. Streaming Response Handling

import requests
import json

def stream_chat(message, context=None):
headers = {
"Authorization": "Bearer YOUR_TOKEN",
"Content-Type": "application/json"
}

response = requests.post(
'https://your-platform/api/v1/studio/chat/stream',
headers=headers,
json={"message": message, "context": context},
stream=True
)

for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = line[6:]
if data == '[DONE]':
break
chunk = json.loads(data)
if 'content' in chunk:
print(chunk['content'], end='', flush=True)
print()

stream_chat("Describe the schema of this source", {"source_id": 456})

3. Multi-Step Investigation with History

conversation_id = "investigation_202410"
messages_history = []

def chat_with_history(message, context):
global messages_history

response = requests.post(
'https://your-platform/api/v1/studio/chat',
headers={"Authorization": "Bearer YOUR_TOKEN"},
json={
"message": message,
"context": context,
"conversation_id": conversation_id,
"messages_history": messages_history
}
)

result = response.json()

# Update history
messages_history.append({"role": "user", "content": message})
messages_history.append({"role": "assistant", "content": result["response"]})

return result["response"]

# Multi-step conversation
context = {"issue_id": 123, "source_id": 456}

print(chat_with_history("Summarize this issue", context))
print(chat_with_history("What are the potential causes?", context))
print(chat_with_history("Propose a step-by-step action plan", context))

4. Data Exploration via Agent

# Ask the agent to explore data - it will use data tools automatically
response = requests.post(
'https://your-platform/api/v1/studio/chat',
headers={"Authorization": "Bearer YOUR_TOKEN"},
json={
"message": "Show me the top 10 customers with the most null email addresses and suggest how to fix them",
"context": {"source_id": 456}
}
)

# The agent will:
# 1. Use describe_source to understand the schema
# 2. Use execute_sql_query to find customers with null emails
# 3. Provide analysis and recommendations
print(response.json()["response"])

Error Handling

Common Errors

400 - Agent not available:

{
"detail": "Agent module not available. Please install langchain and langgraph."
}

503 - LLM not configured:

{
"detail": "No LLM configuration found for partner 1. Please configure an LLM provider in Settings > AI."
}

503 - No worker available:

{
"detail": "No worker available. Please ensure a worker is connected."
}

504 - Worker timeout:

{
"detail": "Worker did not respond in time"
}

Chat Response Errors

Errors during chat are returned in the response body:

{
"ok": false,
"response": null,
"conversation_id": "conv_xyz",
"error": "Agent error: Connection to Ollama refused"
}

Streaming Errors

In SSE mode, errors are sent as data events:

data: {"error": "LLM timeout after 60 seconds"}
data: [DONE]

Performance

Optimizations

  • SSE Streaming: Reduces perceived latency for long responses
  • Automatic Fallbacks: Graceful degradation when streaming unavailable
  • Worker Connection Pool: Reuse of gRPC connections
  • Data Tool Caching: Results cached per request

Timeouts

OperationTimeout
Chat (non-streaming)60s
Chat (streaming)No timeout (SSE)
Data preview60s
Data tools60s per operation

Limits

  • Data preview rows: Max 10,000 rows
  • Tool result rows: Max 20 rows displayed in LLM context
  • Context size: Depends on LLM provider

Next Steps