Features
QALITA Studio offers a powerful conversational interface to interact with your data quality. Discover all its capabilities.
Conversational Chat
Streaming Mode
Streaming mode displays model responses in real-time using Server-Sent Events (SSE).
Advantages:
- ✅ Immediate feedback
- ✅ Better user experience
- ✅ Real-time token delivery
- ✅ Optimized for long responses
API Endpoint:
curl -X POST https://your-platform/api/v1/studio/chat/stream \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"message": "Explain data quality",
"conversation_id": "my-conv-123"
}'
SSE Response Format:
data: {"content": "Data quality refers to"}
data: {"content": " the accuracy, completeness"}
data: {"content": ", and consistency of data."}
data: [DONE]
Fallback Behavior:
- If streaming fails, falls back to direct LLM streaming
- If that fails, returns chunked response for visual streaming effect
Non-Streaming Mode
Traditional mode with complete response at the end.
API Endpoint:
curl -X POST https://your-platform/api/v1/studio/chat \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"message": "Hello",
"conversation_id": "my-conv-123"
}'
Response:
{
"ok": true,
"response": "Hello! How can I help you with your data quality needs?",
"conversation_id": "my-conv-123",
"analysis_completed": false,
"analysis_result": null,
"scripts_generated": null
}
Conversation Management
Each conversation is identified by a unique conversation_id.
Format:
- Alphanumeric with hyphens and underscores
- Example:
conv_20241020_143022_a1b2c3d4
Auto-generation:
If no conversation_id is provided, one is automatically generated based on date/time and a random suffix.
Contextual Enrichment
Studio automatically enriches your prompts with context from the QALITA Platform.
Enriched Context Structure
The frontend sends enriched context data that the agent uses to personalize responses:
Request with context:
{
"message": "Explain this problem",
"context": {
"issue_id": 123,
"source_id": 456,
"project_id": 1,
"enriched": {
"issue": {
"id": 123,
"title": "High null rate in customer_email",
"status": "open",
"description": "45% null values detected"
},
"source": {
"id": 456,
"name": "customers",
"type": "postgresql"
},
"schema": [
{"key": "column_name", "value": "customer_email"},
{"key": "data_type", "value": "varchar"}
],
"recommendations": [
{"content": "Add NOT NULL constraint", "severity": "high"}
],
"metrics": [
{"key": "score", "value": 55.0}
],
"dataSample": {
"headers": ["id", "customer_email", "name"],
"rows": [["1", "john@example.com", "John"], ["2", null, "Jane"]],
"totalRows": 10000
}
}
}
}
Context Components
| Component | Description |
|---|---|
| Issue | Title, status, description, scope, due date |
| Source | Name, type, description |
| Schema | Column names and data types |
| Recommendations | Quality improvement suggestions |
| Metrics | Quality scores and counts |
| Data Sample | Preview of actual data |
System Prompt Construction
The agent constructs a rich system prompt including:
Tu es un assistant expert en qualité des données...
# Contexte actuel
## Ticket en cours
- **Titre**: High null rate in customer_email
- **Statut**: open
- **Description**: 45% null values detected
## Source de données
- **Nom**: customers
- **Type**: postgresql
## Schéma de la source (15 colonnes)
**Colonnes**: id, customer_email, name, ...
## Recommandations de qualité (3 suggestions)
1. 🔴 Add NOT NULL constraint
...
## Métriques de qualité
- **Score moyen**: 55.0%
## Aperçu des données (10000 lignes au total)
**Colonnes**: id, customer_email, name
**Échantillon** (3 premières lignes):
1. 1 | john@example.com | John
2. 2 | | Jane
...
This enrichment provides the LLM with full awareness of the user's working context.
Conversation Management
Conversations in Studio are stored in the Platform database and linked to issues, sources, and projects.
Conversation Model
Each conversation record includes:
| Field | Description |
|---|---|
id | Database ID |
conv_id | Unique conversation identifier |
filename | Original filename |
partner_id | Organization ID |
issue_id | Linked issue (optional) |
source_id | Linked source (optional) |
project_id | Linked project (optional) |
line_count | Number of messages |
created_at | Creation timestamp |
updated_at | Last update timestamp |
Contextual Conversations
Retrieve conversations based on current context:
Endpoint: GET /api/v1/studio/conversations/context
Query Parameters:
source_id: Get conversations from issues linked to this sourceissue_id: Get conversations from this specific issueproject_id: Get conversations from all project sources and issues
Response:
{
"source_conversations": [
{
"id": 1,
"conv_id": "conv_20241020_143022",
"filename": "conv_20241020_143022.jsonl",
"project_id": 1,
"project_name": "E-Commerce",
"issue_id": 123,
"issue_title": "High null rate",
"source_id": 456,
"source_name": "customers",
"line_count": 12,
"created_at": "2024-10-20T14:30:22Z",
"updated_at": "2024-10-20T14:35:00Z"
}
],
"issue_conversations": [...],
"project_conversations": [...],
"total_count": 5
}
Conversation Grouping
Conversations are grouped by context:
- Issue Conversations: Directly linked to the selected issue
- Source Conversations: From other issues linked to the same source
- Project Conversations: From all sources and issues in the project
This allows users to see related conversations when working on similar data quality topics.
Data Tools
Studio's agent can interact directly with your data sources through the Worker infrastructure using LangChain tools.
Available Tools
| Tool | Description |
|---|---|
execute_sql_query | Run SQL queries on database sources |
read_source_data | Read data from files or tables |
describe_source | Get metadata, schema, row count |
sample_source_data | Get random samples |
count_rows | Count rows with optional condition |
filter_data | Filter data by condition |
aggregate_data | Group by and aggregate |
How It Works
┌─────────────┐ gRPC ┌─────────────┐ Local ┌─────────────┐
│ Studio │ ────────→ │ Worker │ ────────→ │ Source │
│ (Agent) │ ←──────── │ │ ←──────── │ (DB/File) │
└─────────────┘ Response └─────────────┘ Data └─────────────┘
- Agent decides to use a data tool based on user request
- Tool sends gRPC request to connected Worker
- Worker executes the operation locally
- Results are formatted and returned to the agent
- Agent incorporates results into its response
Example Usage
User asks: "Show me the columns with null values"
Agent uses: describe_source tool → gets schema, then execute_sql_query → runs SQL to check nulls
Response includes: Actual data from your source
Tool Responses
Tool results are formatted for LLM consumption:
Execution time: 45ms
Columns: id, name, email, created_at
Rows (20 shown, 1000 total):
id | name | email | created_at
------------------------------------
1 | John Doe | john@ex.com | 2024-01-15
2 | Jane Smith | null | 2024-01-16
...
Data Preview
Preview source data directly through Studio.
Preview Endpoint
GET /api/v1/studio/sources/{source_id}/preview
Query Parameters:
limit: Maximum rows (default 1000, max 10000)source_version_id: Specific versionquery: Custom SQL query (for databases)
Response:
{
"ok": true,
"data_type": "table",
"headers": ["id", "name", "email"],
"rows": [
{"values": ["1", "John", "john@example.com"]},
{"values": ["2", "Jane", "jane@example.com"]}
],
"total_rows": 10000
}
Data Types:
table: Structured data with headers/rowstext: Plain text contentjson: JSON contentimage: Base64 encoded imagepdf: Base64 encoded PDFerror: Error response
API Endpoints Reference
All endpoints require authentication via Bearer token.
Chat
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/studio/chat | Send message to agent |
| POST | /api/v1/studio/chat/stream | Streaming SSE response |
Status & Capabilities
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/studio/status | Worker connection status |
| GET | /api/v1/studio/agent/capabilities | Agent and LLM status |
Data Preview
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/studio/sources/{id}/preview | Preview source data |
| POST | /api/v1/studio/sources/{id}/preview | Preview with body params |
Conversations
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/studio/conversations/context | Get contextual conversations |
Advanced Use Cases
1. Issue Analysis with Complete Context
import requests
headers = {"Authorization": "Bearer YOUR_TOKEN"}
response = requests.post('https://your-platform/api/v1/studio/chat',
headers=headers,
json={
"message": "Analyze this problem and suggest solutions",
"context": {
"issue_id": 123,
"source_id": 456,
"enriched": {
"issue": {
"id": 123,
"title": "Duplicate customer records",
"status": "open"
},
"source": {
"id": 456,
"name": "customers",
"type": "postgresql"
}
}
},
"conversation_id": "analysis_issue123"
}
)
print(response.json()["response"])
2. Streaming Response Handling
import requests
import json
def stream_chat(message, context=None):
headers = {
"Authorization": "Bearer YOUR_TOKEN",
"Content-Type": "application/json"
}
response = requests.post(
'https://your-platform/api/v1/studio/chat/stream',
headers=headers,
json={"message": message, "context": context},
stream=True
)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = line[6:]
if data == '[DONE]':
break
chunk = json.loads(data)
if 'content' in chunk:
print(chunk['content'], end='', flush=True)
print()
stream_chat("Describe the schema of this source", {"source_id": 456})
3. Multi-Step Investigation with History
conversation_id = "investigation_202410"
messages_history = []
def chat_with_history(message, context):
global messages_history
response = requests.post(
'https://your-platform/api/v1/studio/chat',
headers={"Authorization": "Bearer YOUR_TOKEN"},
json={
"message": message,
"context": context,
"conversation_id": conversation_id,
"messages_history": messages_history
}
)
result = response.json()
# Update history
messages_history.append({"role": "user", "content": message})
messages_history.append({"role": "assistant", "content": result["response"]})
return result["response"]
# Multi-step conversation
context = {"issue_id": 123, "source_id": 456}
print(chat_with_history("Summarize this issue", context))
print(chat_with_history("What are the potential causes?", context))
print(chat_with_history("Propose a step-by-step action plan", context))
4. Data Exploration via Agent
# Ask the agent to explore data - it will use data tools automatically
response = requests.post(
'https://your-platform/api/v1/studio/chat',
headers={"Authorization": "Bearer YOUR_TOKEN"},
json={
"message": "Show me the top 10 customers with the most null email addresses and suggest how to fix them",
"context": {"source_id": 456}
}
)
# The agent will:
# 1. Use describe_source to understand the schema
# 2. Use execute_sql_query to find customers with null emails
# 3. Provide analysis and recommendations
print(response.json()["response"])
Error Handling
Common Errors
400 - Agent not available:
{
"detail": "Agent module not available. Please install langchain and langgraph."
}
503 - LLM not configured:
{
"detail": "No LLM configuration found for partner 1. Please configure an LLM provider in Settings > AI."
}
503 - No worker available:
{
"detail": "No worker available. Please ensure a worker is connected."
}
504 - Worker timeout:
{
"detail": "Worker did not respond in time"
}
Chat Response Errors
Errors during chat are returned in the response body:
{
"ok": false,
"response": null,
"conversation_id": "conv_xyz",
"error": "Agent error: Connection to Ollama refused"
}
Streaming Errors
In SSE mode, errors are sent as data events:
data: {"error": "LLM timeout after 60 seconds"}
data: [DONE]
Performance
Optimizations
- SSE Streaming: Reduces perceived latency for long responses
- Automatic Fallbacks: Graceful degradation when streaming unavailable
- Worker Connection Pool: Reuse of gRPC connections
- Data Tool Caching: Results cached per request
Timeouts
| Operation | Timeout |
|---|---|
| Chat (non-streaming) | 60s |
| Chat (streaming) | No timeout (SSE) |
| Data preview | 60s |
| Data tools | 60s per operation |
Limits
- Data preview rows: Max 10,000 rows
- Tool result rows: Max 20 rows displayed in LLM context
- Context size: Depends on LLM provider
Next Steps
- 💬 Conversation Management - Organize your conversations
- 🔧 Platform Integration - Deep dive into Platform integration