Features

QALITA Studio offers a powerful conversational interface to interact with your data quality. Discover all its capabilities.

Conversational Chat

Streaming Mode

Streaming mode displays model responses in real-time using Server-Sent Events (SSE).

Advantages:

✅ Immediate feedback
✅ Better user experience
✅ Real-time token delivery
✅ Optimized for long responses

API Endpoint:

curl -X POST https://your-platform/api/v1/studio/chat/stream \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Explain data quality",
    "conversation_id": "my-conv-123"
  }'

SSE Response Format:

data: {"content": "Data quality refers to"}
data: {"content": " the accuracy, completeness"}
data: {"content": ", and consistency of data."}
data: [DONE]

Fallback Behavior:

If streaming fails, falls back to direct LLM streaming
If that fails, returns chunked response for visual streaming effect

Non-Streaming Mode

Traditional mode with complete response at the end.

API Endpoint:

curl -X POST https://your-platform/api/v1/studio/chat \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Hello",
    "conversation_id": "my-conv-123"
  }'

Response:

{
  "ok": true,
  "response": "Hello! How can I help you with your data quality needs?",
  "conversation_id": "my-conv-123",
  "analysis_completed": false,
  "analysis_result": null,
  "scripts_generated": null
}

Conversation Management

Each conversation is identified by a unique conversation_id.

Format:

Alphanumeric with hyphens and underscores
Example: conv_20241020_143022_a1b2c3d4

Auto-generation:

If no conversation_id is provided, one is automatically generated based on date/time and a random suffix.

Contextual Enrichment

Studio automatically enriches your prompts with context from the QALITA Platform.

Enriched Context Structure

The frontend sends enriched context data that the agent uses to personalize responses:

Request with context:

{
  "message": "Explain this problem",
  "context": {
    "issue_id": 123,
    "source_id": 456,
    "project_id": 1,
    "enriched": {
      "issue": {
        "id": 123,
        "title": "High null rate in customer_email",
        "status": "open",
        "description": "45% null values detected"
      },
      "source": {
        "id": 456,
        "name": "customers",
        "type": "postgresql"
      },
      "schema": [
        {"key": "column_name", "value": "customer_email"},
        {"key": "data_type", "value": "varchar"}
      ],
      "recommendations": [
        {"content": "Add NOT NULL constraint", "severity": "high"}
      ],
      "metrics": [
        {"key": "score", "value": 55.0}
      ],
      "dataSample": {
        "headers": ["id", "customer_email", "name"],
        "rows": [["1", "john@example.com", "John"], ["2", null, "Jane"]],
        "totalRows": 10000
      }
    }
  }
}

Context Components

Component	Description
Issue	Title, status, description, scope, due date
Source	Name, type, description
Schema	Column names and data types
Recommendations	Quality improvement suggestions
Metrics	Quality scores and counts
Data Sample	Preview of actual data

System Prompt Construction

The agent constructs a rich system prompt including:

Tu es un assistant expert en qualité des données...

# Contexte actuel

## Ticket en cours
- **Titre**: High null rate in customer_email
- **Statut**: open
- **Description**: 45% null values detected

## Source de données
- **Nom**: customers
- **Type**: postgresql

## Schéma de la source (15 colonnes)
**Colonnes**: id, customer_email, name, ...

## Recommandations de qualité (3 suggestions)
1. 🔴 Add NOT NULL constraint
...

## Métriques de qualité
- **Score moyen**: 55.0%

## Aperçu des données (10000 lignes au total)
**Colonnes**: id, customer_email, name
**Échantillon** (3 premières lignes):
  1. 1 | john@example.com | John
  2. 2 |  | Jane
  ...

This enrichment provides the LLM with full awareness of the user's working context.

Conversation Management

Conversations in Studio are stored in the Platform database and linked to issues, sources, and projects.

Conversation Model

Each conversation record includes:

Field	Description
`id`	Database ID
`conv_id`	Unique conversation identifier
`filename`	Original filename
`partner_id`	Organization ID
`issue_id`	Linked issue (optional)
`source_id`	Linked source (optional)
`project_id`	Linked project (optional)
`line_count`	Number of messages
`created_at`	Creation timestamp
`updated_at`	Last update timestamp

Contextual Conversations

Retrieve conversations based on current context:

Endpoint: GET /api/v1/studio/conversations/context

Query Parameters:

source_id: Get conversations from issues linked to this source
issue_id: Get conversations from this specific issue
project_id: Get conversations from all project sources and issues

Response:

{
  "source_conversations": [
    {
      "id": 1,
      "conv_id": "conv_20241020_143022",
      "filename": "conv_20241020_143022.jsonl",
      "project_id": 1,
      "project_name": "E-Commerce",
      "issue_id": 123,
      "issue_title": "High null rate",
      "source_id": 456,
      "source_name": "customers",
      "line_count": 12,
      "created_at": "2024-10-20T14:30:22Z",
      "updated_at": "2024-10-20T14:35:00Z"
    }
  ],
  "issue_conversations": [...],
  "project_conversations": [...],
  "total_count": 5
}

Conversation Grouping

Conversations are grouped by context:

Issue Conversations: Directly linked to the selected issue
Source Conversations: From other issues linked to the same source
Project Conversations: From all sources and issues in the project

This allows users to see related conversations when working on similar data quality topics.

Data Tools

Studio's agent can interact directly with your data sources through the Worker infrastructure using LangChain tools.

Available Tools

Tool	Description
`execute_sql_query`	Run SQL queries on database sources
`read_source_data`	Read data from files or tables
`describe_source`	Get metadata, schema, row count
`sample_source_data`	Get random samples
`count_rows`	Count rows with optional condition
`filter_data`	Filter data by condition
`aggregate_data`	Group by and aggregate

How It Works

┌─────────────┐    gRPC     ┌─────────────┐    Local    ┌─────────────┐
│   Studio    │ ────────→   │   Worker    │ ────────→   │   Source    │
│   (Agent)   │ ←────────   │             │ ←────────   │  (DB/File)  │
└─────────────┘  Response   └─────────────┘   Data      └─────────────┘

Agent decides to use a data tool based on user request
Tool sends gRPC request to connected Worker
Worker executes the operation locally
Results are formatted and returned to the agent
Agent incorporates results into its response

Example Usage

User asks: "Show me the columns with null values"

Agent uses: describe_source tool → gets schema, then execute_sql_query → runs SQL to check nulls

Response includes: Actual data from your source

Tool Responses

Tool results are formatted for LLM consumption:

Execution time: 45ms
Columns: id, name, email, created_at

Rows (20 shown, 1000 total):
id | name | email | created_at
------------------------------------
1 | John Doe | john@ex.com | 2024-01-15
2 | Jane Smith | null | 2024-01-16
...

Data Preview

Preview source data directly through Studio.

Preview Endpoint

GET /api/v1/studio/sources/{source_id}/preview

Query Parameters:

limit: Maximum rows (default 1000, max 10000)
source_version_id: Specific version
query: Custom SQL query (for databases)

Response:

{
  "ok": true,
  "data_type": "table",
  "headers": ["id", "name", "email"],
  "rows": [
    {"values": ["1", "John", "john@example.com"]},
    {"values": ["2", "Jane", "jane@example.com"]}
  ],
  "total_rows": 10000
}

Data Types:

table: Structured data with headers/rows
text: Plain text content
json: JSON content
image: Base64 encoded image
pdf: Base64 encoded PDF
error: Error response

API Endpoints Reference

All endpoints require authentication via Bearer token.

Chat

Method	Endpoint	Description
POST	`/api/v1/studio/chat`	Send message to agent
POST	`/api/v1/studio/chat/stream`	Streaming SSE response

Status & Capabilities

Method	Endpoint	Description
GET	`/api/v1/studio/status`	Worker connection status
GET	`/api/v1/studio/agent/capabilities`	Agent and LLM status

Data Preview

Method	Endpoint	Description
GET	`/api/v1/studio/sources/{id}/preview`	Preview source data
POST	`/api/v1/studio/sources/{id}/preview`	Preview with body params

Conversations

Method	Endpoint	Description
GET	`/api/v1/studio/conversations/context`	Get contextual conversations

Advanced Use Cases

1. Issue Analysis with Complete Context

import requests

headers = {"Authorization": "Bearer YOUR_TOKEN"}

response = requests.post('https://your-platform/api/v1/studio/chat', 
    headers=headers,
    json={
        "message": "Analyze this problem and suggest solutions",
        "context": {
            "issue_id": 123,
            "source_id": 456,
            "enriched": {
                "issue": {
                    "id": 123,
                    "title": "Duplicate customer records",
                    "status": "open"
                },
                "source": {
                    "id": 456,
                    "name": "customers",
                    "type": "postgresql"
                }
            }
        },
        "conversation_id": "analysis_issue123"
    }
)

print(response.json()["response"])

2. Streaming Response Handling

import requests
import json

def stream_chat(message, context=None):
    headers = {
        "Authorization": "Bearer YOUR_TOKEN",
        "Content-Type": "application/json"
    }
    
    response = requests.post(
        'https://your-platform/api/v1/studio/chat/stream',
        headers=headers,
        json={"message": message, "context": context},
        stream=True
    )
    
    for line in response.iter_lines():
        if line:
            line = line.decode('utf-8')
            if line.startswith('data: '):
                data = line[6:]
                if data == '[DONE]':
                    break
                chunk = json.loads(data)
                if 'content' in chunk:
                    print(chunk['content'], end='', flush=True)
    print()

stream_chat("Describe the schema of this source", {"source_id": 456})

3. Multi-Step Investigation with History

conversation_id = "investigation_202410"
messages_history = []

def chat_with_history(message, context):
    global messages_history
    
    response = requests.post(
        'https://your-platform/api/v1/studio/chat',
        headers={"Authorization": "Bearer YOUR_TOKEN"},
        json={
            "message": message,
            "context": context,
            "conversation_id": conversation_id,
            "messages_history": messages_history
        }
    )
    
    result = response.json()
    
    # Update history
    messages_history.append({"role": "user", "content": message})
    messages_history.append({"role": "assistant", "content": result["response"]})
    
    return result["response"]

# Multi-step conversation
context = {"issue_id": 123, "source_id": 456}

print(chat_with_history("Summarize this issue", context))
print(chat_with_history("What are the potential causes?", context))
print(chat_with_history("Propose a step-by-step action plan", context))

4. Data Exploration via Agent

# Ask the agent to explore data - it will use data tools automatically
response = requests.post(
    'https://your-platform/api/v1/studio/chat',
    headers={"Authorization": "Bearer YOUR_TOKEN"},
    json={
        "message": "Show me the top 10 customers with the most null email addresses and suggest how to fix them",
        "context": {"source_id": 456}
    }
)

# The agent will:
# 1. Use describe_source to understand the schema
# 2. Use execute_sql_query to find customers with null emails
# 3. Provide analysis and recommendations
print(response.json()["response"])

Error Handling

Common Errors

400 - Agent not available:

{
  "detail": "Agent module not available. Please install langchain and langgraph."
}

503 - LLM not configured:

{
  "detail": "No LLM configuration found for partner 1. Please configure an LLM provider in Settings > AI."
}

503 - No worker available:

{
  "detail": "No worker available. Please ensure a worker is connected."
}

504 - Worker timeout:

{
  "detail": "Worker did not respond in time"
}

Chat Response Errors

Errors during chat are returned in the response body:

{
  "ok": false,
  "response": null,
  "conversation_id": "conv_xyz",
  "error": "Agent error: Connection to Ollama refused"
}

Streaming Errors

In SSE mode, errors are sent as data events:

data: {"error": "LLM timeout after 60 seconds"}
data: [DONE]

Performance

Optimizations

SSE Streaming: Reduces perceived latency for long responses
Automatic Fallbacks: Graceful degradation when streaming unavailable
Worker Connection Pool: Reuse of gRPC connections
Data Tool Caching: Results cached per request

Timeouts

Operation	Timeout
Chat (non-streaming)	60s
Chat (streaming)	No timeout (SSE)
Data preview	60s
Data tools	60s per operation

Limits

Data preview rows: Max 10,000 rows
Tool result rows: Max 20 rows displayed in LLM context
Context size: Depends on LLM provider

Next Steps

💬 Conversation Management - Organize your conversations
🔧 Platform Integration - Deep dive into Platform integration

Conversational Chat​

Streaming Mode​

Non-Streaming Mode​

Conversation Management​

Contextual Enrichment​

Enriched Context Structure​

Context Components​

System Prompt Construction​

Conversation Management​

Conversation Model​

Contextual Conversations​

Conversation Grouping​

Data Tools​

Available Tools​

How It Works​

Example Usage​

Tool Responses​

Data Preview​

Preview Endpoint​

API Endpoints Reference​

Chat​

Status & Capabilities​

Data Preview​

Conversations​

Advanced Use Cases​

1. Issue Analysis with Complete Context​

2. Streaming Response Handling​

3. Multi-Step Investigation with History​

4. Data Exploration via Agent​

Error Handling​

Common Errors​

Chat Response Errors​

Streaming Errors​

Performance​

Optimizations​

Timeouts​

Limits​

Next Steps​

Conversational Chat

Streaming Mode

Non-Streaming Mode

Conversation Management

Contextual Enrichment

Enriched Context Structure

Context Components

System Prompt Construction

Conversation Management

Conversation Model

Contextual Conversations

Conversation Grouping

Data Tools

Available Tools

How It Works

Example Usage

Tool Responses

Data Preview

Preview Endpoint

API Endpoints Reference

Chat

Status & Capabilities

Data Preview

Conversations

Advanced Use Cases

1. Issue Analysis with Complete Context

2. Streaming Response Handling

3. Multi-Step Investigation with History

4. Data Exploration via Agent

Error Handling

Common Errors

Chat Response Errors

Streaming Errors

Performance

Optimizations

Timeouts

Limits

Next Steps