OpenAI-Compatible API
MedBackend provides an OpenAI-compatible API layer that allows you to integrate LLM capabilities into your healthcare applications while automatically storing all conversations in FHIR Communication resources. This enables seamless AI integration with full audit trails, RBAC compliance, and FHIR-native conversation history.
Overview
The OpenAI API layer acts as a middleware that:
- Accepts requests in the OpenAI
/v1/chat/completionsformat - Stores user prompts as FHIR Communication resources with
ai-promptcategory - Forwards requests to your configured LLM provider via LiteLLM
- Stores AI responses as FHIR Communication resources with
ai-responsecategory - Returns responses in OpenAI-compatible format (streaming and non-streaming)
- Maintains conversation threading via
partOfandinResponseToreferences
Key Benefits
| Benefit | Description |
|---|---|
| Drop-in Replacement | Use existing OpenAI SDKs with minimal code changes |
| FHIR-Native Storage | All conversations stored as Communication resources |
| Audit Trail | Complete history of all AI interactions |
| RBAC Integration | AI access respects your existing permission rules |
| Multi-Provider | Switch between OpenAI, Anthropic, Azure, Ollama, and more |
| Streaming Support | Full SSE streaming for real-time responses |
Architecture
┌──────────────────────────────────────────────────────────────────────┐
│ Your Application │
│ (OpenAI SDK / HTTP Client) │
└──────────────────────┬────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────── ─────────────────────────────┐
│ MedBackend Backbone │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ /v1/chat/completions │ │
│ │ │ │
│ │ 1. Authenticate user (JWT validation) │ │
│ │ 2. Store prompt → Communication (ai-prompt) │ │
│ │ 3. Load conversation history (if conversation_id provided) │ │
│ │ 4. Forward to LLM (via LiteLLM) │ │
│ │ 5. Stream response back to client │ │
│ │ 6. Store response → Communication (ai-response) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ FHIR Server │
└──────────────────────────────────────────────────────────────────────┘
Request Flow
The following diagram illustrates the complete request lifecycle for both streaming and non-streaming requests:
Client MedBackend FHIR Server
│ │ │
│ 1. POST /v1/chat/completions │ │
│ Authorization: Bearer <jwt> │ │
│ X-Project-ID: <project-id> │ │
│ { │ │
│ "messages": [...], │ │
│ "stream": true, │ │
│ "conversation_id": "conv-001" │ │
│ } │ │
├──────────────────────────────────────────▶│ │
│ │ │
│ │ 2. Validate JWT token │
│ │ Extract user identity │
│ │ Check RBAC permissions │
│ │ │
│ │ 3. Create Communication (ai-prompt) │
│ ├──────────────────────────────────────▶│
│ │ { resourceType: "Communication", │
│ │ category: "ai-prompt", │
│ │ partOf: "conv-001", │
│ │ payload: [...messages] } │
│ │◀──────────────────────────────────────│
│ │ { id: "comm-prompt-123" } │
│ │ │
│ │ 4. Load conversation history │
│ ├──────────────────────────────────────▶│
│ │ GET Communications │
│ │ where partOf = "conv-001" │
│ │◀──────────────────────────────────────│
│ │ [previous messages...] │
│ │ │
│ │ 5. Forward to LLM (via LiteLLM) │
│ │ ┌─────────────────────────────┐ │
│ │ │ OpenAI / Azure / Anthropic │ │
│ │ │ Ollama / Bedrock / etc. │ │
│ │ └─────────────────────────────┘ │
│ │ │
│ 6. SSE Stream (if stream: true) │ │
│◀──────────────────────────────────────────│ │
│ data: {"choices":[{"delta": │ │
│ {"content":"Based"}}]} │ │
│ │ │
│◀──────────────────────────────────────────│ │
│ data: {"choices":[{"delta": │ │
│ {"content":" on"}}]} │ │
│ │ │
│◀──────────────────────────────────────────│ │
│ data: {"choices":[{"delta": │ │
│ {"content":" your"}}]} │ │
│ │ │
│ ... (streaming continues) ... │ │
│ │ │
│◀──────────────────────────────────────────│ │
│ data: {"choices":[{"delta":{}, │ │
│ "finish_reason":"stop"}]} │ │
│ │ │
│◀──────────────────────────────────────────│ │
│ data: [DONE] │ │
│ │ │
│ │ 7. Create Communication (ai-response) │
│ ├──────────────────────────────────────▶│
│ │ { resourceType: "Communication", │
│ │ category: "ai-response", │
│ │ partOf: "conv-001", │
│ │ inResponseTo: "comm-prompt-123", │
│ │ payload: "Based on your..." } │
│ │◀────── ────────────────────────────────│
│ │ { id: "comm-response-456" } │
│ │ │
│ 8. Continue conversation... │ │
│ (new request with same │ │
│ conversation_id) │ │
│ │ │
Non-Streaming Flow
For non-streaming requests (stream: false), the flow is similar but the response is returned as a single JSON object after the LLM completes:
Client MedBackend
│ │
│ POST /v1/chat/completions │
│ { "stream": false, ... } │
├──────────────────────────────────────────▶│
│ │
│ (steps 2-5 same as above) │
│ │
│ Complete JSON response │
│◀─────────── ───────────────────────────────│
│ { │
│ "id": "chatcmpl-abc123", │
│ "choices": [{ │
│ "message": { │
│ "role": "assistant", │
│ "content": "Based on your..." │
│ }, │
│ "finish_reason": "stop" │
│ }], │
│ "conversation_id": "conv-001", │
│ "communication_id": "comm-response-456" │
│ } │
│ │
Key Points
| Step | Description | FHIR Impact |
|---|---|---|
| 1 | Client sends request | - |
| 2 | JWT validation & RBAC check | User identity extracted |
| 3 | Store user prompt | Communication created (ai-prompt) |
| 4 | Load history | Query existing Communications in thread |
| 5 | Forward to LLM | External API call via LiteLLM |
| 6 | Stream response | Real-time SSE delivery |
| 7 | Store AI response | Communication created (ai-response) |
| 8 | Continue conversation | Use same conversation_id |
Endpoints
POST /v1/chat/completions
Send a chat completion request. Compatible with the OpenAI API format.
Headers:
Authorization: Bearer <your-jwt-token>
Content-Type: application/json
X-Project-ID: <your-project-id>
Request Body:
{
"messages": [
{"role": "system", "content": "You are a helpful medical assistant."},
{"role": "user", "content": "What medications am I currently taking?"}
],
"stream": true,
"model": "gpt-4o",
"conversation_id": "comm-001",
"patient_id": "patient-123",
"max_tokens": 1000,
"temperature": 0.7
}
| Field | Type | Required | Description |
|---|---|---|---|
messages | array | Yes | Array of message objects in OpenAI format |
stream | boolean | No | Enable SSE streaming (default: false) |
model | string | No | Model override (uses configured default if not specified) |
conversation_id | string | No | Continue an existing conversation |
patient_id | string | No | Patient context for FHIR queries |
max_tokens | integer | No | Maximum tokens in response |
temperature | float | No | Sampling temperature (0-2) |
Response (Non-Streaming):
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1705312800,
"model": "openai/gpt-4o",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Based on your records, you are currently taking..."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 50,
"completion_tokens": 100,
"total_tokens": 150
},
"conversation_id": "comm-001",
"communication_id": "comm-response-456"
}
Response (Streaming):
When stream: true, responses are delivered as Server-Sent Events (SSE):
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Based"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" on"},"finish_reason":null}]}
...
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
GET /v1/models
List available models configured for your project.
Response:
{
"object": "list",
"data": [{
"id": "openai/gpt-4o",
"object": "model",
"owned_by": "medbackend"
}]
}
Integration Examples
Python (OpenAI SDK)
from openai import OpenAI
# Point the SDK to MedBackend
client = OpenAI(
base_url="https://your-medbackend.com/v1",
api_key="your-jwt-token",
default_headers={
"X-Project-ID": "your-project-id"
}
)
# Non-streaming request
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful medical assistant."},
{"role": "user", "content": "What are my recent lab results?"}
],
extra_body={
"patient_id": "patient-123",
"conversation_id": "existing-conv-id" # Optional: continue conversation
}
)
print(response.choices[0].message.content)
Python (Streaming)
from openai import OpenAI
client = OpenAI(
base_url="https://your-medbackend.com/v1",
api_key="your-jwt-token",
default_headers={"X-Project-ID": "your-project-id"}
)
# Streaming request
stream = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Explain my diagnosis in simple terms."}
],
stream=True,
extra_body={"patient_id": "patient-123"}
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
JavaScript/TypeScript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://your-medbackend.com/v1',
apiKey: 'your-jwt-token',
defaultHeaders: {
'X-Project-ID': 'your-project-id'
}
});
// Non-streaming
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'user', content: 'What medications am I taking?' }
],
// MedBackend-specific parameters
patient_id: 'patient-123',
conversation_id: 'conv-001'
} as any);
console.log(response.choices[0].message.content);
JavaScript (Streaming)
const stream = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Summarize my health history.' }],
stream: true,
patient_id: 'patient-123'
} as any);
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
}
cURL
# Non-streaming
curl -X POST https://your-medbackend.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-H "X-Project-ID: your-project-id" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What are my allergies?"}
],
"patient_id": "patient-123"
}'
# Streaming
curl -X POST https://your-medbackend.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-H "X-Project-ID: your-project-id" \
-N \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Explain my lab results."}
],
"stream": true,
"patient_id": "patient-123"
}'
Conversation Threading
MedBackend automatically manages conversation history using FHIR Communication resources. When you provide a conversation_id, the system:
- Loads all previous messages in the conversation
- Includes them in the context sent to the LLM
- Links new messages using
partOfandinResponseToreferences
Starting a New Conversation
Omit conversation_id to start a new conversation. The response will include the new conversation_id:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
# Save this for continuing the conversation
conversation_id = response.conversation_id
Continuing a Conversation
Include the conversation_id to continue:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What did I ask about earlier?"}],
extra_body={"conversation_id": conversation_id}
)
FHIR Communication Storage
All messages are stored as FHIR Communication resources:
User Prompt (ai-prompt)
{
"resourceType": "Communication",
"id": "comm-prompt-123",
"status": "completed",
"category": [{
"coding": [{
"system": "http://medbackend.com/fhir/communication-category",
"code": "ai-prompt",
"display": "AI Prompt"
}]
}],
"subject": {
"reference": "Patient/patient-123"
},
"sender": {
"reference": "Practitioner/user-456"
},
"sent": "2025-01-13T10:30:00Z",
"payload": [{
"contentString": "What medications am I currently taking?"
}],
"partOf": [{
"reference": "Communication/conversation-root"
}]
}
AI Response (ai-response)
{
"resourceType": "Communication",
"id": "comm-response-789",
"status": "completed",
"category": [{
"coding": [{
"system": "http://medbackend.com/fhir/communication-category",
"code": "ai-response",
"display": "AI Response"
}]
}],
"subject": {
"reference": "Patient/patient-123"
},
"sender": {
"reference": "Device/ai-agent-default"
},
"sent": "2025-01-13T10:30:05Z",
"payload": [{
"contentString": "Based on your records, you are currently taking..."
}],
"partOf": [{
"reference": "Communication/conversation-root"
}],
"inResponseTo": [{
"reference": "Communication/comm-prompt-123"
}]
}
Configuration
Environment Variables
Configure the LLM provider in your MedBackend settings:
| Variable | Description | Example |
|---|---|---|
AI_PROVIDER | LLM provider | openai, azure_openai, anthropic, ollama |
AI_MODEL | Model identifier | gpt-4o, claude-3-5-sonnet, llama3 |
AI_API_KEY | Provider API key | sk-... |
AI_API_BASE | Custom API base URL (optional) | https://your-azure.openai.azure.com |
AI_MAX_TOKENS | Default max tokens | 4096 |
LiteLLM Model Strings
MedBackend uses LiteLLM for multi-provider support. Model strings follow the format provider/model:
| Provider | Model String Example |
|---|---|
| OpenAI | openai/gpt-4o |
| Azure OpenAI | azure/gpt-4o-deployment |
| Anthropic | anthropic/claude-3-5-sonnet-20241022 |
| Ollama | ollama/llama3 |
| AWS Bedrock | bedrock/anthropic.claude-3-sonnet |
Error Handling
The API returns standard HTTP status codes with detailed error messages:
| Status | Description |
|---|---|
400 | Bad Request - Invalid request body or parameters |
401 | Unauthorized - Invalid or missing JWT token |
403 | Forbidden - RBAC permission denied |
404 | Not Found - Conversation or resource not found |
429 | Too Many Requests - Rate limit exceeded |
500 | Internal Server Error - LLM or server error |
Error Response Format:
{
"error": {
"message": "Invalid conversation_id: conversation not found",
"type": "invalid_request_error",
"code": "conversation_not_found"
}
}
Best Practices
Security
- Always use HTTPS in production
- Rotate JWT tokens regularly
- Use patient-scoped queries when accessing patient data
- Review audit logs for AI interactions
Performance
- Use streaming for long responses to improve perceived latency
- Batch related queries when possible
- Consider conversation length - very long histories may impact response time
Conversation Management
- Store
conversation_idfor multi-turn conversations - Use descriptive system prompts for consistent AI behavior
- Consider implementing conversation summarization for very long threads
RBAC Integration
- The AI agent respects your existing RBAC rules
- Patient queries are scoped to the authenticated user's permissions
- Configure appropriate validation rules for Communication resources
Troubleshooting
Common Issues
"Unauthorized" errors:
- Verify your JWT token is valid and not expired
- Ensure the
X-Project-IDheader is included - Check that your user has the required permissions
"Conversation not found" errors:
- Verify the
conversation_idexists and belongs to your project - Ensure the conversation hasn't been deleted
Streaming not working:
- Check that your client supports SSE
- Verify there are no proxies buffering the response
- Use
Accept: text/event-streamheader
Slow responses:
- Check LLM provider status
- Reduce conversation history length
- Consider using a faster model for simple queries