Skip to main content

OpenAI-Compatible API

MedBackend provides an OpenAI-compatible API layer that allows you to integrate LLM capabilities into your healthcare applications while automatically storing all conversations in FHIR Communication resources. This enables seamless AI integration with full audit trails, RBAC compliance, and FHIR-native conversation history.

Overview

The OpenAI API layer acts as a middleware that:

  • Accepts requests in the OpenAI /v1/chat/completions format
  • Stores user prompts as FHIR Communication resources with ai-prompt category
  • Forwards requests to your configured LLM provider via LiteLLM
  • Stores AI responses as FHIR Communication resources with ai-response category
  • Returns responses in OpenAI-compatible format (streaming and non-streaming)
  • Maintains conversation threading via partOf and inResponseTo references

Key Benefits

BenefitDescription
Drop-in ReplacementUse existing OpenAI SDKs with minimal code changes
FHIR-Native StorageAll conversations stored as Communication resources
Audit TrailComplete history of all AI interactions
RBAC IntegrationAI access respects your existing permission rules
Multi-ProviderSwitch between OpenAI, Anthropic, Azure, Ollama, and more
Streaming SupportFull SSE streaming for real-time responses

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│ Your Application │
│ (OpenAI SDK / HTTP Client) │
└──────────────────────┬────────────────────────────────────────────────┘


┌──────────────────────────────────────────────────────────────────────┐
│ MedBackend Backbone │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ /v1/chat/completions │ │
│ │ │ │
│ │ 1. Authenticate user (JWT validation) │ │
│ │ 2. Store prompt → Communication (ai-prompt) │ │
│ │ 3. Load conversation history (if conversation_id provided) │ │
│ │ 4. Forward to LLM (via LiteLLM) │ │
│ │ 5. Stream response back to client │ │
│ │ 6. Store response → Communication (ai-response) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ FHIR Server │
└──────────────────────────────────────────────────────────────────────┘

Request Flow

The following diagram illustrates the complete request lifecycle for both streaming and non-streaming requests:

Client                                    MedBackend                              FHIR Server
│ │ │
│ 1. POST /v1/chat/completions │ │
│ Authorization: Bearer <jwt> │ │
│ X-Project-ID: <project-id> │ │
│ { │ │
│ "messages": [...], │ │
│ "stream": true, │ │
│ "conversation_id": "conv-001" │ │
│ } │ │
├──────────────────────────────────────────▶│ │
│ │ │
│ │ 2. Validate JWT token │
│ │ Extract user identity │
│ │ Check RBAC permissions │
│ │ │
│ │ 3. Create Communication (ai-prompt) │
│ ├──────────────────────────────────────▶│
│ │ { resourceType: "Communication", │
│ │ category: "ai-prompt", │
│ │ partOf: "conv-001", │
│ │ payload: [...messages] } │
│ │◀──────────────────────────────────────│
│ │ { id: "comm-prompt-123" } │
│ │ │
│ │ 4. Load conversation history │
│ ├──────────────────────────────────────▶│
│ │ GET Communications │
│ │ where partOf = "conv-001" │
│ │◀──────────────────────────────────────│
│ │ [previous messages...] │
│ │ │
│ │ 5. Forward to LLM (via LiteLLM) │
│ │ ┌─────────────────────────────┐ │
│ │ │ OpenAI / Azure / Anthropic │ │
│ │ │ Ollama / Bedrock / etc. │ │
│ │ └─────────────────────────────┘ │
│ │ │
│ 6. SSE Stream (if stream: true) │ │
│◀──────────────────────────────────────────│ │
│ data: {"choices":[{"delta": │ │
│ {"content":"Based"}}]} │ │
│ │ │
│◀──────────────────────────────────────────│ │
│ data: {"choices":[{"delta": │ │
│ {"content":" on"}}]} │ │
│ │ │
│◀──────────────────────────────────────────│ │
│ data: {"choices":[{"delta": │ │
│ {"content":" your"}}]} │ │
│ │ │
│ ... (streaming continues) ... │ │
│ │ │
│◀──────────────────────────────────────────│ │
│ data: {"choices":[{"delta":{}, │ │
│ "finish_reason":"stop"}]} │ │
│ │ │
│◀──────────────────────────────────────────│ │
│ data: [DONE] │ │
│ │ │
│ │ 7. Create Communication (ai-response) │
│ ├──────────────────────────────────────▶│
│ │ { resourceType: "Communication", │
│ │ category: "ai-response", │
│ │ partOf: "conv-001", │
│ │ inResponseTo: "comm-prompt-123", │
│ │ payload: "Based on your..." } │
│ │◀──────────────────────────────────────│
│ │ { id: "comm-response-456" } │
│ │ │
│ 8. Continue conversation... │ │
│ (new request with same │ │
│ conversation_id) │ │
│ │ │

Non-Streaming Flow

For non-streaming requests (stream: false), the flow is similar but the response is returned as a single JSON object after the LLM completes:

Client                                    MedBackend
│ │
│ POST /v1/chat/completions │
│ { "stream": false, ... } │
├──────────────────────────────────────────▶│
│ │
│ (steps 2-5 same as above) │
│ │
│ Complete JSON response │
│◀──────────────────────────────────────────│
│ { │
│ "id": "chatcmpl-abc123", │
│ "choices": [{ │
│ "message": { │
│ "role": "assistant", │
│ "content": "Based on your..." │
│ }, │
│ "finish_reason": "stop" │
│ }], │
│ "conversation_id": "conv-001", │
│ "communication_id": "comm-response-456" │
│ } │
│ │

Key Points

StepDescriptionFHIR Impact
1Client sends request-
2JWT validation & RBAC checkUser identity extracted
3Store user promptCommunication created (ai-prompt)
4Load historyQuery existing Communications in thread
5Forward to LLMExternal API call via LiteLLM
6Stream responseReal-time SSE delivery
7Store AI responseCommunication created (ai-response)
8Continue conversationUse same conversation_id

Endpoints

POST /v1/chat/completions

Send a chat completion request. Compatible with the OpenAI API format.

Headers:

Authorization: Bearer <your-jwt-token>
Content-Type: application/json
X-Project-ID: <your-project-id>

Request Body:

{
"messages": [
{"role": "system", "content": "You are a helpful medical assistant."},
{"role": "user", "content": "What medications am I currently taking?"}
],
"stream": true,
"model": "gpt-4o",
"conversation_id": "comm-001",
"patient_id": "patient-123",
"max_tokens": 1000,
"temperature": 0.7
}
FieldTypeRequiredDescription
messagesarrayYesArray of message objects in OpenAI format
streambooleanNoEnable SSE streaming (default: false)
modelstringNoModel override (uses configured default if not specified)
conversation_idstringNoContinue an existing conversation
patient_idstringNoPatient context for FHIR queries
max_tokensintegerNoMaximum tokens in response
temperaturefloatNoSampling temperature (0-2)

Response (Non-Streaming):

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1705312800,
"model": "openai/gpt-4o",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Based on your records, you are currently taking..."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 50,
"completion_tokens": 100,
"total_tokens": 150
},
"conversation_id": "comm-001",
"communication_id": "comm-response-456"
}

Response (Streaming):

When stream: true, responses are delivered as Server-Sent Events (SSE):

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Based"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" on"},"finish_reason":null}]}

...

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

GET /v1/models

List available models configured for your project.

Response:

{
"object": "list",
"data": [{
"id": "openai/gpt-4o",
"object": "model",
"owned_by": "medbackend"
}]
}

Integration Examples

Python (OpenAI SDK)

from openai import OpenAI

# Point the SDK to MedBackend
client = OpenAI(
base_url="https://your-medbackend.com/v1",
api_key="your-jwt-token",
default_headers={
"X-Project-ID": "your-project-id"
}
)

# Non-streaming request
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful medical assistant."},
{"role": "user", "content": "What are my recent lab results?"}
],
extra_body={
"patient_id": "patient-123",
"conversation_id": "existing-conv-id" # Optional: continue conversation
}
)

print(response.choices[0].message.content)

Python (Streaming)

from openai import OpenAI

client = OpenAI(
base_url="https://your-medbackend.com/v1",
api_key="your-jwt-token",
default_headers={"X-Project-ID": "your-project-id"}
)

# Streaming request
stream = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Explain my diagnosis in simple terms."}
],
stream=True,
extra_body={"patient_id": "patient-123"}
)

for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)

JavaScript/TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
baseURL: 'https://your-medbackend.com/v1',
apiKey: 'your-jwt-token',
defaultHeaders: {
'X-Project-ID': 'your-project-id'
}
});

// Non-streaming
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'user', content: 'What medications am I taking?' }
],
// MedBackend-specific parameters
patient_id: 'patient-123',
conversation_id: 'conv-001'
} as any);

console.log(response.choices[0].message.content);

JavaScript (Streaming)

const stream = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Summarize my health history.' }],
stream: true,
patient_id: 'patient-123'
} as any);

for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
}

cURL

# Non-streaming
curl -X POST https://your-medbackend.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-H "X-Project-ID: your-project-id" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What are my allergies?"}
],
"patient_id": "patient-123"
}'

# Streaming
curl -X POST https://your-medbackend.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-H "X-Project-ID: your-project-id" \
-N \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Explain my lab results."}
],
"stream": true,
"patient_id": "patient-123"
}'

Conversation Threading

MedBackend automatically manages conversation history using FHIR Communication resources. When you provide a conversation_id, the system:

  1. Loads all previous messages in the conversation
  2. Includes them in the context sent to the LLM
  3. Links new messages using partOf and inResponseTo references

Starting a New Conversation

Omit conversation_id to start a new conversation. The response will include the new conversation_id:

response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)

# Save this for continuing the conversation
conversation_id = response.conversation_id

Continuing a Conversation

Include the conversation_id to continue:

response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What did I ask about earlier?"}],
extra_body={"conversation_id": conversation_id}
)

FHIR Communication Storage

All messages are stored as FHIR Communication resources:

User Prompt (ai-prompt)

{
"resourceType": "Communication",
"id": "comm-prompt-123",
"status": "completed",
"category": [{
"coding": [{
"system": "http://medbackend.com/fhir/communication-category",
"code": "ai-prompt",
"display": "AI Prompt"
}]
}],
"subject": {
"reference": "Patient/patient-123"
},
"sender": {
"reference": "Practitioner/user-456"
},
"sent": "2025-01-13T10:30:00Z",
"payload": [{
"contentString": "What medications am I currently taking?"
}],
"partOf": [{
"reference": "Communication/conversation-root"
}]
}

AI Response (ai-response)

{
"resourceType": "Communication",
"id": "comm-response-789",
"status": "completed",
"category": [{
"coding": [{
"system": "http://medbackend.com/fhir/communication-category",
"code": "ai-response",
"display": "AI Response"
}]
}],
"subject": {
"reference": "Patient/patient-123"
},
"sender": {
"reference": "Device/ai-agent-default"
},
"sent": "2025-01-13T10:30:05Z",
"payload": [{
"contentString": "Based on your records, you are currently taking..."
}],
"partOf": [{
"reference": "Communication/conversation-root"
}],
"inResponseTo": [{
"reference": "Communication/comm-prompt-123"
}]
}

Configuration

Environment Variables

Configure the LLM provider in your MedBackend settings:

VariableDescriptionExample
AI_PROVIDERLLM provideropenai, azure_openai, anthropic, ollama
AI_MODELModel identifiergpt-4o, claude-3-5-sonnet, llama3
AI_API_KEYProvider API keysk-...
AI_API_BASECustom API base URL (optional)https://your-azure.openai.azure.com
AI_MAX_TOKENSDefault max tokens4096

LiteLLM Model Strings

MedBackend uses LiteLLM for multi-provider support. Model strings follow the format provider/model:

ProviderModel String Example
OpenAIopenai/gpt-4o
Azure OpenAIazure/gpt-4o-deployment
Anthropicanthropic/claude-3-5-sonnet-20241022
Ollamaollama/llama3
AWS Bedrockbedrock/anthropic.claude-3-sonnet

Error Handling

The API returns standard HTTP status codes with detailed error messages:

StatusDescription
400Bad Request - Invalid request body or parameters
401Unauthorized - Invalid or missing JWT token
403Forbidden - RBAC permission denied
404Not Found - Conversation or resource not found
429Too Many Requests - Rate limit exceeded
500Internal Server Error - LLM or server error

Error Response Format:

{
"error": {
"message": "Invalid conversation_id: conversation not found",
"type": "invalid_request_error",
"code": "conversation_not_found"
}
}

Best Practices

Security

  • Always use HTTPS in production
  • Rotate JWT tokens regularly
  • Use patient-scoped queries when accessing patient data
  • Review audit logs for AI interactions

Performance

  • Use streaming for long responses to improve perceived latency
  • Batch related queries when possible
  • Consider conversation length - very long histories may impact response time

Conversation Management

  • Store conversation_id for multi-turn conversations
  • Use descriptive system prompts for consistent AI behavior
  • Consider implementing conversation summarization for very long threads

RBAC Integration

  • The AI agent respects your existing RBAC rules
  • Patient queries are scoped to the authenticated user's permissions
  • Configure appropriate validation rules for Communication resources

Troubleshooting

Common Issues

"Unauthorized" errors:

  • Verify your JWT token is valid and not expired
  • Ensure the X-Project-ID header is included
  • Check that your user has the required permissions

"Conversation not found" errors:

  • Verify the conversation_id exists and belongs to your project
  • Ensure the conversation hasn't been deleted

Streaming not working:

  • Check that your client supports SSE
  • Verify there are no proxies buffering the response
  • Use Accept: text/event-stream header

Slow responses:

  • Check LLM provider status
  • Reduce conversation history length
  • Consider using a faster model for simple queries