OpenAI-Compatible API

MedBackend provides an OpenAI-compatible API layer that allows you to integrate LLM capabilities into your healthcare applications while automatically storing all conversations in FHIR Communication resources. This enables seamless AI integration with full audit trails, RBAC compliance, and FHIR-native conversation history.

Overview

The OpenAI API layer acts as a middleware that:

Accepts requests in the OpenAI /v1/chat/completions format
Stores user prompts as FHIR Communication resources with ai-prompt category
Forwards requests to your configured LLM provider via LiteLLM
Stores AI responses as FHIR Communication resources with ai-response category
Returns responses in OpenAI-compatible format (streaming and non-streaming)
Maintains conversation threading via partOf and inResponseTo references

Key Benefits

Benefit	Description
Drop-in Replacement	Use existing OpenAI SDKs with minimal code changes
FHIR-Native Storage	All conversations stored as Communication resources
Audit Trail	Complete history of all AI interactions
RBAC Integration	AI access respects your existing permission rules
Multi-Provider	Switch between OpenAI, Anthropic, Azure, Ollama, and more
Streaming Support	Full SSE streaming for real-time responses

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                         Your Application                              │
│                    (OpenAI SDK / HTTP Client)                         │
└──────────────────────┬────────────────────────────────────────────────┘
                       │
                       ▼
┌──────────────────────────────────────────────────────────────────────┐
│                        MedBackend Backbone                            │
│  ┌─────────────────────────────────────────────────────────────────┐ │
│  │                    /v1/chat/completions                          │ │
│  │                                                                  │ │
│  │  1. Authenticate user (JWT validation)                          │ │
│  │  2. Store prompt → Communication (ai-prompt)                    │ │
│  │  3. Load conversation history (if conversation_id provided)     │ │
│  │  4. Forward to LLM (via LiteLLM)                                │ │
│  │  5. Stream response back to client                              │ │
│  │  6. Store response → Communication (ai-response)                │ │
│  │                                                                  │ │
│  └─────────────────────────────────────────────────────────────────┘ │
│                                      │                                │
│                                      ▼                                │
│                              FHIR Server                              │
└──────────────────────────────────────────────────────────────────────┘

Request Flow

The following diagram illustrates the complete request lifecycle for both streaming and non-streaming requests:

Client                                    MedBackend                              FHIR Server
  │                                           │                                       │
  │ 1. POST /v1/chat/completions              │                                       │
  │    Authorization: Bearer <jwt>            │                                       │
  │    X-Project-ID: <project-id>             │                                       │
  │    {                                      │                                       │
  │      "messages": [...],                   │                                       │
  │      "stream": true,                      │                                       │
  │      "conversation_id": "conv-001"        │                                       │
  │    }                                      │                                       │
  ├──────────────────────────────────────────▶│                                       │
  │                                           │                                       │
  │                                           │ 2. Validate JWT token                 │
  │                                           │    Extract user identity              │
  │                                           │    Check RBAC permissions             │
  │                                           │                                       │
  │                                           │ 3. Create Communication (ai-prompt)   │
  │                                           ├──────────────────────────────────────▶│
  │                                           │    { resourceType: "Communication",   │
  │                                           │      category: "ai-prompt",           │
  │                                           │      partOf: "conv-001",              │
  │                                           │      payload: [...messages] }         │
  │                                           │◀──────────────────────────────────────│
  │                                           │    { id: "comm-prompt-123" }          │
  │                                           │                                       │
  │                                           │ 4. Load conversation history          │
  │                                           ├──────────────────────────────────────▶│
  │                                           │    GET Communications                 │
  │                                           │    where partOf = "conv-001"          │
  │                                           │◀──────────────────────────────────────│
  │                                           │    [previous messages...]             │
  │                                           │                                       │
  │                                           │ 5. Forward to LLM (via LiteLLM)       │
  │                                           │    ┌─────────────────────────────┐    │
  │                                           │    │  OpenAI / Azure / Anthropic │    │
  │                                           │    │  Ollama / Bedrock / etc.    │    │
  │                                           │    └─────────────────────────────┘    │
  │                                           │                                       │
  │ 6. SSE Stream (if stream: true)           │                                       │
  │◀──────────────────────────────────────────│                                       │
  │    data: {"choices":[{"delta":            │                                       │
  │           {"content":"Based"}}]}          │                                       │
  │                                           │                                       │
  │◀──────────────────────────────────────────│                                       │
  │    data: {"choices":[{"delta":            │                                       │
  │           {"content":" on"}}]}            │                                       │
  │                                           │                                       │
  │◀──────────────────────────────────────────│                                       │
  │    data: {"choices":[{"delta":            │                                       │
  │           {"content":" your"}}]}          │                                       │
  │                                           │                                       │
  │    ... (streaming continues) ...          │                                       │
  │                                           │                                       │
  │◀──────────────────────────────────────────│                                       │
  │    data: {"choices":[{"delta":{},         │                                       │
  │           "finish_reason":"stop"}]}       │                                       │
  │                                           │                                       │
  │◀──────────────────────────────────────────│                                       │
  │    data: [DONE]                           │                                       │
  │                                           │                                       │
  │                                           │ 7. Create Communication (ai-response) │
  │                                           ├──────────────────────────────────────▶│
  │                                           │    { resourceType: "Communication",   │
  │                                           │      category: "ai-response",         │
  │                                           │      partOf: "conv-001",              │
  │                                           │      inResponseTo: "comm-prompt-123", │
  │                                           │      payload: "Based on your..." }    │
  │                                           │◀──────────────────────────────────────│
  │                                           │    { id: "comm-response-456" }        │
  │                                           │                                       │
  │ 8. Continue conversation...               │                                       │
  │    (new request with same                 │                                       │
  │     conversation_id)                      │                                       │
  │                                           │                                       │

Non-Streaming Flow

For non-streaming requests (stream: false), the flow is similar but the response is returned as a single JSON object after the LLM completes:

Client                                    MedBackend
  │                                           │
  │ POST /v1/chat/completions                 │
  │ { "stream": false, ... }                  │
  ├──────────────────────────────────────────▶│
  │                                           │
  │         (steps 2-5 same as above)         │
  │                                           │
  │ Complete JSON response                    │
  │◀──────────────────────────────────────────│
  │ {                                         │
  │   "id": "chatcmpl-abc123",                │
  │   "choices": [{                           │
  │     "message": {                          │
  │       "role": "assistant",                │
  │       "content": "Based on your..."       │
  │     },                                    │
  │     "finish_reason": "stop"               │
  │   }],                                     │
  │   "conversation_id": "conv-001",          │
  │   "communication_id": "comm-response-456" │
  │ }                                         │
  │                                           │

Key Points

Step	Description	FHIR Impact
1	Client sends request	-
2	JWT validation & RBAC check	User identity extracted
3	Store user prompt	Communication created (`ai-prompt`)
4	Load history	Query existing Communications in thread
5	Forward to LLM	External API call via LiteLLM
6	Stream response	Real-time SSE delivery
7	Store AI response	Communication created (`ai-response`)
8	Continue conversation	Use same `conversation_id`

Endpoints

POST /v1/chat/completions

Send a chat completion request. Compatible with the OpenAI API format.

Headers:

Authorization: Bearer <your-jwt-token>
Content-Type: application/json
X-Project-ID: <your-project-id>

Request Body:

{
  "messages": [
    {"role": "system", "content": "You are a helpful medical assistant."},
    {"role": "user", "content": "What medications am I currently taking?"}
  ],
  "stream": true,
  "model": "gpt-4o",
  "conversation_id": "comm-001",
  "patient_id": "patient-123",
  "max_tokens": 1000,
  "temperature": 0.7
}

Field	Type	Required	Description
`messages`	array	Yes	Array of message objects in OpenAI format
`stream`	boolean	No	Enable SSE streaming (default: `false`)
`model`	string	No	Model override (uses configured default if not specified)
`conversation_id`	string	No	Continue an existing conversation
`patient_id`	string	No	Patient context for FHIR queries
`max_tokens`	integer	No	Maximum tokens in response
`temperature`	float	No	Sampling temperature (0-2)

Response (Non-Streaming):

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1705312800,
  "model": "openai/gpt-4o",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Based on your records, you are currently taking..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 100,
    "total_tokens": 150
  },
  "conversation_id": "comm-001",
  "communication_id": "comm-response-456"
}

Response (Streaming):

When stream: true, responses are delivered as Server-Sent Events (SSE):

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Based"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" on"},"finish_reason":null}]}

...

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

GET /v1/models

List available models configured for your project.

Response:

{
  "object": "list",
  "data": [{
    "id": "openai/gpt-4o",
    "object": "model",
    "owned_by": "medbackend"
  }]
}

Integration Examples

Python (OpenAI SDK)

from openai import OpenAI

# Point the SDK to MedBackend
client = OpenAI(
    base_url="https://your-medbackend.com/v1",
    api_key="your-jwt-token",
    default_headers={
        "X-Project-ID": "your-project-id"
    }
)

# Non-streaming request
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful medical assistant."},
        {"role": "user", "content": "What are my recent lab results?"}
    ],
    extra_body={
        "patient_id": "patient-123",
        "conversation_id": "existing-conv-id"  # Optional: continue conversation
    }
)

print(response.choices[0].message.content)

Python (Streaming)

from openai import OpenAI

client = OpenAI(
    base_url="https://your-medbackend.com/v1",
    api_key="your-jwt-token",
    default_headers={"X-Project-ID": "your-project-id"}
)

# Streaming request
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Explain my diagnosis in simple terms."}
    ],
    stream=True,
    extra_body={"patient_id": "patient-123"}
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

JavaScript/TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://your-medbackend.com/v1',
  apiKey: 'your-jwt-token',
  defaultHeaders: {
    'X-Project-ID': 'your-project-id'
  }
});

// Non-streaming
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'user', content: 'What medications am I taking?' }
  ],
  // MedBackend-specific parameters
  patient_id: 'patient-123',
  conversation_id: 'conv-001'
} as any);

console.log(response.choices[0].message.content);

JavaScript (Streaming)

const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Summarize my health history.' }],
  stream: true,
  patient_id: 'patient-123'
} as any);

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}

cURL

# Non-streaming
curl -X POST https://your-medbackend.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -H "X-Project-ID: your-project-id" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What are my allergies?"}
    ],
    "patient_id": "patient-123"
  }'

# Streaming
curl -X POST https://your-medbackend.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -H "X-Project-ID: your-project-id" \
  -N \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Explain my lab results."}
    ],
    "stream": true,
    "patient_id": "patient-123"
  }'

Conversation Threading

MedBackend automatically manages conversation history using FHIR Communication resources. When you provide a conversation_id, the system:

Loads all previous messages in the conversation
Includes them in the context sent to the LLM
Links new messages using partOf and inResponseTo references

Starting a New Conversation

Omit conversation_id to start a new conversation. The response will include the new conversation_id:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Save this for continuing the conversation
conversation_id = response.conversation_id

Continuing a Conversation

Include the conversation_id to continue:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What did I ask about earlier?"}],
    extra_body={"conversation_id": conversation_id}
)

FHIR Communication Storage

All messages are stored as FHIR Communication resources:

User Prompt (ai-prompt)

{
  "resourceType": "Communication",
  "id": "comm-prompt-123",
  "status": "completed",
  "category": [{
    "coding": [{
      "system": "http://medbackend.com/fhir/communication-category",
      "code": "ai-prompt",
      "display": "AI Prompt"
    }]
  }],
  "subject": {
    "reference": "Patient/patient-123"
  },
  "sender": {
    "reference": "Practitioner/user-456"
  },
  "sent": "2025-01-13T10:30:00Z",
  "payload": [{
    "contentString": "What medications am I currently taking?"
  }],
  "partOf": [{
    "reference": "Communication/conversation-root"
  }]
}

AI Response (ai-response)

{
  "resourceType": "Communication",
  "id": "comm-response-789",
  "status": "completed",
  "category": [{
    "coding": [{
      "system": "http://medbackend.com/fhir/communication-category",
      "code": "ai-response",
      "display": "AI Response"
    }]
  }],
  "subject": {
    "reference": "Patient/patient-123"
  },
  "sender": {
    "reference": "Device/ai-agent-default"
  },
  "sent": "2025-01-13T10:30:05Z",
  "payload": [{
    "contentString": "Based on your records, you are currently taking..."
  }],
  "partOf": [{
    "reference": "Communication/conversation-root"
  }],
  "inResponseTo": [{
    "reference": "Communication/comm-prompt-123"
  }]
}

Configuration

Environment Variables

Configure the LLM provider in your MedBackend settings:

Variable	Description	Example
`AI_PROVIDER`	LLM provider	`openai`, `azure_openai`, `anthropic`, `ollama`
`AI_MODEL`	Model identifier	`gpt-4o`, `claude-3-5-sonnet`, `llama3`
`AI_API_KEY`	Provider API key	`sk-...`
`AI_API_BASE`	Custom API base URL (optional)	`https://your-azure.openai.azure.com`
`AI_MAX_TOKENS`	Default max tokens	`4096`

LiteLLM Model Strings

MedBackend uses LiteLLM for multi-provider support. Model strings follow the format provider/model:

Provider	Model String Example
OpenAI	`openai/gpt-4o`
Azure OpenAI	`azure/gpt-4o-deployment`
Anthropic	`anthropic/claude-3-5-sonnet-20241022`
Ollama	`ollama/llama3`
AWS Bedrock	`bedrock/anthropic.claude-3-sonnet`

Error Handling

The API returns standard HTTP status codes with detailed error messages:

Status	Description
`400`	Bad Request - Invalid request body or parameters
`401`	Unauthorized - Invalid or missing JWT token
`403`	Forbidden - RBAC permission denied
`404`	Not Found - Conversation or resource not found
`429`	Too Many Requests - Rate limit exceeded
`500`	Internal Server Error - LLM or server error

Error Response Format:

{
  "error": {
    "message": "Invalid conversation_id: conversation not found",
    "type": "invalid_request_error",
    "code": "conversation_not_found"
  }
}

Best Practices

Security

Always use HTTPS in production
Rotate JWT tokens regularly
Use patient-scoped queries when accessing patient data
Review audit logs for AI interactions

Performance

Use streaming for long responses to improve perceived latency
Batch related queries when possible
Consider conversation length - very long histories may impact response time

Conversation Management

Store conversation_id for multi-turn conversations
Use descriptive system prompts for consistent AI behavior
Consider implementing conversation summarization for very long threads

RBAC Integration

The AI agent respects your existing RBAC rules
Patient queries are scoped to the authenticated user's permissions
Configure appropriate validation rules for Communication resources

Troubleshooting

Common Issues

"Unauthorized" errors:

Verify your JWT token is valid and not expired
Ensure the X-Project-ID header is included
Check that your user has the required permissions

"Conversation not found" errors:

Verify the conversation_id exists and belongs to your project
Ensure the conversation hasn't been deleted

Streaming not working:

Check that your client supports SSE
Verify there are no proxies buffering the response
Use Accept: text/event-stream header

Slow responses:

Check LLM provider status
Reduce conversation history length
Consider using a faster model for simple queries

Overview​

Key Benefits​

Architecture​

Request Flow​

Non-Streaming Flow​

Key Points​

Endpoints​

POST /v1/chat/completions​

GET /v1/models​

Integration Examples​

Python (OpenAI SDK)​

Python (Streaming)​

JavaScript/TypeScript​

JavaScript (Streaming)​

cURL​

Conversation Threading​

Starting a New Conversation​

Continuing a Conversation​

FHIR Communication Storage​

User Prompt (ai-prompt)​

AI Response (ai-response)​

Configuration​

Environment Variables​

LiteLLM Model Strings​

Error Handling​

Best Practices​

Security​

Performance​

Conversation Management​

RBAC Integration​

Troubleshooting​

Common Issues​