Architecting in Azure

Explore effective Azure architecture strategies, learn what works and what breaks, and understand the reasons behind success and failure in real-world applications.

Token-Level Chargeback for Azure OpenAI Batch API Using APIM, Event Hub, and Durable Functions

Batch APIs are fantastic for high-volume, asynchronous workloads—but they come with a challenge every architect eventually runs into:

How do you track and charge back token consumption when thousands of completions are bundled into a single batch job?

If you’re building enterprise-grade AI systems in Azure, you know the drill. Finance wants transparency. Product teams want usage attribution. Engineering wants something that doesn’t require duct tape and a prayer.

That’s exactly where the [PTUBatchChargeback](https://github.com/sbray779/PTUBatchChargeback) repository comes in.

In this post, we’ll walk through how this solution works and how you can deploy it into your own Azure OpenAI batch workflows to get clean, auditable chargeback data—without rewriting your pipeline.

Why Chargeback Is Hard with Azure OpenAI Batch

The [Azure OpenAI Global Batch API](https://learn.microsoft.com/azure/ai-services/openai/how-to/batch) is optimized for throughput, not accounting. You submit a JSONL file with hundreds or thousands of chat completion requests, and the API processes them asynchronously. When the job completes, you get a JSONL output file with results—but no built-in mechanism to attribute token consumption back to the team or product that submitted the work.

If your organization needs to track:

– Which APIM subscription (i.e., which team or product) submitted which batch

– How many prompt and completion tokens were consumed

– How to allocate cost per batch across business units

– How to feed that data into KQL dashboards or FinOps pipelines

…you need a pattern that bridges the gap between batch completion and cost attribution.

What the Repository Does

The repository ([github.com/sbray779/PTUBatchChargeback](https://github.com/sbray779/PTUBatchChargeback)) contains an end-to-end implementation of a chargeback tracking pipeline designed specifically for Azure OpenAI Global Batch jobs routed through API Management (APIM).

At a high level, it gives you:

1. Automatic detection of completed batch jobs

An APIM policy fragment intercepts `GET /batches/{id}` responses. When a batch returns `status: completed`, the policy emits an event to Event Hub containing:

– Batch ID

– APIM Subscription ID (identifies the consuming team/product)

– APIM Product ID

– User ID

– Output file ID and request counts

2. Reliable, deduplicated event processing

An Azure Durable Function consumes events from Event Hub and uses the fan-out/fan-in pattern to process multiple batches in parallel. Table Storage provides atomic deduplication—since APIM emits an event on every poll of a completed batch, the function ensures each batch is processed exactly once.

3. Streaming token extraction

The function streams the batch output JSONL file in 8KB chunks through APIM, extracting token usage (`prompt_tokens`, `completion_tokens`, `total_tokens`) from each line without buffering the full response body in memory. This keeps the function lightweight even for large batch jobs.

4. Aggregated chargeback records in Log Analytics

Token usage is aggregated per batch and ingested into a custom Log Analytics table (`BatchTokenUsage_CL`) via the Azure Monitor Ingestion API (DCE/DCR). The repo includes KQL queries for chargeback reports, daily consumption trends, per-model breakdowns, and monthly summaries.

Architecture

How the Solution Works

Let’s walk through the workflow step by step.

1. Batch Job Completes — APIM Emits an Event

When a client polls `GET /batches/{batch-id}` through APIM and the response contains `”status”: “completed”`, the APIM policy fragment fires. It constructs a JSON event from two sources:

**From APIM context** (the key to chargeback):

– `subscriptionId` — the APIM subscription that submitted the job

– `productId` — the APIM product associated with that subscription

– `userId` — the authenticated user

**From the batch response body:**

– `batchId`, `outputFileId`, `errorFileId`, `requestCounts`, `model`

This event is sent to Event Hub via a fire-and-forget `send-one-way-request`. Two policy variants are provided:

**`batch-completion-policy.xml`** — uses a shared access key (simpler setup)

**`batch-completion-policy-msi.xml`** — uses APIM’s Managed Identity (recommended for production)

> **Important:** The `productId` field is populated from `context.Product?.Id`. This requires the APIM subscription to be associated with a Product. Standalone subscriptions will produce `”unknown”`.

2. Durable Function Picks Up the Event

The `BatchCompletionTrigger` function consumes events from the `function-processor` consumer group, filters for `status: completed`, and starts the `BatchProcessingOrchestrator`.

The orchestrator fans out one `ProcessSingleBatch` activity per batch event, enabling parallel processing when multiple batches complete simultaneously.

3. Deduplication via Table Storage

Because APIM emits an event on *every* poll of a completed batch (not just the first), deduplication is critical. The function uses an atomic `create_entity()` call to Table Storage. If the insert fails with `ResourceExistsError`, another invocation already claimed that batch, and processing is skipped.

4. Stream and Extract Token Usage

The function calls APIM to retrieve the batch metadata (`GET /batches/{id}`) and then streams the output JSONL file (`GET /files/{output_file_id}/content`) in 8KB chunks.

As each JSONL line is parsed, only the `usage` fields are extracted:

json

{

  “prompt_tokens”: 127,

  “completion_tokens”: 245,

  “total_tokens”: 372

}

The full completion text is never accumulated in memory—only the small usage record is kept. This is important for large batches where the output file could be many megabytes.

5. Aggregate and Ingest

Token usage is aggregated across all requests in the batch into a single record:

| Field | Description |

|——-|————-|

| `BatchId` | Batch job identifier |

| `SubscriptionId` | APIM subscription (cost attribution key) |

| `ProductId` | APIM product |

| `PromptTokens` | Total prompt tokens across all requests |

| `CompletionTokens` | Total completion tokens |

| `TotalTokens` | Combined total |

| `ModelName` | Model used (e.g., `gpt-4o-mini-2024-07-18`) |

| `RequestCount` | Number of individual requests in the batch |

This record is ingested into `BatchTokenUsage_CL` via the Azure Monitor Ingestion API using Managed Identity—no shared keys.

Querying for Chargeback

The repository includes ready-to-use KQL queries in `queries/chargeback-queries.kql`. Here are a few examples:

NOTE:  Column names may include type suffixes (_s, _d) depending on your table configuration. Adjust query column names to match your deployed schema

**Token usage by subscription (chargeback report):**

kql

BatchTokenUsage_CL

| where TimeGenerated > ago(30d)

| summarize

    TotalPromptTokens = sum(PromptTokens),

    TotalCompletionTokens = sum(CompletionTokens),

    TotalTokens = sum(TotalTokens),

    BatchCount = dcount(BatchId)

  by SubscriptionId

| order by TotalTokens desc

**Monthly chargeback summary:**

kql

BatchTokenUsage_CL

| where TimeGenerated > ago(90d)

| extend Month = startofmonth(TimeGenerated)

| summarize

    TotalTokens = sum(TotalTokens),

    BatchCount = dcount(BatchId)

  by Month, SubscriptionId, ProductId

| order by Month desc, TotalTokens desc

**Token usage by model:**

kql

BatchTokenUsage_CL

| where TimeGenerated > ago(30d)

| summarize TotalTokens = sum(TotalTokens), BatchCount = count() by ModelName

| order by TotalTokens desc

From Log Analytics, you can wire these into Power BI dashboards, Azure Workbooks, or export to external FinOps tools.

Security Design

The solution follows zero-trust principles throughout:

| Component | Auth Method | Details |

|———–|————-|———|

| Function → Event Hub | Managed Identity | `Azure Event Hubs Data Receiver` role — no SAS connection string |

| Function → Log Analytics | Managed Identity | `Monitoring Metrics Publisher` role on the DCR |

| Function → Key Vault | Managed Identity | `Key Vault Secrets User` — resolves `@Microsoft.KeyVault(…)` references |

| Function → APIM | Subscription key | Stored in Key Vault, resolved automatically at runtime |

| APIM → Event Hub (MSI variant) | Managed Identity | `Azure Event Hubs Data Sender` role |

| APIM → Event Hub (SAS variant) | Shared access key | Stored in Key Vault, referenced via APIM Named Value |

All role assignments are managed by Terraform.

A Note on the Batch API’s `metadata` Field

The Azure OpenAI Batch API supports a `metadata` property—a key-value map you can set when creating a batch:

json

POST /openai/batches

{

  “input_file_id”: “file-abc”,

  “endpoint”: “/v1/chat/completions”,

  “completion_window”: “24h”,

  “metadata”: {“cost_center”: “engineering”, “team”: “platform”}

}

This metadata is returned in the batch object when you `GET /batches/{id}`. However, this solution does **not** currently extract the batch’s `metadata` field in the APIM policy. Cost attribution is derived from the APIM subscription and product context instead—which is often more reliable because it’s enforced at the API gateway level rather than relying on the caller to set metadata correctly.

If your organization needs to pass batch-level metadata through to chargeback records, you can extend the APIM policy to include `batchResp[“metadata”]` in the Event Hub event payload.

Deployment

Prerequisites

– Azure CLI authenticated (`az login`)

– Terraform >= 1.5.0

– Azure Functions Core Tools v4

– Python 3.11+

1. Deploy Infrastructure

bash

cd terraform

cp terraform.tfvars.example terraform.tfvars

# Edit terraform.tfvars with your APIM details, resource group, etc.

terraform init && terraform apply

Terraform provisions: Event Hub namespace/hub/consumer group, Key Vault with secrets, Function App with Managed Identity, all RBAC role assignments, Log Analytics workspace with DCE/DCR/custom table, and Application Insights.

2. Apply the APIM Policy

Apply the policy fragment from `apim/batch-completion-policy-msi.xml` to your batch API GET operation in APIM. If using MSI auth, grant APIM’s managed identity the `Azure Event Hubs Data Sender` role on the Event Hub.

3. Deploy the Function App

“`bash

cd function-app

func azure functionapp publish func-batch-chargeback-dev –python

When You Should Use This Pattern

This solution is a good fit when:

– You route Azure OpenAI Global Batch API calls through APIM

– You need to attribute token consumption to specific teams, products, or cost centers

– You want aggregated chargeback data in Log Analytics

– You need deduplication for reliable exactly-once processing

– You’re building a FinOps-aligned architecture for AI workloads

If you need **item-level** cost attribution (per individual request within a batch), you can extend the function to ingest per-line records instead of the batch aggregate.

Final Thoughts

As organizations scale their use of Azure OpenAI batch processing, token-level cost attribution becomes essential. The PTUBatchChargeback repository gives you a production-ready pipeline that turns opaque batch completions into transparent, easy to query chargeback data—using APIM for cost identity, Event Hub for reliable message handling, Durable Functions for scalable processing, and Log Analytics for reporting.

It’s focused. It’s extensible. And it fits naturally into enterprise Azure architectures where APIM is already your front door for AI services.

**Get started:** [github.com/sbray779/PTUBatchChargeback](https://github.com/sbray779/PTUBatchChargeback)

One response to “Token-Level Chargeback for Azure OpenAI Batch API Using APIM, Event Hub, and Durable Functions”