+Integration with Aperture TypeScript SDK
+
+
+
+
+Import and setup Aperture Client:
+
+```typescript
+import { ApertureClient, FlowStatusEnum } from "@fluxninja/aperture-js";
+
+apertureClient = new ApertureClient({
+ address: "localhost:8080",
+ channelCredentials: grpc.credentials.createSsl(),
+});
+```
+
+Wrap the OpenAI API call with Aperture Client's `StartFlow` and `End` methods:
+
+```typescript
+const PRIORITIES: Record = {
+ paid_user: 10000,
+ trial_user: 1000,
+ free_user: 100,
+}
+
+let flow: Flow | undefined = undefined
+
+if (this.apertureClient) {
+ const charCount =
+ this.systemMessage.length +
+ message.length +
+ String("system" + "user").length
+ const labels: Record = {
+ api_key: CryptoES.SHA256(api.apiKey).toString(),
+ estimated_tokens: (
+ Math.ceil(charCount / 4) + responseTokens
+ ).toString(),
+ model_variant: modelVariant,
+ priority: String(
+ PRIORITIES[userType],
+ ),
+ }
+
+ flow = await this.apertureClient.StartFlow("openai", {
+ labels: labels,
+ grpcCallOptions: {
+ deadline: Date.now() + 1200000,
+ },
+ })
+}
+
+// As we use Aperture as a queue, send the message regardless of whether it was accepted or rejected
+try {
+ const { data: chatCompletion, response: raw } = await api.chat.completions
+ .create({
+ model: modelVariant,
+ temperature: temperature,
+ top_p: topP,
+ max_tokens: responseTokens,
+ messages: messages,
+ })
+ .withResponse()
+ .catch(err => {
+ logger.error(`openai chat error: ${JSON.stringify(err)}`)
+ throw err
+ })
+ )
+ return chatCompletion.choices[0]?.message?.content ?? ""
+} catch (e) {
+ flow?.SetStatus(FlowStatusEnum.Error)
+ throw e // throw the error to be caught by the chat function
+} finally {
+ flow?.End()
+}
+```
+
+
+
+
+
+
+Let's understand the code snippet above; we are creating a control point named
+`openai` and setting the labels, which will be used by the policy to identify
+and schedule the request. Before calling OpenAI, we rely on Aperture Agent to
+gate the request using the `StartFlow` method. To provide more context to
+Aperture, we also attach the following labels to each request:
+
+- `model_variant`: This specifies the model variant being used (`gpt-4`,
+ `gpt-3.5-turbo`, or `gpt-3.5-turbo-16k`). Requests and tokens per minute rate
+ limit policies are set individually for each model variant.
+- `api_key` - This is a cryptographic hash of the OpenAI API key, and rate
+ limits are enforced on a per-key basis.
+- `estimated_tokens`: As the tokens per minute quota limit is enforced based on
+ the
+ [estimated tokens for the completion request](https://platform.openai.com/docs/guides/rate-limits/reduce-the-max_tokens-to-match-the-size-of-your-completions),
+ we need to provide this number for each request to Aperture for metering.
+ Following OpenAI's
+ [guidance](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them),
+ we calculate `estimated_tokens` as `(character_count / 4) + max_tokens`. Note
+ that OpenAI's rate limiter doesn't tokenize the request using the model's
+ specific tokenizer but relies on a character count-based heuristic.
+- `priority`: Requests are ranked according to a priority number provided in
+ this label. For example, requests from `paid_user` can be given precedence
+ over those from `trial_user` and `free_user` in example code.
+
+### Policies
+
+To generate a policy using quota scheduler blueprint, `values` files should be
+generated first, specific to the policy. The values file can be generated using
+the following command:
+
+```mdx-code-block
+