Inference - Latitude.sh Docs

Latitude.sh AI Inference provides access to AI models through a unified, OpenAI-compatible API. Run inference on text generation, vision, code, and reasoning models from providers like Qwen, Meta, Google, Mistral, DeepSeek, and Moonshot AI.

AI Inference is currently available to select customers. Contact support to request access.

Features

API Keys: Generate and manage keys to authenticate your API requests
Models: Browse available models with pricing, context length, and capabilities
Playground: Test models interactively before integrating them
Overview: Monitor usage including requests, tokens, and costs

Managing API keys

Access API keys

Create a key

Click Create Key and provide a name for your key.

Copy your key

Copy the generated API key immediately. For security, the full key is only shown once.

API keys are prefixed with lat_ and can be deleted at any time from the API Keys page.

Browsing models

Access the models page

Filter models

Use the search bar to find models by name. Filter by provider or capability (text, vision, code, reasoning).

View model details

Each model shows its context length, input/output pricing per million tokens, and supported capabilities. Click on a model to open a side panel with a ready-to-use curl command.

Using the playground

Access the playground

Configure your request

Select a model from the dropdown in the chat header. In the sidebar, enter your API key and optionally adjust the system prompt, temperature, and max tokens.

Send a message

Type your prompt and send it to see the model’s response in real-time.

Making API requests

The AI Inference API is available at https://api.lsh.ai and is fully compatible with the OpenAI SDK. You can also make direct HTTP requests.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.lsh.ai/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="moonshotai/Kimi-K2.5",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Replace YOUR_API_KEY with your API key. You can find available model IDs on the Models page.

Viewing metrics

The Overview page shows your AI Inference usage for the last 30 days:

Total Requests: Number of API requests made
Tokens Used: Total input and output tokens consumed
Total Cost: Cumulative cost of all API usage

Bar charts display requests and tokens by model, and a table shows usage metrics and costs for each model you’ve used.

​Features

​Managing API keys

​Browsing models

​Using the playground

​Making API requests

​Viewing metrics

Features

Managing API keys

Browsing models

Using the playground

Making API requests

Viewing metrics