Changelog
Follow up on the latest improvements and updates.
RSS
With Prompt Caching, DigitalOcean automatically recognizes repeated token prefixes and currently offers an 80% discount on cached input tokens compared to standard input pricing across supported models. See the pricing page for current rates, applicable terms, and any pricing updates. The result is lower costs and faster time-to-first-token, with no changes to your application code. Caching applies automatically, and every API response includes a
cached_tokens
count so you can verify your savings in real time. It is available today across a broad set of leading open models, with more models coming soon. Teams can now validate any model or inference router configuration on their own data before production. Run structured LLM-as-a-Judge evaluations across catalog models, fine-tuned models, BYOM imports, and router setups without stitching together a separate evaluation stack.
Claude Sonnet 5, Anthropic’s latest model for coding, autonomous agents, and professional work at scale, is now available through DigitalOcean Serverless Inference. Designed for production AI applications, Sonnet 5 can plan, use tools, and complete complex tasks while delivering near-Opus 4.8 performance on leading agentic benchmarks, including SWE-bench, BrowseComp, and OSWorld-Verified, at Sonnet pricing.
DOKS clusters can now authenticate users through an external OpenID Connect provider. Each cluster has its own independent configuration managed via doctl, so dev, staging, and production environments can each enforce distinct access policies. Token issuance and revocation are handled directly from the IdP, so deactivating a user there removes their cluster access without manual credential rotation.
GLM-5.1 and GLM-5.2, the latest open-source models from Z.ai for autonomous software engineering and repository-scale reasoning, are now available through DigitalOcean Inference Engine. GLM-5.1 is optimized for long-running agentic coding workflows and can sustain complex autonomous tasks for up to 8 hours, while GLM-5.2 introduces a 1M-token context window and dual reasoning modes for large-scale code analysis and refactoring. Both models support tool calling, structured output, and multilingual applications.
Simplify how AI applications and agents are built by embedding tool use directly into the Inference Engine and removing the need to stitch together separate APIs, credentials, and orchestration layers. You can enable models to use web search, web fetch, web mode, knowledge bases, MCP servers, and existing Anthropic/OpenAI tools directly within inference using their existing DigitalOcean access key.
Claude Fable 5 is now available through DigitalOcean Serverless Inference, bringing Anthropic's most advanced generally available model for autonomous knowledge work, coding, and complex agentic workflows. It delivers frontier reasoning, stronger first-shot accuracy, enhanced debugging and self-correction capabilities, advanced vision understanding, and enterprise-ready safety features.
NVIDIA Nemotron 3 Ultra, a 550B MoE open model built for long-running autonomous agents, is now available through the DigitalOcean Inference Engine. Designed for orchestration and complex reasoning across coding, deep research, and enterprise workflows, Nemotron 3 Ultra delivers up to 5x faster inference and up to 30% lower cost for agentic workloads while helping agents complete tasks faster and more efficiently in production environments.
Anthropic Claude Opus 4.8 is now available through DigitalOcean’s Serverless Inference. Built for high-reliability execution, Opus 4.8 delivers stronger instruction following, lower output variance, and more predictable behavior across complex workflows. With support for dynamic workflows, mid-conversation system updates, and high-speed inference modes, teams can run larger autonomous tasks with greater consistency and confidence.
DeepSeek-V4-Flash is now available through DigitalOcean’s Inference Engine, bringing high-throughput, cost-efficient intelligence for autonomous workflows, long-context reasoning, and multi-agent orchestration. Built on a large-scale Mixture-of-Experts architecture with a 1M token context window and dynamic reasoning modes, it delivers strong production performance while maintaining inference efficiency at scale.
Load More
→