NewsAI Engineering & DevTools AI News & Tools

SageMaker Async Inference Now Supports Inline Payloads

Dan Ferguson· June 17, 2026 View original

▶ The 60-second brief

Summary

Amazon SageMaker AI Async Inference now allows customers to send inference payloads directly within the request body of the InvokeEndpointAsync API. This enhancement eliminates the previous requirement of uploading input data to Amazon S3 before each invocation, streamlining the process.

Amazon has announced a significant update to its SageMaker AI Async Inference service, introducing support for inline request payloads. Previously, users were required to upload their input data to an Amazon S3 bucket before initiating an asynchronous inference request. With this new feature, data can now be directly included in the request body of the InvokeEndpointAsync API call. The update aims to reduce the complexity and overhead associated with managing temporary S3 storage for inference inputs. By allowing inline payloads, the process becomes more efficient, particularly for smaller data inputs or scenarios where direct API interaction is preferred, ultimately accelerating the deployment and testing of AI models on SageMaker.

Why it matters

This update simplifies and accelerates the deployment and testing of AI models on Amazon SageMaker, reducing operational overhead and improving developer productivity for AI engineering teams.

How to implement this in your domain

1Update your SageMaker Async Inference client to use the latest API version.
2Modify your inference invocation code to include payloads directly in the request body, bypassing S3 uploads.
3Evaluate existing workflows to identify opportunities to leverage inline payloads for efficiency gains.
4Test the performance and reliability of inline payloads with your specific AI models and data sizes.
5Train your AI engineering team on the new streamlined inference process.

Who benefits

AI/ML DevelopmentCloud ComputingSoftware EngineeringData Science

Key takeaways

Amazon SageMaker Async Inference now supports inline request payloads.
This eliminates the need for prior S3 uploads for inference data.
The update streamlines AI model deployment and testing workflows.
It improves efficiency and reduces operational complexity for developers.

Original post by Dan Ferguson

"Today, we’re announcing inline payload support for Amazon SageMaker AI Async Inference. Customers can now send inference payloads directly in the request body of the InvokeEndpointAsync API, removing the need to upload input data to Amazon Simple Storage Service (Amazon S3) befor…"

View on X

Originally posted by Dan Ferguson on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

More in AI Engineering & DevTools

AI Engineering & DevToolsAI News & Tools

MCP and A2A Protocols Standardize Agentic Internet Development

The Model Context Protocol (MCP) and Agent-to-Agent (A2A) Protocol are standardizing how AI agents discover tools, call services, and coordinate across systems. Understanding these protocols is crucial for developers building agent-compatible infrastructure.

Theo VasilisJun 28, 2026

Video

AI ResearchAI Engineering & DevTools

VISReg Enhances JEPA Training with Novel Regularization

A new research paper introduces VISReg, a Variance-Invariance-Sketching Regularization technique designed to improve the training of Joint Embedding Predictive Architectures (JEPA). This method aims to create more robust and generalizable self-supervised learning models.

@_akhaliqJun 28, 2026

AI News & ToolsAI Engineering & DevTools

Ford's AI-Driven Layoffs Backfire Significantly

Ford reportedly replaced human workers with AI, a decision that subsequently led to severe negative repercussions for the company.

speckxJun 28, 2026