vllm.inputs.engine ¶
Schema and utilities for inputs to the engine client (LLMEngine/AsyncLLM).
DecoderEngineInput module-attribute ¶
DecoderEngineInput: TypeAlias = (
TokensInput | MultiModalInput
)
A rendered DecoderPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
DecoderOnlyEngineInput module-attribute ¶
DecoderOnlyEngineInput: TypeAlias = (
TokensInput | EmbedsInput | MultiModalInput
)
A rendered DecoderOnlyPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
EncoderInput module-attribute ¶
EncoderInput: TypeAlias = (
TokensInput | MultiModalEncDecInput
)
A rendered EncoderPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
EngineInput module-attribute ¶
EngineInput: TypeAlias = (
DecoderOnlyEngineInput | EncoderDecoderInput
)
A rendered PromptType which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
MultiModalHashes module-attribute ¶
A dictionary containing per-item hashes for each modality.
MultiModalPlaceholders module-attribute ¶
A dictionary containing per-item placeholder ranges for each modality.
SingletonInput module-attribute ¶
SingletonInput: TypeAlias = (
DecoderOnlyEngineInput | MultiModalEncDecInput
)
A rendered SingletonPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
EmbedsInput ¶
Bases: _InputOptions
Represents embeddings-based input to the engine.
Source code in vllm/inputs/engine.py
EncoderDecoderInput ¶
Bases: TypedDict
A rendered EncoderDecoderPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
Source code in vllm/inputs/engine.py
arrival_time instance-attribute ¶
arrival_time: NotRequired[float]
The time when the input was received (before rendering).
decoder_prompt instance-attribute ¶
decoder_prompt: DecoderEngineInput
The inputs for the decoder portion.
encoder_prompt instance-attribute ¶
encoder_prompt: EncoderInput
The inputs for the encoder portion.
MultiModalEncDecInput ¶
Bases: MultiModalInput
Represents multi-modal input to the engine for encoder-decoder models.
Note
Even text-only encoder-decoder models are currently implemented as multi-modal models for convenience. (Example: https://github.com/vllm-project/bart-plugin)
Source code in vllm/inputs/engine.py
MultiModalInput ¶
Bases: _InputOptions
Represents multi-modal input to the engine.
Source code in vllm/inputs/engine.py
mm_kwargs instance-attribute ¶
Keyword arguments to be directly passed to the model after batching.
mm_placeholders instance-attribute ¶
mm_placeholders: MultiModalPlaceholders
For each modality, information about the placeholder tokens in prompt_token_ids.
prompt instance-attribute ¶
prompt: NotRequired[str]
The prompt text corresponding to the token IDs, if available.
prompt_token_ids instance-attribute ¶
The processed token IDs which includes placeholder tokens.
TokensInput ¶
Bases: _InputOptions
Represents token-based input to the engine.
Source code in vllm/inputs/engine.py
_InputOptions ¶
Bases: TypedDict
Additional options available to all SingletonInput types.
Source code in vllm/inputs/engine.py
arrival_time instance-attribute ¶
arrival_time: NotRequired[float]
The time when the input was received (before rendering).
cache_salt instance-attribute ¶
cache_salt: NotRequired[str]
Optional cache salt to be used for prefix caching.
_prepare_decoder_input_ids_for_generation ¶
_prepare_decoder_input_ids_for_generation(
decoder_input_ids: list[int],
decoder_start_token_id: int,
) -> list[int]
Prepare decoder_input_ids for generation with encoder-decoder models, according to GenerationMixin._prepare_decoder_input_ids_for_generation().
Source: https://github.com/huggingface/transformers/blob/v5.1.0/src/transformers/generation/utils.py
Source code in vllm/inputs/engine.py
embeds_input ¶
embeds_input(
prompt_embeds: Tensor,
*,
prompt: str | None = None,
cache_salt: str | None = None,
) -> EmbedsInput
Construct EmbedsInput from optional values.
Source code in vllm/inputs/engine.py
tokens_input ¶
tokens_input(
prompt_token_ids: list[int],
*,
prompt: str | None = None,
cache_salt: str | None = None,
) -> TokensInput
Construct TokensInput from optional values.