vllm.config.kernel ¶
IrOpPriorityConfig ¶
Configuration for vLLM IR op priority for dispatching/lowering during the forward pass. Each member is a list of strings, which will be passed to vllm.ir.ops.
If specified manually, platform defaults will be appended to the lists. See KernelConfig.set_platform_defaults().
Source code in vllm/config/kernel.py
rms_norm class-attribute instance-attribute ¶
Priority list for vllm.ir.ops.rms_norm
compute_hash ¶
compute_hash() -> str
Produces a hash unique to the pass configuration. Any new fields that affect compilation should be added to the hash. Any future fields that don't affect compilation should be excluded.
Also, manually add IR op impl UUIDs to make sure they affect the compile cache.
Source code in vllm/config/kernel.py
set_priority ¶
Context manager to set the IR op priority for all op members. It also imports vllm.kernels to ensure all implementations are made available.
Source code in vllm/config/kernel.py
with_default classmethod ¶
with_default(
default: list[str], /, **kwargs: list[str]
) -> IrOpPriorityConfig
A helper to create an IrOpPriorityConfig where fields not specified in kwargs use the given default list.
Source code in vllm/config/kernel.py
KernelConfig ¶
Configuration for kernel selection and warmup behavior.
Source code in vllm/config/kernel.py
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
enable_flashinfer_autotune class-attribute instance-attribute ¶
enable_flashinfer_autotune: bool = None
If True, run FlashInfer autotuning during kernel warmup.
ir_op_priority class-attribute instance-attribute ¶
ir_op_priority: IrOpPriorityConfig = Field(
default_factory=IrOpPriorityConfig
)
vLLM IR op priority for dispatching/lowering during the forward pass. Platform defaults appended automatically during VllmConfig.post_init.
moe_backend class-attribute instance-attribute ¶
Backend for MoE expert computation kernels. Available options:
- "auto": Automatically select the best backend based on model and hardware
- "triton": Use Triton-based fused MoE kernels
- "deep_gemm": Use DeepGEMM kernels (FP8 block-quantized only)
- "cutlass": Use vLLM CUTLASS kernels
- "flashinfer_trtllm": Use FlashInfer with TRTLLM-GEN kernels
- "flashinfer_cutlass": Use FlashInfer with CUTLASS kernels
- "flashinfer_cutedsl": Use FlashInfer with CuteDSL kernels (FP4 only)
- "marlin": Use Marlin kernels (weight-only quantization)
- "aiter": Use AMD AITer kernels (ROCm only)
_skip_none_validation classmethod ¶
Skip validation if the value is None when initialization is delayed.
Source code in vllm/config/kernel.py
compute_hash ¶
compute_hash() -> str
Produces a hash unique to the pass configuration. Any new fields that affect compilation should be added to the hash. Any future fields that don't affect compilation should be excluded.
Source code in vllm/config/kernel.py
set_platform_defaults ¶
set_platform_defaults(vllm_config: VllmConfig) -> None
Set platform-specific defaults for the kernel config.