Daily AI Updates: January 8, 2026

January 8, 2026
Overview

VS Code 1.108 released, Qwen3-VL-Embedding and Reranker models launched, vLLM adds KV Offloading, Cerebras integrates GLM-4.7.

Main Content

  • Visual Studio Code 1.108 Release: Internal maintenance optimization closes nearly 6,000 issues. Agent Skills moves to stable release, and terminal IntelliSense user experience is significantly improved.

  • Qwen3-VL-Embedding Model: Based on Qwen3-VL foundation model supporting text, image, and video inputs across 30+ languages. Achieves SOTA performance on multimodal retrieval benchmarks. Open-sourced on Hugging Face with configurable embedding dimensions and quantization support.

  • Qwen3-VL-Reranker Model: Computes fine-grained relevance scores for improved retrieval accuracy. Supports multimodal inputs and multiple languages for image-text retrieval, video search, and multimodal RAG scenarios. Open-sourced with API coming soon.

  • vLLM KV Offloading Connector: New asynchronous KV cache offloading to CPU RAM improves concurrency. Achieves up to 9x throughput improvement on H100 and reduces TTFT time by 2-22x.

  • vLLM Qwen3-VL Integration: Adds compatibility for Qwen3-VL-Embedding and Qwen3-VL-Reranker with quick start guide for enhanced multimodal model inference efficiency.

  • Cerebras GLM-4.7 Integration: New GLM-4.7 model supports coding, tool agents, and multi-turn reasoning at 1000 tokens/s for coding (up to 1700 tokens/s for other scenarios). Offers 10x better price-performance than Sonnet 4.5.