Sliding Window Attention, SWA

滑動窗口注意力(Sliding Window Attention, SWA)

為了解決 Transformer 模型在處理長文本時的上下文記憶和計算效率問題而提出的創新機制

Sliding Window Attention (SWA) is a technique used in transformer models to limit the attention span of each token to a fixed size window around it.


Powered by Forestry.md