Machine Learning Systems Intern

BrainChip

United States

United States

Ähnliche Jobs finden

Über

Hybrid SSM‑Transformer models have a unique advantage for on‑chip memory efficiency: SSM layers
compress sequence history into a fixed‑size recurrent state Attention layers
store key‑value caches that grow with context length This leads to an important design question: For a given model configuration and maximum context length, can on‑chip SRAM be sized so that inference runs entirely on chip—eliminating the need for slower off‑chip HBM or DRAM? What the intern will work on: The intern will model and analyze memory behavior during inference of hybrid SSM‑Transformer models, with a focus on avoiding off‑chip memory accesses. Key responsibilities include: Modeling data movement between
SRAM and HBM/DRAM
during inference Sweeping parameters such as: SRAM capacity Mapping the
feasibility boundary
where inference can be performed fully on chip Breaking down
per‑layer memory working sets Identifying
when and why memory spills occur Exploring
tiling and scheduling strategies
to extend the no‑spill region Validating analytical results through
simulation
#J-18808-Ljbffr

United States

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.

Ähnliche Jobs finden