Alper Ilkbahar

August 11, 2025

[8 min read]

sndsk.newsroom.blogPost

Memory-Centric AI: Sandisk’s High Bandwidth Flash Will Redefine AI Infrastructure

Alper Ilkbahar

August 11, 2025

[8 min read]

This week at Future of Memory and Storage (FMS), I presented our latest updates on the memory technology we have been developing at Sandisk for nearly two years: High Bandwidth Flash (HBF).

We invented HBF to address the increasing compute-memory gap, or the “memory wall” problem, which is exacerbated by the ever-increasing discrepancy between the demands of exponential growth in AI models and the stagnant improvements to DRAM technology.

Addressing the memory wall

The memory wall problem is creating significant challenges in the datacenter and for edge AI applications. In the datacenter, current technologies are primarily trying to solve this problem by applying more GPU compute power. And at the edge, quite frankly, there are no good solutions.

HBF is our answer to this problem. This NAND-based architecture offers 8 to 16x the capacity of High Bandwidth Memory (HBM), while delivering the same read bandwidth at the same price points.

In the datacenter, we see HBF augmenting HBM with the ability to attach terabytes of memory to GPUs.

At the edge, our vision for HBF is to enable sophisticated AI models that are currently not possible due to cost, power, or space limitations.

memory-centric-ai-memory-all

Graph Title: AI and Memory WallSource: www.ayarlabs.com/glossary/memory-wall; AI and Memory Wall, Amir Gholami et al., IEEE Micro Journal, March 2024

Overview:The graph illustrates the evolution of AI models and hardware from 2014 to 2024, divided into two distinct eras:

  1. Compute Limited Era (2014–2020)
  2. Memory Capacity/Bandwidth Limited Era (2021–2024)

Y-Axis:Logarithmic scale representing the number of parameters in AI models, ranging from 10 million (10M) to 10 trillion (10T).

X-Axis:Timeline from 2014 to 2024

AI Models Plotted:

  • Inception
  • ResNet-50
  • ResNext-101
  • BERT
  • GPT-2
  • Turing-NGL
  • GPT-3
  • PaLM
  • GPT-4

Hardware Plotted:

  • P100 (12GB)
  • TPU V2 (16GB)
  • TPU V3 (32GB)
  • V100 (32GB)
  • A100 (40GB)
  • TPU v4 / Inferentia-2 (32GB)
  • A100 (80GB)
  • H100 (80GB)
  • HB00 (288GB)

Additional Visual Element:

  • A red arrow labeled "Memory Wall" marks the transition point where memory capacity and bandwidth become the primary limiting factors in scaling AI models.

The new AI paradigm

Frontier LLMs released in recent months are showing some clear trends for technology requirements around AI.

Model sizes and context lengths are increasing in every new generation, which drives the need for higher memory capacity, while the implementation of architectural innovations like Mixture of Experts (MoE) is causing compute requirements to trend downward.

At Sandisk, we love these trends. This combination of requiring more memory and less compute gives rise to a new paradigm we call Memory-Centric AI—and it is optimum for HBF-based systems.

Recent AI model trends
 

Title:Recent AI Model Trends

Subtitle:The New AI Paradigm: Memory-Centric Inference

Description:

The image contains a table comparing five AI models across four categories:

  • Total Parameters
  • Mixture of Experts
  • Active Parameters
  • Context Length

Models and their specifications:

  1. Llama4 Behemoth
    • Total Parameters: 2 trillion
    • Mixture of Experts: 16 experts
    • Active Parameters: 288 billion
    • Context Length: 256 thousand
  2. Llama4 Maverick
    • Total Parameters: 400 billion
    • Mixture of Experts: 128 experts
    • Active Parameters: 17 billion
    • Context Length: 1 million
  3. Llama4 Scout
    • Total Parameters: 109 billion
    • Mixture of Experts: 16 experts
    • Active Parameters: 17 billion
    • Context Length: 10 million
  4. Kimi K2
    • Total Parameters: 1 trillion
    • Mixture of Experts: 384 experts
    • Active Parameters: 32 billion
    • Context Length: 128 thousand
  5. Grok 4
    • Total Parameters: 1.7 trillion
    • Mixture of Experts: Data unavailable
    • Active Parameters: Data unavailable
    • Context Length: 256 thousand

Summary Statement Below the Table:

Higher Memory Capacity (Model Size, Context Lengths) plus Lower Compute Requirements (Mixture of Experts architecture) equals Memory-Centric AI - HBF Technology to Unlock Scalable, Efficient AI

NAND reimagined

Of course, new paradigms are not easy to establish. When we first introduced HBF as a concept, we received significant skepticism. Some in the industry did not think that a NAND-based technology could meet the requirements of AI. Commonly, we would hear that NAND latency levels are too high, write speeds do not match those of DRAM, and others were concerned with endurance capabilities.

The skeptics, of course, were thinking of the same NAND found in a standard SSD. HBF, however, is NAND reimagined with extreme performance, a very different beast.

So to demonstrate how HBF would perform compared to HBM, we created a simulation model that ran our HBF technology on a 400 billion-parameter Llama 3.1 model, alongside an HBM model on the same GPU.

An HBM-based GPU cannot run such a large model on a single device. To address this, we decided to give HBM an unfair advantage by assuming it can hypothetically scale to unlimited capacity.

As such, our simulation showed the effect of performance differences between HBF and HBM on the workload, while ignoring capacity differences. In observing the inferences at various stages of the inference engine flow, we found the overall performance difference between the two systems to be within a mere 2.2% of each other.

High Bandwidth Flash simulation chart
 

The image is a bar chart comparing the weight read bandwidth (GB/s) of HBF (Hypothetical Bandwidth Fabric) and HBM (High Bandwidth Memory). It highlights that HBF performance is within 2.2% of hypothetical unlimited capacity HBM, based on simulations using the Llama 3.1 model with 405B parameters.

  • X-axis: Different projection types (e.g., Attn QKV Projection, FFN Up-Projection, Final Linear).
  • Y-axis (left): Bandwidth in GB/s (0–1000).
  • Y-axis (right): Ratio of performance (0–1).
  • Gray bars: Unlimited size HBM
  • Brown bars: HBF
  • Red line: Performance ratio (HBF/HBM)

The chart visually demonstrates that HBF closely matches the performance of HBM across various model operations.

Finding allies

In equal measure to skeptics of HBF, Sandisk has found collaborative partners and innovators who are eager to build on HBF, eager to apply their expertise and bring innovative ideas.

One such example comes from Professor Joungho Kim at the Korea Advanced Institute of Science and Technology (KAIST). Professor Kim’s group proposed an architecture where 100GB of HBM could act as a caching layer in front of 1TB of HBF, which would deliver all the advantages of HBF without the performance degradation. KAIST calls Professor Kim the ‘father of HBM,’ and we were delighted to see the professor and his team adopt HBF as a great innovation platform.

During the Sandisk investor day event in February, we announced that Sandisk would form a Technical Advisory Board for HBF.

We are pleased and excited to announce two members of the HBF Technical Advisory Board: Professor David Patterson, Pardee Professor of Computer Science, Emeritus at the University of California at Berkeley, and a Google distinguished engineer—and Raja Koduri, Founder and CEO of Oxmiq Labs, and a computer engineer and business executive known for his leadership in graphics architecture.

We are deeply honored to have both Professor Patterson and Raja Koduri guide us in the development of HBF and building an ecosystem around it.

In addition to the advisory board, we also announced that we would create a standards-based ecosystem around HBF. Any breakthrough technology needs infrastructure, alignment, and a foundation on which others can build and innovate.

Sandisk is pleased to announce a collaboration with SK hynix to define a comprehensive industry standard around HBF. To commemorate the signing of our Memorandum of Understanding with SK hynix, I was joined on stage at FMS by Dr. Woopyo Jeong, Executive Vice President and Head of NAND Development at SK hynix.

The image captures a moment on stage at FMS where Alper Ilkbahar and Dr. Woopyo Jeong, Executive Vice President and Head of NAND Development at SK hynix, are shaking hands in front of a large screen displaying the Sandisk logo. The stage is illuminated with red lighting, and side panels frame the screen. On the right side of the image, logos for Sandisk and SK hynix are visible.
 

As many already know, SK hynix was instrumental in driving the adoption of HBM to great success within the industry. Together, our companies are aligning to define specs and requirements that will ensure HBF will scale both technically and commercially and exceed our respective customers’ expectations.

The partnership will lay the groundwork for a robust HBF ecosystem, and we cannot wait to get started on this vital work.

Looking forward

We at Sandisk were honored to receive a Best of Show, Most Innovative Technology award from FMS. This is our second in a row, and while awards are a validating recognition of our efforts, we look forward to more innovative work ahead.

The image features a glass award engraved with the following text: FMS - Most Innovative Technology: NAND Flash Solution, SanDisk, 2025
 

To that end, the most frequent question we have received since introducing HBF is “When are we going to see HBF?”

Sandisk is pleased to announce a targeted delivery of the first samples of HBF memory in the second half of 2026, with expectations for the first inference devices powered by HBF to sample in early 2027.

We believe these devices will redefine AI memory with massive capacity and minimal footprint. We foresee HBF enabling use cases from datacenter-scale models to edge-persistent personalization. It will establish new power, performance, and cost paradigms in AI inferencing.

We are thrilled to include HBF as part of our DNA of innovation at Sandisk 2.0 and look forward to the future it will power.

Title:HBF™ Performance Within 2.2% of Hypothetical Unlimited Capacity HBM

Subtitle: Simulated Performance Comparison of HBM and HBF (Llama 3.1 Model – 405B parameters)

Graph Type:

Y-Axis (Left):Weight Read Bandwidth in gigabytes per second (GB/s), ranging from 0 to 1000

Y-Axis (Right): Ratio of Performance, ranging from 0 to 1

X-Axis Categories:

  1. Attention QKV Projection
  2. Attention Output Projection
  3. Feedforward Network (FFN) Up-Projection
  4. FFN Down-Projection
  5. Final Linear
  6. Average over LLM Decode Pass

Data Representation:

  • Each category has two bars:
    • Gray bar: Unlimited size HBM
    • Brown bar: HBF
  • A red line runs across the top of the bars, representing the Ratio of Performance (HBF divided by HBM)

Summary:

The chart visually demonstrates that HBF achieves nearly the same performance as hypothetical unlimited capacity HBM across various model operations, with only a 2.2% difference.