HSTU streaming training #178

WangPengGG · 2025-01-23T10:31:03Z

I understand that this code does not currently include a streaming training implementation, but I wanted to raise a concern about this topic for discussion.

In industrial streaming training settings, user behavior data typically arrives incrementally over several days, e.g., [d1, d2, ..., dn], with new data continuously appended, e.g., [d1, d2, ..., dn, dn+1]. A key issue in such scenarios is that historical data is gradually forgotten over time.

This can lead to a situation where historical data is trained on multiple times before it is forgotten, while newly arrived data—often the most critical for model updates—receives similar importance in terms of loss weighting as the historical data. This imbalance in the treatment of new and historical data can result in suboptimal model performance, as the model may fail to adequately prioritize the most recent and relevant user behavior patterns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HSTU streaming training #178

HSTU streaming training #178

WangPengGG commented Jan 23, 2025

HSTU streaming training #178

HSTU streaming training #178

Comments

WangPengGG commented Jan 23, 2025