Fresh Hacker News
▲
High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction
(jchandra.com)
14 points by
jchandra
2 days ago |
2 comments
▲
vivahir215
2 days ago
[-]
Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?
▲
jchandra
2 days ago
[-]
[dead]
!--frsh-comments:2-->
!--frsh-comments:1-->
▲
jchandra
2 days ago
[-]
[dead]
!--frsh-comments:3-->
!--frsh-comments:0-->