graham1@gekinzuku.comtoTechnology@beehaw.org•Google's Bard Urges Google to Drop Web Environment IntegrityEnglish
9·
1 year agoLarge language models literally do subspace projections on text to break it into contextual chunks, and then memorize the chunks. That’s how they’re defined.
Source: the paper that defined the transformer architecture and formulas for large language models, which has been cited in academic sources 85,000 times alone https://arxiv.org/abs/1706.03762
I believe your “They use attention mechanisms to figure out which parts of the text are important” is just a restatement of my “break it into contextual chunks”, no?