@graham1

graham1@gekinzuku.com · 1 year ago

I believe your “They use attention mechanisms to figure out which parts of the text are important” is just a restatement of my “break it into contextual chunks”, no?

graham1@gekinzuku.com · 1 year ago

Large language models literally do subspace projections on text to break it into contextual chunks, and then memorize the chunks. That’s how they’re defined.

Source: the paper that defined the transformer architecture and formulas for large language models, which has been cited in academic sources 85,000 times alone https://arxiv.org/abs/1706.03762