Much more accessible than the usual attempts to communicate about these issues to the general public. Earned a subscription, in the hopes of the algorithm promoting your piece more widely.
Brilliant essay with lots of great points! I particularly like your insight that humans do information processing and pattern recignition first, and langauge occurs after that and how that's similar to what LLMs do. It's something I have also noted before. I also like your distinction between neuralese and thinkish.
I have a question about improving neuralese training efficiency.
What if we construct a training dataset of neuralese in the same way that tokenized text datasets are constructed for LLMs - by recording an LLM's (which is trained on standard text) internal neural representations over time?
Could such a dataset then be used to train a model that performs next-neuralese prediction in parallel, thereby recovering the parallelism advantages of token-level language modeling?
Wow, I can't believe this doesn't have more views! Well-written, clear, thoughtful... and morbidly fascinating!
Thanks for helping me learn something new!
Thanks Phil! Glad you enjoyed!
Super helpful overview, thanks!
Much more accessible than the usual attempts to communicate about these issues to the general public. Earned a subscription, in the hopes of the algorithm promoting your piece more widely.
Thanks for the kind words and the subscribe. Really appreciate it!
Learning with you as always!
Brilliant essay with lots of great points! I particularly like your insight that humans do information processing and pattern recignition first, and langauge occurs after that and how that's similar to what LLMs do. It's something I have also noted before. I also like your distinction between neuralese and thinkish.
I have a question about improving neuralese training efficiency.
What if we construct a training dataset of neuralese in the same way that tokenized text datasets are constructed for LLMs - by recording an LLM's (which is trained on standard text) internal neural representations over time?
Could such a dataset then be used to train a model that performs next-neuralese prediction in parallel, thereby recovering the parallelism advantages of token-level language modeling?