LLM Lab

This will be where I collect resources, ideas, papers, concepts and anything else related to LLMs.

To beginners, a word of caution. This field is new, so it’s fine if you take longer to build your basics and let the “world get ahead”, we aren’t getting AGI soon, and it’ll benefit you in the long run. I have written this as a rough roadmap, so check it out.

My primary source is twitter, so kindly just follow Andrej Karpathy to know what’s what in this field, and get started from there. (Writing because 90% of people I meet in-person aren’t on twitter, nor know who Karpathy is, including me a few months ago).

Some blogs to read to get to know, at a high level, what LLMs/AI are and what they aren’t:

One of the best ways to understand how LLMs can be used in practice, is to read about how the pros do it, or just look at various cookbooks:

The foremost tech stack that I know of is

for building anything, with only instructor having a JS version.

Frameworks, we got the popular ones:

There are pros and cons for each (yet to here the pros of tensorflow), so any is fine. PyTorch is well… more pythonic. Jax is more functional (if you do not know what that is, get better at programming), and Tensorflow has a .js version, which brings me to the web: ML on web has,

Programming Languages:

Now for those who are more interested in the theoretical aspects (maybe for research, hobby or just to learn), we have to solidify the mathematical foundation through my own blog, or any of Andrew Ng’s courses. (Personally, I find courses exhausting and slow, hence I just prefer blogs). There’s YouTubers, like :

Reading papers will help, a LOT. Go through ArXiv. Download some papers (kindly go for math heavy ones) and go through them, maybe implement some too

Go through this thread on twitter to get a better handle on the research side of LLMs. As of now, these are some of the “cutting edge” sub-fields that I know of in LLMs, AI. There maybe more research papers on these topics, but I have chosen the more foundational ones. My aim would be to work on building some of them in Mojo. New language, new architecture.

While we are in the theoretical part, we cannot ignore Quantam AI. Mostly the research is done by Google, IBM and Microsoft.

The above content mostly deals with the biggest paradigm of deep learning: autoregressive models. To explore and learn more about models outside of just next-token prediction, there is Reinforcement Learning, with some introductory papers [1, 2, 3, 4]. Reading more about DeepMind’s Alpha series, the most famous application of RL would also benefit you a lot. Learning more in-depth about Deep Reinforcement Learning could be done with the help of this openAI blog post.

Which leads us to Diffusion models. The content is spread across the internet about these models, with a high barrier to entry due to a lack of popularity and hence a lack of interest from the experts to distill the knowledge for the laymen or the novice learners. Reading some landmark papers may help, but I happened upon an excellent blog post (which refers to other posts and papers, so it’s could be the entrance to a rabbit’s hole) for diffusion, hence that would be a great place to start. Here is another great video on the same.