How to diff a book against your mind

I was looking into embeddings recently and realized you can embed your personal knowledge base and a book’s sections/chapters/pages, then use cosine similarity to estimate what you’re likely to learn. mindblown.gif

The result is the ability to diff a book against your current knowledge base. This lets you glance at a technical book and see whether you’ll learn something new, deepen your understanding of a topic, or if it’s simply review.

It works like this: The book text is extracted page by page and chunked into 100–800 word segments, then labeled with the nearest table of contents entry so an unread book is still easy to contextualize. Personal notes under 500 words embed as a single chunk while larger ones are split on headings. Embeddings run locally via sentence-transformers, and the tool compares each book chunk to my notes to classify it as novel, a depth gap, or review.

I’ve encapsulated this into a small project for Obsidian on GitHub. I’d love it if you gave it a try and told me what you thought.

One example from my own knowledge base: reading Pete Hodgson’s Continuous Delivery in the Wild, the tool predicted two novel pieces specifically around correlating changes with impact, and ten pieces that would deepen my understanding of already documented concepts.

Related

Comments