Using geometry and physics to explain feature learning in deep neural networks

Deep neural networks (DNNs), the machine learning algorithms underpinning the functioning of large language models (LLMs) and other artificial intelligence (AI) models, learn to make accurate predictions by analyzing large amounts of data. These networks are structured in layers, each of which transforms input data into 'features' that guide the analysis of the next layer.

The process through which DNNs learn features has been the topic of numerous research studies and is ultimately the key to these models' good performance on a variety of tasks. Recently, some computer scientists have started exploring the possibility of modeling feature learning in DNNs using frameworks and approaches rooted in physics.

Researchers at the University of Basel and the University of Science and Technology of China discovered a phase diagram, a graph resembling those used in thermodynamics to delineate liquid, gaseous and solid phases of water, that represents how DNNs learn features under various conditions. Their paper, published in Physical Review Letters, models a DNN as a spring-block chain, a simple mechanical system that is often used to study interactions between linear (spring) and nonlinear (friction) forces.

"Cheng and I were at a workshop where there was an inspiring talk on 'a law of data separation,'" Ivan Dokmanić, the researcher who led the study, told Phys.org. "The layers of a deep neural network (but also of biological neural networks such as the human visual cortex) process inputs by progressively distilling and simplifying them.

"The deeper you are in the network, the more regular, more geometric these representations become, which means that representations of different classes of objects (e.g., representations of cats and dogs) become more separate and easier to distinguish. There's a way to measure this separation.

To read more, click here.