liamdgray
|
12 days ago
|
on: Chrome DevTools MCP (2025)
Please do!
liamdgray
|
4 months ago
|
on: Denmark's government aims to ban access to social media for children under 15
Some students even wish for a ban to reduce the pressure to keep up with social media.
That reminded me of Warren Buffet asking for his kind and to be taxed more.
liamdgray
|
7 months ago
|
on: ISR: Invertible Symbolic Regression (2024)
Abstract: "We introduce an Invertible Symbolic Regression (ISR) method. It is a machine learning technique that generates analytical relationships between inputs and outputs of a given dataset via invertible maps (or architectures). The proposed ISR method naturally combines the principles of Invertible Neural Networks (INNs) and Equation Learner (EQL), a neural network-based symbolic architecture for function learning. In particular, we transform the affine coupling blocks of INNs into a symbolic framework, resulting in an end-to-end differentiable symbolic invertible architecture that allows for efficient gradient-based learning. The proposed ISR framework also relies on sparsity promoting regularization, allowing the discovery of concise and interpretable invertible expressions. We show that ISR can serve as a (symbolic) normalizing flow for density estimation tasks. Furthermore, we highlight its practical applicability in solving inverse problems, including a benchmark inverse kinematics problem, and notably, a geoacoustic inversion problem in oceanography aimed at inferring posterior distributions of underlying seabed parameters from acoustic signals."
liamdgray
|
7 months ago
|
on: Neural Symbolic Regression that scales (2021)
Abstract: "Symbolic equations are at the core of scientific discovery. The task of discovering the underlying equation from a set of input-output pairs is called symbolic regression. Traditionally, symbolic regression methods use hand-designed strategies that do not improve with experience. In this paper, we introduce the first symbolic regression method that leverages large scale pre-training. We procedurally generate an unbounded set of equations, and simultaneously pre-train a Transformer to predict the symbolic equation from a corresponding set of input-output-pairs. At test time, we query the model on a new set of points and use its output to guide the search for the equation. We show empirically that this approach can re-discover a set of well-known physical equations, and that it improves over time with more data and compute."
liamdgray
|
7 months ago
|
on: Recasting Self-Attention with Holographic Reduced Representations (2023)
Abstract: "In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the O(T^2) memory and O(T^2H) compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of T≥100,000 are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a “Hrrformer” we obtain several benefits including O(THlogH) time complexity, O(TH)
space complexity, and convergence in 10× fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to 280× faster to train on the Long Range Arena benchmark."
liamdgray
|
7 months ago
|
on: Composing Linear Layers from Irreducibles
Abstract: "Contemporary large models often exhibit behaviors suggesting the presence of low-level primitives that compose into modules with richer functionality, but these fundamental building blocks remain poorly understood. We investigate this compositional structure in linear layers by asking: can we identify/synthesize linear transformations from a minimal set of geometric primitives? Using Clifford algebra, we show that linear layers can be expressed as compositions of bivectors -- geometric objects encoding oriented planes -- and introduce a differentiable algorithm that decomposes them into products of rotors. This construction uses only O(log^2 d) parameters, versus O(d^2) required by dense matrices. Applied to the key, query, and value projections in LLM attention layers, our rotor-based layers match the performance of strong baselines such as block-Hadamard and low-rank approximations. Our findings provide an algebraic perspective on how these geometric primitives can compose into higher-level functions within deep models."
liamdgray
|
7 months ago
|
on: Introduction to latent variable energy-based models (2024)
Abstract: "Current automated systems have crucial limitations that need to
be addressed before artificial intelligence can reach human-like levels and bring
new technological revolutions. Among others, our societies still lack level-5 self driving cars, domestic robots, and virtual assistants that learn reliable world models, reason, and plan complex action sequences. In these notes, we summarize the main ideas behind the architecture of autonomous intelligence of the future proposed by Yann LeCun. In particular, we introduce energy-based and latent variable models and combine their advantages in the building block of LeCun’s proposal, that is, in the hierarchical joint-embedding predictive architecture."
liamdgray
|
7 months ago
|
on: Modern Methods in Associative Memory
Abstract:
"Associative Memories like the famous Hopfield Networks are elegant models for describing fully recurrent neural networks whose fundamental job is to store and retrieve information. In the past few years they experienced a surge of interest due to novel theoretical results pertaining to their information storage capabilities, and their relationship with SOTA AI architectures, such as Transformers and Diffusion Models. These connections open up possibilities for interpreting the computation of traditional AI networks through the theoretical lens of Associative Memories. Additionally, novel Lagrangian formulations of these networks make it possible to design powerful distributed models that learn useful representations and inform the design of novel architectures. This tutorial provides an approachable introduction to Associative Memories, emphasizing the modern language and methods used in this area of research, with practical hands-on mathematical derivations and coding notebooks."
liamdgray
|
7 months ago
|
on: Dense Associative Memory for Pattern Recognition (2016)
"Abstract
A model of associative memory is studied, which stores and reliably retrieves many more patterns than the number of neurons in the network. We propose a simple duality between this dense associative memory and neural networks commonly used in deep learning. On the associative memory side of this duality, a family of models that smoothly interpolates between two limiting cases can be constructed. One limit is referred to as the feature-matching mode of pattern recognition, and the other one as the prototype regime. On the deep learning side of the duality, this family corresponds to feedforward neural networks with one hidden layer and various activation functions, which transmit the activities of the visible neurons to the hidden layer. This family of activation functions includes logistics, rectified linear units, and rectified polynomials of higher degrees. The proposed duality makes it possible to apply energy-based intuition from associative memory to analyze computational properties of neural networks with unusual activation functions - the higher rectified polynomials which until now have not been used in deep learning. The utility of the dense memories is illustrated for two test cases: the logical gate XOR and the recognition of handwritten digits from the MNIST data set."
liamdgray
|
7 months ago
|
on: Hypertokens: Holographic Associative Memory in Tokenized LLMs
Is this intended to run on a quantum computer? You mention Grover's search algorithm, for example.
liamdgray
|
7 months ago
|
on: How to put algorithms into neural networks? (2019) [video]
Recorded at the ML in PL 2019 Conference, the University of Warsaw, 22-24 November 2019.
Anton Osokin (Higher School of Economics, Moscow), https://aosokin.github.io/
Slides available at https://docs.mlinpl.org/conference/2019/slides/anton_osokin_...
Abstract:
Recently, deep neural nets have shown amazing results in such fields as computer vision, natural language processing, etc. To build such networks, we usually use layers from a relatively small dictionary of available modules (fully-connected, convolutional, recurrent, etc.). Being restricted with this set of modules complicates transferring technology to new tasks. On the other hand, many important applications already have a long history and successful algorithmic solutions. Is it possible to use existing methods to construct better networks? In this talk, we will cover several ways of putting algorithms into networks and discuss their pros and cons. Specifically, we will touch using optimization algorithms as structured pooling, unrolling of algorithm iterations into network layers and direct differentiation of the output w.r.t. the input. We will illustrate these approaches on applications from structured-output prediction and computer vision.
liamdgray
|
7 months ago
|
on: Improved Analytic Learned Iterative Shrinkage Thresholding Algorithm
Abstract
Tomographic Synthetic Aperture Radar (TomoSAR) building object height inversion is a sparse reconstruction problem that utilizes the data obtained from several spacecraft passes to invert the scatterer position in the height direction. In practical applications, the number of passes is often small, and the observation data are also small due to the objective conditions, so this study focuses on the inversion under the restricted observation data conditions. The Analytic Learned Iterative Shrinkage Thresholding Algorithm (ALISTA) is a kind of deep unfolding network algorithm, which is a combination of the Iterative Shrinkage Thresholding Algorithm (ISTA) and deep learning technology, and it has the advantages of both. The ALISTA is one of the representative algorithms for TomoSAR building object height inversion. However, the structure of the ALISTA algorithm is simple, which has neither the excellent connection structure of a deep learning network nor the acceleration format combined with the ISTA algorithm. Therefore, this study proposes two directions of improvement for the ALISTA algorithm: firstly, an improvement in the inter-layer connection of the network by introducing a connection similar to residual networks obtains the Extragradient Analytic Learned Iterative Shrinkage Thresholding Algorithm (EALISTA) and further proves that the EALISTA achieves linear convergence; secondly, there is an improvement in the iterative format of the intra-layer iteration of the network by introducing the Nesterov momentum acceleration, which obtains the Fast Analytic Learned Iterative Shrinkage Thresholding Algorithm (FALISTA). We first performed inversion experiments on simulated data, which verified the effectiveness of the two proposed algorithms. Then, we conducted TomoSAR building object height inversion experiments on limited measured data and used the deviation metric P to measure the robustness of the algorithms to invert under restricted observation data. The results show that both proposed algorithms have better robustness, which verifies the superior performance of the two algorithms. In addition, we further analyze how to choose the most suitable algorithms for inversion in engineering practice applications based on the results of the experiments on measured data.
Keywords: ALISTA; residual structure; Nesterov acceleration; height inversion
liamdgray
|
7 months ago
|
on: In-context denoising with one-layer transformers
Keywords: associative memory, Hopfield networks, transformers, attention, in-context learning, denoising
Abstract:
We introduce in-context denoising, a task that refines the connection between attention-based architectures and dense associative memory (DAM) networks, also known as modern Hopfield networks. Using a Bayesian framework, we show theoretically and empirically that certain restricted denoising problems can be solved optimally even by a single-layer transformer. We demonstrate that a trained attention layer processes each denoising prompt by performing a single gradient descent update on a context-aware DAM energy landscape, where context tokens serve as associative memories and the query token acts as an initial state. This one-step update yields better solutions than exact retrieval of either a context token or a spurious local minimum, providing a concrete example of DAM networks extending beyond the standard retrieval paradigm. Overall, this work solidifies the link between associative memory and attention mechanisms first identified by Ramsauer et al., and demonstrates the relevance of associative memory models in the study of in-context learning.
liamdgray
|
7 months ago
|
on: Algorithm Unrolling: Interpretable, Efficient Deep Learning for Sig&Img (2019)
Abstract: "Deep neural networks provide unprecedented performance gains in many real world problems in signal and image processing. Despite these gains, future development and practical deployment of deep networks is hindered by their blackbox nature, i.e., lack of interpretability, and by the need for very large training sets. An emerging technique called algorithm unrolling or unfolding offers promise in eliminating these issues by providing a concrete and systematic connection between iterative algorithms that are used widely in signal processing and deep neural networks. Unrolling methods were first proposed to develop fast neural network approximations for sparse coding. More recently, this direction has attracted enormous attention and is rapidly growing both in theoretic investigations and practical applications. The growing popularity of unrolled deep networks is due in part to their potential in developing efficient, high-performance and yet interpretable network architectures from reasonable size training sets. In this article, we review algorithm unrolling for signal and image processing. We extensively cover popular techniques for algorithm unrolling in various domains of signal and image processing including imaging, vision and recognition, and speech processing. By reviewing previous works, we reveal the connections between iterative algorithms and neural networks and present recent theoretical results. Finally, we provide a discussion on current limitations of unrolling and suggest possible future research directions."
liamdgray
|
7 months ago
|
on: Hypertokens: Holographic Associative Memory in Tokenized LLMs
In Section 2.8, you write "full implementation details and extended results are provided in the appendix." Which appendix?
I imagine you may be withholding some of the details until after the conference at which, it seems, you will present this week. I wish you well!
Meanwhile, you may not have intended to nerd-snipe, but that has been the effect for me. Now I have Manus trying to implement the paper for me, because why not? I envision a future in which publishing a conceptual paper often results in working code provided by a reader, a la "stone soup."
liamdgray
|
7 months ago
|
on: Hypertokens: Holographic Associative Memory in Tokenized LLMs
liamdgray
|
7 months ago
|
on: Hypertokens: Holographic Associative Memory in Tokenized LLMs
Abstract: "Large Language Models (LLMs) exhibit remarkable capabilities but suffer from apparent precision loss, reframed here as information spreading. This reframing shifts the problem from computational precision to an information-theoretic communication issue. We address the K:V and V:K memory problem in LLMs by introducing HDRAM (Holographically Defined Random Access Memory), a symbolic memory framework treating transformer latent space as a spread-spectrum channel. Built upon hypertokens, structured symbolic codes integrating classical error-correcting codes (ECC), holographic computing, and quantum-inspired search, HDRAM recovers distributed information through principled despreading. These phase-coherent memory addresses enable efficient key-value operations and Grover-style search in latent space. By combining ECC grammar with compressed sensing and Krylov subspace alignment, HDRAM significantly improves associative retrieval without architectural changes, demonstrating how Classical-Holographic-Quantum-inspired (CHQ) principles can fortify transformer architectures."
liamdgray
|
9 months ago
|
on: Universal pre-training by iterated random computation
Abstract: "We investigate the use of randomly generated data for the sake of pre-training a model. We justify this approach theoretically from the perspective of algorithmic complexity, building on recent research that shows that sequence models can be trained to approximate Solomonoff induction. We derive similar, but complementary theoretical results. We show empirically that synthetically generated data can be used to pre-train a model before the data is seen. We replicate earlier results that models trained this way show zero-shot in-context learning across a variety of datasets, and that this performance improves with scale. We extend earlier results to real-world data, and show that finetuning a model after pre-training offers faster convergence and better generalization."
liamdgray
|
11 months ago
|
on: The Deep Learning Model of Higher-Lower-Order Cognition, Memory, and Affection
Abstract:
We firstly simulated disease dynamics by KAN (Kolmogorov-Arnold Networks) nearly 4 years ago, but the kernel functions in the edge include the exponential number of infected and discharged people and is also in line with the Kolmogorov-Arnold representation theorem, and the shared weights in the edge are the infection rate and cure rate, and used activation function by tanh at the node of edge. And this Arxiv preprint version 1 of March 2022 is an upgraded version of KAN, considering the invariant coarse-grained which calculated by residual or gradient of MSE loss. The improved KAN is PNN (Plasticity Neural Networks) or ELKAN (Edge Learning KNN), in addition to edge learning, it also considered the trimming of the edge. We not inspired by the Kolmogorov-Arnold representation theorem but inspired by the brain science. The ELKAN to explain brain, the variables correspond to different types of neurons, the learning edge can be explained by rebalance of synaptic strength and glial cells phagocytose synapses, and the kernel function means the discharge of neurons and synapses, different neurons and edges mean brain regions. Through testing by cosine, the ELKAN or ORPNN (Optimized Range PNN) is better than the KAN or CRPNN (Constant Range PNN).The ELKAN is more general to explore brain, such as mechanism of consciousness, interactions of natural frequencies in brain regions, synaptic and neuronal discharge frequencies, and data signal frequencies; mechanism of Alzheimer's disease, the Alzheimer's patients has more high frequencies in the upstream brain regions; long short-term relatively good and inferior memory which means gradient of architecture and architecture; turbulent energy flow in different brain regions, turbulence critical conditions need to be met; heart-brain of the quantum entanglement may occur between the emotions of heartbeat and the synaptic strength of brain potentials.
liamdgray
|
11 months ago
|
on: Digital recording system with time-bracketed authentication by on-line challenge
Bennett, Charles Henry, David Peter DiVincenzo, and Ralph Linsker. "Digital recording system with time-bracketed authentication by on-line challenges and method of authenticating recordings." U.S. Patent No. 5,764,769. 9 Jun. 1998.
Abstract
An apparatus and method produce a videotape or other recording that cannot be pre- or post-dated, nor altered, nor easily fabricated by electronically combining pre-recorded material. In order to prevent such falsification, the camera or other recording apparatus periodically receives certifiably unpredictable signals ("challenges") from a trusted source, causes these signals to influence the scene being recorded, then periodically forwards a digest of the ongoing digital recording to a trusted repository. The unpredictable challenges prevent pre-dating of the recording before the time of the challenge, while the storage of a digest prevents post-dating of the recording after the time the digest was received by the repository. Meanwhile, the interaction of the challenge with the evidence being recorded presents a formidable obstacle to real-time falsification of the scene or system, forcing the would-be falsifier to simulate or render the effects of this interaction in the brief time interval between arrival of the challenge and archiving of the digest at the repository.