The majority of today’s Large Language Models (LLMs) predominantly operate as generalists, inclusive current generation Mixture of Experts (MoE). This often causes them to suffer from being ”the averages of the average” of all underlying training data.
However, we’re observing a significant growth in effectiveness of domain-specific language models. A big bonus of these models is that when trained on carefully managed and controlled data sets, they don’t require any limiting heuristics, extensive fine-tuning, or human validation to generate reliable, responsible outputs.
Oxide is actively working on specilized language models (SLMs) for financial domains. This process is highly automated due to our large-scale data repository, specific to various verticals and collected over multiple years.
Enhancing language models with ability to create plans boosts their applicability in producing results for more complex requests. This is especially relevant given constraints in context size and necessity to retain focus in longer input sequences.
To formulate more exact plans suitable for use by computational AI, Oxide’s approach is training and fine-tuning of SLMs & LLMs. This process utilizes highly detailed explanations and reasoning chains. Our Oxogen AI service runs many reasoning agents that produce extensive explanations down to mathematical computational models as well as every piece of information used in the process. We leverage a combination of comprehensive explanations and formal computational plans for the effective training of our language models.
Nearly all machine learning methodologies depend on some form of optimization technique when dealing with high-dimensional spaces. While Gradient Descent Search (GDS) is often the most common strategy, there’s a vast spectrum of available optimization models. For very large optimization problem, varying degrees of randomness are used to explore the space, as carrying out an exhaustive search becomes computationally expensive.
The game-changer lies in being able to compute an approximate global maximum/minimum or quickly identify multiple sub-optimal areas within the space, similar to Pareto Fronts. Knowing specific properties of the search space is a huge advantage in determining search directionality (where to go) and especially, the point of converge (when to stop exploration). In fact, this not only refines our approach, but also alters the foundations of ML significantly.
At Oxide, we’re researching optimization techniques from several perspectives, spanning from the classification of fitness landscapes and problem hardness determination all the way to future possibilities with using Quantum Computers for optimization.
The first generation of LLMs had the huge benefit of being trained on human-generated source data. This will change for future generation models, as they could be influenced by AI-generated data, thereby inheriting bias and errors. The issue is accentuated for LLMs that represent facts and relations about the real world. Although, for some forms of training, this method may still hold viability.
As a company committed to understanding real-world dynamics in financial markets, Oxide places great value on fully understanding evidence and facts. Our research and development efforts are focused towards detecting AI-generated content, providing us with an extended perspective on data. This complements our existing criteria for authority, coverage, and timing, among others. To train AI models to recognize AI generated content, we leverage our big data sets from the time before LLMs became broadly available. Combining our refined historical data with historical source data, we are able to detect changes in data with high precision.
Next token prediction has certainly transformed the field by showcasing amazing language capabilities. Some decades ago, we witnessed a similar trend in the chatbot landscape with the emergence of statistical language models, involving Hidden Markov Models. This approach used a similar next word prediction method, often incorporating controlled randomness to create intriguing textual variations.
Today’s transformer models are more sophisticated, employing semantic embedding and attention mechanisms as opposed to simple probabilistic state changes for guessing next word. However, it’s worth noting that LLMs entail a significant computational load for both training and inferencing.
Autoregressive learning, also referred as sequence intelligence within the numerical statistical domain, is similar to transformers. However, differences exist in using convolution rather than attention, and normally foregoing the use of any embeddings via an encoder. Yet, recognizing “next token prediction” as a universal property of a learning system can ignite new interesting solutions beyond transformers and LLMs as we know them today. Oxide AI is working on this generalization.
LLMs tend to have slow performance, with the output token stream often measured on a scale of seconds for a few hundred generated tokens. In order for AI models to explore large spaces of potential computational models, the generative and test processes need to be many orders of magnitude faster.
Another issue with LLMs producing code or models, is that they are confined within the boundaries of already known code constructs they were initially trained on. Although some random variation could induce a certain a level of “innovation”, it falls short of the creativity we need to create value for financial markets.
At Oxide, we actively use fast code generation techniques, combining them with the slower paced LLMs. This approach points towards a promising direction in AI development.
Even smaller or medium-sized deep neural networks possess attractive properties of approximating any function (a property known as universal approximation). The potential to architect entire networks based on modular units or “neural circuits” demonstrates great results, since we don’t need to train extensive networks all at once.
Oxide is using evolutionary computations to generate deep neural networks with unrestricted topology. With weights being simultaneously optimized, we research the potential of achieving for highly compressed networks.