Why Traditional Machine Learning is relevant in LLM Era?

--

Day to day, we are witnessing a significant adoption of LLMs in academia and industry. You name any use case, and the answer is LLMs. While I’m happy about this, I’m concerned about not considering traditional machine learning and deep learning models like logistic regression, SVM, MLP, LSTMs, autoencoders, etc., depending on the use case. As we do in machine learning by first getting it done with a baseline model and developing on top of it, I would say if the use case has the best solution with a small model, we should not be using LLMs to do it. This article is a sincere attempt to give some ideas on when to choose traditional methods over LLMs or the combination.

“It’s good to choose a clap to kill a mosquito than a sword”

Data:

  • LLMs are more hungry for data. It is important to strike a balance between model complexity and the available data. For smaller datasets, we should go ahead and try traditional methods, as they get the job done within this quantity. For example, the classification of sentiment in a low-resource language like Telugu. However, when the use case has less data and is related to the English language, we can utilize LLMs to generate synthetic data for our model creation. This overcomes the old problems of the data not being comprehensive in covering the complex variations.

Interpretability:

  • When it comes to real-world use cases, interpreting the results given by models holds considerable importance, especially in domains like healthcare where consequences are significant, and regulations are stringent. In such critical scenarios, traditional methods like decision trees and techniques such as SHAP (SHapley Additive exPlanations) offer a simpler means of interpretation. However, the interpretability of Large Language Models (LLMs) poses a challenge, as they often operate as black boxes, hindering their adoption in domains where transparency is crucial. Ongoing research, including approaches like probing and attention visualization, holds promise, and we may soon reach a better place than we are right now.

Computational Efficiency:

  • Traditional machine learning techniques demonstrate superior computational efficiency in both training and inference compared to their Large Language Model (LLM) counterparts. This efficiency translates into faster development cycles and reduced costs, making traditional methods suitable for a wide range of applications.
  • Let’s consider an example of classifying the sentiment of a customer care executive message. For the same use case, training a BERT base model and a Feed Forward Neural Network (FFNN) with 12 layers and 100 nodes each (~0.1 million parameters) would yield distinct energy and cost savings.
  • The BERT base model, with its 12 layers, 12 attention heads, and 110 million parameters, typically requires substantial energy for training, ranging from 1000 to 10,000 kWh according to available data. With best practices for optimization and a moderate training setup, achieving training within 200–800 kWh is feasible, resulting in energy savings by a factor of 5. In the USA, where each kWh costs $0.165, this translates to around $165 (10000 * 0.165) — $33 (2000 * 0.165) = $132 in cost savings. It’s essential to note that these figures are ballpark estimates with certain assumptions.
  • This efficiency extends to inference, where smaller models, such as the FFNN, facilitate faster deployment for real-time use cases.

Specific Tasks:

  • There are use cases, such as time series forecasting, characterized by intricate statistical patterns, calculations, and historical performance. In this domain, traditional machine learning techniques have demonstrated superior results compared to sophisticated Transformer-based models. The paper [Are Transformers Effective for Time Series Forecasting?, Zeng et al.] conducted a comprehensive analysis on nine real-life datasets, surprisingly concluding that traditional machine learning techniques consistently outperformed Transformer models in all cases, often by a substantial margin. For those interested in delving deeper. Check out this link https://arxiv.org/pdf/2205.13504.pdf

Hybrid Models:

  • There are numerous use cases where combining Large Language Models (LLMs) with traditional machine learning methods proves to be more effective than using either in isolation. Personally, I’ve observed this synergy in the context of semantic search. In this application, the amalgamation of the encoded representation from a model like BERT, coupled with the keyword-based matching algorithm BM25, has surpassed the results achieved by BERT and BM25 individually.
  • BM25, being a keyword-based matching algorithm, tends to excel in avoiding false positives. On the other hand, BERT focuses more on semantic matching, offering accuracy but with a higher potential for false positives. To harness the strengths of both approaches, I employed BM25 as a retriever to obtain the top 10 results and used BERT to rank and refine these results. This hybrid approach has proven to provide the best of both worlds, addressing the limitations of each method and enhancing overall performance.

In conclusion, based on your usecase it might be a good idea to experiment traditional machine learning models or hybrid models keeping in consideration of interpretation, available data, energy and cost savings along with the possible benefits of combining them with llms. Have a good day. Happy learning!!

Thanks to all blogs, generative ai friends bard, chatgpt for helping me :)

Until next time, cheers!

--

--

My interest lies in software, machine learning, entreprenuership, shiva, comedy, travel. Attention to detail and passion for life are what you see in my blogs.