Presented by Dell
As generative AI burst onto the scene a year ago, technologists quickly became enthralled by the power of large language models (LLMs), which drummed up human-like answers to queries.
Yet as is typical in technology, big things gradually come in smaller packages. Mainframes begat client servers. PCs eventually shared workloads with tablets and cell phones as the demand for mobile computing soared.
Some of the same trends are happening with gen AI software. The biggest driver? The ability to put lightweight yet powerful gen AI services on smaller devices — akin to how organizations mobilized versions of their applications more than a decade ago.
Naturally, the rush to shrink models has also added to the confusion IT leaders already face over which model to choose. But there is a playbook for choosing a small language model (SLM).
The LLM vs. SLM bake off
But first: A crash course in the differences between LLMs and SLMs. First, a disclosure: this is an imprecise exercise, as there is no universally accepted standard of delineation between LLMs and SLMs.
Most LLMs span hundreds of billions of parameters, including the weights and biases the algorithms learned during the model’s training process. SLMs typically range from a hundred million to tens of billions of parameters.
LLMs can create a wide array of content from text and images to audio and video, with multimodal systems emerging to handle more than one of the above tasks. They process massive amounts of information to execute natural language processing (NLP) tasks that approximate human speech in response to prompts. As such, they are ideal for pulling from vast amounts of data to generate a wide range of content, as well as conversational AI tasks.
This requires a significant number of servers, storage and the all-too-scarce GPUs that power the models — at a cost some organizations are unwilling or unable to bear. It’s also tough to satisfy ESG requirements when LLMs hog compute resources for training, augmenting, fine-tuning and other tasks organizations require to hone their models.
In contrast, SLMs consume fewer computing resources than their larger brethren and provide surprisingly good performance — in some cases on par with LLMs depending on certain benchmarks. They’re also more customizable, allowing organizations to execute specific tasks. For instance, SLMs may be trained on curated data sets and run through retrieval-augmented generation (RAG) that help refine search. For many organizations, SLMs may be ideal for running models on premises.
Downsizing models is catching on among hyperscalers and startups alike, many of which have rolled out smaller, more compact models intended to run on mobile devices, such as laptops and even smartphones.
Within a single week in December, Google unveiled its Gemini class of products, including the compact Nano model, while Mistral AI and Microsoft followed suit with their Mixtral 8x7b and Phi-2 models, respectively. In February, Google unveiled its compact Gemma models.
Choosing the right model requires the right approach
Choosing between an LLM and SLM may come down to how many parameters will cover your prompting ground and how much money you can spend to operate a model. How do you know if an SLM is right for your organization? These steps can help.
Evaluate business needs: What business problems are you solving for? Maybe you need a new chatbot for customer care. Maybe you want to help your sales and marketing teams create content more efficiently. Or perhaps your software developers can use an extra pair of virtual hands (and algorithmic brains) as they write code. Identifying use cases is a critical first step.
Research the market: Assess your options for fit across your current resources, including people, process and technology. Consider model size, performance relative to the business tasks you want to execute, as well as the quality of data you expect to use to augment, train and fine-tune the SLMs. Will your prospective solution afford you the ability to scale up and preserve your security requirements?
Conduct a model bakeoff: Run test-and-learn pilots with SLMs you favor to see how they stack up. How are the resulting benchmarks for model accuracy, generalization, interpretability and speed? Which models step up and which fall across those dimensions?
Assess resource requirements: This is a hard one for many organizations that may have not implemented an AI system of this scale before — especially not on premises. But you must gauge server and storage needs, GPU options and the associated costs with all of them. Also: Should you implement observability and AIOps to analyze the outputs relative to desired business outcomes?
Craft a deployment strategy: Develop a cohesive deployment strategy for the SLM you chose, considering such critical details as integration between existing systems, security and data privacy, as well as maintenance and support. If you picked a public model, how will you ensure it’s supported? If you chose an open-source model, how are you keeping up with any changes to it?
One more thing
This space is moving incredibly fast. If you don’t stop to look around once in a while you might miss something.
Fortunately, a growing ecosystem of partners can help you pick the right model and land on the right infrastructure, adoption frameworks and reference designs for your business. The right partner can help you build the best possible gen AI service for your employees and customers.
What are you waiting for? It’s time to partner up and get building.
Learn more about how Dell APEX for Generative AI can help you bring AI to your data.
Clint Boulton is Senior Advisor, Portfolio Marketing, APEX at Dell Technologies.
Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact