Enterprise large language models -- small and nimble is smart

Presented by Intel

The astonishing natural language capabilities of generative AI, built with large language models (LLMs), brought AI decisively into the public consciousness. These mammoth model types may be the most consequential invention of our generation. Yet, paradoxically, the future of AI is that dozens of open-sourced LLMs are leading to tens of thousands of domain-specific LLMs in the enterprise.

LLM-based AI services can automate routine tasks and act as our productivity assistants. However, for AI to address the most complex problems, enhance an organization’s core mission, and enrich and personalize consumer services, LLMs will need to get specialized. We agree with other industry experts¹that for most organizations, most of the artificial intelligence will be delivered by nimble expert models running on existing enterprise IT, edge and client infrastructure.

Do LLMs give a competitive advantage?

LLMs of 100’s of billions of parameters are trained with web-scale data on data center scale clusters. Such models are utilized to build versatile AI platforms for general queries and are deployed by a cloud provider or AI services company. They cost hundreds of millions of dollars to develop² and tens of millions to run in perpetuity. These large models are generalists and good for non-proprietary results derived from public data. Since most organizations will use the same gen AI services with an API call, the only advantage they provide is keeping up with the competition.

For organizations to enrich distinctive products and services, enhance customer engagement and improve cost structure faster than the competition, they require highly accurate and timely models trained on domain-specific private data to avoid mistakes, bias and loss of reputation. The more sophisticated the use case, the more precise the model needs to be and therefore the criticality to implement and include your own data. Large models are cumbersome and inefficient to keep current to run mission critical enterprise applications. Small and nimble is better.

Fortunately, there are open-source, pretrained, small LLMs that are 10x to 100x smaller than the large foundational models with hundreds of billions of parameters and with high accuracy.

These models can be further fine-tuned and used with retrieval augmentation generation (RAG) methods to train with your private data in minutes to hours to produce reliable expert models customized for your business. Instead of spending tens of millions of dollars and months of time, organizations can develop a model over lunch and deploy on servers running the application. This is the only economical and sustainable approach to scale AI into every application.

A foundational LLM model knows about business, but a nimble, domain specific LLM knows about your business.

Large foundation AI model and services

Advantages	Disadvantages
Incredible versatility Compelling results Fast integration via APIs Web-scale data sets	Complex management Expensive to train, run and update Hallucination, bias and black box results Security is not transparent Data sources often unknown

Small language model ecosystem

Advantages	Disadvantages
Smaller with improved accuracy Preserve data privacy and security Source attribution and explainability More economical to fine-tune and deploy	Models require few-shot fine-tuning Retrieval models need indexing of all source data Reduced range of tasks

Why enterprises will manage their own LLMs

Ultimately most organizations will use API services for routine, non-differentiated tasks and private AI models for those use cases more specific to their business. To choose which AI models to self-manage, consider the following:

Data privacy — Protect sensitive information and competitive advantage; comply with data governance regulations
Accuracy — Reliably run mission critical applications and protect reputation
Explainability — Trace results back to the data before making major decisions and ongoing monitoring for consistency
Cost — Less expensive to self-operate persistent models on existing IT infrastructure
Proximity — Co-locate model with the application to ensure a natural human response time
Integration — Deploy within the existing business logic and IT decision making systems

Understand your requirements and model options

AI is erroneously marketed as dedicated applications on segregated clusters in a performance race to be #1 on a leaderboard. We believe AI will eventually be a function infused in every application and deployed on existing IT infrastructure, but it’s important to understand your data, your use case requirements, and your AI model and deployment options. Some enterprises, with large amounts of data and differentiated business tasks, will want to build and train their own large language models. The majority of enterprises, however, will use many nimble and more agile open-source models for specific tasks such as customer service or order-taking chat agents.

AI everywhere implies everything that computes will need to accelerate AI up to the point of the work that is required by the application. Models will be obtained from the open-sourced ecosystem and fine-tuned with private data or delivered integrated with commercial software. Most of the work to build a production-ready use case is not just in the LLM; it is in the pipeline and orchestration from data ingestion, storage and processing to inference serving, validation and monitoring. For these reasons, you need a compute platform that can serve data prep, model building and deployment.

Intel has an end-to-end AI platform that includes the Intel® Gaudi® accelerator for best cost effective performance (Nvidia shows Intel Gaudi 2 is 4x better performance per dollar than its H100), and 5^th Gen Intel® Xeon® general-purpose CPU with built-in AI features for the smaller LLMs, other AI workloads such as machine learning, and the AI data pipeline.

Models: Automated model recipes and optimization for tens of thousands of models for AI Platforms like Hugging Face, code-sharing repositories like GitHub and Gaudi Developer Hub
Software: Intel® Gaudi® Software and Intel AI software suite tested with over 400 AI models with industry standard frameworks and libraries
Enterprise-Ready: Validated fine tuning and production AI models using VMware Private AI and Redhat OpenShift on Xeon-based OEM servers
Experiment with LLMs and generative AI with a free trial on Intel Developer Cloud

Your generative AI journey begins now

The enterprise journey begins by identifying the business use case — whether it’s to save costs from streamlined operations, increase revenue from better customer experience and building new products and services, or removing mundane repetitive tasks for happier and more productive employees.

The developer journey starts with an open-source LLM or use case specific model, data requirements, software tools and the appropriate AI platform for maximum cost performance and ease of use. Try it out below.

Download Intel open source generative AI models on Hugging Face.

Interested in learning more about AI models and generative AI? Sign up for a free trial on Intel Developer Cloud.

Learn more about generative AI in our monthly GenAI webinar series.

Jordan Plawner is Senior Director, Global AI Product at Intel.

Susan Lansing is Senior Director, Gaudi Accelerator at Intel

Sancha Norris is AI Product Marketing Strategist at Intel.

Footnotes:
1. Large language models: OpenAI CEO sees ‘end of an era’ in number of parameters — https://the-decoder.com/large-language-models-openai-ceo-sees-end-of-an-era-in-number-of-parameters/
2. OpenAI’s ChatGPT is costing the company a shocking amount of money — https://www.tweaktown.com/news/91375/openais-chatgpt-is-costing-the-company-shocking-amount-of-money/index.html

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Do LLMs give a competitive advantage?

Why enterprises will manage their own LLMs

Understand your requirements and model options

Your generative AI journey begins now

The AI insights you need to lead