Unleashing the Potential: Exploring the Generative AI Value Chain - Foundation Models

Generative AI has shifted our perspectives of how we look at business and technology as a whole. This has clear impact on how any new business use case will be taken up - quite often it will take lot less time to develop and integrate technologies together - as most of the heavy lifting would done by the new breed of generative AI models, or what we technologist call, Foundation Models.

In this article, I would like to highlight some of the basics about Foundation Model. Before we dive into foundation models, let's first understand where foundation models fits into the entire value chain of Generative AI.

Generative AI value chain:

As outlined in this diagram, value chain of generative AI involves multiple parties:

  • Hardware manufacturers: NVIDIA GPUs are widely used here but others such as Google TPU and Amazon's own Traininum and Inferentia accelerators are also gaining traction in the market.

  • Cloud providers who offer hardware at scale include Amazon Web Services (AWS), Google Cloud Platform (GCP), and Azure. Being Amazonian, I suggest starting with AWS, but you can also consider other providers. Alternatively, if you have sufficient hardware in your own data center, you can begin building the model.

  • Platform tooling plays a crucial role in enhancing the overall model creation process. While I may be biased towards Amazon SageMaker, there are various other platforms available that offer an easy starting point for model creators.

  • Foundation model: As a model creator, once you are able to gather all the required data, have enough GPU/accelerators, you start Training the model. Evaluate the model for the accuracy you need and in few weeks (or so you think!) you have a highly performant model.

  • Once you have the model, you need a hub which allows your consumers to start experiment and consume the model - either doing zero shot, few shots learning or in some cases you may want to have your consumers create a custom models. Quite often Amazon services like SageMaker Jumpstart Foundation Model Hub or Hugging Face Model Hub would help accelerate adoption.

  • Applications: once the model you want to use is available for consumption - either as model weights, endpoint or via some API mechanism - application builders will start building wide range of applications using these models. There maybe many providers offering capabilities for application developers to consume the model, I really love the idea of making API call to multiple foundation models - as different applications requires different types of foundation models. One model may excel in processing large corpus of text while another in image creation or upscale and another model may be best for conversation like question/answers. So instead of the application teams need to work with multiple providers, how about we go to one provider which gives you access to multiple model via a single API call? Welcome Amazon Bedrock! Amazon Bedrock is the easiest way for you all to build and scale generative AI-based applications using multiple foundation models, democratizing access for all builders.

  • It is highly likely that single application will not solve your problem and you will need to integrate multiple applications to collaborate and support the actual business use case. This scenario resembles standard micro-service architecture where multiple services will form a larger application. This is where all the toolset of bundling things together or making your foundation model more performant by infusing "lang-chain" or "RAG" etc. More on that later.

  • Above all, you want to operate your AI workload responsibly. There are multiple area of Responsible AI but pay attention to 1/Privacy & Security , 2/Fairness, 3/Explainability, 4/Robustness, 5/Transparency, and 6/Governance.

    In different articles, I will touch upon different aspect of value chain but something unique here is Foundation Model - so let's dive a bit deeper into what is this thing Foundation Model is? How is it different from other form of models being created for last few years as a part of overall AI/ML work.

Foundation Model:

Foundation models is inspired by how our human brain has billions of neurons connected. In a similar fashion, using artificial neuron network, a large corpus of unlabeled data is provided to these neural network and foundation models are built. Foundation models are part of what we call deep learning, a term that relates to the many deep layers within neural networks. Deep learning has powered many of the recent advances in AI, but the foundation models is evolutionary within deep learning. Unlike previous deep learning models, they can process extremely large and varied sets of unstructured data and perform multiple tasks from creating texts, summarizing text, questions-answering to analyzing medical images and much much more. Foundation models have enabled new capabilities and vastly improved existing ones across a broad range of modalities, including images, video, audio, and computer code.

Quite often these foundation models are trained with public datasets sourced from Internet - including platforms like Wikipedia, Twitter, Reddit, and through web crawling and scraping. As a result, model can only provide information what it learned from these datasets. However, there are other mechanisms available, such as RAG (Retrieval-Augmented Generation) and LoRa, which enable us to retrieve the latest information and incorporate it into the model, thereby improving the quality of the answers.

Alternatively, there is the option of creating a domain-specific foundation model. I had the privilege of collaborating with Bloomberg, who developed a bespoke foundation model called BloombergGPT specifically for the financial services domain. As documented in their arxiv.org paper, they meticulously trained their model using a combination of datasets. While a portion of the data originated from the public domain, such as Wikipedia, the majority of the datasets utilized were finance-oriented documents. As a result, their model demonstrated superior performance compared to generic foundation models when handling various financial tasks.

In addition to the discussion on foundation models, it is important to note the distinction between proprietary and open-source models. Proprietary foundation models, like GPT4 from Open AI, Claude from Anthropic, or our own Titan model from Amazon, typically do not allow direct interaction with the underlying "neurons" by default. There are additional fine-tuning capabilities are provided by the model creators. In contrast, there are publicly available models or what we commonly refer them as open-source models - such as Falcon (7B, 40B), Stable Diffusion, Bloom models - where you can "interact" with the neural network and customize and even own the intellectual property, if their license are permissive enough (such as Apache 2.0, MIT).

So with all these foundation models, one may think as an enterprise should I build my own foundation model?

Like any other business decision, the answer is, it depends. Let's understand when you want to build your own foundation model vs. when you want to consume a model vs. when you want to customize an existing model.

1. Should I build my own?: Building a foundation model requires combination of science, art and $$$ - along with time! Yes, it takes 10s of millions to 100s of millions of dollar to build a foundation model from scratch - in terms of GPU, human workforce and potentially acquiring data. Also, it takes time (from few weeks to few months, or even years!) to build a very good model. You can shrink the time by brining additional resources! This is helpful for companies who want to differentiate themselves, stay ahead of competition, reduce toxicity and potentially monetize the model by either making it available in the marketplace or building new set of products on top of this foundation model.

2. Should I consume a foundation model?: When considering the adoption of a foundation model, it's important to evaluate its suitability for your specific use cases. While many proprietary foundation models have achieved impressive performance, it's crucial to recognize that they are trained on "generic" datasets rather than "domain-specific" ones. Consequently, they excel at providing general answers but may lack the necessary "knowledge" or understanding of industry-specific nuances, such as healthcare vocabulary in the case of the healthcare industry. Consequently, the accuracy of their responses may be compromised in such specialized contexts. Additionally, operating responsibly is paramount, as foundation models may occasionally generate toxic or hallucinated answers. In such instances, it becomes your responsibility to filter out and eliminate any inappropriate or misleading responses.

3. Should I customize an existing model?: It is a valid consideration and can be seen as a middle ground solution, especially when you possess a domain-specific dataset and desire the model to align with your specific requirements. In such cases, it is recommended to leverage an existing pre-trained model (proprietary or open-source, depending on your IP retention policy) and perform fine-tuning using your own datasets. The amount of data required for fine-tuning will depend on the size and complexity of the pre-trained model, allowing you to create your personalized version of a fine-tuned foundation model.

In terms of time it takes to use any of these options - the quickest option is 2 (consuming a foundation model). While the middle ground of option 3 is where many of the enterprises will fall into, it does involve some $$ spend and some time (few days to weeks), and I am sure you know by now that option 1 (building your own) is the most expensive and time consuming effort.

To summarize, foundation model is the heart of generative AI. They are the driving force behind all the adoption of generative AI. There are multitude of foundation models available in the marketplace - proprietary and open-source alike. As more and more research is happening in this domain and more and more money is being infused, we should expect better quality, better accuracy models in near future.

Previous
Previous

Meta's Llama2 and it's impact