26 weeks of AI marathon

Jul 23

Marathons are 26.2 miles or 42.2 km (for my friends using metric system) and in last 26 weeks (more or less) we are witnessing an array of significant announcements in the realm of artificial intelligence and foundation models.

Since the release of chatGPT in late November 2022, if we liken it to an "iPhone moment," the past six months have been characterized by an overwhelming influx of announcements regarding generative AI from numerous research labs and technology companies worldwide.

This article dive deep into the pre-chatGPT era, the developments following its release, and the profound impact it has had on the world of generative AI.

The research conducted in recent years has made significant contributions to the current state of generative AI. Most notable are 1) Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Radford et al. (2016), 2) Attention is all your need, Vaswani et al. (2017), 3) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al. (2018) , 4) Language Models are Few-Shot Learners, Brown et al. (2020)

Before chatGPT many companies have developed and released foundation models, starting with very popular, Google's BERT in 2018 to OpenAI's GPT-3 in 2020, DALL-E in 2021 and many others. The pace accelerated in 2022 with OpenAI's DALL-E 2, Google's Imagen, PaLM-540B and Chinchilla-70B. When Stability AI launched their open-source text-to-image model, Stable Diffusion, in August 2022, it captured public's attention. Though the release of chatGPT on November 30, 2022, marked an "iPhone moment" in many ways. It revealed the power of foundation model to the tech-savvy consumer to begin with.

After chatGPT, there are many research got published, notably in February Amazon's work on multimodal chain-of-thought reasoning in language model demonstrated that model under 1 billion parameters outperforms the previous state-of-the-art LLM (GPT-3.5) by 16 percentage points (75.17%->91.68% accuracy) on the some of the benchmarks, such as ScienceQA. This got reinforced by Meta demonstration of LLaMA models ranging from 7B-65B parameters, where LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B.

It proved that bigger models may not be necessary as long as you are training decent size models with large numbers (trillions) of tokens.

Fast forward couple of weeks and we got avalanches of generative models in March:

Anthropic's Claude: Started as better version of chatGPT in text processing and conversational activities with less toxicity, hallucination and bias. First hand experience confirmed that the output of Claude is delightful to read.
Google released BARD, built on top of PaLM 2, an AI-powered chatbot for simulating human conversations - much needed interface to compete against chatGPT.
Adobe launched their Firefly for the creative professionals, making it easier for them to create content.
Bloomberg debuted the market with one of it's own kind, domain-specific model, trained from scratch for financial services on AWS, a 50B parameter BloombergGPT model, outperforming many open source model on financial domain task. This further proved that smaller models with the right datasets (in this case financial domain data) can outperforms larger models with generic datasets. P.S. I was fortunate enough to work with Bloomberg team on orchestrating BloombergGPT on AWS.

Real excitement begun in April when we got couple of open model launches one from Databricks about making Dolly 2.0 open-source and Stability AI made StableVicuna available as large-scale open source chatbot trained via reinforced learning from human feedback (RLHF). Though one thing made the biggest news in the month of April was announcements around Amazon Bedrock and Amazon Titan models.

Amazon Bedrock: A brand new service for AWS customers to avail multiple foundation models from Amazon's own Titan models as well as from the 3rd party partners - such as AI21, Anthropic and Stability AI models, accessible via just an API. This allows customers to build and scale generative AI applications using all these foundation models in a very easy way and in a secure manner.

Though I am biased being Amazonian, one thing clearly separates out Amazon Bedrock from other providers is the security and privacy of customer data. Security is job zero at Amazon and I do see that here with Bedrock as well. Amazon Bedrock's default state is not using customer data to train the original base model. Also I liked the idea of bringing model where the data is - in customer's AWS account, instead of taking your data (to the provider) where the model is (hosted).

Amazon Titan: At the launch, Amazon Titan offers two models - text and embeddings. Titan Text is a generative large language model, like GPT-3.5 while Titan Embeddings translates text inputs into numerical representations that contain the semantic meaning of the text. Titan embedding is useful for applications like personalization and search.

Again, I maybe biased but I truly appreciate the responsible AI aspect of the Titan model where the foundation models are built to detect and remove harmful content in the data, reject inappropriate content in the user input, and filter model outputs that contain inappropriate content (such as hate speech, profanity, and violence).

Are you still with me? We just crossed the April!

All the excitement in last 3-4 months generated enough noise that in the month of May, regulators wanted to hear from some of the AI leaders and have been considering rules for AI. Regulation takes time to shape up but these hearing did pose few questions around: Who owns the data generated by the models? How much influence will generative AI will have in US President's election in 2024?

While we were listening to senate hearings, Anthropic's science team was working very hard and they launched the largest context window for any model - 100,000 tokens - roughly equivalent to 75,000 words. By comparison, GPT-4 has 32,768 tokens. Those who don't know the context window is, context is the length of the longest sequence of tokens that a LLM can use to generate a token.

Anthropic demonstrated this by loading the Great Gatsby book, which they had modified one line in, into Claude-Instant . They then directed the model to identify the new sentence which it did in 22 seconds. By comparison, it takes a person 5 - 6 hours to read a book that size. Just as impressive is that Claude can ingest up to 6 hours of audio using 100k tokens, opening it up for all sorts of use cases.

If this was not enough in the month of May, Technology Innovation Institute (TII) out of Abu Dhabi made their Falcon family of models (7B and 40B) completely open source. These models trained with 1 trillion token on AWS, performing best with staying on top of Huggingface leaderboard for few weeks now (as of this writing). Exciting part of this news is, models being open-source (with Apache 2.0 license), anyone can take that model and customize to fit their organization need and own the copyrights of the new model. Considering this is top performing model, the newly created model with your own domain data, it should perform even better for your domain-specific tasks. Amazon SageMaker made it even easier to fine-tune by making it available via Jumpstart. P.S. I was fortunate enough to work with TII team in helping them build Falcon models on AWS

June and July brought in many more excitements when Stability AI made their Stable Diffusion XL (SDXL 0.9) available -- which allows it to run on consumer GPU, with Windows 10 or 11, Linux OS with AMD or NVIDIA GPU. It provides ability to generate hyper-realistic creations for films, television, music, and instructional videos, as well as offering advancements for design and industrial use cases. In fact, the title image of this post is created with SDXL 0.9 via clipdrop.

Anthropic is hard at work - in mere five months after the debut of Claude, its successor Claude 2.0 got launched in first week of July which boasts longer responses, nuanced reasoning, and superior performance, scoring impressively in the GRE reading and writing exams.

While I may have neglected many other advancements around these generative AI models, there are many research happening and being productionized:

lang-chain - it's a framework for developing applications powered by these foundation models
plugins for chatGPT: mainly allows developers to build a plugin so chatGPT can access live data from the plugin provider.
prompt engineering: be a master of interacting with large foundation model by asking the right prompt
RAG (Retrieval Augmented Generation): accessing the external knowledge source so fact based information can be given out to the end user.

All these hard work by researchers, technology companies and many people is making real business impact. As predicted by McKinsey:

Generative AI’s impact on productivity could add trillions of dollars in value to the global economy. Our latest research estimates that generative AI could add the equivalent of $2.6 trillion to $4.4 trillion annually across the 63 use cases we analyzed—by comparison, the United Kingdom’s entire GDP in 2021 was $3.1 trillion. This would increase the impact of all artificial intelligence by 15 to 40 percent. This estimate would roughly double if we include the impact of embedding generative AI into software that is currently used for other tasks beyond those use cases.

While it's hardly six months (or 26 weeks) in 2023 we have seen so much advancements in AI, in particular generative AI, what am I most excited about is upcoming 26 weeks of 2023 and beyond. We will see many industrial use cases where customers are solving real business problems and realizing gain either in productivity, efficiency or cost.

The AI journey has just begun. What are you excited about?

(originally published by the author in LinkedIn)

Ritesh Vajariya

26 weeks of AI marathon

14 must know generative AI terms

A strategy for leaders to implement AI-driven change