15 minute read

All Resources to Learn Data Science

Model Garden of VertexAI:

Unlocking the Power of Google’s VertexAI: Exploring the World of Pre-Built Models for AI Tasks

Introduction:

Artificial Intelligence (AI) has transformed numerous industries, from healthcare and finance to e-commerce, logistic, eduction and entertainment. But the complexity of developing machine learning models often poses a challenge. As the demand for AI-powered solutions continues to rise, data scientists seek efficient ways to leverage pre-trained models or build custom models to address specific tasks. In this regard, Google’s VertexAI emerges as a robust platform that offers an extensive selection of pre-built models for a wide range of AI tasks. VertexAI platform has revolutionized the landscape by seamlessly leveraging LLM (Large Language Models) and Prompt Engineering techniques to perform complex machine learning tasks effortlessly. With VertexAI, data scientists can harness the power of state-of-the-art language models, such as LLM, to accelerate their ML development process. Additionally, the innovative concept of Prompt Engineering enables users to effectively communicate with the models, guiding them to deliver precise and accurate results. From computer vision and natural language processing to speech processing and structured tabular data analysis, Vertex AI’s repertoire includes over 100 models catering to diverse application domains. This article explores how Vertex AI, through its integration of LLM and Prompt Engineering, empowers users to effortlessly tackle intricate machine learning tasks across diverse domains, revolutionizing the AI development experience.

Foundation models:

Pre-trained multi-task models that can be further tuned or customized for specific tasks.

Sno. Name Details Task Name Vision/ Language Input DataType Model Name
1 PaLM 2 for Text Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks, such as: classification, extraction, summarization and content generation. Text Gen. Language Text text-bison@001
2 PaLM 2 for Chat Fine-tuned to conduct natural conversation. Use this model to build and customize your own chatbot application. Text Gen. Language Text chat-bison@001
3 Embeddings for text Text embedding is an important NLP technique that converts textual data into numerical vectors that can be processed by machine learning algorithms, especially large models. These vector representations are designed to capture the semantic meaning and context of the words they represent. Embedding Language Text textembedding-gecko@001
4 Codey for Code Completion Generates code based on code prompts. Good for code suggestions and minimizing bugs in code. Code Gen. Language Text code-gecko@001
5 Codey for Code Generation Generates code based on natural language input. Good for writing functions, classes, unit tests, and more. Code Gen. Language Text code-bison@001
6 Codey for Code Chat Get code-related assistance through natural conversation. Good for questions about an API, syntax in a supported language, and more. Code Gen. Language Text codechat-bison@001
7 BERT Neural network-based technique for natural language processing. Use it to train your own question answering system and more. Text Gen. Language Text google/bert-base-001
8 InstructPix2Pix Given an input image and a text prompt that tells the model what to do, the instruct-pix2pix model follows the prompt to edit the image by generating a new one. Image Gen. Vision, Language Text+Image timbrooks/instruct-pix2pix
9 ControlNet Control image generation with text prompt and control image. Image Gen. Vision, Language Text lllyasviel/ControlNet
10 BLIP2 BLIP2 is for the image captioning and visual-question-answering tasks. Text Gen. Vision, Language Image Salesforce/blip2-opt-2.7b
11 Stable Diffusion 1.4 (Keras) KerasCV implementation of stability.ai’s text-to-image model, Stable Diffusion 1.4. Image Gen. Vision, Language Text keras/stable-diffusion-v1-4
12 Embeddings for Image Generates vectors based on images, which can be used for downstream tasks like image classification, image search, and so on. Embedding Vision, Image imageembedding-001
13 Label detector (PaLI zero-shot) Label Detector Zero-shot classifies images based on labels, represented as a list of text prompt strings, which are provided by the user, and calculates the confidence score of each labelâs presence in the image. Classification Vision, Image imagezeroshot-001
14 Stable Diffusion v1-5 Latent text-to-image diffusion model capable of generating photo-realistic images given a text input. Image Gen. Vision, Text runwayml/stable-diffusion-v1-5
15 Stable Diffusion Inpainting Stable Diffusion Inpainting is a latent diffusion model capable of inpainting images given any text input and a mask image. Image Gen. Vision, Text runwayml/stable-diffusion-inpainting
16 BLIP image captioning A Vision-Language Pre-training (VLP) framework for image captioning. Text Gen. Vision, Image Salesforce/blip-image-captioning-base
17 BLIP VQA A Vision-Language Pre-training (VLP) framework for visual question answering (VQA). Text Gen. Vision, Image Salesforce/blip-vqa-base
18 CLIP Neural network capable of classifying images without prior training on the classes. Classification Vision, Image openai/clip-vit-base-patch32
19 OWL-ViT Zero-shot, text-conditioned object detection model that can query an image with one or multiple text queries. Text Gen. Vision, Text+Image google/owlvit-base-patch32
20 ViT GPT2 Image captioning model Text Gen. Vision, Image nlpconnect/vit-gpt2-image-captioning
21 ViLT VQA Vision-and-Language Transformer (ViLT) model fine-tuned on VQAv2. Text Gen. Vision, Image dandelin/vilt-b32-finetuned-vqa
22 LayoutLM for VQA Fine-tuned for document understanding and information extraction tasks like form and receipt understanding. Info. Extraction Vision, Scan Doc impira/layoutlm-document-qa
23 T5-FLAN T5 (Text-To-Text Transfer Transformer) model with the T5-FLAN checkpoint. Text Gen. Language Text google/t5-flan-001
24 Sec-PaLM2 The sec-palm model is a foundational model that has been pretrained on a variety of security-specific tasks. The model has broad security understanding across a number topics, such as threat intelligence, security operations, and malware analysis. It is ideal for analyzing, summarizing, and aggregating information across multiple security data sources, as well as generating rules and search queries from natural language input. Info. Extraction Language Text google/sec-palm-000
25 Chirp Chirp is a version of a Universal Speech Model that has over 2B parameters and can transcribe in over 100 languages in a single model. Speech Gen.   Speech chirp-rnnt1

Fine-tunable models :

Models that data scientists can further fine-tune through a custom notebook or pipeline.

Sno. Name Details Task Name Vision/ Language Input DataType Model Name
1 Stable Diffusion Inpainting Stable Diffusion Inpainting is a latent diffusion model capable of inpainting images given any text input and a mask image. Image Gen. Vision, Language Text runwayml/stable-diffusion-inpainting
2 ControlNet Control image generation with text prompt and control image. Image Gen. Vision, Text+Image lllyasviel/ControlNet
3 tfhub/EfficientNetV2 EfficientNet V2 are a family of image classification models, which achieve better parameter efficiency and faster training speed than prior arts. Classification Vision, Image tensorflow-hub/efficientnetv2
4 tfvision/vit The Vision Transformer (ViT) is a transformer-based architecture for image classification. Classification Vision, Image tfvision/vit-s16
5 tfvision/SpineNet SpineNet is an image object detection model generated using Neural Architecture Search. Detection Vision, Image tfvision/spinenet49
6 tfvision/YOLO YOLO algorithm is a one-stage object detection algorithm that can achieve real-time performance on a single GPU. Detection Vision, Image tfvision/scaled-yolo
7 DeepLabv3+ (with checkpoint) Semantic segmentation is the task of assigning a label to each pixel in an image, where each label corresponds to a specific class of object or scene element. Segmentation Vision, Image deeplabv3plus-cityscapes-20230315
8 ResNet (with checkpoint) Image classification model as described in the paper “Deep Residual Learning for Image Recognition”. Classification Vision, Image resnet50
9 ResNet-RS (with checkpoint) Image classification model as described in the paper “Revisiting ResNets: Improved Training and Scaling Strategies”. Classification Vision, Image ResNet-RS-50
10 Faster R-CNN (Detectron2) Faster R-CNN is a deep convolutional network used for image object detection. Detection Vision, Image detectron2/faster-r-cnn
11 MobileNet (TIMM) Small but powerful models optimized for mobile and embedded vision applications. Classification Vision, Image timm/mobilenetv2_100
12 EfficientNet (TIMM) A family of convolutional neural networks (CNNs) designed to be both accurate and efficient. Classification Vision, Image timm/efficientnetv2_rw_s
13 DeiT A convolution-free transformer for image classification. Classification Vision, Image timm/deit_base_patch16_224
14 BEiT A self-supervised learning framework for image representation learning inspired by BERT. Classification Vision, Image timm/beit_base_patch16_224
15 ViT (TIMM) Transformer-like architecture for image classification. Classification Vision, Image timm/vit_base_patch16_224
16 RetinaNet (Detectron2) RetinaNet is a one-stage object detection model that utilizes a feature pyramid network (FPN) on top of a ResNet and adds a focal loss function to address class imbalance during training. Detection Vision, Image detectron2/retinanet
17 Mask R-CNN (Detectron2) Mask R-CNN is an instance segmentation model which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Detection Vision, Image detectron2/mask-r-cnn
18 ResNet (TIMM) A type of artificial neural network that is made up of residual blocks with skip connections. Classification Vision, Image timm/resnet50
19 ResNeSt (TIMM) An extension of the ResNet architecture that uses a new attention mechanism called split-attention. Classification Vision, Image timm/resnest50d
20 ConvNeXt (TIMM) A pure convolutional model that is an extension of the ResNet architecture that uses a new attention mechanism called Swin Transformer. Classification Vision, Image timm/convnext_base
21 CspNet (TIMM) A type of deep neural network that is an extension of the ResNet architecture that uses a new cross stage partial connection to reduce the number of parameters and computation cost without sacrificing accuracy. Classification Vision, Image timm/cspdarknet53
22 Inception (TIMM) Inception network is a deep neural network with an architectural design that consists of repeating components referred to as Inception modules. Classification Vision, Image timm/inception_v4

Task-specific solutions:

Most of these pre-built models are ready to use off the shelf, and many can be customized using your own data.

Sno. Name Details Task Name Vision/ Language Input DataType Model Name
1 Entity analysis Inspect text to identify and label persons, organizations, locations, events, products and more. Classification Language Text google/language_v1-analyze_entities
2 Content classification Use Google’s state-of-the-art language technology to analyzes text content and returns content categories for the content. The latest version of Content Classification supports over 1,000 categories. Classification Language Text google/language_v1-classify_text_v1
3 Sentiment analysis Sentiment analysis attempts to determine the overall attitude (positive or negative) expressed within the text. Sentiment is represented by numerical score and magnitude values. Classification Language Text google/language_v1-analyze_sentiment
4 Entity sentiment analysis Entity Sentiment Analysis inspects the given text for known entities (proper nouns and common nouns), returns information about those entities, and identifies the prevailing emotional opinion of the entity within the text, especially to determine a writer’s attitude toward the entity as positive, negative, or neutral. Classification Language Text google/language_v1-analyze_entity_sentiment
5 Syntax analysis Syntactic analysis extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), providing further analysis on those tokens. Extraction Language Text google/language_v1-analyze_syntax
6 Text Moderation Text moderation analyzes a document and returns a list of harmful and sensitive categories that apply to the text found in the document. Classification Language Text google/language_v1-moderate_text
7 Text Translation Use Google’s proven pre-trained text model to get text translations for 100+ languages. Translation Language Text Text Translation
8 Occupancy analytics Detect people and vehicles in a video or image, plus zone detection, dwell time, and more. Detection Vision, Image, Video google/occupancy-analytics-001
9 Person/vehicle detector Detects and counts people and vehicles in video. Detection Vision, Video People/vehicle detector
10 Object detector Identify and locate objects in video Detection Vision, Video Object detector
11 PPE detector Identify people and personal protective equipment (PPE). Detection Vision, Image PPE detector
12 Person blur Mask or blur a person’s appearance in video Detection Vision, Video People blur
13 Product recognizer Identify products at the GTIN or UPC level Recognition Vision, Image Product recognizer
14 Tag recognizer Extract text in product and price tags Recognition Vision, Scan Doc Tag recognizer
15 Content moderation (Vision) Content Moderator (Vision) detects objectionable or unwanted content across predefined content labels (e.g., adult, violence, spoof) or custom labels provided by the user. Classification Vision, Scan Doc Content Moderation
16 Face detector (Vision API) Face detector is a prebuilt Vision API model that detects multiple faces in media (images, video) and provides bounding polygons for the face and other facial “landmarks” along with their corresponding confidence values. Detection Vision, Image, Video Face Detector
17 Watermark detector Watermark detector is a prebuilt model that detects watermarks in the input image. Detection Vision, Scan Doc imagewatermarkdetector-001
18 Text detector (Vision API) Text detector detects and extracts text from images. It uses optical character recognition (OCR) for an image to recognize text and convert it to machine coded text. Detection Vision, Language Scan Doc Text Detector
19 AutoML E2E Tabular Workflow for End-to-End AutoML is the complete AutoML pipeline for classification and regression tasks. Classification   Tabular AutoML E2E
20 Document AI OCR processor Document OCR can identify and extract text from documents in over 200 printed languages and 50 handwritten languages. Extraction   Document pretrained-ocr-v1.2-2022-11-10
21 Form Parser Document AI Form Parser applies advanced machine learning technologies to extract key-value pairs, checkboxes, tables from documents in over 200+ languages. Extraction   Document pretrained-form-parser-v1.0-2020-09-23
22 TabNet TabNet is a general model which performs well on a wide range of classification and regression tasks. Classification   Tabular TabNet

Task-specific LLM Prompts :

Customize language model outputs to meet specific needs. Prompts help to refine or enrich the outputs of the large language model selected.

Sno. Name Details Task Name Vision/ Language Input DataType Model Name
1 Object classification Classify an object using a small number of examples (few-shot prompting). Classification Vision, Structured LLM Prompt
2 Kindergarten Science Teacher Your name is Miles. You are an astronomer who is knowledgeable about the solar system. Respond in short sentences. Shape your response as if talking to a 10-years-old. Text Gen. Language Freeform LLM Prompt
3 Online Return Customer Service A customer service chatbot that provides basic customer support and makes decisions on simple tasks Text Gen. Language Freeform LLM Prompt
4 Gluten Free Advisor A chatbot that provides gluten free cooking recipes and diet plans. Text Gen. Language Freeform LLM Prompt
5 Company Information Guide A informative chatbot that has a simple company background and allows customers to ask questions about those products. Text Gen. Language Freeform LLM Prompt
6 Fictional Captain from the 1700s Chat with a fictional character from the 1700s without any modern knowledge. Text Gen. Language Freeform LLM Prompt
7 Support rep chat summarization You are a customer support manager and would like to quickly see what your team’s support calls are about. Summarization Language Freeform LLM Prompt
8 Summarize news article News takes too much time to read. You want a quicker way to get the summary. Let Vertex help you. Summarization Language Freeform LLM Prompt
9 Chat agent summarization You are a customer service center manager and you need to quickly see what your agents are talking about. Summarization Language Freeform LLM Prompt
10 Chat agent follow up You are a customer service center manager. Sometimes your agents forget to note down follow ups. You want to automate follow up lists. Info. Extraction Language Freeform LLM Prompt
11 Transcript summarization Summarize a block of text. Summarization Language Structured LLM Prompt
12 Dialog summarization Summarize a conversation. Summarization Language Structured LLM Prompt
13 Hashtag tokenization Create and tokenize hashtags based on the provided text. Text Gen. Language Structured LLM Prompt
14 Title generation Generate a title based on the provided text. Classification Language Structured LLM Prompt
15 Sentiment analysis about a person You would like to see how reporters write about certain people. You have articles and would like to see if a certain person is written about positivly or negatively. Classification Language Freeform LLM Prompt
16 Customer request classification, few-shot Based on customer your customer’s answer, you want to automate routing of your customer to the proper service queue. Use few-shot learning. Classification Language Structured LLM Prompt
17 Text classification few-shot You are an intern at a library and your job is to classify hundreds of articles every day. You’d rather automate this and do something else. Classification Language Structured LLM Prompt
18 Article classification You are an intern at a library and your job is to classify hundreds of articles every day. You’d rather automate this and do something else. Classification Language Freeform LLM Prompt
19 Classification headline Few shot classification on a given topic. Classification Language Structured LLM Prompt
20 Sentiment analysis Explain the sentiment expressed in a body of text. Classification Language Structured LLM Prompt
21 Pixel Technical Specifications, one-shot Generate technical specification from text of a Pixel phone into JSON, one-shot. Info. Extraction Language Structured LLM Prompt
22 Wifi troubleshooting Given description of the different status lights on the Google WiFi router, what should be the troubleshooting step. Text Gen. Language Freeform LLM Prompt
23 Contract analysis You are a partner of a law firm. Your associates are bored of reading contracts to find specific provisions when they can work on more intellectually challenging tasks. Info. Extraction Language Freeform LLM Prompt
24 Extractive Question Answering Answer questions from given background texts. Text Gen. Language Structured LLM Prompt
25 Marketing generation Pixel You work in Google’s device marketing team and you need to create marketing pitch for the new Pixel 7 Pro. You have writers block and need help. Text Gen. Language Freeform LLM Prompt
26 Ad copy generation You are a marketer and want to create different versions of the same ad to target different audiences. You would like some suggestions. Text Gen. Language Freeform LLM Prompt
27 Essay outline Generate an outline for an essay on a particular topic. Text Gen. Language Freeform LLM Prompt
28 Correct grammar Correct grammar in the text. Text Gen. Language Freeform LLM Prompt
29 Ad copy from description Write an ad copy for something based on a description. Text Gen. Language Freeform LLM Prompt
30 Write emails and letters Write an email or letter based on the specified content. Text Gen. Language Freeform LLM Prompt
31 Reading comprehension test Your child is preparing for SAT verbal exam and needs more practice in reading comprehension. Summarization Language Freeform LLM Prompt
32 Generate memes Generate memes based on a certain topic. Text Gen. Language Freeform LLM Prompt
33 Interview questions Generate a list of interview questions targeting a specific position. Text Gen. Language Freeform LLM Prompt
34 Naming Generate ideas for names of a specified entity. Text Gen. Language Freeform LLM Prompt
35 General tips and advice Get tips and advice on general topics. Text Gen. Language Freeform LLM Prompt

Conclusion:

The realm of AI has witnessed remarkable advancements, thanks to platforms like Google’s VertexAI. By providing a vast array of pre-built models spanning computer vision, natural language processing, speech processing, and ML tasks on structured tabular data, VertexAI has simplified the development of AI solutions for a multitude of tasks. The platform’s comprehensive selection of models empowers data scientists to efficiently tackle image classification, object detection, sentiment analysis, speech recognition, and much more. Whether it’s creating voice assistants, automating customer support, analyzing visual data, or making data-driven predictions, Vertex AI’s models offer the versatility and performance required to succeed in today’s AI-driven landscape. As AI continues to transform industries, Google’s Vertex AI stands as a powerful tool that unlocks the potential of AI, enabling innovation and driving real-world impact across diverse domains.

By harnessing the power of Vertex AI and its pre-built models, businesses and developers can pave the way for intelligent applications that enhance efficiency, accuracy, and user experiences. With a commitment to ongoing research and development, Google’s Vertex AI is poised to continuously expand its model offerings, ensuring that users have access to cutting-edge AI capabilities and enabling them to push the boundaries of what is possible in the world of artificial intelligence.