Model Garden of VertexAI:
#

Unlocking the Power of Google’s VertexAI: Exploring the World of Pre-Built Models for AI Tasks
#

Introduction:
#

Artificial Intelligence (AI) has transformed numerous industries, from healthcare and finance to e-commerce, logistic, eduction and entertainment. But the complexity of developing machine learning models often poses a challenge. As the demand for AI-powered solutions continues to rise, data scientists seek efficient ways to leverage pre-trained models or build custom models to address specific tasks. In this regard, Google’s VertexAI emerges as a robust platform that offers an extensive selection of pre-built models for a wide range of AI tasks. VertexAI platform has revolutionized the landscape by seamlessly leveraging LLM (Large Language Models) and Prompt Engineering techniques to perform complex machine learning tasks effortlessly. With VertexAI, data scientists can harness the power of state-of-the-art language models, such as LLM, to accelerate their ML development process. Additionally, the innovative concept of Prompt Engineering enables users to effectively communicate with the models, guiding them to deliver precise and accurate results. From computer vision and natural language processing to speech processing and structured tabular data analysis, Vertex AI’s repertoire includes over 100 models catering to diverse application domains. This article explores how Vertex AI, through its integration of LLM and Prompt Engineering, empowers users to effortlessly tackle intricate machine learning tasks across diverse domains, revolutionizing the AI development experience.

Foundation models:
#

Pre-trained multi-task models that can be further tuned or customized for specific tasks.

Sno.	Name	Details	Task Name	Vision/ Language	Input DataType	Model Name
1	PaLM 2 for Text	Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks, such as: classification, extraction, summarization and content generation.	Text Gen.	Language	Text	text-bison@001
2	PaLM 2 for Chat	Fine-tuned to conduct natural conversation. Use this model to build and customize your own chatbot application.	Text Gen.	Language	Text	chat-bison@001
3	Embeddings for text	Text embedding is an important NLP technique that converts textual data into numerical vectors that can be processed by machine learning algorithms, especially large models. These vector representations are designed to capture the semantic meaning and context of the words they represent.	Embedding	Language	Text	textembedding-gecko@001
4	Codey for Code Completion	Generates code based on code prompts. Good for code suggestions and minimizing bugs in code.	Code Gen.	Language	Text	code-gecko@001
5	Codey for Code Generation	Generates code based on natural language input. Good for writing functions, classes, unit tests, and more.	Code Gen.	Language	Text	code-bison@001
6	Codey for Code Chat	Get code-related assistance through natural conversation. Good for questions about an API, syntax in a supported language, and more.	Code Gen.	Language	Text	codechat-bison@001
7	BERT	Neural network-based technique for natural language processing. Use it to train your own question answering system and more.	Text Gen.	Language	Text	google/bert-base-001
8	InstructPix2Pix	Given an input image and a text prompt that tells the model what to do, the instruct-pix2pix model follows the prompt to edit the image by generating a new one.	Image Gen.	Vision, Language	Text+Image	timbrooks/instruct-pix2pix
9	ControlNet	Control image generation with text prompt and control image.	Image Gen.	Vision, Language	Text	lllyasviel/ControlNet
10	BLIP2	BLIP2 is for the image captioning and visual-question-answering tasks.	Text Gen.	Vision, Language	Image	Salesforce/blip2-opt-2.7b
11	Stable Diffusion 1.4 (Keras)	KerasCV implementation of stability.ai’s text-to-image model, Stable Diffusion 1.4.	Image Gen.	Vision, Language	Text	keras/stable-diffusion-v1-4
12	Embeddings for Image	Generates vectors based on images, which can be used for downstream tasks like image classification, image search, and so on.	Embedding	Vision,	Image	imageembedding-001
13	Label detector (PaLI zero-shot)	Label Detector Zero-shot classifies images based on labels, represented as a list of text prompt strings, which are provided by the user, and calculates the confidence score of each labelâs presence in the image.	Classification	Vision,	Image	imagezeroshot-001
14	Stable Diffusion v1-5	Latent text-to-image diffusion model capable of generating photo-realistic images given a text input.	Image Gen.	Vision,	Text	runwayml/stable-diffusion-v1-5
15	Stable Diffusion Inpainting	Stable Diffusion Inpainting is a latent diffusion model capable of inpainting images given any text input and a mask image.	Image Gen.	Vision,	Text	runwayml/stable-diffusion-inpainting
16	BLIP image captioning	A Vision-Language Pre-training (VLP) framework for image captioning.	Text Gen.	Vision,	Image	Salesforce/blip-image-captioning-base
17	BLIP VQA	A Vision-Language Pre-training (VLP) framework for visual question answering (VQA).	Text Gen.	Vision,	Image	Salesforce/blip-vqa-base
18	CLIP	Neural network capable of classifying images without prior training on the classes.	Classification	Vision,	Image	openai/clip-vit-base-patch32
19	OWL-ViT	Zero-shot, text-conditioned object detection model that can query an image with one or multiple text queries.	Text Gen.	Vision,	Text+Image	google/owlvit-base-patch32
20	ViT GPT2	Image captioning model	Text Gen.	Vision,	Image	nlpconnect/vit-gpt2-image-captioning
21	ViLT VQA	Vision-and-Language Transformer (ViLT) model fine-tuned on VQAv2.	Text Gen.	Vision,	Image	dandelin/vilt-b32-finetuned-vqa
22	LayoutLM for VQA	Fine-tuned for document understanding and information extraction tasks like form and receipt understanding.	Info. Extraction	Vision,	Scan Doc	impira/layoutlm-document-qa
23	T5-FLAN	T5 (Text-To-Text Transfer Transformer) model with the T5-FLAN checkpoint.	Text Gen.	Language	Text	google/t5-flan-001
24	Sec-PaLM2	The sec-palm model is a foundational model that has been pretrained on a variety of security-specific tasks. The model has broad security understanding across a number topics, such as threat intelligence, security operations, and malware analysis. It is ideal for analyzing, summarizing, and aggregating information across multiple security data sources, as well as generating rules and search queries from natural language input.	Info. Extraction	Language	Text	google/sec-palm-000
25	Chirp	Chirp is a version of a Universal Speech Model that has over 2B parameters and can transcribe in over 100 languages in a single model.	Speech Gen.		Speech	chirp-rnnt1

Fine-tunable models :
#

Models that data scientists can further fine-tune through a custom notebook or pipeline.

Sno.	Name	Details	Task Name	Vision/ Language	Input DataType	Model Name
1	Stable Diffusion Inpainting	Stable Diffusion Inpainting is a latent diffusion model capable of inpainting images given any text input and a mask image.	Image Gen.	Vision, Language	Text	runwayml/stable-diffusion-inpainting
2	ControlNet	Control image generation with text prompt and control image.	Image Gen.	Vision,	Text+Image	lllyasviel/ControlNet
3	tfhub/EfficientNetV2	EfficientNet V2 are a family of image classification models, which achieve better parameter efficiency and faster training speed than prior arts.	Classification	Vision,	Image	tensorflow-hub/efficientnetv2
4	tfvision/vit	The Vision Transformer (ViT) is a transformer-based architecture for image classification.	Classification	Vision,	Image	tfvision/vit-s16
5	tfvision/SpineNet	SpineNet is an image object detection model generated using Neural Architecture Search.	Detection	Vision,	Image	tfvision/spinenet49
6	tfvision/YOLO	YOLO algorithm is a one-stage object detection algorithm that can achieve real-time performance on a single GPU.	Detection	Vision,	Image	tfvision/scaled-yolo
7	DeepLabv3+ (with checkpoint)	Semantic segmentation is the task of assigning a label to each pixel in an image, where each label corresponds to a specific class of object or scene element.	Segmentation	Vision,	Image	deeplabv3plus-cityscapes-20230315
8	ResNet (with checkpoint)	Image classification model as described in the paper “Deep Residual Learning for Image Recognition”.	Classification	Vision,	Image	resnet50
9	ResNet-RS (with checkpoint)	Image classification model as described in the paper “Revisiting ResNets: Improved Training and Scaling Strategies”.	Classification	Vision,	Image	ResNet-RS-50
10	Faster R-CNN (Detectron2)	Faster R-CNN is a deep convolutional network used for image object detection.	Detection	Vision,	Image	detectron2/faster-r-cnn
11	MobileNet (TIMM)	Small but powerful models optimized for mobile and embedded vision applications.	Classification	Vision,	Image	timm/mobilenetv2_100
12	EfficientNet (TIMM)	A family of convolutional neural networks (CNNs) designed to be both accurate and efficient.	Classification	Vision,	Image	timm/efficientnetv2_rw_s
13	DeiT	A convolution-free transformer for image classification.	Classification	Vision,	Image	timm/deit_base_patch16_224
14	BEiT	A self-supervised learning framework for image representation learning inspired by BERT.	Classification	Vision,	Image	timm/beit_base_patch16_224
15	ViT (TIMM)	Transformer-like architecture for image classification.	Classification	Vision,	Image	timm/vit_base_patch16_224
16	RetinaNet (Detectron2)	RetinaNet is a one-stage object detection model that utilizes a feature pyramid network (FPN) on top of a ResNet and adds a focal loss function to address class imbalance during training.	Detection	Vision,	Image	detectron2/retinanet
17	Mask R-CNN (Detectron2)	Mask R-CNN is an instance segmentation model which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.	Detection	Vision,	Image	detectron2/mask-r-cnn
18	ResNet (TIMM)	A type of artificial neural network that is made up of residual blocks with skip connections.	Classification	Vision,	Image	timm/resnet50
19	ResNeSt (TIMM)	An extension of the ResNet architecture that uses a new attention mechanism called split-attention.	Classification	Vision,	Image	timm/resnest50d
20	ConvNeXt (TIMM)	A pure convolutional model that is an extension of the ResNet architecture that uses a new attention mechanism called Swin Transformer.	Classification	Vision,	Image	timm/convnext_base
21	CspNet (TIMM)	A type of deep neural network that is an extension of the ResNet architecture that uses a new cross stage partial connection to reduce the number of parameters and computation cost without sacrificing accuracy.	Classification	Vision,	Image	timm/cspdarknet53
22	Inception (TIMM)	Inception network is a deep neural network with an architectural design that consists of repeating components referred to as Inception modules.	Classification	Vision,	Image	timm/inception_v4

Task-specific solutions:
#

Most of these pre-built models are ready to use off the shelf, and many can be customized using your own data.

Sno.	Name	Details	Task Name	Vision/ Language	Input DataType	Model Name
1	Entity analysis	Inspect text to identify and label persons, organizations, locations, events, products and more.	Classification	Language	Text	google/language_v1-analyze_entities
2	Content classification	Use Google’s state-of-the-art language technology to analyzes text content and returns content categories for the content. The latest version of Content Classification supports over 1,000 categories.	Classification	Language	Text	google/language_v1-classify_text_v1
3	Sentiment analysis	Sentiment analysis attempts to determine the overall attitude (positive or negative) expressed within the text. Sentiment is represented by numerical score and magnitude values.	Classification	Language	Text	google/language_v1-analyze_sentiment
4	Entity sentiment analysis	Entity Sentiment Analysis inspects the given text for known entities (proper nouns and common nouns), returns information about those entities, and identifies the prevailing emotional opinion of the entity within the text, especially to determine a writer’s attitude toward the entity as positive, negative, or neutral.	Classification	Language	Text	google/language_v1-analyze_entity_sentiment
5	Syntax analysis	Syntactic analysis extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), providing further analysis on those tokens.	Extraction	Language	Text	google/language_v1-analyze_syntax
6	Text Moderation	Text moderation analyzes a document and returns a list of harmful and sensitive categories that apply to the text found in the document.	Classification	Language	Text	google/language_v1-moderate_text
7	Text Translation	Use Google’s proven pre-trained text model to get text translations for 100+ languages.	Translation	Language	Text	Text Translation
8	Occupancy analytics	Detect people and vehicles in a video or image, plus zone detection, dwell time, and more.	Detection	Vision,	Image, Video	google/occupancy-analytics-001
9	Person/vehicle detector	Detects and counts people and vehicles in video.	Detection	Vision,	Video	People/vehicle detector
10	Object detector	Identify and locate objects in video	Detection	Vision,	Video	Object detector
11	PPE detector	Identify people and personal protective equipment (PPE).	Detection	Vision,	Image	PPE detector
12	Person blur	Mask or blur a person’s appearance in video	Detection	Vision,	Video	People blur
13	Product recognizer	Identify products at the GTIN or UPC level	Recognition	Vision,	Image	Product recognizer
14	Tag recognizer	Extract text in product and price tags	Recognition	Vision,	Scan Doc	Tag recognizer
15	Content moderation (Vision)	Content Moderator (Vision) detects objectionable or unwanted content across predefined content labels (e.g., adult, violence, spoof) or custom labels provided by the user.	Classification	Vision,	Scan Doc	Content Moderation
16	Face detector (Vision API)	Face detector is a prebuilt Vision API model that detects multiple faces in media (images, video) and provides bounding polygons for the face and other facial “landmarks” along with their corresponding confidence values.	Detection	Vision,	Image, Video	Face Detector
17	Watermark detector	Watermark detector is a prebuilt model that detects watermarks in the input image.	Detection	Vision,	Scan Doc	imagewatermarkdetector-001
18	Text detector (Vision API)	Text detector detects and extracts text from images. It uses optical character recognition (OCR) for an image to recognize text and convert it to machine coded text.	Detection	Vision, Language	Scan Doc	Text Detector
19	AutoML E2E	Tabular Workflow for End-to-End AutoML is the complete AutoML pipeline for classification and regression tasks.	Classification		Tabular	AutoML E2E
20	Document AI OCR processor	Document OCR can identify and extract text from documents in over 200 printed languages and 50 handwritten languages.	Extraction		Document	pretrained-ocr-v1.2-2022-11-10
21	Form Parser	Document AI Form Parser applies advanced machine learning technologies to extract key-value pairs, checkboxes, tables from documents in over 200+ languages.	Extraction		Document	pretrained-form-parser-v1.0-2020-09-23
22	TabNet	TabNet is a general model which performs well on a wide range of classification and regression tasks.	Classification		Tabular	TabNet

Task-specific LLM Prompts :
#

Customize language model outputs to meet specific needs. Prompts help to refine or enrich the outputs of the large language model selected.

Sno.	Name	Details	Task Name	Vision/ Language	Input DataType	Model Name
1	Object classification	Classify an object using a small number of examples (few-shot prompting).	Classification	Vision,	Structured	LLM Prompt
2	Kindergarten Science Teacher	Your name is Miles. You are an astronomer who is knowledgeable about the solar system. Respond in short sentences. Shape your response as if talking to a 10-years-old.	Text Gen.	Language	Freeform	LLM Prompt
3	Online Return Customer Service	A customer service chatbot that provides basic customer support and makes decisions on simple tasks	Text Gen.	Language	Freeform	LLM Prompt
4	Gluten Free Advisor	A chatbot that provides gluten free cooking recipes and diet plans.	Text Gen.	Language	Freeform	LLM Prompt
5	Company Information Guide	A informative chatbot that has a simple company background and allows customers to ask questions about those products.	Text Gen.	Language	Freeform	LLM Prompt
6	Fictional Captain from the 1700s	Chat with a fictional character from the 1700s without any modern knowledge.	Text Gen.	Language	Freeform	LLM Prompt
7	Support rep chat summarization	You are a customer support manager and would like to quickly see what your team’s support calls are about.	Summarization	Language	Freeform	LLM Prompt
8	Summarize news article	News takes too much time to read. You want a quicker way to get the summary. Let Vertex help you.	Summarization	Language	Freeform	LLM Prompt
9	Chat agent summarization	You are a customer service center manager and you need to quickly see what your agents are talking about.	Summarization	Language	Freeform	LLM Prompt
10	Chat agent follow up	You are a customer service center manager. Sometimes your agents forget to note down follow ups. You want to automate follow up lists.	Info. Extraction	Language	Freeform	LLM Prompt
11	Transcript summarization	Summarize a block of text.	Summarization	Language	Structured	LLM Prompt
12	Dialog summarization	Summarize a conversation.	Summarization	Language	Structured	LLM Prompt
13	Hashtag tokenization	Create and tokenize hashtags based on the provided text.	Text Gen.	Language	Structured	LLM Prompt
14	Title generation	Generate a title based on the provided text.	Classification	Language	Structured	LLM Prompt
15	Sentiment analysis about a person	You would like to see how reporters write about certain people. You have articles and would like to see if a certain person is written about positivly or negatively.	Classification	Language	Freeform	LLM Prompt
16	Customer request classification, few-shot	Based on customer your customer’s answer, you want to automate routing of your customer to the proper service queue. Use few-shot learning.	Classification	Language	Structured	LLM Prompt
17	Text classification few-shot	You are an intern at a library and your job is to classify hundreds of articles every day. You’d rather automate this and do something else.	Classification	Language	Structured	LLM Prompt
18	Article classification	You are an intern at a library and your job is to classify hundreds of articles every day. You’d rather automate this and do something else.	Classification	Language	Freeform	LLM Prompt
19	Classification headline	Few shot classification on a given topic.	Classification	Language	Structured	LLM Prompt
20	Sentiment analysis	Explain the sentiment expressed in a body of text.	Classification	Language	Structured	LLM Prompt
21	Pixel Technical Specifications, one-shot	Generate technical specification from text of a Pixel phone into JSON, one-shot.	Info. Extraction	Language	Structured	LLM Prompt
22	Wifi troubleshooting	Given description of the different status lights on the Google WiFi router, what should be the troubleshooting step.	Text Gen.	Language	Freeform	LLM Prompt
23	Contract analysis	You are a partner of a law firm. Your associates are bored of reading contracts to find specific provisions when they can work on more intellectually challenging tasks.	Info. Extraction	Language	Freeform	LLM Prompt
24	Extractive Question Answering	Answer questions from given background texts.	Text Gen.	Language	Structured	LLM Prompt
25	Marketing generation Pixel	You work in Google’s device marketing team and you need to create marketing pitch for the new Pixel 7 Pro. You have writers block and need help.	Text Gen.	Language	Freeform	LLM Prompt
26	Ad copy generation	You are a marketer and want to create different versions of the same ad to target different audiences. You would like some suggestions.	Text Gen.	Language	Freeform	LLM Prompt
27	Essay outline	Generate an outline for an essay on a particular topic.	Text Gen.	Language	Freeform	LLM Prompt
28	Correct grammar	Correct grammar in the text.	Text Gen.	Language	Freeform	LLM Prompt
29	Ad copy from description	Write an ad copy for something based on a description.	Text Gen.	Language	Freeform	LLM Prompt
30	Write emails and letters	Write an email or letter based on the specified content.	Text Gen.	Language	Freeform	LLM Prompt
31	Reading comprehension test	Your child is preparing for SAT verbal exam and needs more practice in reading comprehension.	Summarization	Language	Freeform	LLM Prompt
32	Generate memes	Generate memes based on a certain topic.	Text Gen.	Language	Freeform	LLM Prompt
33	Interview questions	Generate a list of interview questions targeting a specific position.	Text Gen.	Language	Freeform	LLM Prompt
34	Naming	Generate ideas for names of a specified entity.	Text Gen.	Language	Freeform	LLM Prompt
35	General tips and advice	Get tips and advice on general topics.	Text Gen.	Language	Freeform	LLM Prompt

Conclusion:
#

The realm of AI has witnessed remarkable advancements, thanks to platforms like Google’s VertexAI. By providing a vast array of pre-built models spanning computer vision, natural language processing, speech processing, and ML tasks on structured tabular data, VertexAI has simplified the development of AI solutions for a multitude of tasks. The platform’s comprehensive selection of models empowers data scientists to efficiently tackle image classification, object detection, sentiment analysis, speech recognition, and much more. Whether it’s creating voice assistants, automating customer support, analyzing visual data, or making data-driven predictions, Vertex AI’s models offer the versatility and performance required to succeed in today’s AI-driven landscape. As AI continues to transform industries, Google’s Vertex AI stands as a powerful tool that unlocks the potential of AI, enabling innovation and driving real-world impact across diverse domains.

By harnessing the power of Vertex AI and its pre-built models, businesses and developers can pave the way for intelligent applications that enhance efficiency, accuracy, and user experiences. With a commitment to ongoing research and development, Google’s Vertex AI is poised to continuously expand its model offerings, ensuring that users have access to cutting-edge AI capabilities and enabling them to push the boundaries of what is possible in the world of artificial intelligence.

Follow Me

Dr. Hari Thapliyaal

Dr. Hari Thapliyal is a seasoned professional and prolific blogger with a multifaceted background that spans the realms of Data Science, Project Management, and Advait-Vedanta Philosophy. Holding a Doctorate in AI/NLP from SSBM (Geneva, Switzerland), Hari has earned Master's degrees in Computers, Business Management, Data Science, and Economics, reflecting his dedication to continuous learning and a diverse skill set. With over three decades of experience in management and leadership, Hari has proven expertise in training, consulting, and coaching within the technology sector. His extensive 16+ years in all phases of software product development are complemented by a decade-long focus on course design, training, coaching, and consulting in Project Management. In the dynamic field of Data Science, Hari stands out with more than three years of hands-on experience in software development, training course development, training, and mentoring professionals. His areas of specialization include Data Science, AI, Computer Vision, NLP, complex machine learning algorithms, statistical modeling, pattern identification, and extraction of valuable insights. Hari's professional journey showcases his diverse experience in planning and executing multiple types of projects. He excels in driving stakeholders to identify and resolve business problems, consistently delivering excellent results. Beyond the professional sphere, Hari finds solace in long meditation, often seeking secluded places or immersing himself in the embrace of nature.

Comments:

Share with :

Model Garden of VertexAI

On This Page

Model Garden of VertexAI:
#

Unlocking the Power of Google’s VertexAI: Exploring the World of Pre-Built Models for AI Tasks
#

Introduction:
#

Foundation models:
#

Fine-tunable models :
#

Task-specific solutions:
#

Task-specific LLM Prompts :
#

Conclusion:
#

Dr. Hari Thapliyaal

Comments:

Related

On This Page

Model Garden of VertexAI:#

Unlocking the Power of Google’s VertexAI: Exploring the World of Pre-Built Models for AI Tasks#

Introduction:#

Foundation models:#

Fine-tunable models :#

Task-specific solutions:#

Task-specific LLM Prompts :#

Conclusion:#

Dr. Hari Thapliyaal

Comments:

Related

Model Garden of VertexAI:
#

Unlocking the Power of Google’s VertexAI: Exploring the World of Pre-Built Models for AI Tasks
#

Introduction:
#

Foundation models:
#

Fine-tunable models :
#

Task-specific solutions:
#

Task-specific LLM Prompts :
#

Conclusion:
#