visual-chatgpt - Visual ChatGPT连接 ChatGPT 和一系列 Visual Foundation Models 以实现在聊天过程中发送和接收图像。

Created at: 2023-03-02 17:04:28

Language: Python

编号: https://github.com/microsoft/visual-chatgpt

License: MIT

可视聊天

Visual ChatGPT 连接 ChatGPT 和一系列 Visual Foundation 模型，以便在聊天过程中发送和接收图像。

请参阅我们的论文：Visual ChatGPT：使用Visual Foundation Models进行对话，绘图和编辑

更新：

添加自定义 GPU/CPU 分配
添加窗口支持
合并拥抱人脸控制网，删除 download.sh
添加提示修饰器
添加拥抱脸和Colab演示
清洁要求

洞察与目标：

一方面，ChatGPT（或LLM）作为一个通用界面，提供对广泛主题的广泛而多样化的理解。另一方面，基础模型通过提供特定领域的深入知识来充当领域专家。通过利用一般和深入的知识，我们的目标是建立一个能够处理各种任务的人工智能。

演示

系统架构

快速入门

# clone the repo
git clone https://github.com/microsoft/visual-chatgpt.git

# Go to directory
cd visual-chatgpt

# create a new environment
conda create -n visgpt python=3.8

# activate the new environment
conda activate visgpt

#  prepare the basic environments
pip install -r requirements.txt

# prepare your private OpenAI key (for Linux)
export OPENAI_API_KEY={Your_Private_Openai_Key}

# prepare your private OpenAI key (for Windows)
set OPENAI_API_KEY={Your_Private_Openai_Key}

# Start Visual ChatGPT !
# You can specify the GPU/CPU assignment by "--load", the parameter indicates which 
# Visual Foundation Model to use and where it will be loaded to
# The model and device are sperated by underline '_', the different models are seperated by comma ','
# The available Visual Foundation Models can be found in the following table
# For example, if you want to load ImageCaptioning to cpu and Text2Image to cuda:0
# You can use: "ImageCaptioning_cpu,Text2Image_cuda:0"

# Advice for CPU Users
python visual_chatgpt.py --load ImageCaptioning_cpu,Text2Image_cpu

# Advice for 1 Tesla T4 15GB  (Google Colab)                       
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,Text2Image_cuda:0"
                                
# Advice for 4 Tesla V100 32GB                            
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,ImageEditing_cuda:0,
    Text2Image_cuda:1,Image2Canny_cpu,CannyText2Image_cuda:1,
    Image2Depth_cpu,DepthText2Image_cuda:1,VisualQuestionAnswering_cuda:2,
    InstructPix2Pix_cuda:2,Image2Scribble_cpu,ScribbleText2Image_cuda:2,
    Image2Seg_cpu,SegText2Image_cuda:2,Image2Pose_cpu,PoseText2Image_cuda:2,
    Image2Hed_cpu,HedText2Image_cuda:3,Image2Normal_cpu,
    NormalText2Image_cuda:3,Image2Line_cpu,LineText2Image_cuda:3"

显卡内存使用情况

这里我们列出了每个视觉基础模型的 GPU 内存使用情况，你可以指定你喜欢哪一个：

基础模型	显卡内存（MB）
图像编辑	3981
指示像素2像素	2827
文本2图像	3385
图像标题	1209
图片2精明	0
CannyText2Image	3531
图像2线	0
行文本2图像	3529
图片2Hed	0
HedText2Image	3529
图片2涂鸦	0
涂鸦文本2图像	3531
图像2姿势	0
姿势文本2图像	3529
图片2赛格	919
SegText2Image	3529
图像2深度	0
深度文本2图像	3531
图像2正常	0
普通文本2图像	3529
视觉问答	1495

确认

我们感谢以下项目的开源：

拥抱脸郎链稳定扩散控制网指令Pix2Pix 剪辑例如 BLIP

联系信息

如需使用 Visual ChatGPT 的帮助或问题，请提交 GitHub 问题。

如需其他通讯，请联系吴晨飞（chewu@microsoft.com）或段楠（nanduan@microsoft.com）。