BakLLaVA-1 : Free GPT4-V Installation guide

Robot eating bakllava

BakLLaVA-1 is a multimodal Vision model similar to GPT-4V but it is opensource and free. It is based on the Llava 1.5 project. BakLLaVA-1 is a significant improvement over the default Llava 1.5, which uses the Vicuna 13B model. The Mistral 7B model, which BakLLaVA-1 is based on, outperforms the Vicuna 13B model in several benchmarks, making BakLLaVA-1 a highly efficient and effective AI vision model.

BakLLaVA-1 at work:

BakLLaVA-1 at work


How to Install BakLLaVA-1?

Installing BakLLaVA-1 is straightforward and can be done in two ways. 


1. Installing using Docker image

We have created a docker image to make it easier for our readers to run and test this model.

The image can be found in the docker hub repo: securecortex/bakllava - Docker Image | Docker Hub

If you need more information on how to install docker the info can be found here: Install Docker Engine | Docker Docs

a) Copy and save the below code into dockercompose.yml file

version: '3'
services:
  bakllava:
    image: securecortex/bakllava:latest
    ports:
      - "10000:10000"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
    command: /workspace/BakLLaVA/start.sh
    ipc: host
    ulimits:
      memlock: -1


b) Create the container by running this command in your shell:

docker compose -f dockercompose.yml up

On the first run the application will download the necessary models from huggingface which is around 25GB of disk space. After that the application can be reached from http://localhost:10000


2. Installing using Conda

a) Install Conda :

To install Conda download and run the appropriate installer from : Free Download | Anaconda. In ubuntu conda can be installed using these commands : 

wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh

sudo chmod +x Anaconda3-2023.09-0-Linux-x86_64.sh

./Anaconda3-2023.09-0-Linux-x86_64.sh


b) Download the code from the Github Repository

git clone https://github.com/SkunkworksAI/BakLLaVA.git

cd BakLLaVA


c) Install repository : 

conda create -n llava python=3.10 -y

conda activate llava

pip install --upgrade pip  # enable PEP 660 support

pip install -e .

pip install --upgrade transformers


d) Install Ninja and flash attention for training

pip install ninja

pip install flash-attn --no-build-isolation3


e) To run the application use these commands (all the 3 commands are required to run at the same time) 

python -m llava.serve.controller --host 0.0.0.0 --port 9000 &

python -m llava.serve.gradio_web_server --controller http://localhost:9000 --port 10000 --model-list-mode reload &

python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:9000 --port 2000 --worker http://localhost:2000 --model-path liuhaotian/llava-v1.5-13b --load-4bit & 

On the first run the application will download the necessary models from huggingface which is around 25GB of disk space. After that the application can be reached from http://localhost:10000

In conclusion, BakLLaVA-1 is a powerful AI multimodal vision model that offers superior performance and efficiency. Its open-source nature and low resource requirements make it an accessible and cost-effective solution for many users and developers. We are excited about its potential and look forward to seeing how it will be used to advance the field of computer vision.

Comments

Popular Posts