Qwen VL

Image classification Image generation A10G

A large vision language model made by Alibaba cloud

Deploy Qwen VL behind an API endpoint in seconds.

Deploy model

Example usage

The model requires two inputs:

prompt: The instruction given to the model to extract info from the image
image: The input image in the form of a base64 string

Input

1import requests
2import base64
3from PIL import Image
4from io import BytesIO
5
6# Place model id below
7model_id = ""
8baseten_api_key = os.environ["BASETEN_API_KEY"]
9
10def pil_to_b64(pil_img):
11    buffered = BytesIO()
12    pil_img.save(buffered, format="PNG")
13    img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
14    return img_str
15
16
17data = {
18  "image": pil_to_b64(Image.open("/path/to/image/dog.jpg")),
19  "prompt": "Generate the caption in English with grounding"
20}
21
22# Call model endpoint
23res = requests.post(
24    f"https://model-{model_id}.api.baseten.co/production/predict",
25    headers={"Authorization": f"Api-Key {baseten_api_key}"},
26    json=data
27)
28
29print(res.json())

JSON output

1{
2    "output": "Picture 1: <img>/tmp/tmpw6m_zmbk.png</img>\nGenerate the caption in English with grounding<ref> A maltese dog</ref><box>(385,361),(783,934)</box> in a flower garden<|endoftext|>"
3}

Example usage

Deploy any model in just a few commands