Qwen VL
A large vision language model made by Alibaba cloud
Deploy Qwen VL behind an API endpoint in seconds.
Deploy modelExample usage
The model requires two inputs:
prompt: The instruction given to the model to extract info from the image
image: The input image in the form of a base64 string
Input
1import requests
2import base64
3from PIL import Image
4from io import BytesIO
5
6# Place model id below
7model_id = ""
8baseten_api_key = os.environ["BASETEN_API_KEY"]
9
10def pil_to_b64(pil_img):
11 buffered = BytesIO()
12 pil_img.save(buffered, format="PNG")
13 img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
14 return img_str
15
16
17data = {
18 "image": pil_to_b64(Image.open("/path/to/image/dog.jpg")),
19 "prompt": "Generate the caption in English with grounding"
20}
21
22# Call model endpoint
23res = requests.post(
24 f"https://model-{model_id}.api.baseten.co/production/predict",
25 headers={"Authorization": f"Api-Key {baseten_api_key}"},
26 json=data
27)
28
29print(res.json())
JSON output
1{
2 "output": "Picture 1: <img>/tmp/tmpw6m_zmbk.png</img>\nGenerate the caption in English with grounding<ref> A maltese dog</ref><box>(385,361),(783,934)</box> in a flower garden<|endoftext|>"
3}