Run your own Compute Community server to provide compute resources
Compute Community allows you to run your own server to provide AI compute resources to your network. After setting up your server, you'll have an API key and gateway URL that you can share with friends to let them access your GPU resources.
To run the Compute Community server, you'll need:
Ensure Docker is installed and properly configured with NVIDIA support. The server uses vLLM for LLM inference. Make sure your model fits on your GPU's available memory.
Use Docker to run the vLLM server with your chosen model:
docker run --runtime nvidia --gpus all ^
-p 8000:8000 ^
--ipc=host ^
vllm/vllm-openai:latest ^
--model Qwen/Qwen2.5-14B-Instruct-AWQ ^
--gpu-memory-utilization 0.90 ^
--max_model_len 16384 ^
--api-key YOUR_API_KEY
Note: Replace YOUR_API_KEY
with a secure API key of your choice. Also, you can replace the model with any model supported by vLLM.
To expose your local server to the internet, you'll need to set up ngrok:
Create an ngrok account
Authenticate with your ngrok authtoken
Find your authtoken in your ngrok dashboard and set it up:
ngrok authtoken YOUR_AUTH_TOKEN
Create a static domain (recommended)
Create a static domain in your ngrok dashboard, such as your-domain.ngrok-free.app
Start the ngrok tunnel
ngrok http --url=YOUR_STATIC_DOMAIN 8000
After setting up your server, you can share the following details with your friends:
https://your-domain.ngrok-free.app
)Qwen/Qwen2.5-14B-Instruct-AWQ
)They can add these details in the Compute Community settings page to connect to your server.