AI Service (Ollama)
LocalCloud uses Ollama to run large language models locally. Ollama provides an OpenAI-compatible API, making it easy to integrate with existing applications.Available Models
LocalCloud includes several popular models by default:- llama3.2 - Meta’s latest Llama model (3B parameters)
- mistral - Mistral AI’s 7B model
- qwen2.5 - Alibaba’s Qwen 2.5 model
- phi3 - Microsoft’s lightweight model
- deepseek-coder - Code-focused model
Any Ollama Model Available: You’re not limited to the pre-configured models! You can use any model available in the Ollama library by typing the model name manually when prompted during setup or by pulling it later with
lc models pull <model-name>
.Using Custom Models
During Setup
When runninglc setup
, you can type any Ollama model name:
After Setup
Pull any Ollama model at any time:API Endpoints
The Ollama service runs onhttp://localhost:11434
with these endpoints:
Generate Text
Chat Completion (OpenAI Compatible)
List Models
Using with Popular Libraries
OpenAI Python SDK
LangChain
Direct API Usage
Model Management
View Installed Models
Remove Models
Model Storage
Models are stored in Docker volumes and persist between restarts. Storage location:- Volume:
localcloud_ollama_data
- Typical size: 3-50GB per model depending on parameters
Performance Tips
-
Start with smaller models if you have limited resources:
qwen2.5:3b
- Fast and efficientphi3:3.8b
- Great for basic tasksllama3.2:3b
- Good balance of speed and capability
-
For better performance, use models that fit in your available RAM:
- 8GB RAM: Use 3B-7B parameter models
- 16GB RAM: Can handle 7B-13B models comfortably
- 32GB+ RAM: Can run larger models like 20B+
- GPU Acceleration: If you have a compatible GPU, Ollama will automatically use it for faster inference.
Troubleshooting
Model Download Issues
If model downloads fail:Out of Memory
If you get memory errors:- Try a smaller model
- Close other applications
- Increase Docker memory allocation
Slow Response Times
- Use smaller models for faster responses
- Ensure no other heavy processes are running
- Consider upgrading your hardware
Popular Model Recommendations
For General Use
llama3.2:3b
- Fast, good qualitymistral:7b
- Excellent general purposemixtral:8x7b
- High quality but requires more resources
For Coding
deepseek-coder:6.7b
- Specialized for codecodellama:7b
- Meta’s code modelqwen-coder:7b
- Good for multiple languages
For Creative Writing
starling-lm:7b
- Good creative capabilitiesneural-chat:7b
- Conversational focus
For Limited Resources
phi3:3.8b
- Microsoft’s efficient modelqwen2.5:3b
- Fast and capabletinyllama:1.1b
- Ultra-light option