Question 1

How is this different from just calling the OpenAI API?

Accepted Answer

Calling an API is the easy part. The hard part is retrieval quality, evals that catch regressions, latency under load, cost control as you scale, and a human review loop for cases the model gets wrong. We build the system around the API call.

Question 2

Can you use open-source / self-hosted models?

Accepted Answer

Yes. We've shipped systems on GPT-4o, Claude, Llama 3, Mistral, and Mixtral. We pick based on quality requirements, latency budget, and data residency — not vendor preference.

Question 3

How long does a typical RAG project take?

Accepted Answer

A focused RAG implementation over a defined corpus is typically 4–8 weeks from kickoff to production. The shape: 1 week discovery, 2–4 weeks build + eval iteration, 1–2 weeks deployment and handover.

Question 4

What if we need our own model fine-tuned?

Accepted Answer

We start with a strong prompt + retrieval baseline because that solves most use cases. If evals show fine-tuning is the right call, we run that as a separate workstream — synthetic data generation, training run, eval gates, and deployment.

AI & LLM Integration

What this looks like in production

When this fits

Questions about AI & LLM Integration.

Ready to talk about AI & LLM Integration?