Forty percent of executives admit they greenlight AI projects without fully understanding the terminology. This AI concepts for beginners glossary cuts through the noise — every term defined plainly, with a concrete example, so you can evaluate vendors, read proposals, and make decisions without nodding along at words you don't know.
Terms are grouped by theme, not alphabetically, because understanding how concepts relate to each other matters more than alphabetical tidiness.
The Foundation: Models and Intelligence
Large Language Model (LLM)
A large language model is a statistical system trained on billions of text samples to predict and generate human-like text. It learns patterns — grammar, reasoning styles, factual associations — from that data.
Examples: GPT-4o (OpenAI), Claude 3.5 Sonnet (Anthropic), Gemini 1.5 Pro (Google), Llama 3.3 (Meta).
Plain-language take: An LLM is not a search engine and not a database. It does not "look things up." It generates probable continuations of text based on patterns it absorbed during training. That distinction explains most of its limitations.
Foundation Model
A foundation model is a large AI model trained on broad, general data that can be fine-tuned or prompted for specific tasks. LLMs are one type; multimodal models (text + image + audio) are another.
Why it matters: When a vendor says they "built an AI," they almost always mean they configured a foundation model — not that they trained one from scratch. Training a frontier model costs tens of millions of dollars and requires thousands of GPUs.
Multimodal Model
A model that processes more than one type of input — text, images, audio, video, or code — within a single system.
Example: GPT-4o can accept a photo of a whiteboard and a spoken question, then return a written or spoken answer. That is multimodality in practice.
Parameters
Parameters are the numerical weights adjusted during training that determine how a model responds. GPT-3 had 175 billion parameters; GPT-4 is estimated at over 1 trillion. More parameters generally (not always) mean higher capability and higher compute cost.
Beginner shortcut: Parameters ≈ the model's "memory of patterns." More parameters can hold more nuance.
How You Communicate with a Model
Prompt
A prompt is the input you send to an AI model — the instruction, question, or context that shapes its output. Prompt quality directly affects output quality.
Bad prompt: "Write about marketing."
Better prompt: "Write a 150-word LinkedIn post for a B2B SaaS CFO explaining why AI reduces month-end close time. Tone: direct, no jargon."
Prompt Engineering
The practice of deliberately designing prompts to get reliable, high-quality outputs from an AI model. It includes techniques like chain-of-thought prompting (asking the model to reason step by step), few-shot prompting (providing examples inside the prompt), and role assignment.
Context Window
The maximum amount of text — measured in tokens — that a model can "see" at once during a single interaction. This includes your prompt, the conversation history, and the model's response.
Current benchmarks: GPT-4o supports ~128,000 tokens (~96,000 words). Gemini 1.5 Pro supports up to 1 million tokens. Claude 3.5 supports 200,000 tokens.
Why it matters for software: If your product needs to analyze a 500-page legal contract in one shot, context window size is a hard technical constraint, not a marketing detail.
Token
The unit models use to process text. A token is roughly 0.75 words in English. "Catalizadora builds software" = 5 tokens. API pricing is typically quoted per million tokens, so token counting directly affects cost forecasting.
Making Models Smarter and More Accurate
RAG (Retrieval-Augmented Generation)
RAG is a technique that connects an LLM to an external knowledge source — a database, document library, or live data feed — at query time. Instead of relying solely on training data, the model retrieves relevant chunks of real information and uses them to generate its answer.
Example: A customer-support chatbot trained only on GPT-4 knows nothing about your proprietary return policy. With RAG, it fetches the relevant policy document before answering, so the response is accurate and current.
RAG vs. fine-tuning: RAG is faster to implement and keeps knowledge updatable. Fine-tuning bakes knowledge into the model weights and requires retraining when information changes.
Embeddings
Embeddings are numerical representations (vectors) of text, images, or other data that capture semantic meaning. Two sentences that mean the same thing — even if worded differently — produce similar vectors.
Why they power RAG: Search in a RAG system works by finding documents whose embedding vectors are closest to the query's embedding vector. That is why a search for "contract termination clause" can return a document that uses the phrase "agreement cancellation terms."
Fine-Tuning
Fine-tuning takes a pre-trained foundation model and trains it further on a smaller, domain-specific dataset to improve performance on a particular task.
When to use it: Fine-tuning makes sense when you have thousands of high-quality labeled examples and need consistent formatting, tone, or specialized domain accuracy. It is overkill — and expensive — when prompt engineering or RAG can achieve the same result.
Vector Database
A database optimized to store and search embeddings. It performs nearest-neighbor search, finding the vectors most similar to a query vector in milliseconds, even across millions of records.
Common tools: Pinecone, Weaviate, Qdrant, pgvector (Postgres extension).
AI Agents: Where the Field Is Heading
AI Agent
An AI agent is a system that uses an LLM as its reasoning core but can take actions — browse the web, run code, call APIs, send emails, write to databases — to complete a multi-step task autonomously.
Key difference from a chatbot: A chatbot responds. An agent acts and adapts based on results.
Example: An agent assigned to "research our top 10 competitors and populate this CRM" will search the web, extract data, format it, and push it to your CRM — without a human clicking through each step.
Agentic Loop
The cycle an AI agent runs through: receive goal → plan steps → execute action → observe result → revise plan → repeat. This loop continues until the task is complete or the agent hits a failure condition.
Tool Use / Function Calling
The mechanism that lets an LLM trigger external functions — APIs, scripts, database queries — during a conversation. The model decides when and which tool to call based on the user's request.
Example: A user asks an internal assistant, "What was last month's revenue?" The model calls a function connected to your ERP, retrieves the figure, and answers — no human query required.
Orchestration
When multiple AI agents or model calls need to work together — one agent researching, another writing, another reviewing — orchestration manages the workflow, dependencies, and data passing between them.
Frameworks: LangGraph, CrewAI, AutoGen, and custom-built orchestrators are common choices in production systems.
Guardrails
Rules, filters, and validation layers added around an AI system to constrain its behavior — preventing harmful outputs, off-topic responses, hallucinated data, or policy violations.
In production software, guardrails are not optional. Any AI feature deployed to real users needs input validation, output filtering, confidence thresholds, and human escalation paths.
Evaluating and Trusting AI Output
Hallucination
When an AI model generates text that sounds confident and fluent but is factually wrong or entirely fabricated. The model is not "lying" — it is producing a statistically plausible continuation that happens to be incorrect.
Mitigation strategies: RAG with verified sources, output validation layers, confidence scoring, and human review for high-stakes decisions.
Inference
Inference is the act of running a trained model to generate an output. Training happens once (or periodically); inference happens every time a user interacts with the system. Inference cost and latency are the primary operational variables in production AI systems.
Latency
The time between sending a prompt and receiving a complete response. For user-facing applications, latency above 3-4 seconds measurably hurts engagement. Streaming responses (displaying text as it generates) is a common UX technique to manage perceived latency.
From Concepts to Custom Software
Understanding this glossary is step one. The harder question is knowing which combination of these concepts — RAG, agents, fine-tuning, guardrails — actually solves your specific business problem, and which are over-engineered for your use case.
At Catalizadora, we build AI-native software that applies exactly the right layer of these technologies — no more, no less. Our Core program delivers production-ready AI systems in 12 weeks. Clients own 100% of the code and IP, with no recurring license fees attached to our work.
If you have read this glossary and have a concrete problem in mind, the next step is seeing how we think about building.