AI REFERENCE

AI & Machine Learning Glossary

Clear, practical definitions of the AI and ML terms that matter for product teams and engineers building production AI systems.

A

AI Agent

Agent

An AI system that can take sequences of actions to complete a goal, typically by using tools (web search, code execution, database queries, API calls) and reasoning about intermediate results. Agents differ from simple LLM calls in that they can plan, execute multiple steps, and adapt their approach based on intermediate outputs. Multi-agent systems consist of multiple specialized agents that coordinate to complete complex tasks.

Example

A research agent given the task 'find the top 5 competitors to Acme Corp and summarize their pricing' autonomously searches the web, navigates to competitor websites, extracts pricing information, and synthesizes a structured report — taking 15+ tool call steps without human intervention.

AML/KYC

Anti-Money Laundering (AML) and Know Your Customer (KYC) — regulatory compliance requirements for financial institutions to verify customer identity and monitor for suspicious financial activity. AI applications in AML/KYC include transaction monitoring for suspicious patterns, customer risk scoring, identity document verification, and adverse media screening. Models must be explainable for regulatory examination.

Example

An AI-powered AML system monitors 2 million transactions daily — scoring each for risk based on counterparty networks, geographic patterns, transaction velocity, and behavioral anomalies — generating a prioritized alert queue for compliance analysts to review, with SHAP-based explanations for every high-risk flagging decision.

C

Computer Vision

A field of AI that enables computers to interpret and understand visual information from images and video. Modern computer vision uses convolutional neural networks (CNNs) and vision transformers (ViTs) for tasks including image classification, object detection, semantic segmentation, optical character recognition (OCR), and video analysis.

Example

A manufacturing quality control system uses computer vision to inspect 500 units per minute on a production line — identifying surface defects, dimensional variations, and assembly errors that human inspectors would miss at that speed, with a detection accuracy that exceeds the 99.2% target specified for the customer's quality SLA.

E

Embeddings

Numerical vector representations of text, images, audio, or other data that encode semantic meaning. Items with similar meaning have vectors that are close in the embedding space. Embeddings are produced by embedding models and are used as inputs to vector databases, recommendation systems, and classification models.

Example

The sentences 'the patient presented with chest pain' and 'the patient complained of thoracic discomfort' have embeddings that are close together, while 'the dog barked loudly' has an embedding that is far from both — capturing semantic similarity without keyword matching.

Edge AI

Artificial intelligence inference that runs on edge devices — hardware at or near the data source (sensors, cameras, industrial controllers, mobile phones) rather than in the cloud. Edge AI reduces latency, enables operation without internet connectivity, and improves privacy by keeping sensitive data on-device. It requires model optimization (quantization, pruning) to fit within edge hardware constraints.

Example

An industrial robot uses edge AI to perform real-time visual inspection at 60fps without cloud connectivity — the AI model runs on a NVIDIA Jetson device mounted on the robot, making pass/fail decisions in under 10ms to avoid slowing the production line.

F

Fine-Tuning

The process of continuing to train a pre-trained foundation model on a smaller, domain-specific dataset so the model learns to produce outputs that match a specific format, style, or domain vocabulary. Fine-tuning adjusts the model weights rather than providing information at inference time.

Example

A legal AI company fine-tunes a base LLM on 10,000 contract clauses so the model consistently extracts structured data in the format their application requires, outperforming prompting approaches on their specific task.

Foundation Model

A large AI model trained on broad data at scale that can be adapted to a wide range of downstream tasks through fine-tuning, prompting, or retrieval augmentation. Foundation models (GPT-4, Claude, Llama, Stable Diffusion) represent a paradigm shift from task-specific models to general-purpose base models that are adapted to specific use cases.

Example

Instead of training a separate NLP model for contract analysis, customer service, code generation, and summarization, a company uses Claude as a foundation model — adapting it to each task through prompting and RAG, rather than training and maintaining four separate specialized models.

Feature Engineering

The process of transforming raw data into representations (features) that improve ML model performance. Feature engineering includes selecting relevant variables, creating derived features (ratios, aggregations, time windows), encoding categorical variables, handling missing values, and normalizing numerical features. For tabular ML problems, feature engineering often has more impact on model performance than algorithm selection.

Example

For a credit risk model, raw data includes transaction timestamps and amounts. Feature engineering derives features like 'number of transactions in last 7 days,' 'average transaction amount in last 30 days,' and 'days since last transaction' — capturing behavioral patterns the raw data doesn't express directly.

FHIR

Fast Healthcare Interoperability Resources — the HL7 standard for exchanging electronic health records (EHR) between systems. FHIR defines data formats (Resources) and APIs for clinical data including patients, observations, conditions, medications, and procedures. Compliance with FHIR R4 is required for CMS interoperability rules and is the standard integration point for EHR systems including Epic, Cerner, and Allscripts.

Example

A clinical AI application integrates with a hospital's Epic EHR system via the FHIR R4 API — pulling patient observations, lab results, and problem lists to provide real-time sepsis risk scores to the care team, without requiring any custom integration work or EHR modification.

Function Calling

A capability in modern LLMs (GPT-4, Claude, Gemini) that allows the model to specify when to call a predefined function and what arguments to pass — rather than generating free-form text. The application executes the function and returns results to the model, which then generates a response incorporating the result. Function calling is the foundation for tool-using AI agents.

Example

A travel booking assistant uses function calling to allow GPT-4 to invoke flight_search(origin, destination, date), hotel_search(city, checkin, checkout), and book_itinerary(flights, hotels) — the model decides when to call each function, constructs the arguments from conversation context, and processes the results to complete the booking.

H

Hallucination

A phenomenon where a large language model generates text that is fluent and confident-sounding but factually incorrect or entirely fabricated. Hallucinations occur because LLMs generate the most statistically probable next token, not the most factually accurate one. Mitigation strategies include retrieval augmentation (RAG), output validation, citation enforcement, and human review for high-stakes outputs.

Example

When asked about a specific legal case, an LLM without grounding may generate a plausible-sounding but entirely fabricated case citation, complete with judge name, jurisdiction, and ruling — because it has learned the pattern of how legal citations look.

Hugging Face

The leading open-source platform for ML models, datasets, and deployment infrastructure. Hugging Face hosts over 500,000 pre-trained models including BERT, T5, Llama, Stable Diffusion, and Whisper. The Transformers library provides a unified API for fine-tuning and inference. Hugging Face Inference Endpoints and Spaces provide hosting infrastructure for model deployment.

Example

A team building a clinical NLP pipeline uses Hugging Face to access a pre-trained BioBERT model specifically fine-tuned on biomedical text, reducing the labeled data requirement for their clinical entity extraction task compared to starting from a general-purpose BERT model.

I

Inference

The process of running a trained ML model to produce predictions on new data. Inference is distinct from training — training adjusts model weights on a large dataset, while inference applies the fixed weights to new inputs to generate outputs. Production AI systems spend most of their compute time on inference, making inference optimization (latency, throughput, cost) a critical engineering concern.

Example

When a user submits a photo to a skin condition analysis app, the model performs inference — applying its learned weights to the new photo to produce a classification and confidence score — without any learning or weight updates occurring.

L

Large Language Model

LLM

A neural network trained on large amounts of text data that learns to predict the next token in a sequence. At sufficient scale, LLMs develop emergent capabilities including reasoning, code generation, translation, and instruction following. Modern LLMs (GPT-4, Claude, Gemini, Llama) serve as general-purpose reasoning engines for a wide range of AI applications.

Example

GPT-4 is an LLM that can analyze a legal contract, extract key obligations, identify risks, and generate a summary — capabilities that emerge from training on internet-scale text data rather than being explicitly programmed.

LangChain

An open-source framework for building LLM-powered applications. LangChain provides abstractions for chaining LLM calls with tools, memory, and retrieval systems. LangGraph extends LangChain with stateful multi-agent graph execution. LangSmith provides observability and evaluation for production LLM applications. The framework simplifies building RAG pipelines, agents, and multi-step LLM workflows.

Example

A legal research tool built with LangChain chains together: a vector database retrieval step (find relevant case law), a reranking step (select the most relevant cases), and an LLM generation step (synthesize an argument supported by the cases) — a three-step workflow that would require significant custom code without the framework.

Latency

The time delay between when an AI inference request is initiated and when the result is returned. Latency is measured as p50 (median), p95, and p99 percentile response times. Production AI applications have strict latency requirements: real-time fraud scoring under 100ms, conversational AI under 2 seconds for first token, image classification under 50ms for production line inspection. Latency optimization involves model quantization, batching, caching, and hardware selection.

Example

A fraud detection model's p99 latency requirement is 80ms — meaning 99% of inference requests must return within 80ms. Achieving this requires INT8 quantization (4x speed improvement), in-memory feature caching (eliminates database lookup latency), and deployment on a GPU inference server rather than CPU.

M

MLOps

Machine Learning Operations — the set of practices, tools, and workflows that enable organizations to reliably build, deploy, monitor, and maintain ML models in production. MLOps covers the full ML lifecycle: data pipelines, feature engineering, model training, evaluation, deployment, monitoring, and retraining. Analogous to DevOps for software, but with the additional complexity of managing data and model artifacts.

Example

An MLOps platform automatically retrains a fraud detection model weekly on new transaction data, evaluates it against a holdout set, promotes it to production if performance improves, and pages the ML team if the model's precision falls below a threshold.

Model Drift

The degradation of a deployed ML model's performance over time as the statistical properties of real-world data diverge from the training data distribution. Data drift refers to changes in input feature distributions; concept drift refers to changes in the relationship between inputs and the target variable. Monitoring for drift and triggering retraining are core MLOps responsibilities.

Example

A fraud detection model trained on pre-pandemic transaction patterns experiences concept drift when fraud tactics change post-pandemic — the relationship between transaction features and fraud probability shifts in ways the model has not learned. The model's precision falls from 94% to 87% over six months before monitoring alerts trigger retraining.

Model Serving

The infrastructure and processes for making trained ML models available for real-time or batch inference. Model serving includes the inference server (TF Serving, Triton, FastAPI), model versioning and A/B testing, scaling and load balancing, monitoring, and optimization for latency and throughput. Production model serving is distinct from development inference and requires careful engineering for reliability.

Example

A recommendation model serving infrastructure handles 50,000 inference requests per second at peak load — auto-scaling GPU instances, routing requests across model versions for A/B testing, caching frequent predictions, and maintaining p99 latency under 80ms with 99.95% uptime.

MLflow

An open-source MLOps platform for tracking ML experiments, packaging models, and managing the model lifecycle. MLflow's four components: Tracking (log parameters, metrics, and artifacts for every training run), Projects (package ML code for reproducibility), Models (a standard format for saving and loading models), and Registry (a model store with staging/production lifecycle management). Integrates with all major cloud ML platforms.

Example

A data science team uses MLflow to track 200+ training experiments for a customer churn model — comparing hyperparameters, features, and evaluation metrics across runs. The best model is registered in the MLflow Model Registry, where it moves from Staging to Production after sign-off, triggering an automated deployment to the serving infrastructure.

N

NLP

Natural Language Processing — the field of AI concerned with enabling computers to understand, interpret, and generate human language. Modern NLP is dominated by transformer-based models (BERT, GPT, T5, RoBERTa) pre-trained on large text corpora. Applications include text classification, named entity recognition, sentiment analysis, question answering, summarization, and machine translation.

Example

An NLP pipeline for a healthcare company processes clinical notes — extracting diagnoses (ICD codes), medications, procedures, and vital signs mentioned in unstructured text — transforming free-form physician documentation into structured data for analytics and quality reporting.

O

OCR

Optical Character Recognition — technology that converts images of text (scanned documents, photos, screenshots) into machine-readable text. Modern AI-powered OCR (also called Intelligent Document Processing or IDP) goes beyond character recognition to extract structured data from complex document layouts including tables, forms, invoices, and contracts.

Example

An insurance company processes 10,000 claims per day — each containing a mix of handwritten forms, typed correspondence, and scanned medical records. AI-powered OCR extracts structured data (claim amounts, dates, diagnoses, policy numbers) from the document images, reducing manual data entry from 8 minutes per claim to 30 seconds for review.

P

Prompt Engineering

The practice of designing and optimizing the input text (prompt) sent to a language model to elicit specific, reliable, and high-quality outputs. Includes techniques such as few-shot prompting (providing examples), chain-of-thought (asking the model to reason step by step), and system prompt design (establishing role and constraints). Prompt engineering is a complement to, not a replacement for, model fine-tuning or retrieval augmentation for production systems.

Example

Instead of asking an LLM to 'summarize this document,' a prompt engineer writes a system prompt that defines the output format, length constraints, required sections, and tone — producing consistent, structured summaries that downstream systems can reliably parse.

Q

Quantization

A model compression technique that reduces the numerical precision of model weights from 32-bit or 16-bit floating point to 8-bit integers or lower. Quantization reduces model size (typically by 4x for INT8) and improves inference speed with minimal accuracy loss on most tasks. Essential for deploying ML models on edge devices with limited memory and compute.

Example

A computer vision model for industrial defect detection originally requires 2GB of GPU memory. After INT8 quantization, the model fits in 500MB and runs 3x faster on the edge device — enabling real-time inference at the production line without cloud connectivity.

R

Retrieval-Augmented Generation

RAG

A technique that combines a large language model with a retrieval system — a vector database or search index — so the model can access up-to-date, domain-specific information at inference time rather than relying solely on knowledge encoded during training. The model retrieves relevant documents, then generates an answer grounded in the retrieved content.

Example

A customer support chatbot uses RAG to retrieve relevant knowledge base articles before answering, ensuring responses are accurate and up-to-date without requiring the model to be retrained when the knowledge base changes.

RLHF

Reinforcement Learning from Human Feedback — a training technique used to align language models with human preferences. Human raters compare model outputs and indicate which is better; these preferences train a reward model; and the reward model guides further model training via reinforcement learning. RLHF is how models like GPT-4 and Claude are made helpful, harmless, and honest rather than just statistically likely.

Example

Without RLHF, an LLM asked to write a medical disclaimer might produce a technically accurate but poorly formatted response. With RLHF, human raters train the model to produce clear, appropriately formatted, and user-friendly outputs — not just statistically likely ones.

RAG

Retrieval-Augmented Generation — see Retrieval-Augmented Generation for the full definition. RAG is the dominant architecture for production LLM applications that need accurate, grounded, and auditable outputs over private or frequently updated knowledge.

Example

See the Retrieval-Augmented Generation entry for a detailed example.

S

SaMD

Software as a Medical Device — software intended to be used for one or more medical purposes without being part of a hardware medical device. The FDA, IMDRF, and EU MDR/IVDR all have specific regulatory frameworks for SaMD. AI/ML-based SaMD that continuously learns from post-market data requires a Predetermined Change Control Plan (PCCP) to receive FDA clearance.

Example

A mobile app that analyzes photos of skin lesions to detect melanoma is SaMD — it performs a diagnostic function (identify potential melanoma) using software that is not part of a physical medical device. It requires FDA clearance as a Class II medical device under the De Novo or 510(k) pathway before commercial distribution in the US.

T

Token

The unit of text that language models process. Tokens are subword units — roughly 3/4 of a word on average in English. LLM API pricing, context window limits, and generation speed are all measured in tokens. Understanding token economics is critical for designing cost-efficient LLM applications at scale.

Example

The sentence 'The quick brown fox jumps over the lazy dog' consists of approximately 10 tokens in GPT-4's tokenizer. At $0.01 per 1,000 input tokens, processing 1 million such sentences costs $100 in API fees — a calculation that shapes architecture decisions for high-volume applications.

V

Vector Database

Vector DB

A database designed to store and query high-dimensional vector representations (embeddings). Unlike traditional databases that find exact matches, vector databases find semantically similar items using approximate nearest neighbor (ANN) algorithms. Used in RAG systems, semantic search, recommendation engines, and anomaly detection.

Example

A vector database stores embeddings of 500,000 product descriptions. When a user types a search query, the query is converted to an embedding and the vector database returns the 20 most semantically similar products — finding relevant results even when no keywords match.

BUILD WITH US

Ready to apply these concepts to your product?

Our engineering team builds production AI systems using the technologies and techniques described here.