Glossary of AI Video Analytics and Physical Threat Detection

A

Access control

A system that determines who is allowed to enter a physical space and when. Modern access control increasingly combines credentials (cards, mobile keys, PINs) with biometric verification and AI-assisted analytics such as tailgating detection.

Aggressive behavior detection

AI video analytics that identifies physical altercations, raised arms in attack patterns, group escalation, and similar pre-violence indicators. Used in hospitals, schools, transit, and retail to alert security before an incident becomes critical.

AICPA SOC 2

An attestation report governed by the American Institute of Certified Public Accountants that evaluates a service organization's controls for security, availability, processing integrity, confidentiality, and privacy. A common procurement requirement for video analytics vendors.

Alarm fatigue

Operator desensitization caused by repeated false or low-value alerts. The leading cause of missed real incidents in security operations centers and a primary justification for AI-based false alarm filtering.

Alarm Hub

A Scylla module that aggregates, manages, and routes alarms from multiple sites and detection types into a single operational interface.

Alarm verification

The process — manual, AI-assisted, or both — of confirming an alarm is real before dispatching a response. Reducing time-to-verify is a primary KPI for monitoring centers.

Alyssa's Law

A category of U.S. state laws requiring public schools to install silent panic alarms that connect directly to law enforcement. First enacted in New Jersey (2019) and adopted by several states since. Often paired with AI weapon detection in school safety procurement.

Anomaly detection

Identification of behavior or events that deviate from a learned baseline rather than matching a fixed signature. Used when threats are too varied to enumerate (loitering near sensitive zones, unusual crowd movement, abnormal vehicle paths).

Anti-spoofing

Techniques that detect attempts to fool biometric systems using photos, video replay, 3D masks, or deepfakes. Critical for any face-recognition or liveness-checked access flow.

API (Application Programming Interface)

A defined way for one software system to call another. Video analytics APIs typically expose detection events, alerts, configuration, and search.

Asteria

Scylla's smart edge monitoring appliance, designed to run detection workloads locally on a customer's premises rather than in the cloud.

Attribute Based Search

A method of searching video surveillance footage by describing the visual characteristics of a person or vehicle of interest — such as clothing color, garment type, carried objects, hair color, vehicle make, or body type — rather than relying on facial recognition or precise timestamps. Attribute-based search enables investigators to locate a subject across multi-camera networks without a face match, making it effective in environments with partial camera coverage, poor facial visibility, or privacy compliance constraints that restrict biometric data use. In Scylla Forensics Pro, attribute-based search is powered by the Event Raptor Vision-Language Model, which interprets plain-language descriptive queries and matches them against a continuously updated semantic index of archived and live camera footage.

B

Behavior recognition

AI analysis of human or vehicle behavior over time (as opposed to single-frame object detection). Includes activities such as fighting, falling, loitering, and crowd formation.

BIPA (Biometric Information Privacy Act)

An Illinois state law (2008) that imposes strict consent, notice, and retention requirements on the collection of biometric identifiers, including facial templates. Carries a private right of action; the basis for many high-value class actions against face-recognition deployments.

Bounding box

The rectangular coordinates a detection model outputs around a detected object. The basic geometric unit of object detection.

C

CCPA / CPRA (California Consumer Privacy Act / California Privacy Rights Act)

California's consumer privacy framework. Imposes disclosure and opt-out obligations on the collection of personal information, including biometric data and video.

Central monitoring station (CMS)

A 24/7 facility that receives, verifies, and dispatches on alarms from many customer sites. Also called a monitoring center or alarm receiving center (ARC) in Europe.

Chain of Custody

The documented, unbroken record of who collected, handled, transferred, and accessed a piece of evidence — in this context, security camera footage — from the moment of capture through to its presentation in legal proceedings or law enforcement handoff. Maintaining a defensible chain of custody for video evidence requires precise timestamping, location metadata, access logging, and tamper-evident export formats. AI-powered investigation platforms such as Scylla Forensics Pro support chain of custody requirements by generating structured PDF evidence reports containing AI-generated narrative summaries, captured video frames, and immutable location and timestamp metadata — formatted for immediate legal or law enforcement use.

Charon

Scylla's smart decision-making algorithm that verifies detections in real time and reduces false alerts before they reach the operator. Named after the ferryman of Greek myth.

CJIS Security Policy

The U.S. FBI's Criminal Justice Information Services security framework. Required for any system that processes, transmits, or stores criminal justice information; relevant for law enforcement integrations.

COCO

A large-scale benchmark dataset widely used to train and evaluate computer vision and object detection models. COCO contains over 330,000 images annotated across 80 object categories, and performance on the COCO benchmark — measured by mean Average Precision (mAP) — is the industry-standard metric for comparing the accuracy of AI detection models. Security AI vendors frequently reference COCO benchmark scores when validating the accuracy of their detection capabilities. Scylla AI's detection models are evaluated against COCO benchmarks as part of its certification and accuracy validation process.

Command Center

Scylla's video management and operations interface that consolidates live camera feeds, alarms, and detection events for security operators.

Computer vision (CV)

The branch of AI concerned with interpreting visual information from images and video. Includes object detection, segmentation, tracking, pose estimation, and recognition.

Confidence threshold

The minimum score a detection model must produce before an event is treated as a positive. Lower thresholds increase recall and false positives; higher thresholds increase precision and missed detections.

Cooperative vs. non-cooperative recognition

In face recognition, "cooperative" describes subjects who deliberately face the camera at a controlled distance (e.g., kiosk enrollment). "Non-cooperative" or "in the wild" describes recognition of subjects who are unaware of the camera, in motion, or at angle and distance. The latter is significantly harder.

Cyber Physical Convergence

The progressive integration of digital cybersecurity systems and physical security infrastructure into a single, unified operational environment. As physical security devices — including IP cameras, access control systems, building automation controllers, and visitor management platforms — become network-connected, the boundary between IT security and physical security dissolves. A breach in one domain increasingly enables lateral movement into the other: a compromised email account can lead to unlocked doors; a vulnerable camera network can provide an entry point into corporate IT systems. Cyber-physical convergence is identified by the Security Industry Association (SIA) as one of the defining forces reshaping enterprise security architecture in 2025–2026.

D

Deep learning

A subset of machine learning that uses multi-layered neural networks to learn hierarchical representations from data. The foundation of modern computer vision.

Detection latency

The elapsed time between a threat appearing in the video frame and an alert being issued. Made up of camera latency, network transit, inference time, and decision-logic time.

Dwell time

How long a person or vehicle remains in a defined area. Used as an input to loitering detection and retail analytics.

E

Edge inference (edge analytics)

Running AI detection on a device near the camera (an appliance, NVR, or the camera itself) rather than streaming video to the cloud. Reduces bandwidth, lowers latency, and supports privacy and compliance requirements.

Edge Cloud Hybrid Architecture

A deployment model for AI video analytics in which processing workloads are distributed between on-premise edge hardware — located physically at or near the camera network — and cloud-based infrastructure, according to the latency, privacy, and computational requirements of each task. In physical security applications, latency-sensitive real-time detection tasks such as weapon detection or perimeter breach alerts are processed at the edge to minimize response time and keep sensitive video data local. Storage-intensive or computationally demanding tasks such as cross-site forensic search or analytics aggregation are handled in the cloud. Edge-cloud hybrid architecture has emerged as the dominant deployment pattern for enterprise AI video analytics in 2026, offering the low latency and data sovereignty advantages of edge processing alongside the scalability of cloud infrastructure. Scylla AI supports edge-cloud hybrid deployment via its Asteria edge appliance alongside cloud and on-premise options.

Embedding (face / object)

A numeric vector that represents the identity-relevant features of a face or object. Recognition is performed by comparing embeddings, not raw images.

Event Raptor

A Scylla module that captures, classifies, and surfaces specific event types from live or recorded video for review and response.

F

F1 score

The harmonic mean of precision and recall; a single number that balances false positives and false negatives.

Face recognition (1:1 vs. 1:N)

1:1 verification confirms whether a face matches a single claimed identity (e.g., unlocking a phone). 1:N identification searches a face against a gallery of many enrolled identities (e.g., watchlist matching).

Face recognition in the wild

Identification of subjects under uncontrolled conditions — variable lighting, motion, angle, partial occlusion, distance — as opposed to controlled enrollment-style capture. The operating regime for most security deployments.

False acceptance rate (FAR)

The rate at which a biometric system incorrectly matches an unknown subject to an enrolled identity. Reported alongside FRR.

False alarm filtering

AI-based suppression of nuisance alerts (animals, weather, shadows, reflections, sensor noise) before they reach operators. Often quoted as a percentage reduction against a baseline.

False negative

A real event the system missed. The most operationally dangerous error class in threat detection.

False positive

An alert raised for an event that did not actually occur, or for the wrong class. Drives alarm fatigue.

False rejection rate (FRR)

The rate at which a biometric system fails to match a subject who is in fact enrolled.

FedRAMP

The U.S. Federal Risk and Authorization Management Program. A standardized authorization process for cloud services used by federal agencies.

FIPS 140-2 / FIPS 140-3

U.S. federal standards governing cryptographic modules. Often required for sales into government and regulated infrastructure.

Forensics Pro

Scylla's post-incident video search module that uses AI to find people, objects, and events across recorded footage.

FPS (frames per second)

The rate at which a camera captures or transmits video. Higher FPS provides better motion fidelity but increases bandwidth and processing load.

G

Gallery

In face recognition, the set of enrolled identities against which probe faces are searched.

Geofencing

Definition of a virtual boundary in physical space such that crossings, entries, or exits trigger detection logic.

Gun detection

AI video analytics that identifies visible firearms in camera streams in real time, enabling faster lockdown and law-enforcement notification than human observation alone. Operates on visible weapons only; does not see concealed firearms.

H

H.264 / H.265 (HEVC)

Widely used video compression codecs. H.265 produces roughly half the bitrate of H.264 at similar quality but is more compute-intensive to decode.

HDR / WDR

High dynamic range / wide dynamic range. Imaging techniques that preserve detail in scenes with both bright and dark areas — important for entrance cameras facing daylight.

HIPAA

The U.S. Health Insurance Portability and Accountability Act. Governs protected health information; relevant when video analytics in healthcare settings could incidentally capture patient data.

I

IAHSS (International Association for Healthcare Security and Safety)

The professional body for healthcare security. Publishes industry guidelines on workplace violence prevention, infant protection, and patient elopement.

Inference

Running a trained model on new data to produce predictions. Distinct from training, which is the process of creating the model.

Intersection over Union (IoU)

A geometric measure of overlap between a predicted bounding box and a ground-truth box. Used to score detection accuracy.

IP camera

A network-connected camera that streams video over standard IP protocols rather than analog cabling. The default modern surveillance endpoint.

ISO 27001

An international standard for information security management systems. A common enterprise procurement requirement.

K

Knife detection

AI video analytics specialized for edged-weapon recognition. Generally more challenging than firearm detection due to size, reflectivity, and frequent partial visibility.

L

Latency

The time delay introduced by a stage of processing. End-to-end detection latency is the sum of camera, network, inference, and alerting latencies.

Line crossing

A detection rule that fires when a tracked object crosses a defined virtual line. Useful for entrance counting and perimeter triggers.

Liveness detection

A technique that confirms a biometric sample is from a real, present person rather than a photo, video, or mask.

Loitering detection

Detection of a person or vehicle dwelling in a defined area beyond a configured threshold.

LPR / ANPR (License Plate Recognition / Automatic Number Plate Recognition)

AI-based reading of vehicle license plates from camera feeds.

M

Machine learning (ML)

A class of methods in which systems learn patterns from data rather than being explicitly programmed. The parent discipline of deep learning.

MTTD (mean time to detect)

Average elapsed time from incident onset to its detection by the security system. A core operational KPI.

MTTR (mean time to respond)

Average elapsed time from detection to response action.

N

Natural Language Video Search

A capability that enables security investigators to search surveillance footage by typing plain-language descriptions of what they are looking for — such as "person in a red jacket near the loading dock after 10 PM" — rather than manually scrubbing through footage camera by camera. Powered by Vision-Language Models (VLMs), natural language video search works by continuously indexing video content into a searchable semantic representation, then matching investigator queries against that index to return ranked, timestamped results in real time. The technology supports both text-based descriptive queries and image recognition search — where a reference photo is uploaded to find matching individuals or vehicles across the camera network. In Scylla Forensics Pro, natural language video search is delivered through the Event Raptor VLM, enabling post-incident investigations to be completed in minutes rather than hours, with results exportable as structured PDF evidence reports.

See also: VLM, Event Raptor, Forensics Pro

NDAA Section 889

A 2019 U.S. National Defense Authorization Act provision prohibiting federal agencies and contractors from using certain Chinese-manufactured video surveillance equipment. A baseline procurement filter for federal, defense, and critical-infrastructure work.

NERC CIP

North American Electric Reliability Corporation's Critical Infrastructure Protection standards. Mandatory for bulk-power-system operators; relevant for video and access-control systems at electric utilities.

Neural network

A model architecture composed of interconnected computational units arranged in layers. The basis of deep-learning approaches in computer vision.

NIST FRVT (Face Recognition Vendor Test)

The U.S. National Institute of Standards and Technology's ongoing benchmark of commercial face-recognition algorithms across multiple use cases (1:1 verification, 1:N identification, demographic effects). The most widely cited independent face-recognition benchmark.

Non-maximum suppression (NMS)

A post-processing step that removes overlapping bounding boxes from a detector's output, keeping only the highest-confidence detection per object.

NVR (Network Video Recorder)

A dedicated appliance that records video from IP cameras. Increasingly hosts AI analytics at the edge alongside recording.

O

Object detection

A computer-vision task that locates and classifies objects in an image or video frame, returning bounding boxes and labels. The foundational task underneath gun, knife, vehicle, and behavior analytics.

ONVIF

An industry standard for the interoperability of IP cameras, recorders, and video software. Profiles include S (basic streaming), G (recording), T (advanced video), and M (metadata and analytics).

Organized retail crime (ORC)

Coordinated, profit-motivated theft from retailers, typically across multiple stores. Distinct from opportunistic shoplifting; drives loss-prevention budgets at major chains.

P

Perimeter intrusion detection system (PIDS)

A system that identifies unauthorized crossing of a site boundary. AI-based PIDS uses video analytics — often paired with fence sensors and radar — to detect and classify intruders while filtering false alarms.

Piggybacking

An unauthorized person passing through an access-controlled door with the knowledge and cooperation of an authorized user. Distinct from tailgating, which is non-cooperative.

PoE (Power over Ethernet)

Power delivered over the same network cable used for data. Standard for IP camera installations.

Post Incident Investigation

The process of reviewing, analyzing, and documenting security camera footage and related data after a security incident has occurred — such as theft, unauthorized access, assault, or policy violation — in order to establish what happened, identify those responsible, and build an evidence record for law enforcement, legal proceedings, or insurance claims. Traditional post-incident investigation relies on manual footage review, which is slow, cognitively demanding, and structurally prone to missing critical evidence. AI-powered post-incident investigation platforms such as Scylla Forensics Pro replace manual scrubbing with natural language search, automated timeline building, and one-click evidence report export — compressing investigation timelines from hours to minutes.

PPM / PPF (pixels per meter / pixels per foot)

A measure of how many pixels a camera resolves per unit of real-world distance at a given range. The dominant predictor of analytic accuracy for identification and detection tasks; commonly minimum 100 PPM is required for reliable firearm detection.

Precision

Of the items the model flagged as positive, the fraction that are truly positive. Tells you how trustworthy an alert is.

Probe

In face recognition, the face being searched against the gallery.

PSIM (Physical Security Information Management)

Software that aggregates and correlates events from many physical-security systems — video, access control, intrusion, fire — into a single operational picture.

PTZ camera

A pan-tilt-zoom camera that can be moved and zoomed remotely or autonomously to follow detected events.

R

Recall

Of the items that are truly positive, the fraction the model flagged. Tells you how complete the detection coverage is.

Resolution

Pixel dimensions of a video frame (e.g., 1080p, 4K, 8K). Combined with focal length, determines PPM at a given range.

RTSP (Real-Time Streaming Protocol)

A standard network protocol for streaming video from IP cameras to recorders and analytics platforms.

S

ScyllaNet

The proprietary deep learning framework developed by Scylla AI that powers its computer vision and AI video analytics capabilities. ScyllaNet serves as the neural network backbone underlying Scylla's detection models — including weapon detection, behavior analytics, and thermal analysis — enabling high-accuracy, real-time inference on edge hardware and cloud infrastructure. All Scylla AI products, including Forensics Pro and Event Raptor, are built on ScyllaNet.

SLoU (Scylla Loss Function)

A proprietary loss function and bounding box regression metric developed by Scylla AI to improve the training efficiency and detection accuracy of its object detection models. SIoU extends the standard Intersection over Union (IoU) metric — which measures the overlap between a predicted bounding box and the ground truth — by incorporating directional and angular alignment components that accelerate model convergence and improve localization precision. SIoU was introduced by Scylla AI researchers and has since been adopted and referenced in the broader computer vision research community.

SOC (Security Operations Center)

A facility where operators monitor video, alarms, and incidents in real time. Includes both in-house corporate SOCs and third-party central monitoring stations.

STOP School Violence Act

A U.S. federal grant program administered by the Department of Justice that funds school safety technology and training, including weapon detection. Often paired with state-level safety grants.

Suspicious shopping behavior detection

AI video analytics that flags actions associated with shoplifting and organized retail crime — concealment motions, booster-bag patterns, unusual aisle dwell — for loss-prevention review.

T

Tailgating

An unauthorized person passing through an access-controlled door immediately behind an authorized user, without that user's cooperation.

Thermal imaging

Sensing via long-wave infrared rather than visible light. Used for perimeter detection in darkness, smoke, and adverse weather; cannot be used for identification.

Training data

The labeled images or video used to teach a detection model. Quality, diversity, and representativeness of training data are the single largest determinants of real-world model performance.

True positive / true negative

Correctly detected real events / correctly ignored non-events. Paired with false positives and false negatives to form the confusion matrix.

U

Uptime Institute Tier classification

A four-tier rating system (I through IV) for data center reliability. Tier III and IV facilities have stringent physical-security requirements that influence video, access-control, and analytic spend.

V

Vehicle identification and tracking

AI analytics that detect, classify (car, truck, van), read plates, and follow vehicles across multiple cameras.

Video analytics

Any automated extraction of meaningful information from video — object detection, behavior recognition, counting, anomaly detection. Modern video analytics is predominantly AI-based.

Vigilance Decrement

The well-documented decline in an operator's ability to detect and respond to target events during sustained, monotonous monitoring tasks over time. In security operations, vigilance decrement is a critical performance limitation: a widely cited 2002 industry study established that CCTV operators miss approximately 45% of on-screen activity after just 12 minutes of continuous viewing, rising to approximately 95% after 22 minutes — even under controlled conditions with a small number of cameras. Vigilance decrement is distinct from inattentional blindness in that it is caused by sustained cognitive fatigue rather than divided attention, but both phenomena compound each other in real-world security monitoring environments. AI video analytics directly counteracts vigilance decrement by continuously monitoring all camera feeds simultaneously, without degradation in detection accuracy over time.

VLM (Vision Language Model)

A class of artificial intelligence model trained simultaneously on visual data and natural language, enabling it to understand and connect what it sees in images or video with how humans describe it in words. Unlike earlier computer vision systems that could only detect predefined object categories, a VLM interprets context, spatial relationships, attributes, and actions — and can match those interpretations against open-ended natural language descriptions. In physical security and video surveillance applications, VLMs power natural language video search, attribute-based forensic investigation, and AI-generated evidence summaries. Scylla AI's proprietary VLM — Event Raptor — is the engine behind Forensics Pro's natural language search, image recognition search, and contextual analysis capabilities, built on the ScyllaNet framework.

VMS (Video Management System)

Software that aggregates, records, displays, and manages video from IP cameras. AI analytics typically integrate with — or are embedded into — a VMS. Common platforms include Genetec, Milestone, Avigilon, and Hanwha.

VSaaS (Video Surveillance as a Service)

Cloud-hosted video surveillance and recording, generally subscription-priced.

W

Watchlist

A curated list of identities or vehicles of interest. A face- or plate-recognition system raises alerts when matches occur. Watchlist design, governance, and legal basis are first-order compliance considerations.

Weapon detection

The broader category covering AI detection of firearms, knives, and other identifiable weapons in video streams.

Workplace violence prevention (WVP)

A program area combining policy, training, environmental design, and technology to reduce violence against employees. Healthcare and retail are the highest-risk verticals; OSHA, the Joint Commission, and IAHSS publish governing guidance in the U.S.

X

XactID

Scylla's face-recognition module designed for non-cooperative ("in-the-wild") identification across CCTV, body cameras, and drones in operational security environments.

Y

YOLO (You Only Look Once)

A family of real-time object detection algorithms widely used in AI video analytics and computer vision. YOLO processes entire image frames in a single neural network pass — hence "you only look once" — enabling fast, accurate detection of objects, people, and vehicles in live video streams. YOLO-based architectures are a foundational component of modern physical security AI systems, including those used for weapon detection, perimeter intrusion, and crowd analysis. Scylla AI leverages YOLO-based detection models within its ScyllaNet framework to achieve real-time threat identification across camera networks.