swarmsort package

Submodules

Module contents

SwarmSort Standalone - Multi-Object Tracker with GPU-Accelerated Embeddings

SwarmSort is a high-performance, standalone multi-object tracking library that combines advanced computer vision techniques with deep learning embeddings for robust real-time object tracking applications.

Key Features:

Real-time multi-object tracking with motion prediction via Kalman filtering
GPU-accelerated embedding extraction using CuPy (optional)
Advanced distance scaling with 11 different normalization methods
Hungarian algorithm for optimal detection-to-track assignment
Re-identification (ReID) capabilities for recovering lost tracks
Probabilistic and non-probabilistic cost computation methods
Comprehensive configuration system with sensible defaults
Extensive test suite with 200+ tests for reliability

Class Aliases:

SwarmSort: Convenience alias for AdaptiveSwarmSortTracker.: Use this when you want automatic input/output format detection. Recommended for integration with other packages. Auto-detects: swarmtracker pipeline, input format, output format.
SwarmSortTracker: Core tracker class for standalone use.: Use this when you control the input format directly. Always uses Detection objects and TrackedObject outputs.

Usage Guide:

# For most users - use SwarmSortTracker directly: from swarmsort import SwarmSortTracker, Detection tracker = SwarmSortTracker(config) detections = [Detection(position=…, confidence=…)] tracked_objects = tracker.update(detections)

# For integration with other packages (auto-adapts formats): from swarmsort import SwarmSort tracker = SwarmSort(config) # Accepts various input formats, adapts output to caller’s expectations

Installation:

# Clone from repository git clone https://github.com/cfosseprez/swarmsort.git cd swarmsort poetry install

# Development installation with testing tools poetry install –with dev

Basic Usage:

import numpy as np from swarmsort import SwarmSortTracker, Detection

# Initialize tracker with default settings tracker = SwarmSortTracker()

# Create detections for current frame detections = [

Detection(
position=np.array([100.0, 150.0], dtype=np.float32), confidence=0.9

), Detection(

position=np.array([300.0, 200.0], dtype=np.float32), confidence=0.8

)

]

# Update tracker and get current tracked objects tracked_objects = tracker.update(detections)

# Process results for obj in tracked_objects:

print(f”Track ID: {obj.id}, Position: {obj.position}”) print(f” Velocity: {obj.velocity}, Age: {obj.age}”)

Advanced Configuration:

from swarmsort import SwarmSortTracker, SwarmSortConfig

# Configure tracker for embedding-based tracking config = SwarmSortConfig(

do_embeddings=True, embedding_weight=0.4, reid_enabled=True, max_distance=100.0

) tracker = SwarmSortTracker(config)

GPU Acceleration:

from swarmsort import is_gpu_available, SwarmSortTracker

if is_gpu_available():: print(“GPU acceleration available for embeddings”) # GPU will be used automatically for embedding operations tracker = SwarmSortTracker()
else:: print(“Using CPU mode”) tracker = SwarmSortTracker()

class swarmsort.AdaptiveSwarmSortTracker(config=None)[source]

Bases: object

Adaptive wrapper for SwarmSortTracker that automatically handles both standalone and swarmtracker integration modes.

property config: Access tracker configuration.

get_statistics()[source]: Get tracker statistics.

reset()[source]: Reset tracker state.

update(detections)[source]

Update tracker with detections in any supported format.

Parameters:: detections – List of detection objects (any supported format)
Returns:: List of tracked objects in the appropriate format

class swarmsort.CupyShapeEmbedding(use_gpu=True, hog_cells=3, hog_bins=9)[source]

Bases: EmbeddingExtractor

Shape-focused embedding optimized for microscopy organism tracking.

Key design principles (from embedding_recommendation.md): - NO RESIZE: Processes original patch resolution to preserve fine details - SHAPE-FOCUSED: Prioritizes contour/boundary features over texture - ROTATION-INVARIANT: Handles organisms at any angle

Feature composition (96 dimensions): - HOG features (36 dims): Histogram of Oriented Gradients for local shape - Fourier descriptors (16 dims): Contour shape in frequency domain - Radial signature (32 dims): Distance from centroid to boundary - Hu moments (7 dims): Classic rotation/scale invariant moments - Shape stats (5 dims): Area, perimeter, circularity, aspect ratio, solidity

This embedding is specifically designed for grayscale microscopy images where organisms have similar internal textures but distinct boundary shapes.

property embedding_dim: int: Return embedding dimensionality.

extract(frame, bbox)[source]

Extract shape-focused embedding for a single patch.

Return type:: ndarray

extract_batch(frame, bboxes)[source]

Extract batch of shape embeddings.

Return type:: List[ndarray]

class swarmsort.CupyTextureColorEmbedding(patch_size=32, use_gpu=True)[source]

Bases: EmbeddingExtractor

CuPy-accelerated texture embedding with rich color features.

Combines the robust texture analysis of CupyTextureEmbedding with comprehensive color-based features for improved tracking and re-identification performance.

Feature composition (84 total features): - Grayscale texture features (32) - from base CupyTextureEmbedding - RGB color histograms (24) - 8 bins per channel - HSV color histograms (24) - 8 bins per channel - Color moments (4) - mean and std of RGB channels

This embedding is particularly effective for scenarios where both texture and color information are important for distinguishing between objects.

property embedding_dim: int: Return embedding dimensionality.

extract(frame, bbox)[source]

Extract features for a single patch.

Return type:: ndarray

extract_batch(frame, bboxes)[source]

Extract batch of embeddings with GPU/CPU selection.

Return type:: List[ndarray]

extract_batch_cpu(frame, bboxes)[source]

CPU batch extraction.

Return type:: List[ndarray]

extract_batch_gpu(frame, bboxes)[source]

GPU batch extraction.

Return type:: List[ndarray]

class swarmsort.CupyTextureEmbedding(patch_size=32, use_gpu=True)[source]

Bases: EmbeddingExtractor

Fast GPU-accelerated embedding for microorganism tracking. Features: 36 dimensions - Basic statistics (4): mean, std, min, max - Gradient features (4): gradient magnitudes and variations - Shape features (8): center of mass, moments, eccentricity, orientation - Radial features (8): intensity at different radii from center - Color features (6): HSV statistics - Texture features (6): Local Binary Pattern approximation

property embedding_dim: int: Return embedding dimensionality.

extract(frame, bbox)[source]

Extract features for a single patch.

Return type:: ndarray

extract_batch(frame, bboxes)[source]

Extract batch of embeddings with GPU/CPU selection.

Return type:: List[ndarray]

extract_batch_cpu(frame, bboxes)[source]

CPU batch extraction.

Return type:: List[ndarray]

extract_batch_gpu(frame, bboxes)[source]

GPU batch extraction using CuPy.

Return type:: List[ndarray]

class swarmsort.Detection(position, confidence, bbox=None, embedding=None, class_id=None, id=None)[source]

Bases: object

Input detection representing a single object observation.

A Detection represents a single object detection from a computer vision model, containing its position, confidence score, and optional features like bounding box and embedding vector for appearance-based matching.

position

2D position [x, y] in world/image coordinates

Type:: np.ndarray

confidence

Detection confidence score, typically in range [0, 1]

Type:: float

bbox

Bounding box as [x1, y1, x2, y2]

Type:: np.ndarray, optional

embedding

Feature vector for appearance matching

Type:: np.ndarray, optional

class_id

Object class identifier

Type:: int, optional

id

Unique detection identifier

Type:: str, optional

Example

>>> import numpy as np
>>> detection = Detection(
...     position=np.array([100.0, 150.0], dtype=np.float32),
...     confidence=0.95,
...     bbox=np.array([90, 140, 110, 160], dtype=np.float32)
... )

bbox: ndarray | None = None

class_id: int | None = None

confidence: float

embedding: ndarray | None = None

id: str | None = None

position: ndarray

class swarmsort.EmbeddingDistanceScaler(method='robust_minmax', update_rate=0.05, min_samples=200, update_interval=3)[source]

Bases: object

Enhanced embedding scaler with multiple scaling methods for comparison

get_statistics()[source]

Get current scaler statistics

Return type:: dict

reset()[source]

Full reset of all statistics.

Use this when: - Starting tracking on a new video/scene - The embedding distribution has changed significantly - Scene changes detected (camera switch, dramatic lighting change)

After reset, the scaler will need min_samples frames to become ready again.

restore_update_rate(rate=None)[source]

Restore the update rate after a soft reset.

Parameters:: rate (float) – Rate to restore to. If None, uses 0.05 (default).

scale_distances(distances)[source]

Scale distances using the selected method

Return type:: ndarray

soft_reset(faster_update_rate=0.2)[source]

Soft reset - keep statistics but increase learning rate temporarily.

Use this when: - Embedding distribution may be shifting gradually - You want to adapt faster without losing all history

Parameters:: faster_update_rate (float) – Temporary update rate (default 0.2, 4x faster than default 0.05)

update_statistics(distances)[source]: Update running statistics with new distance samples - OPTIMIZED VERSION

warmup(n_samples=None)[source]

Pre-populate scaler with synthetic data to avoid mode transition spike.

When the scaler transitions from simple scaling (sample_count < min_samples) to percentile-based scaling, there can be a performance spike as statistics are computed for the first time with real data sizes.

This method pre-populates the scaler with synthetic data that approximates typical embedding distance distributions, avoiding the cold-start spike.

Parameters:: n_samples (int) – Number of synthetic samples to generate. Defaults to min_samples. Should be >= min_samples to enable percentile-based scaling immediately.

swarmsort.FastMultiHypothesisTracker: alias of RawTrackerSwarmSORT

class swarmsort.MegaCupyTextureEmbedding(patch_size=32, use_gpu=True)[source]

Bases: EmbeddingExtractor

Mega Microbe Embedding: Combines the best of shape, color, and texture analysis. - Dims 0-7: Basic stats and gradients - Dims 8-17: Advanced shape features (moments, eccentricity, Hu) - Dims 18-23: HSV color statistics (including circular stats for hue) - Dims 24-33: Rotation-invariant Local Binary Patterns (LBP) - Dims 34-49: Robust radial & polar features (intensity, variation, DFT) - Dims 50-55: Multi-scale wavelet and entropy texture features - Dims 56-63: Reserved / Other advanced features

property embedding_dim: int: Return embedding dimensionality.

extract(frame, bbox)[source]

Extract single embedding.

Return type:: ndarray

extract_batch(frame, bboxes)[source]

Extract batch of embeddings with GPU/CPU selection.

Return type:: List[ndarray]

extract_batch_cpu(frame, bboxes)[source]

Return type:: List[ndarray]

extract_batch_gpu(frame, bboxes)[source]

Return type:: List[ndarray]

class swarmsort.RawTrackerSwarmSORT(tracker_config=None, runtime_config=None, yaml_config_location=None, **kwargs)[source]

Bases: object

Raw SwarmSORT tracker - tracking only, compatible with SwarmTracker pipeline

__init__(tracker_config=None, runtime_config=None, yaml_config_location=None, **kwargs)[source]: Initialize with SwarmTracker pipeline compatibility

get_statistics()[source]

Get tracker statistics - compatibility method

Return type:: Dict

reset()[source]: Reset tracker state - compatibility method

track(detections, frame, verbose=True, **kwargs)[source]

Track objects using SwarmSORT - SwarmTracker pipeline compatible

Parameters:

detections – Detection results (can be Detection objects, lists, etc.)
frame (ndarray) – Input frame (for compatibility - not used in tracking)
verbose (bool) – Enable verbose output
**kwargs – Additional arguments for compatibility

Returns:

Standardized tracking result

Return type:

TrackingResult

swarmsort.StandaloneSwarmSort: alias of _StandaloneSwarmSortWrapper

swarmsort.SwarmSort: alias of AdaptiveSwarmSortTracker

class swarmsort.SwarmSortConfig(max_distance=80.0, detection_conf_threshold=0, max_track_age=30, kalman_type='simple', uncertainty_weight=0.0, uncertainty_window=10, local_density_radius=-1.0, collision_freeze_embeddings=True, embedding_freeze_density=1, deduplication_distance=0.0, collision_safety_distance=-1.0, do_embeddings=True, embedding_weight=1.0, embedding_threshold_adjustment=1.0, max_embeddings_per_track=7, embedding_function='cupytexture', embedding_matching_method='median', store_embedding_scores=False, embedding_score_history_length=5, sparse_computation_threshold=300, use_probabilistic_costs=False, assignment_strategy='hungarian', greedy_threshold=-1.0, greedy_confidence_boost=1.0, hungarian_fallback_threshold=1.0, reid_enabled=True, reid_max_distance=-1.0, reid_embedding_threshold=0.5, reid_min_frames_lost=3, reid_embedding_weight_boost=1.5, init_conf_threshold=0.0, min_consecutive_detections=10, max_detection_gap=1, pending_detection_distance=-1.0, embedding_scaling_method='min_robustmax', embedding_scaling_update_rate=0.05, embedding_scaling_min_samples=200, default_embedding_dimension=128, debug_embeddings=False, plot_embeddings=False, debug_timings=False, kalman_velocity_damping=0.95, kalman_update_alpha=0.7, kalman_velocity_weight=0.2, prediction_miss_threshold=3, mahalanobis_normalization=5.0, probabilistic_gating_multiplier=1.5, time_covariance_inflation=0.2, base_position_variance=25.0, velocity_variance_scale=0.5, velocity_isotropic_threshold=2.0, singular_covariance_threshold=1e-06)[source]

Bases: BaseConfig

Configuration for SwarmSort multi-object tracker.

SwarmSort is a real-time tracking algorithm that follows multiple objects across video frames. It combines motion prediction (where objects will be) with appearance matching (what objects look like) to maintain consistent identity assignments even when objects temporarily disappear or cross paths.

This configuration controls all aspects of the tracking behavior. Default values are tuned for general tracking scenarios but should be adjusted based on your specific use case.

Quick Start Guide:

For tracking fast-moving objects: increase max_distance For crowded scenes: decrease max_distance, increase min_consecutive_detections For high-quality detections: increase detection_conf_threshold For appearance-based tracking: ensure do_embeddings=True, adjust embedding_weight

assignment_strategy: Literal['hungarian', 'greedy', 'hybrid'] = 'hungarian'

Algorithm for matching detections to tracks.

“hungarian”: Globally optimal assignment (best accuracy, O(n³) complexity) Use when: <50 objects, accuracy is critical
“greedy”: Fast local assignment (good accuracy, O(n²) complexity) Use when: >100 objects, speed is critical
“hybrid”: Greedy for obvious matches, Hungarian for ambiguous (recommended) Best balance of speed and accuracy for most scenarios

base_position_variance: float = 25.0

Base position variance for covariance estimation in probabilistic mode.

This is the minimum uncertainty in position regardless of velocity. Higher values make the tracker more tolerant of position errors.

5-10 = Tight uncertainty (good for high-quality detections)
15-25 = Moderate uncertainty (default, good for typical tracking)
25-50 = Loose uncertainty (for noisy detections or fast motion)

collision_freeze_embeddings: bool = True

Freeze appearance updates when objects are too close together.

When enabled, prevents appearance features from being corrupted during collisions or severe occlusions. The tracker “remembers” what each object looked like before they got too close.

True = Safer tracking in crowds (recommended)
False = Always update appearance (may cause ID switches in crowds)

collision_safety_distance: float = -1.0

Distance at which to consider objects in collision for embedding freezing.

When objects are closer than this, their embeddings stop updating to prevent appearance confusion during occlusion. This is typically larger than deduplication_distance but smaller than max_distance.

Set to -1.0 for auto-compute (max_distance * 0.4).

20-30 pixels = Early freeze for safety
30-50 pixels = Standard collision distance
50+ pixels = Late freeze, allows more updates

debug_embeddings: bool = False

Print detailed embedding information for debugging.

Outputs embedding statistics, distances, and scaling information. Useful for tuning embedding parameters but very verbose.

debug_timings: bool = False

Print detailed timing information for performance analysis.

Shows time spent in each component of the tracking pipeline. Use to identify performance bottlenecks.

deduplication_distance: float = 0.0

Minimum distance between detections to be considered separate objects.

Detections closer than this are merged to prevent duplicate tracks. Should be set based on your object size and detector characteristics.

5-10 pixels = Tight deduplication for small objects
10-20 pixels = Standard deduplication (default)
20-50 pixels = Loose deduplication for large objects

default_embedding_dimension: int = 128

Default embedding dimension when embeddings are not yet available.

Used for array pre-allocation and Numba JIT compilation warmup. Should match your embedding extractor’s output dimension.

128 = Standard (most re-identification networks)
256 = High-dimensional embeddings
512+ = Very high-dimensional (more memory overhead)

detection_conf_threshold: float = 0

Minimum confidence score to accept a detection (0.0 to 1.0).

Filters out low-confidence detections from your detector (YOLO, etc.) before tracking. This is a GLOBAL filter applied to ALL detections.

0.0 = Accept all detections (default - let tracker handle noise)
0.3-0.5 = Moderate filtering (good for decent detectors)
0.7-0.9 = Aggressive filtering (only very confident detections)

Note: Set based on your detector’s confidence distribution. Check histogram of confidence scores to pick appropriate threshold.

do_embeddings: bool = True

Enable appearance-based matching using visual features.

When True: Uses both motion AND appearance to match objects When False: Uses only motion/position (faster but less accurate)

Embeddings help when: - Objects look different (clothing, color, size) - Objects cross paths or occlude each other - Multiple similar objects need to be distinguished

Disable only if all objects look identical or for maximum speed.

embedding_freeze_density: int = 1

Number of nearby tracks to trigger embedding freeze.

When this many (or more) tracks are within local_density_radius, stop updating appearance features to prevent corruption.

1 = Freeze when ANY other track is nearby (most conservative)
2-3 = Freeze only in moderately crowded areas
5+ = Freeze only in very dense crowds

embedding_function: str = 'cupytexture'

Algorithm for extracting appearance features.

Built-in options (require CuPy for GPU): - “cupytexture”: GPU-accelerated texture features (fast, good quality) - “cupytexture_color”: Texture + color histogram features - “cupytexture_mega”: Advanced features (slower, best quality) - “cupyshape”: Shape-based features

External embeddings: - “external” or None: Use embeddings provided in Detection.embedding field.

This allows using custom models (MobileNet, ResNet, etc.). You must attach embeddings to each Detection before calling tracker.update().

Example with external embeddings:

detection = Detection(: position=np.array([x, y]), bbox=np.array([x1, y1, x2, y2]), embedding=my_model.extract_features(crop) # Your custom embedding

)

embedding_matching_method: Literal['last', 'average', 'weighted_average', 'best_match', 'median'] = 'median'

How to match current appearance against track history.

“last”: Use only the most recent embedding (fastest, 2-3x speedup) Best for real-time applications where appearance is consistent
“average”: Compare against mean of all stored appearances Simple, stable, but slow to adapt to changes
“weighted_average”: Recent appearances count more Good balance of stability and adaptability
“best_match”: Find best matching historical appearance Handles appearance changes well but more computationally expensive
“median”: Median distance to all stored appearances (default) More robust to outliers than average, good for noisy embeddings

embedding_scaling_method: str = 'min_robustmax'

Method for normalizing embedding distances to [0,1] range.

Raw embedding distances have arbitrary scale. These methods normalize them to match the scale of spatial distances:

“min_robustmax”: Asymmetric scaling with true min and robust max
“robust_minmax”: Symmetric robust scaling using percentiles
Others: Various statistical methods (see embedding_scaler.py)

Most users should keep the default.

embedding_scaling_min_samples: int = 200

Minimum samples before embedding scaling is activated.

The scaler needs to see enough embedding distances to compute reliable statistics. Before this, a simple fallback scaling is used.

100-200 = Quick activation (may be less accurate initially)
500-1000 = Careful activation (more accurate but takes longer)

embedding_scaling_update_rate: float = 0.05

Learning rate for updating scaling statistics (0.0 to 1.0).

Controls how quickly the scaler adapts to changing embedding distributions.

0.01 = Very slow adaptation (stable but slow to adjust)
0.05 = Moderate adaptation (default)
0.1-0.2 = Fast adaptation (responsive but may be unstable)

embedding_score_history_length: int = 5

Number of recent embedding match scores to keep per track.

Only used when store_embedding_scores=True. Higher values give more stable averages but use slightly more memory.

embedding_threshold_adjustment: float = 1.0

Threshold adjustment factor for embedding contribution to assignment gating.

With additive cost formula, total cost can exceed max_distance. This parameter adjusts the effective assignment threshold to account for embedding contribution:

effective_max_distance = max_distance × (1 + embedding_weight × embedding_threshold_adjustment)

Example with max_distance=80, embedding_weight=1.0, embedding_threshold_adjustment=1.0: - effective_max_distance = 80 × (1 + 1.0 × 1.0) = 160 - Allows costs up to 160 (80 position + 80 max embedding contribution)

0.0 = No threshold adjustment (may reject valid matches with high embedding cost)
1.0 = Full adjustment for embedding contribution (default, recommended)
0.5 = Partial adjustment (more strict on appearance)

embedding_weight: float = 1.0

Relative importance of appearance penalty (0.0 to ~2.0).

Controls how much embedding distance adds to the position-based cost: Cost = position_distance + embedding_weight × embedding_distance × max_distance

0.0 = Position only (appearance ignored even if do_embeddings=True)
0.5 = Embedding adds up to 50% of max_distance as penalty
1.0 = Embedding adds up to 100% of max_distance as penalty (default)
2.0 = Embedding adds up to 200% of max_distance as penalty

Increase for distinct-looking objects, decrease for similar-looking objects.

greedy_confidence_boost: float = 1.0: Confidence multiplier for greedy matches (not currently used).

greedy_threshold: float = -1.0

Distance threshold for confident matches in hybrid/greedy mode.

Matches closer than this distance are assigned immediately without considering other possibilities. Should be much smaller than max_distance.

max_distance/6 = Very conservative (only super obvious matches)
max_distance/4 = Balanced (default)
max_distance/2 = Aggressive (may cause errors in crowds)

hungarian_fallback_threshold: float = 1.0

Multiplier for max_distance in Hungarian phase of hybrid assignment.

After greedy assignment, Hungarian considers matches up to max_distance * hungarian_fallback_threshold.

1.0 = Same threshold (consistent behavior)
1.5 = More permissive in Hungarian (catches difficult matches)
0.8 = More restrictive (fewer but more confident matches)

init_conf_threshold: float = 0.0

Minimum confidence to start tracking an object (0.0 to 1.0).

This is a SECOND filter specifically for track creation. Detections must pass BOTH detection_conf_threshold (to be processed) AND init_conf_threshold (to create new tracks).

0.0 = Create tracks from any detection (after min_consecutive_detections)
0.3-0.5 = Only track reasonably confident detections
0.7+ = Only track very confident detections

Use higher values to reduce false positive tracks.

kalman_type: Literal['simple', 'oc'] = 'simple'

Motion prediction algorithm type.

“simple”: Classic Kalman filter with constant velocity model Pros: Smooth, predictable motion, good for linear movement Cons: Overshoots on sudden stops/turns
“oc”: OC-SORT style observation-centric (no prediction during occlusion) Pros: Better for erratic motion, no drift during occlusion Cons: Less smooth, may lose fast-moving occluded objects

Use “simple” for vehicles/pedestrians, “oc” for animals/sports/erratic motion.

kalman_update_alpha: float = 0.7

Measurement weight in Kalman update (0.0 to 1.0).

Controls the blend between prediction and measurement: - new_pos = alpha * measurement + (1 - alpha) * prediction

1.0 = Trust measurement completely (no filtering)
0.7 = Balanced (default, good for noisy detections)
0.5 = Heavy filtering (smoother but slower response)

kalman_velocity_damping: float = 0.95

Velocity damping factor applied during Kalman prediction (0.0 to 1.0).

Controls how quickly velocity decays when no measurement is available. - 1.0 = No damping (velocity persists indefinitely) - 0.95 = Slight damping (default, handles noise) - 0.8 = Strong damping (velocity decays quickly)

Lower values help prevent overshoot during occlusions.

kalman_velocity_weight: float = 0.2

Weight for velocity consistency term in OC-SORT cost (0.0 to 1.0).

Controls how much velocity consistency affects the assignment cost. Higher values prefer matches that maintain velocity direction.

local_density_radius: float = -1.0

Radius in pixels to check for nearby tracks (for density computation).

Used to detect crowded areas where ID switches are more likely. When many tracks are within this radius, the tracker becomes more conservative.

Typically set to max_distance/2 or max_distance/3.

mahalanobis_normalization: float = 5.0

Normalization factor for Mahalanobis distance (used in probabilistic mode).

Scales Mahalanobis distance to be comparable with max_distance. LOWER values make the probabilistic gating more permissive.

The normalized distance = mahal_dist * mahalanobis_normalization. - 3-5 = Permissive (good for new tracks with low velocity estimates) - 5-10 = Moderate (default, balances precision and tolerance) - 10-20 = Strict (requires good velocity model)

max_detection_gap: int = 1

Maximum frame gap allowed during track initialization.

While building confidence for a new track, detections can be missing for up to this many frames without resetting the count.

0 = No gaps allowed (very strict)
1-2 = Allow brief gaps (default, handles detector flickering)
3-5 = Allow longer gaps (for difficult detection scenarios)

max_distance: float = 80.0

Maximum pixel distance for matching detections to tracks.

This is THE most important parameter. It defines how far an object can move between frames and still be considered the same object.

INCREASE (200-300) for: fast-moving objects, low frame rates, zoomed-out views
DECREASE (50-100) for: slow objects, high frame rates, crowded scenes, zoomed-in views

Rule of thumb: Set to the maximum pixels an object typically moves per frame.

Example: If objects move 100 pixels/frame, set to 150 to handle variations.

max_embeddings_per_track: int = 7

Maximum number of appearance samples to store per track.

Each track keeps a history of appearance features to handle appearance changes (rotation, lighting, partial occlusion).

1 = Only most recent (fast, no appearance history)
5-10 = Short history (good for stable appearance)
15-30 = Long history (handles appearance variation, uses more memory)

More samples = better appearance model but slower matching.

max_track_age: int = 30

Maximum frames a track can exist without any detection before deletion.

Controls how long to keep tracking an object after it disappears (occlusion, leaving frame, detection failure).

10-20 frames = Quick deletion (good for fast-changing scenes)
30-50 frames = Balanced (default, handles brief occlusions)
60-120 frames = Persistent (good for long occlusions, more false positives)

At 30 FPS: 30 frames = 1 second of occlusion tolerance.

min_consecutive_detections: int = 10

Number of consecutive detections required to confirm a track.

New tracks start as “tentative” and become “confirmed” after being detected in this many consecutive frames. Prevents tracking noise/artifacts.

1 = Immediate tracking (fast response, more false positives)
2-3 = Quick confirmation (balanced)
5-10 = Careful confirmation (slow response, very few false positives)

Increase in noisy environments or with unreliable detectors.

pending_detection_distance: float = -1.0

Maximum distance to associate detections during initialization.

Before a track is confirmed, detections must be within this distance to be considered the same pending object.

Usually same as max_distance, but can be smaller for stricter initialization.

plot_embeddings: bool = False

Generate embedding visualization plots (requires matplotlib).

Creates visual representations of embedding space and distances. Helpful for understanding embedding behavior but slows tracking.

prediction_miss_threshold: int = 3

Number of missed frames before using last position instead of prediction.

When a track has missed this many frames, the tracker uses its last known position rather than the predicted position for matching. This prevents prediction drift during extended occlusions.

probabilistic_gating_multiplier: float = 1.5

Multiplier for max_distance in probabilistic gating.

Euclidean pre-filter threshold = max_distance * this value. Allows Mahalanobis gating to consider matches beyond strict max_distance.

reid_embedding_threshold: float = 0.5

Maximum embedding distance for ReID (0.0 to 1.0, lower = stricter).

Lost tracks are only matched if appearance similarity is better than this.

0.1-0.2 = Very strict (only nearly identical appearance)
0.3-0.4 = Balanced (default, some appearance change allowed)
0.5-0.7 = Permissive (allows significant appearance change)

Lower values = fewer but more accurate re-identifications.

reid_embedding_weight_boost: float = 1.5

Multiplier for embedding_weight during ReID (1.0 to 2.0).

During ReID, appearance matching is more important than spatial matching because lost tracks may have moved far. This boosts embedding weight.

Final ReID embedding weight = min(embedding_weight * boost, 0.95)

1.0 = Same as normal matching (no boost)
1.5 = 50% more emphasis on appearance (default, recommended)
2.0 = Double emphasis on appearance (very appearance-focused)

reid_enabled: bool = True

Enable re-identification of lost tracks.

ReID attempts to re-connect tracks that were lost (due to occlusion, detection failure) with new detections using appearance matching.

Helps maintain consistent IDs through temporary disappearances. Disable if objects never reappear or appearance is unreliable.

reid_max_distance: float = -1.0

Maximum distance for re-identification matching.

Lost tracks can be matched to detections up to this distance away. Larger than max_distance because objects may have moved far during occlusion.

max_distance * 1.0 = Conservative (only nearby reappearances)
max_distance * 1.5 = Balanced (default)
max_distance * 2.0 = Aggressive (may cause false re-identifications)

reid_min_frames_lost: int = 3

Minimum frames a track must be lost before attempting ReID.

Prevents immediate re-identification that can cause ID swaps. Allows the tracker to wait and see if the object reappears naturally.

0 = Immediate ReID (may cause ID swaps)
1 = Wait one frame (minimal delay)
2-3 = Wait a few frames (recommended, prevents most ID swaps)
5+ = Conservative delay (very safe but may miss quick reappearances)

singular_covariance_threshold: float = 1e-06

Threshold for detecting singular (degenerate) covariance matrices.

If the covariance matrix determinant is below this value, the tracker falls back to Euclidean distance instead of Mahalanobis distance. This prevents numerical instability from matrix inversion.

1e-6 = Default, works for most cases
1e-8 = More aggressive, may have numerical issues
1e-4 = More conservative, falls back to Euclidean more often

sparse_computation_threshold: int = 300

Threshold for switching to sparse computation mode.

When both the number of detections AND tracks exceed this threshold, the tracker uses grid-based spatial indexing to compute costs only for nearby detection-track pairs. This reduces complexity from O(n*m) to O(n*k) where k is the average number of nearby tracks per detection.

Benefits: - Significant speedup for high-density scenarios (300+ objects) - Grid size = max_distance * 1.5, only pairs in 3x3 neighborhood are considered - Automatically disabled if sparse pairs > 50% of full matrix (not beneficial)

Note: Sparse mode is not compatible with use_probabilistic_costs=True.

store_embedding_scores: bool = False

Store embedding match scores for each track update.

When True: Stores the cosine similarity scores from recent matches. Useful for debugging, visualization, and confidence estimation.

This has minimal performance impact as scores are computed anyway during matching.

time_covariance_inflation: float = 0.2

Rate at which covariance inflates per missed frame (0.0 to 1.0).

Each missed frame, covariance is multiplied by (1 + this * misses). Higher values make uncertainty grow faster during occlusion.

uncertainty_weight: float = 0.0

Weight for uncertainty penalties in cost computation (0.0 to 1.0).

Adds a cost penalty based on recent miss ratio, making unreliable tracks less likely to steal matches from reliable ones.

Formula: cost = base_cost * (1 + uncertainty_weight * recent_miss_ratio)

0.0 = Disabled (no overhead, treat all tracks equally)
0.2-0.3 = Light uncertainty (small preference for reliable tracks)
0.5-0.7 = Strong uncertainty (heavily favor reliable tracks)

uncertainty_window: int = 10

Number of recent frames to consider for uncertainty calculation.

Tracks the hit/miss ratio over the last N frames.

5-10 = Short memory (quick adaptation)
10-20 = Medium memory (balanced)

use_probabilistic_costs: bool = False

Use probabilistic fusion for cost computation.

False: Simple distance-based costs (faster, usually sufficient)
True: Bayesian fusion considering uncertainties (more sophisticated)

Probabilistic costs can help in complex scenarios but add computation overhead. Most users should keep this False.

validate()[source]

Validate configuration parameters.

Checks that all parameters are within valid ranges and compatible with each other. Raises ValueError if configuration is invalid.

Return type:: None

velocity_isotropic_threshold: float = 2.0

Velocity threshold (pixels/frame) below which covariance is isotropic.

When a track’s velocity magnitude is below this threshold, the covariance is circular (same uncertainty in all directions). Above this threshold, the covariance becomes elliptical (more uncertainty in direction of motion).

0.1 = Very slow tracks only get isotropic covariance (default)
1.0 = Tracks moving < 1 pixel/frame get isotropic covariance
5.0 = Only very fast tracks get anisotropic covariance

velocity_variance_scale: float = 0.5

Scale factor for velocity contribution to position uncertainty.

In probabilistic mode, tracks moving faster have higher uncertainty in their predicted position. This controls how much velocity affects the covariance: variance_along_motion = base + scale * velocity_magnitude.

0.0 = Velocity doesn’t affect uncertainty
1.0-2.0 = Moderate velocity effect (default)
3.0+ = Strong velocity effect

class swarmsort.SwarmSortTracker(config=None, embedding_type=None, use_gpu=None, **kwargs)[source]

Bases: object

Main SwarmSort multi-object tracker implementation.

This class implements a sophisticated tracking algorithm that combines: - Kalman filtering for motion prediction - Hungarian/Greedy assignment for detection-track matching - Deep learning embeddings for appearance-based matching - Re-identification for recovering lost tracks

__init__(config=None, embedding_type=None, use_gpu=None, **kwargs)[source]

Initialize the SwarmSort tracker with configuration.

Parameters:

config (Union[SwarmSortConfig, dict, None]) – Configuration object or dictionary
embedding_type (Optional[str]) – Type of embedding extractor to use (overrides config)
use_gpu (Optional[bool]) – Whether to use GPU for embeddings (overrides config)
**kwargs – Additional keyword arguments (ignored for compatibility)

property frame_count: Get current frame count.

get_state()[source]

Get current tracker state for debugging.

Return type:: dict

get_statistics()[source]

Get tracker statistics for benchmarking and analysis.

Returns:

next_id: Next track ID to be assigned (= total tracks created)
active_tracks: Number of currently active tracks
confirmed_tracks: Number of confirmed tracks
pending_detections: Number of pending detections
frame_count: Total frames processed

Return type:

Dictionary containing

reset()[source]

Reset the tracker to initial state.

This clears all tracks, pending detections, and resets the embedding scaler statistics. Use when starting a new video or scene.

update(detections)[source]

Main update function for the tracker.

Parameters:

detections (List[Detection]) – List of Detection objects for current frame

Return type:

List[TrackedObject]

Returns:

List of TrackedObject instances representing current tracks

Raises:

TypeError – If detections is not a list
ValueError – If any detection has invalid data

class swarmsort.TrackedObject(id, position, velocity, confidence, age, hits, time_since_update, state, bbox=None, class_id=None, predicted_position=None, embedding_score=None)[source]

Bases: object

Output tracked object representing a track’s current state.

A TrackedObject represents the current state of a tracked object, including its position, motion, confidence, and tracking statistics. This is the main output type returned by SwarmSortTracker.update().

id

Unique track identifier, assigned when track is created

Type:: int

position

Current 2D position [x, y] estimate

Type:: np.ndarray

velocity

Current velocity [vx, vy] estimate from Kalman filter

Type:: np.ndarray

confidence

Most recent detection confidence associated with this track

Type:: float

age

Number of frames since track was created

Type:: int

hits

Total number of successful detection associations

Type:: int

time_since_update

Frames since last successful detection association

Type:: int

state

Track state (0: Tentative, 1: Confirmed, 2: Deleted)

Type:: int

bbox

Most recent bounding box [x1, y1, x2, y2]

Type:: np.ndarray, optional

class_id

Object class identifier

Type:: int, optional

Example

>>> # TrackedObject is typically created by the tracker
>>> tracked_objects = tracker.update(detections)
>>> for obj in tracked_objects:
...     print(f"Track {obj.id} at position {obj.position}")
...     print(f"  Velocity: {obj.velocity}")
...     print(f"  Age: {obj.age}, Hits: {obj.hits}")

age: int

bbox: ndarray | None = None

class_id: int | None = None

confidence: float

embedding_score: float | None = None

hits: int

id: int

position: ndarray

predicted_position: ndarray | None = None

state: int

time_since_update: int

velocity: ndarray

class swarmsort.TrackingResult(tracked_objects: List[Any], bounding_boxes: ndarray | None = None, result_image: ndarray | None = None)[source]

Bases: NamedTuple

Standardized return type for tracking operations - SwarmTracker compatible.

bounding_boxes: ndarray | None: Alias for field number 1

result_image: ndarray | None: Alias for field number 2

tracked_objects: List[Any]: Alias for field number 0

swarmsort.compute_embedding_distance(emb1, emb2, metric='cosine')[source]

Compute distance between two embeddings.

Parameters:

emb1 (ndarray) – First embedding vector
emb2 (ndarray) – Second embedding vector
metric (str) – Distance metric to use - “cosine” (default): Cosine distance, standard for L2-normalized embeddings - “correlation”: Pearson correlation distance, robust to offsets

Return type:

float

Returns:

Distance in [0, 1] where 0 = identical, 1 = maximally different

swarmsort.compute_embedding_distances_batch(emb, embs, metric='cosine')[source]

Compute distances from one embedding to multiple embeddings.

Parameters:

emb (ndarray) – Query embedding vector
embs (List[ndarray]) – List of embedding vectors to compare against
metric (str) – Distance metric (“cosine” or “correlation”)

Return type:

ndarray

Returns:

Array of distances

swarmsort.correlation_distance_jit(emb1, emb2)[source]

JIT-optimized correlation (Pearson) distance computation.

Correlation distance = (1 - Pearson_correlation) / 2 Returns value in [0, 1] where 0 = identical pattern, 1 = opposite pattern.

More robust to offset differences than cosine, but may discard useful mean information from well-normalized embeddings.

Return type:: float

swarmsort.cosine_distance_jit(emb1, emb2)[source]

JIT-optimized cosine distance computation.

Cosine distance = 1 - cosine_similarity Returns value in [0, 1] where 0 = identical, 1 = opposite.

This is the standard distance metric for L2-normalized embeddings.

Return type:: float

swarmsort.create_swarmsort_tracker(runtime_config=None, yaml_config_location=None)[source]

Create SwarmSORT tracker with full config handling - SwarmTracker compatible

Config priority (highest to lowest): 1. runtime_config - parameters from SwarmTracker pipeline 2. YAML config - default_config.yaml in swarmsort/data/ 3. Python defaults - SwarmSortConfig dataclass defaults

This function maintains the exact same interface as the old integration, ensuring seamless compatibility with the SwarmTracker pipeline.

swarmsort.create_tracked_object_fast(track_id, position, bbox=None, confidence=1.0, **kwargs)[source]: Create tracked object compatible with SwarmTracker pipeline

swarmsort.create_tracker(config=None, force_standalone=False)[source]

Factory function to create the appropriate tracker instance.

Parameters:

config – Configuration (SwarmSortConfig, dict, or path to YAML file)
force_standalone – If True, always create standalone tracker

Returns:

Tracker instance (adaptive or standalone)

swarmsort.get_embedding_extractor(name, **kwargs)[source]

Get an embedding extractor by name.

Return type:: EmbeddingExtractor

swarmsort.is_gpu_available()[source]

Check if GPU acceleration is available.

Return type:: bool

swarmsort.is_within_swarmtracker()[source]

Detect if SwarmSort is being used within the swarmtracker pipeline.

Returns:: True if running within swarmtracker, False if standalone
Return type:: bool

swarmsort.list_available_embeddings()[source]

List all available embeddings.

Return type:: List[str]

swarmsort.load_config(config_path=None)[source]

Load SwarmSort configuration from a yaml file.

Parameters:: config_path (Optional[str]) – Path to YAML configuration file. If None, uses defaults.
Return type:: SwarmSortConfig
Returns:: SwarmSortConfig instance

swarmsort.numpy_to_detections(boxes, confidences=None, embeddings=None, format='xyxy')[source]

Convert numpy arrays to SwarmSort Detection format.

Ultra-fast conversion for custom detection pipelines.

Parameters:

boxes (ndarray) – Array of bounding boxes. Shape (N, 4)
confidences (Optional[ndarray]) – Array of confidence scores. Shape (N,)
embeddings (Optional[ndarray]) – Optional embedding vectors. Shape (N, embedding_dim)
format (str) – Box format - ‘xyxy’, ‘xywh’, or ‘cxcywh’

Return type:

List[Detection]

Returns:

List of Detection objects

Example

>>> boxes = np.array([[100, 100, 200, 200], [300, 300, 400, 400]])
>>> confs = np.array([0.9, 0.85])
>>> detections = numpy_to_detections(boxes, confs)

swarmsort.prepare_detections(raw_detections, source_format='auto', **kwargs)[source]

Universal detection preparation function.

Automatically converts and verifies detections from various formats.

Parameters:

raw_detections (Union[List[Detection], ndarray, Any]) – Raw detection data in any supported format
source_format (str) – Format hint - ‘auto’, ‘yolo’, ‘numpy’, ‘detection’
**kwargs – Additional arguments passed to conversion functions

Return type:

List[Detection]

Returns:

List of verified Detection objects ready for tracking

Example

>>> # Auto-detect format and convert
>>> detections = prepare_detections(yolo_results)
>>> tracked = tracker.update(detections)

swarmsort.validate_config(config)[source]

Validate configuration at runtime with detailed error messages.

This is a convenience function for runtime validation that collects all errors instead of raising on the first one.

Parameters:: config (SwarmSortConfig) – SwarmSortConfig instance to validate
Return type:: Tuple[bool, List[str]]
Returns:: Tuple of (is_valid, list_of_error_messages)

Example

>>> config = SwarmSortConfig(max_distance=-10)
>>> is_valid, errors = validate_config(config)
>>> if not is_valid:
...     for error in errors:
...         print(f"Config error: {error}")

swarmsort.verify_detections(detections, image_shape=None, auto_fix=False, raise_on_error=False)[source]

Verify and optionally fix detection inputs.

Checks for common issues and can auto-fix them.

Parameters:

detections (List[Detection]) – List of Detection objects to verify
image_shape (Optional[Tuple[int, int]]) – (height, width) to check if detections are within bounds
auto_fix (bool) – If True, attempts to fix issues (clip coords, normalize, etc.)
raise_on_error (bool) – If True, raises exception on critical errors

Return type:

Tuple[List[Detection], List[str]]

Returns:

Tuple of (verified_detections, list_of_warnings)

Example

>>> detections, warnings = verify_detections(detections, image_shape=(720, 1280))
>>> if warnings:
>>>     print(f"Found {len(warnings)} issues")

swarmsort.yolo_to_detections(yolo_results, image_shape=None, confidence_threshold=0.0, class_filter=None, extract_embeddings=False)[source]

Convert YOLO v8/v11 detection results to SwarmSort Detection format.

Works with both stream=True and stream=False YOLO predictions.

Parameters:

yolo_results (Any) – YOLO results object from model.predict() or model.track() - Single Results object (one frame) - NOT a generator/list of results (use in loop for that)
image_shape (Optional[Tuple[int, int]]) – (height, width) of the image. If None, extracts from results
confidence_threshold (float) – Minimum confidence to include detection
class_filter (Optional[List[int]]) – List of class IDs to include. None means all classes
extract_embeddings (bool) – If True, attempts to extract visual features if available

Return type:

List[Detection]

Returns:

List of Detection objects ready for SwarmSort tracking

Examples

>>> from ultralytics import YOLO
>>> model = YOLO('yolov8n.pt')
>>>
>>> # Option 1: stream=False (loads all frames to memory)
>>> results = model.predict('video.mp4', stream=False)
>>> for result in results:  # Iterate over pre-loaded results
>>>     detections = yolo_to_detections(result)
>>>     tracked = tracker.update(detections)
>>>
>>> # Option 2: stream=True (memory efficient, processes one at a time)
>>> results = model.predict('video.mp4', stream=True)
>>> for result in results:  # Generator, loads one frame at a time
>>>     detections = yolo_to_detections(result)
>>>     tracked = tracker.update(detections)

swarmsort.yolo_to_detections_batch(yolo_results_list, confidence_threshold=0.0, class_filter=None)[source]

Convert a pre-loaded list of YOLO results to SwarmSort format.

NOTE: This is mainly useful for offline analysis where you want to process all detections first before tracking. For real-time tracking, use yolo_to_detections() in your frame loop instead.

Parameters:

yolo_results_list (List[Any]) – List of YOLO results (pre-loaded, not a generator)
confidence_threshold (float) – Minimum confidence threshold
class_filter (Optional[List[int]]) – Optional class filter

Return type:

List[List[Detection]]

Returns:

List of detection lists (one list per frame)

Example (offline analysis):

>>> # Load all results first (uses more memory)
>>> model = YOLO('yolov8n.pt')
>>> results = model.predict('video.mp4', stream=False)  # All frames in memory
>>> all_detections = yolo_to_detections_batch(results)
>>>
>>> # Now you can analyze detections before tracking
>>> print(f"Total detections: {sum(len(d) for d in all_detections)}")
>>>
>>> # Then track
>>> for frame_detections in all_detections:
>>>     tracked = tracker.update(frame_detections)

For real-time/streaming, just use yolo_to_detections() directly:

>>> for result in model.predict('video.mp4', stream=True):
>>>     detections = yolo_to_detections(result)
>>>     tracked = tracker.update(detections)