Filtering datasets
Use argus-cv filter to create a filtered copy of a dataset containing only specified classes.
Relative output paths are resolved under the dataset root. For example,
argus-cv filter /datasets/coco -o filtered --classes person writes to
/datasets/coco/filtered.
Basic usage
This creates a new dataset with only the person and car classes. Class IDs are automatically remapped to sequential values (0, 1, 2, ...).
Filter to a single class
Exclude background images
By default, images without annotations (after filtering) are kept. Use --no-background to exclude them:
This is useful when you want a dataset with only images that contain your target class.
Use symlinks for faster filtering
For large datasets, use --symlinks to create symbolic links instead of copying images:
This saves disk space and speeds up the filtering process significantly.
Supported formats
The filter command works with all dataset formats:
| Format | Supported | Notes |
|---|---|---|
| YOLO Detection | Yes | Labels remapped to new class IDs |
| YOLO Segmentation | Yes | Polygon annotations preserved |
| YOLO Classification | Yes | Only selected class directories copied |
| YOLO (Roboflow layout) | Yes | Reads {split}/images/ layout, writes standard layout |
| COCO | Yes | Annotations and category IDs remapped |
| Mask | Yes | Pixel values remapped to new class IDs |
Output layout
The output always uses the standard dataset layout, even when the source uses a different directory structure (e.g. Roboflow YOLO). This normalises the output so it can be used directly with training frameworks.
YOLO output:
output/
├── data.yaml
├── images/
│ ├── train/
│ ├── val/
│ └── test/
└── labels/
├── train/
├── val/
└── test/
COCO output:
output/
├── annotations/
│ ├── instances_train.json
│ ├── instances_val.json
│ └── instances_test.json
└── images/
├── train/
├── val/
└── test/
Class ID remapping
When filtering, class IDs are remapped to start from 0 and be sequential. For example:
| Original | Filtered |
|---|---|
| 0: person | (removed) |
| 1: car | 0: car |
| 2: dog | 1: dog |
| 3: cat | (removed) |
If you filter to keep only car and dog, the new dataset will have car as class 0 and dog as class 1.
Common errors
- "No classes specified": You must provide at least one class name with
--classes. - "Classes not found in dataset": Check the class names match exactly (case-sensitive). Use
argus-cv statsto see available classes. - "Output directory already exists": The output directory must be empty or non-existent.