A DeepSeek Moment in Computer Vision

Zhora Gevorgyan

Lead Computer Vision Engineer & Co-Founder, Scylla Technologies Inc.

Scylla Technologies Inc. is proud to announce a major milestone: our flagship AI model, ScyllaNet, has secured 2nd place on the globally respected COCO (Common Objects in Context) test-dev leaderboard with its submission on September 24, 2025. Achieving a mean Average Precision (mAP) of 0.66, ScyllaNet now stands shoulder-to-shoulder with the world’s top contenders.

This accomplishment marks a defining moment for lightweight AI in security—echoing the disruptive impact of DeepSeek—by proving that compact, efficient models can match or exceed heavyweight systems. It reinforces Scylla’s leadership in real-time object detection and AI-powered security, paving the way for next-generation computer vision deployments across critical infrastructure worldwide.

Precision and Efficiency Redefined

Benchmarked on COCO—the gold standard for object detection—ScyllaNet delivers strong performance across all object scales:

● Large objects: 0.79 mAP, 0.92 AR ● Medium objects: 0.69 mAP ● Small objects: 0.50 mAP, 0.81 AR (max=100)

These results translate directly to real-world impact: ScyllaNet accurately detects intruders, weapons, and abnormal behaviors even in crowded and complex scenes.

Despite its power, ScyllaNet contains just ~40 million parameters, making it 90–200× faster than models like Co-DETR ViT-L (300M–3B parameters). This leap in efficiency shows how lightweight architectures can deliver both speed and accuracy, unlocking broader deployment for security operations at scale.

At the core of this advancement is Scylla’s proprietary SIoU (Scylla-IoU) loss function, which improves bounding box regression by factoring in distance, angle, and shape. This innovation accelerates convergence and boosts accuracy, outperforming models like YOLOv8 in both speed and precision.

A Turning Point for AI in Security

Earning 2nd place on the COCO leaderboard is more than a technical success—it’s proof that lightweight, efficient AI can lead in a market expected to hit $28.8 billion in AI video surveillance by 2030. ScyllaNet challenges traditional thinking around edge AI, demonstrating that real-time monitoring at scale can be achieved for military bases, smart cities, and critical infrastructure.

This milestone positions Scylla Technologies at the forefront of global security innovation.

Benchmark and Evaluation Protocol

We evaluated ScyllaNet on the COCO test-dev 2025 benchmark using the official CodaLab evaluation server. All results are for bounding-box detection and follow the COCO evaluation protocol, computing mAP across IoU thresholds from 0.50 to 0.95 and Average Recall (AR) across multiple scales and detection limits. The model was trained without external data and evaluated with standard test-time augmentation (TTA).

Overall Leaderboard Standing

Rank	Team / Model	AP@[.50:.95]	Parameters / Notes
1	CW_Detection	0.66	Top-ranked overall AP; fewer metric leads than ScyllaNet
2	Scylla Technologies Inc. (ScyllaNet)	0.6612	Leads 6 of 12 metrics; highest average rank across all metrics (~40M params)
3	zongzhuofan (SenseTime Basemodel)	0.66	~300M; Co-DETR-style transformer
4	UBTECH Thinker	0.66	Unknown architecture
5	InternImage-DCNv3 (PJLab & Tsinghua)	0.655	InternImage-H variant; best AP_small & AR_large (~1B params)

Detailed Metric Comparison

ScyllaNet ranks first in AP_medium, AR@10, AR@100, AR_medium, AR_small, and AR@1 (tied), and second in AR_large and AP_large.

Comparison of COCO test-dev performance: the first row shows ScyllaNet, the second row shows InternImage-H, and the fourth row shows Co-DETR.

Comparative Insights

Compared to CW_Detection:

● Nearly identical overall AP@[.50:.95] (difference < 0.01) ● Slightly lower AP@.50 (–0.01), reflecting a stricter precision regime ● Higher recall across most metrics (+0.01–0.02 on average)

Compared to InternImage and Co-DETR, ScyllaNet achieves similar or better accuracy with 6–8× higher parameter efficiency, thanks to adaptive receptive fields and attention-based feature aggregation—not brute-force scaling.

The figure below compares COCO test-dev performance across models. The first row corresponds to ScyllaNet, the second row to InternImage-H, and the fourth row to Co-DETR.

Conclusion

ScyllaNet’s submission ranks 2nd overall with mAP@[.50:.95] = 66.12%, achieving the highest mean rank across all 12 COCO metrics while using only ~40M parameters. These results underscore ScyllaNet’s architectural efficiency and balanced precision-recall trade-off, establishing it as a new benchmark for lightweight, high-performance object detection.