
Scylla Tops the COCO Detection Challenge: A Milestone in Modern Object Detection

Zhora Gevorgyan
Lead Computer Vision Engineer & Co-Founder, Scylla Technologies Inc.
In a field defined by incremental gains and fierce global competition, achieving first place on a benchmark like the COCO Detection Challenge is not just a technical success but a statement about the maturity and originality of an entire research direction. Zhora Gevorgyan, Lead Computer Vision Engineer & Co-Founder of Scylla Technologies Inc., has secured 1st place in the COCO Detection Challenge (Bounding Box) hosted on CodaLab.
What is CodaLab and Why It Matters
CodaLab is one of the most important infrastructures in modern AI research. It is an open-source platform designed to host scientific competitions where researchers submit models or predictions and are evaluated on standardized benchmarks.
CodaLab enables:
● Reproducible evaluation pipelines ● Blind testing on hidden datasets ● Public leaderboards that reflect real performance
This structure makes it a trusted arbiter of progress in machine learning, particularly in domains like computer vision, NLP, and multimodal AI. Winning a CodaLab-hosted challenge is therefore not marketing but peer-validated technical leadership.
Understanding the COCO Bounding Box Challenge
The COCO (Common Objects in Context) dataset is the de facto standard for object detection. The bounding box challenge specifically evaluates how accurately an algorithm can localize objects using rectangular boxes.
Key characteristics of the challenge:
● Models must detect and classify multiple objects per image ● Performance is measured using mean Average Precision (mAP) across IoU thresholds ● The dataset includes complex, real-world scenes with occlusion, scale variation, and clutter
Bounding box detection is fundamental: it is the backbone of surveillance, autonomous systems, retail analytics, and defense applications. Excelling here means mastering localization, classification, and generalization simultaneously.
ScyllaNet: Engineering for Real-World Performance
At the core of this achievement lies ScyllaNet, Scylla’s proprietary object detection architecture. Unlike many academic models optimized purely for benchmarks, ScyllaNet is designed for production-grade video analytics, where latency, robustness, and false-positive control are critical.
ScyllaNet’s distinguishing characteristics include:
● High inference speed while maintaining strong accuracy ● Generalization across diverse environments, including unusual perspectives and crowded scenes ● Tight integration with real-time security workflows
This balance between efficiency and precision is essential. Many models perform well in lab conditions but degrade in deployment; ScyllaNet explicitly addresses this gap.

Comparison of COCO test-dev performance: the first row shows ScyllaNet, the second row shows InternImage-H, and the third row shows Co-DETR.
SIoU (Scylla-IoU): Rethinking Bounding Box Regression
A major contributor to ScyllaNet’s success is the SIoU (Scylla-IoU) loss function.
Traditional IoU-based loss functions (e.g., IoU, GIoU, CIoU) optimize overlap, distance, and aspect ratio but they ignore one critical factor: directionality.

SIoU Loss: More Powerful Learning for Bounding Box Regression
Our research is suggesting a new loss function SIoU that has proven its effectiveness not only in a number of simulations and tests but also in production.
Read moreSIoU introduces a key innovation:
It incorporates the angle between predicted and ground-truth boxes, effectively encoding how the prediction should move toward the target, not just how far.
This leads to:
● Faster convergence during training ● Reduced “wandering” of bounding boxes ● Higher localization accuracy
Empirically, SIoU has demonstrated measurable gains on COCO benchmarks, improving mAP over prior loss functions.
From a theoretical standpoint, SIoU reduces unnecessary degrees of freedom in optimization and aligns gradient updates with geometrically meaningful directions, an elegant solution to a long-standing inefficiency in object detection training.

Final Takeaway
The first-place ranking in the COCO Detection Challenge on CodaLab is a landmark achievement in computer vision. It represents the convergence of novel theory (SIoU), robust system design (ScyllaNet), and rigorous benchmarking (COCO/CodaLab).
In an era where marginal gains are hard-won, this result stands out not just as a win, but as a clear signal of where object detection is heading next. But most importantly, it highlights a shift: innovation is no longer confined to large research labs. Focused teams with strong theoretical insight and practical constraints can redefine state-of-the-art performance.
Stay up to date with all of new stories
Scylla Technologies Inc needs the contact information you provide to us to contact you about our products and services. You may unsubscribe from these communications at any time. For information on how to unsubscribe, as well as our privacy practices and commitment to protecting your privacy, please review our Privacy Policy.
Related materials

Beyond AI snake oil: Red flags in vendor claims
Not every ‘self-learning’ or anomaly detection company is telling the truth. Many hide behind buzzwords, fake metrics, and overblown claims of perfection. This article from Scylla AI cuts through the noise, revealing how to identify real-world performance and spot snake oil before it costs you.
Read more
Facial Recognition Technology: A Global Force in Physical Security
Discover how facial recognition technology enhances public safety and improves customer experience across industries, turning identity verification into a proactive security advantage.
Read more
What All Security Directors Should Understand About AI Bias and Surveillance Compliance
Explore the regulatory landscape surrounding AI surveillance to better address AI bias and ensure your video surveillance systems are both effective and compliant with global standards.
Read more