Scylla Tops the COCO Detection Challenge: A Milestone in Modern Object Detection

Zhora Gevorgyan

Lead Computer Vision Engineer & Co-Founder, Scylla Technologies Inc.

In a field defined by incremental gains and fierce global competition, achieving first place on a benchmark like the COCO Detection Challenge is not just a technical success but a statement about the maturity and originality of an entire research direction. Zhora Gevorgyan, Lead Computer Vision Engineer & Co-Founder of Scylla Technologies Inc., has secured 1st place in the COCO Detection Challenge (Bounding Box) hosted on CodaLab.

What is CodaLab and Why It Matters

CodaLab is one of the most important infrastructures in modern AI research. It is an open-source platform designed to host scientific competitions where researchers submit models or predictions and are evaluated on standardized benchmarks.

CodaLab enables:

● Reproducible evaluation pipelines ● Blind testing on hidden datasets ● Public leaderboards that reflect real performance

This structure makes it a trusted arbiter of progress in machine learning, particularly in domains like computer vision, NLP, and multimodal AI. Winning a CodaLab-hosted challenge is therefore not marketing but peer-validated technical leadership.

Understanding the COCO Bounding Box Challenge

The COCO (Common Objects in Context) dataset is the de facto standard for object detection. The bounding box challenge specifically evaluates how accurately an algorithm can localize objects using rectangular boxes.

Key characteristics of the challenge:

● Models must detect and classify multiple objects per image ● Performance is measured using mean Average Precision (mAP) across IoU thresholds ● The dataset includes complex, real-world scenes with occlusion, scale variation, and clutter

Bounding box detection is fundamental: it is the backbone of surveillance, autonomous systems, retail analytics, and defense applications. Excelling here means mastering localization, classification, and generalization simultaneously.

ScyllaNet: Engineering for Real-World Performance

At the core of this achievement lies ScyllaNet, Scylla’s proprietary object detection architecture. Unlike many academic models optimized purely for benchmarks, ScyllaNet is designed for production-grade video analytics, where latency, robustness, and false-positive control are critical.

ScyllaNet’s distinguishing characteristics include:

● High inference speed while maintaining strong accuracy ● Generalization across diverse environments, including unusual perspectives and crowded scenes ● Tight integration with real-time security workflows

This balance between efficiency and precision is essential. Many models perform well in lab conditions but degrade in deployment; ScyllaNet explicitly addresses this gap.

Comparison of COCO test-dev performance: the first row shows ScyllaNet, the second row shows InternImage-H, and the third row shows Co-DETR.

SIoU (Scylla-IoU): Rethinking Bounding Box Regression

A major contributor to ScyllaNet’s success is the SIoU (Scylla-IoU) loss function.

Traditional IoU-based loss functions (e.g., IoU, GIoU, CIoU) optimize overlap, distance, and aspect ratio but they ignore one critical factor: directionality.

SIoU introduces a key innovation:

It incorporates the angle between predicted and ground-truth boxes, effectively encoding how the prediction should move toward the target, not just how far.

This leads to:

● Faster convergence during training ● Reduced “wandering” of bounding boxes ● Higher localization accuracy

Empirically, SIoU has demonstrated measurable gains on COCO benchmarks, improving mAP over prior loss functions.

From a theoretical standpoint, SIoU reduces unnecessary degrees of freedom in optimization and aligns gradient updates with geometrically meaningful directions, an elegant solution to a long-standing inefficiency in object detection training.

Final Takeaway

The first-place ranking in the COCO Detection Challenge on CodaLab is a landmark achievement in computer vision. It represents the convergence of novel theory (SIoU), robust system design (ScyllaNet), and rigorous benchmarking (COCO/CodaLab).

In an era where marginal gains are hard-won, this result stands out not just as a win, but as a clear signal of where object detection is heading next. But most importantly, it highlights a shift: innovation is no longer confined to large research labs. Focused teams with strong theoretical insight and practical constraints can redefine state-of-the-art performance.