Key discrepancies across the Vehicle (Pv),
Drone (Pd), and
Quadruped (Pq) platforms.
We compare the core attributes (viewpoint, speed, stability), event data distributions, and semantic distributions across event data acquired by three platforms, highlighting the challenges of adapting event camera perception to diverse operational contexts.
These variations motivate the need for a robust cross-platform adaptation framework to harmonize event-based dense perception across distinct environmental setups and conditions.
Cross-platform adaptation in event-based perception is crucial for deploying event cameras across diverse settings, such as Vehicles, Drones, and Quadrupeds, each with unique motion dynamics, viewpoints, and class distributions. In this work, we introduce EventFly, a framework for robust cross-platform adaptation in event camera perception.
Our approach comprises three key components: (i) Event Activation Prior (EAP), which identifies high-activation regions in the target domain to minimize prediction entropy, fostering confident, domain-adaptive predictions; (ii) EventBlend, a data-mixing strategy that integrates source and target event voxel grids based on EAP-driven similarity and density maps, enhancing feature alignment; and (iii) EventMatch, a dual-discriminator technique that aligns features from source, target, and blended domains for better domain-invariant learning.
To holistically assess cross-platform adaptation abilities, we introduce EXPo, a large-scale benchmark with diverse samples across vehicle, drone, and quadruped platforms. Extensive experiments validate our effectiveness, demonstrating substantial gains over popular adaptation methods. We hope this work can pave the way for adaptive, high-performing event perception across diverse and complex environments.
Overview of the EventFly framework. Guided by the EAP principle, the pair of source and target event data Vv and Vd are mixed via the EventBlend operation, where the blending mask M is obtained by measuring the similarities between density maps Dv and D̃d. The features Fv, Fd, and F̃ from the source, target, and blended domains are then used for EventMatch. This facilitates learning an intermediary representation that is both robust (source-aligned) and adaptable (target-sensitive).
The structure and sequence information among the Vehicle, Drone, and Quadruped platforms in our benchmark.
Platform | Sequence Name | # Frames | Total |
---|---|---|---|
Vehicle | horse |
714 | 43,766 |
penno_small_loop |
1,102 | ||
rittenhouse |
9,752 | ||
ucity_small_loop |
16,867 | ||
city_hall |
7,453 | ||
penno_big_loop |
7,878 | ||
Drone | fast_flight_1 |
2,229 | 19,899 |
fast_flight_2 |
4,077 | ||
penno_parking_1 |
2,810 | ||
penno_parking_2 |
2,713 | ||
penno_plaza |
1,694 | ||
penno_cars |
3,073 | ||
penno_trees |
3,303 | ||
Quadruped | penno_short_loop |
2,942 | 25,563 |
skatepark_1 |
2,305 | ||
skatepark_2 |
1,652 | ||
srt_green_loop |
1,597 | ||
srt_under_bridge_1 |
5,083 | ||
srt_under_bridge_2 |
4,533 | ||
art_plaza_loop |
3,615 | ||
rocky_steps |
3,836 |
The distributions of Vehicle, Drone, and Quadruped are denoted by the • Green, • Red, and • Blue colors, respectively.
Road
Sidewalk
Building
Wall
Fence
Pole
Traffic Light
Traffic Sign
Vegetation
Terrain
Sky
Person
Rider
Car
Truck
Bus
Train
Motorcycle
Bicycle
ID | Class | Type | ![]() |
![]() |
![]() |
---|---|---|---|---|---|
0 | road |
static | ![]() |
![]() |
![]() |
1 | sidewalk |
static | ![]() |
![]() |
![]() |
2 | building |
static | ![]() |
![]() |
![]() |
3 | wall |
static | ![]() |
![]() |
![]() |
4 | fence |
static | ![]() |
![]() |
![]() |
5 | pole |
static | ![]() |
![]() |
![]() |
6 | traffic-light |
static | ![]() |
![]() |
![]() |
7 | traffic-sign |
static | ![]() |
![]() |
![]() |
8 | vegetation |
static | ![]() |
![]() |
![]() |
9 | terrain |
static | ![]() |
![]() |
![]() |
10 | sky |
static | ![]() |
![]() |
![]() |
11 | person |
dynamic | ![]() |
![]() |
![]() |
12 | rider |
dynamic | ![]() |
![]() |
![]() |
13 | car |
dynamic | ![]() |
![]() |
![]() |
14 | truck |
dynamic | ![]() |
![]() |
![]() |
15 | bus |
dynamic | ![]() |
![]() |
![]() |
16 | train |
dynamic | ![]() |
![]() |
![]() |
17 | motorcycle |
dynamic | ![]() |
![]() |
![]() |
18 | bicycle |
dynamic | ![]() |
![]() |
![]() |
Note: Target is trained with ground truth from the target domain. All scores are given in percentage (%). The best and second best scores under each metric are highlighted in colors.
Method | Acc | mAcc | mIoU | fIoU | ground | building | fence | person | pole | road | sidewalk | vegetation | car | wall | traffic-sign |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Source-Only | 43.69 | 33.81 | 15.04 | 11.81 | 48.71 | 11.57 | 0.92 | 8.42 | 13.33 | 25.48 | 8.18 | 31.51 | 14.88 | 0.04 | 2.41 |
AdaptSegNet | 49.14 | 35.38 | 21.16 | 12.15 | 29.37 | 23.57 | 0.17 | 0.48 | 13.45 | 38.23 | 17.85 | 48.73 | 29.42 | 35.55 | 0.40 |
CBST | 57.95 | 41.18 | 24.31 | 16.02 | 33.05 | 24.43 | 0.00 | 3.08 | 18.24 | 56.32 | 16.84 | 56.15 | 23.61 | 35.65 | 0.00 |
IntraDA | 57.37 | 38.85 | 23.58 | 15.91 | 32.31 | 23.17 | 0.00 | 4.90 | 14.91 | 56.70 | 18.67 | 54.94 | 20.71 | 33.08 | 0.00 |
DACS | 59.81 | 42.01 | 27.07 | 16.14 | 35.16 | 26.12 | 0.18 | 4.11 | 18.49 | 55.64 | 21.74 | 56.81 | 34.69 | 44.73 | 0.05 |
MIC | 63.11 | 45.60 | 28.87 | 17.46 | 41.40 | 25.19 | 0.01 | 10.11 | 22.86 | 59.25 | 20.84 | 58.86 | 33.95 | 44.18 | 0.90 |
PLSR | 64.61 | 45.93 | 29.69 | 17.99 | 42.09 | 30.06 | 0.00 | 9.75 | 23.32 | 62.48 | 20.65 | 60.15 | 31.69 | 44.27 | 2.06 |
EventFly | 69.17 | 48.20 | 32.67 | 20.01 | 46.64 | 30.55 | 1.27 | 10.91 | 25.50 | 67.17 | 24.21 | 61.01 | 41.30 | 44.54 | 6.21 |
Target | 79.57 | 52.25 | 42.90 | 23.30 | 74.48 | 39.40 | 7.10 | 0.33 | 31.67 | 71.96 | 31.64 | 67.87 | 57.51 | 66.14 | 23.79 |
Note: Target is trained with ground truth from the target domain. All scores are given in percentage (%). The best and second best scores under each metric are highlighted in colors.
Method | Acc | mAcc | mIoU | fIoU | ground | building | fence | person | pole | road | sidewalk | vegetation | car | wall | traffic-sign |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Source-Only | 66.59 | 39.73 | 25.15 | 16.52 | 63.01 | 39.26 | 3.88 | 17.88 | 10.12 | 51.67 | 9.27 | 68.02 | 12.35 | 0.24 | 0.99 |
AdaptSegNet | 67.25 | 48.73 | 32.79 | 14.89 | 45.00 | 45.88 | 30.00 | 34.92 | 12.22 | 55.50 | 15.85 | 73.84 | 16.07 | 31.35 | 0.00 |
CBST | 69.25 | 49.58 | 35.06 | 14.95 | 47.39 | 54.68 | 34.27 | 36.83 | 13.78 | 56.15 | 18.13 | 74.23 | 16.18 | 34.06 | 0.00 |
IntraDA | 68.29 | 48.91 | 34.25 | 14.82 | 43.75 | 55.36 | 32.64 | 33.39 | 11.60 | 55.31 | 17.00 | 76.00 | 20.30 | 31.40 | 0.00 |
DACS | 69.55 | 53.88 | 36.51 | 14.66 | 43.72 | 57.27 | 38.43 | 35.42 | 14.02 | 57.10 | 18.43 | 76.16 | 24.79 | 36.21 | 0.00 |
MIC | 70.78 | 49.22 | 36.93 | 15.60 | 51.71 | 51.73 | 33.54 | 38.10 | 9.44 | 54.27 | 20.74 | 74.40 | 29.79 | 41.78 | 0.70 |
PLSR | 70.91 | 53.65 | 37.57 | 15.25 | 49.04 | 53.28 | 37.54 | 36.64 | 12.91 | 57.60 | 25.29 | 75.92 | 24.92 | 39.85 | 0.24 |
EventFly | 73.42 | 54.14 | 40.05 | 15.78 | 50.07 | 61.33 | 39.17 | 41.97 | 12.83 | 59.14 | 23.51 | 79.80 | 27.26 | 42.65 | 2.86 |
Target | 80.02 | 60.55 | 49.84 | 19.58 | 74.80 | 56.23 | 46.08 | 55.28 | 21.79 | 59.90 | 30.31 | 77.24 | 58.38 | 62.47 | 5.81 |
@inproceedings{kong2025eventfly,
title = {EventFly: Event Camera Perception from Ground to the Sky},
author = {Lingdong Kong and Dongyue Lu and Xiang Xu and Lai Xing Ng and Wei Tsang Ooi and Benoit R. Cottereau},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2025},
}
This work is under the programme DesCartes and is supported by the National Research Foundation, Prime Minister’s Office, Singapore, under its Campus for Research Excellence and Technological Enterprise (CREATE) programme.
This work is also supported by the Apple Scholars in AI/ML Ph.D. Fellowship program.