Introduction
R1. Guarantee of timely execution , R2. High tracking accuracy
No study has achieved both for multiple MOT tasks to be applied to a vision system with multiple cameras.
Key questions addressed
- How can we design the system architecture of RT-MOT to provide a control knob to explore a trade-off between R1 and R2?
- How can we efficiently utilize the proposed system architecture to achieve R1 and R2?
Contributions
- Motivate
the importance of choosing a proper pair of detection and association models to explore a trade-off between R1 and R2.
- Propose the first system design
RT-MOT, which addresses R1 and R2 for multiple MOT tasks.
- Re-define and estimate a measure, which enables to predict tracking accuracy variation to be used in the scheduling framework.
- Develop a novel
confidence-aware real-time scheduling framework for RT-MOT, which offers both offline and online timing guarantees with flexible execution
- Demonstrate the effectiveness of RT-MOT through experiment on an actual computing system.
Target System: DNN-based Multi-Object Tracking

Trade-off between Execution Time and Tracking Accuracy
- High-confidence detection (DH) : processing a full-size frame
- Low-confidence detection (DL) : processing a partial portion of a frame
- High-confidence association (AH) : performing both feature- and IoU-based methods
- Low-confidence association (AL) : performing the IoU-based method only

Observations
- There exists a
trade-off between execution time and tracking accuracy in choosing the ratio of high-confidence detection/association.
- Tracking accuracy
varies greatly with different combinations of a choice of detection and association schemes, although the combinations yield similar computation time.
Challenges
- There exists
a huge number of possible combinations of the choices of detection and association schemes for a frame sequence.
- The tracking accuracy of one combination also dynamically varies with the input frame sequence.
- + Multiple tasks
System Goal and Overview
RT-MOT
- Supports dynamic selection of different execution models for detection and association.
- Run-time frame-level scheduling decisions
Key issues
- I1) How to
estimate the variation of overall accuracy according to different detector/tracker selections of the next frame?
- I2) How to
provide an offline timing guarantee to multiple MOT tasks while maximizing the overall accuracy at runtime using the answer of I1?
- I3) How to
design the system architecture that supports flexible tracking-by-detection and provides an interface that can accommodate the answer of I1 and I2?
Design of RT-MOT
1. Dynamic tracking-by-detection execution pipeline
- High-confidence detector (DH) : process a full-size frame (672 Γ 672)
- Low-confidence detection (DL) : process a RoI of a frame (256 Γ 256)
- High-confidence association (AH) : feature-based + IoU-based
- Low-confidence association (AL) : IoU-based only

2. Multi-object tracking confidence estimator
- Tracklet confidence: Ξ©(Οitβ)=Ξ©Mβ(Mitβ)ΓΞ©Aβ(Aitβ)
- Motion confidence(Ξ©Mβ(Mitβ)): specified as position, size, and velocity.
- Appearance confidence(Ξ©Aβ(Aitβ)): feature vector extracted by feature extractor.
3. Frame-level flexible scheduler
- The scheduler determines each frameβs priority and its pair of detector and tracker according to non-preemptive fixed-priority
with the minimum execution by default.
- Considering the currently available running time, prioritize and select detection and association models that
provide higher accuracy.
Tracking confidence
Definition of Tracklet
Object, Oitβ
- the i-th object response detected at the t-th frame
- Oitβ=(Mitβ,Aitβ) : motion state, appearance state
Tracklet, Οitβ
- a set of tracks followed by Oitβ up to the t-th frame
- Οitβ={Oikββ£tsiββ€kβ€teiββ€t}
Tracklets set, Ξ¦1:tβ
- A set of tracklets of all objects up to the t-th frame.
Tracklet confidence definition
Tracklet confidence, Ξ©(Οitβ)
- Ξ©(Οitβ)=Ξ©Mβ(Mitβ)ΓΞ©Aβ(Aitβ)
Motion confidence, Ξ©Mβ(Mitβ)
- specified as position, size, and velocity.
Appearance confidence, Ξ©Aβ(Aitβ)
- feature vector extracted by feature extractor.
Updating tracklet confidence
CG1. Οitβ is matched with one of the detected objects by high-confidence association.
Ξ©Mβ(Mitβ)=Ξ©Aβ(Aitβ)=1(1a)
CG2. Οitβ is matched with one of the detected objects by low-confidence association.
Ξ©Mβ(Mitβ)=1,Ξ©Aβ(Aitβ)=[Ξ©Aβ(Aitβ1β)ΓΞAitβ1β]0β(1b)
CG3. Οitβ is unmatched with any of the detected objects.
Ξ©Mβ(Mitβ)=[Ξ©Mβ(Mitβ1β)ΓΞMitβ1β]0β,Ξ©Aβ(Aitβ)=[Ξ©Aβ(Aitβ1β)ΓΞAitβ1β]0β(1c)
Calculating variation of confidences
-
Motion confidence variation (ΞMitβ1β)
ΞMtβ1_i=Ξsβ(Mtβf_i,Mtβ1_i)ΓΞvβ(Mtβf_i,Mtβ1_i)(2)
where
Ξsβ(Mitβfβ,Mitβ1β)=β41β(hitβfβ+hitβ1βhitβfββhitβ1ββ+witβfβ+witβ1βwitβfββwitβ1ββ)+21β(3)
Ξvβ(Mitβfβ,Mitβ1β)=1β2β£β£β£β£β£β£βΟ(vxitβfβ+vxitβ1βvxitβfββvxitβ1ββ+vyitβfβ+vyitβ1βvyitβfββvyitβ1ββ)β21ββ£β£β£β£β£β£β(4)
-
Appearance confidence variation (ΞAitβ1β)
ΞAitβ1β=Ξaβ(Aitβgβ,Aitβ1β)=β₯Aitβgββ₯β₯Aitβ1ββ₯Aitβgββ
Aitβ1ββ(5)

Tracklet Confidence Prediction for next frame
Confidence Prediction Model
- (DHβ, AHβ) : all tracklets belong to CG1.
- (DLβ, AHβ) : a subset of tracklets in RoI belongs to CG1, and the rest of tracklets outside RoI belongs to CG3.
- (DHβ, ALβ) : all tracklets belong to CG2.
- (DLβ, ALβ) : a subset of tracklets in RoI belongs to CG2, and the rest of tracklets outside RoI belongs to CG3.
Expected Confidence Score Calculation
Ξ©ΛΟkββ(Ξ¦1:t+1β,(Dxβ,Ayβ))=β£Ξ¦1:t+1ββ£βΟit+1ββΞ¦1:t+1ββΞ©Λ(Οit+1β)β(6)
Measured Confidence Score Calculation
Ξ©Οkββ(Ξ¦1:tβ)=β£Ξ¦1:tββ£βΟit+1ββΞ¦1:tββΞ©(Οitβ)β(7)
Confidence Improvement Calculation
ΞΞ©ΛΟkββ=ΞΞ©ΛΟkββ(Ξ¦1:t+1β,(Dxβ,Ayβ))=Ξ©ΛΟkββ(Ξ¦1:t+1β,(Dxβ,Ayβ))βΞ©Οkββ(Ξ¦1:tβ)(8)
Scheduling Framework for RT-MOT
Scheduling Framework, NPFPflex
-
First, NPFPflex shares the existing offline schedulability test for NPFPmin, which offers timely execution of every instance (i.e. job) of a set of multi-object tracking tasks.
-
Second, NPFPflex checks the feasibility for each active job to be executed beyond its minimum execution requirement, without compromising the schedulability of any future jobs to executed according to NPFPmin.
-
Third, NPFPflex chooses to execute the job that yields the largest expected improvement of confidence score
Task Model
Periodic task model
- Οiβ=(Tiβ,Ciβ)
- Ciβ=CiDβ+CiAβ (detection + association)
Detection part
- cpre: the WCET for RoI identification and image cropping.
- cinter(LΒ orΒ H): the total inference time for YOLOv5 that has the two different WCETs for low-confidence (L) and high-confidence (H) detection.
- CiDβ(LΒ orΒ H)=cpre+ciinferβ(LΒ orΒ H)
Association part
- ciasβ(LΒ orΒ H): the WCET for low-confidence (L) and high-confidence (H) association.
- ciIoUβ: the WCET for a simple IoU-based matching algorithm.
- cicascadeβ: the WCET for feature-based method.
- ciasβ(H)=ciIoUβ+cicascadeβ, ciasβ(L)=ciIoUβ
- cpost: the WCET for updating the confidence of all tracklets in each task.
- CiAβ(LΒ orΒ H)=ciasβ(LΒ orΒ H)+cpost
Combinations
- {CiLLβ,CiLHβ,CiHLβ,CiHHβ}
Base Scheduling Algorithm
NPFPmin
Same as the traditional non-preemptive fixed-priority scheduling with Ciβ=CiLLβ for every ΟiββΟ.
Lemma 1
Suppose that a task set Ο is scheduled by the NPFPmin scheduling algorithm.
If every task ΟiββΟ satisfies Eq.(9), every job invoked by tasks in Ο cannot miss its deadline.
Riββ€Tiβ,(9)
where Riβ, the worst-case response time of Οiβ, is calculated by finding Riβ(x+1)=Riβ(x) through iteration from Riβ(0)=CiLLβ+maxΟjβββLP(Οiβ)CjLLβ in Eq.(10).
Riβ(x+1)=CiLLβ+ΟjββLP(Οiβ)maxβCjLLβ+ΟhββHP(Οiβ)βββThβRiβ(x)βββ
ChLLβ(10)
Novel Scheduling Framework for RT-MOT
NPFPflex
To guarantee the feasibility of a job, we need to guarantee
- No deadline miss of Jkβ if it starts its execution for Ckβ at t0β.
- No deadline miss of all future jobs to be executed after Jkβ according to NPFPmin.
If it's deemed feasible to execute Jkβ for given Ckβ,
- we calculate ΞΞ©ΛΟkββ
- set flex to T.
If flex is T, find the largest ΞΞ©ΛΟkββ
otherwise, follow NPFPmin.
Online feasibility test
- Ziβ(t): the existence of an active job of Οiβ at t.
- riβ(t): the earliest release time of any job of Οiβ after or at t.
(if Ziβ(t)=T, then the deadline of a job of Οiβ)
Lemma 2
Suppose that we start to execute a job of Οkβ(denoted by Jkβ) at t0β for at most Ckβ.
Then, if Eq.(11) holds, Jkβ cannot miss its deadline.
Ckββ€rkβ(t0β)βt0β(11)
Lemma 3
Suppose that
(i) We start to execute an active job of Οkβ (denoted by Jkβ) at t0β for at most Ckβ
(ii) All jobs to be executed after Jkββs execution are scheduled by NPFPmin
If Eq.(12) holds, the earliest job of a given Οjβ with Zjβ(t0β)=T to be executed after Jkββs execution (denoted by Jjβ) cannot miss its deadline. (Note that Οjβ can be Οkβ.)
CjLLββ+Ckβ+ΟhββHP(Οjβ)β{Οkβ}β£Zhβ(t0β)=TββChLLβ+ΟhββHP(Οjβ)β£rhβ(t0β)<rjβ(t0β)βββThβrjβ(t0β)βrhβ(t0β)ββChLLββ€rjβ(t0β)βt0ββ(12)
Lemma 4
Suppose that
(i) We start to execute an active job of Οkβ (denoted by Jkβ) at t0β for at most Ckβ
(ii) All jobs to be executed after Jkββs execution are scheduled by NPFPmin
(iii) Eq.(9) holds for every ΟiββΟ.
If Eq.(13) holds, the earliest job of a given Οjβ with Zjβ(t0β)=F to be executed after Jkββs execution (denoted by Jjβ) cannot miss its deadline. (Note that Οjβ cannot be Οkβ.)
CjLLββ+Ckβ+ΟhββHP(Οjβ)β{Οkβ}β£Zhβ(t0β)=TββChLLβ+ΟhββHP(Οjβ)β£rhβ(t0β)<rjβ(t0β)+TjββββThβrjβ(t0β)+Tjββrhβ(t0β)ββChLLββ€rjβ(t0β)+Tjββt0ββ(13)
Lemma 5
Suppose that
(i) a job of Οjβ (denoted by Jjβ ) starts its execution at t1β and does not miss its deadline if it executes for up to CjLLβ,
(ii) all jobs to be executed after Jjβ βs execution are scheduled by NPFPmin, and
(iii) Eq.(9) holds for every ΟiββΟ .
Then, any job of Οjβ to be executed after Jjβ βs execution cannot miss its deadline.
Theorem 1
Suppose that a task set Ο is scheduled by NPFPflex in Algorithm 1.
If every task ΟiββΟ satisfies Eq.(9), every job invoked by tasks in Ο cannot miss its deadline.
Experimental Setup
Hardware and Software
- Intel(R) Xeon(R) Silver 4215R CPU @ 3.20GHz, 251.5GB RAM, NVIDIA V100 GPU.
- Ubuntu 18.04.4 with CUDA 10.2, and PyTorch 1.10.2
- YOLOv5 as a front-end detector.
- Waymo Open dataset.
- DeepSORT, SORT for association.
Experimental Setup
Execution time profiling and run-time overhead
| Time(ms) | cpre | cinfer(L) | cinfer(H) | cas(L) | cas(H) | cpost |
|---|
| Average | 0.6 | 12.6 | 13.1 | 3.2 | 23.4 | 0.7 |
| Maximum | 0.9 | 17.6 | 23.2 | 9.6 | 32.7 | 0.9 |
- cpre: the WCET for RoI identification and image cropping.
- cinter(LΒ orΒ H): the total inference time for YOLOv5 that has the two different WCETs for low-confidence (L) and high-confidence (H) detection.
- ciasβ(LΒ orΒ H): the WCET for low-confidence (L) and high-confidence (H) association.
- ciasβ(H)=ciIoUβ+cicascadeβ, ciasβ(L)=ciIoUβ
- cpost: the WCET for updating the confidence of all tracklets in each task.
Experimental Results
MOTA
MOTA=1ββtβgtββtβ(mtβ+fptβ+mmetβ)β
- gtβ: the number of all objects for time t
- mtβ: the number of missed for time t
- fptβ: the number of false positive for time t
- mmetβ: the number of mismatches for time t
Three versions
- RT-MOTmin employs NPFPmin.
- RT-MOTflex-NPI employs NPFPflex but does not allow priority inversion.
- RT-MOTflex employs NPFP$^\textrm{flex} $ in Algorithm 1 as it is.
Fixed-Priority policy
Popular RT-MOT versions
H+SORT employs YOLOv5 with the original frame size (672Γ672) for detection and SORT for association.
L+DeepSORT employs YOLOv5 with the down-scaled frame size (256Γ256) for detection and DeepSORT for association.
H+DeepSORT employs YOLOv5 with the original frame size (672Γ672) for detection and DeepSORT for association.


| Subject | Related works |
|---|
| One-Stage MOT | FairMOT, ByteTrack |
| Two-Stage MOT | DeepSORT, StrongSORT |
| RT Detection | DNN-SAM, AnytimeNet |
| RT Scheduling | SubFlow, Real-Time Multi-Path |
| Single MOT Task | Self-cueing Attention |
Conclusion
RT-MOT
- a method to
estimate the overall accuracy variation according to different detector/tracker selections
- a scheduling framework that provides
offline timing guarantees while maximizing overall accuracy at run-time using the method
- a
system architecture that supports the framework.
Future Works
- More combinations of various detectors and trackers
- Improve the scheduling framework in terms of schedulability performance.
(e.g., by allowing a preemption between the completion of detection and the beginning of association for each MOT task.)