πŸ“¦ RT-MOT: Confidence-Aware Real-Time Scheduling Framework for Multi-Object Tracking Tasks

BardΒ·2025λ…„ 2μ›” 24일

RTCL

λͺ©λ‘ 보기
4/15
post-thumbnail

Introduction

R1. Guarantee of timely execution , R2. High tracking accuracy

No study has achieved both for multiple MOT tasks to be applied to a vision system with multiple cameras.

Key questions addressed

  • How can we design the system architecture of RT-MOT to provide a control knob to explore a trade-off between R1 and R2?
  • How can we efficiently utilize the proposed system architecture to achieve R1 and R2?

Contributions

  • Motivate the importance of choosing a proper pair of detection and association models to explore a trade-off between R1 and R2.
  • Propose the first system design RT-MOT, which addresses R1 and R2 for multiple MOT tasks.
  • Re-define and estimate a measure, which enables to predict tracking accuracy variation to be used in the scheduling framework.
  • Develop a novel confidence-aware real-time scheduling framework for RT-MOT, which offers both offline and online timing guarantees with flexible execution
  • Demonstrate the effectiveness of RT-MOT through experiment on an actual computing system.

Target System: DNN-based Multi-Object Tracking

Trade-off between Execution Time and Tracking Accuracy

Experiment inputs

  • High-confidence detection (DHD^H) : processing a full-size frame
  • Low-confidence detection (DLD^L) : processing a partial portion of a frame
  • High-confidence association (AHA^H) : performing both feature- and IoU-based methods
  • Low-confidence association (ALA^L) : performing the IoU-based method only

Observations

  • There exists a trade-off between execution time and tracking accuracy in choosing the ratio of high-confidence detection/association.
  • Tracking accuracy varies greatly with different combinations of a choice of detection and association schemes, although the combinations yield similar computation time.

Challenges

  • There exists a huge number of possible combinations of the choices of detection and association schemes for a frame sequence.
  • The tracking accuracy of one combination also dynamically varies with the input frame sequence.
  • + Multiple tasks

System Goal and Overview

RT-MOT

  • Supports dynamic selection of different execution models for detection and association.
  • Run-time frame-level scheduling decisions

Key issues

  • I1) How to estimate the variation of overall accuracy according to different detector/tracker selections of the next frame?
  • I2) How to provide an offline timing guarantee to multiple MOT tasks while maximizing the overall accuracy at runtime using the answer of I1?
  • I3) How to design the system architecture that supports flexible tracking-by-detection and provides an interface that can accommodate the answer of I1 and I2?

Design of RT-MOT

1. Dynamic tracking-by-detection execution pipeline

  • High-confidence detector (DHD^H) : process a full-size frame (672 Γ—\times 672)
  • Low-confidence detection (DLD^L) : process a RoI of a frame (256 Γ—\times 256)
  • High-confidence association (AHA^H) : feature-based + IoU-based
  • Low-confidence association (ALA^L) : IoU-based only

2. Multi-object tracking confidence estimator

  • Tracklet confidence: Ξ©(Ο‡it)=Ξ©M(Mit)Γ—Ξ©A(Ait)\Omega(\chi^t_i) = \Omega_M (M^t_i) \times \Omega_A (A^t_i)
  • Motion confidence(Ξ©M(Mit)\Omega_M(M^t_i)): specified as position, size, and velocity.
  • Appearance confidence(Ξ©A(Ait)\Omega_A(A^t_i)): feature vector extracted by feature extractor.

3. Frame-level flexible scheduler

  • The scheduler determines each frame’s priority and its pair of detector and tracker according to non-preemptive fixed-priority with the minimum execution by default.
  • Considering the currently available running time, prioritize and select detection and association models that provide higher accuracy.

Tracking confidence

Definition of Tracklet

Object, OitO^t_i

  • the ii-th object response detected at the tt-th frame
  • Oit=(Mit,Ait)O^t_i = (M^t_i, A^t_i) : motion state, appearance state

Tracklet, Ο‡it\chi^t_i

  • a set of tracks followed by OitO^t_i up to the tt-th frame
  • Ο‡it={Oik∣tsi≀k≀tei≀t}\chi^t_i = \{ O^k_i \mid t_s^i \leq k \leq t_e^i \leq t \}

Tracklets set, Ξ¦1:t\Phi_{1:t}

  • A set of tracklets of all objects up to the tt-th frame.

Tracklet confidence definition

Tracklet confidence, Ξ©(Ο‡it)\Omega(\chi^t_i)

  • Ξ©(Ο‡it)=Ξ©M(Mit)Γ—Ξ©A(Ait)\Omega(\chi^t_i) = \Omega_M (M^t_i) \times \Omega_A (A^t_i)

Motion confidence, Ξ©M(Mit)\Omega_M(M^t_i)

  • specified as position, size, and velocity.

Appearance confidence, Ξ©A(Ait)\Omega_A(A^t_i)

  • feature vector extracted by feature extractor.

Updating tracklet confidence

CG1. χitχ^t_i is matched with one of the detected objects by high-confidence association.

Ξ©M(Mit)=Ξ©A(Ait)=1(1a)\Omega_M(M^t_i) = \Omega_A(A^t_i) = 1 \tag{1a}

CG2. χitχ^t_i is matched with one of the detected objects by low-confidence association.

Ξ©M(Mit)=1,Ξ©A(Ait)=[Ξ©A(Aitβˆ’1)Γ—Ξ”Aitβˆ’1]0(1b)\Omega_M(M^t_i) = 1, \quad \Omega_A(A^t_i) = \left[ \Omega_A(A^{t-1}_i) \times \Delta A^{t-1}_i \right]_0 \tag{1b}

CG3. χitχ^t_i is unmatched with any of the detected objects.

Ξ©M(Mit)=[Ξ©M(Mitβˆ’1)Γ—Ξ”Mitβˆ’1]0,Ξ©A(Ait)=[Ξ©A(Aitβˆ’1)Γ—Ξ”Aitβˆ’1]0(1c)\Omega_M(M^t_i) = \left[ \Omega_M(M^{t-1}_i) \times \Delta M^{t-1}_i \right]_0, \qquad \Omega_A(A^t_i) = \left[ \Omega_A(A^{t-1}_i) \times \Delta A^{t-1}_i \right]_0 \tag{1c}

Calculating variation of confidences

  1. Motion confidence variation (Ξ”Mitβˆ’1\Delta M^{t-1}_i)

    Ξ”Mtβˆ’1_i=Ξ›s(Mtβˆ’f_i,Mtβˆ’1_i)Γ—Ξ›v(Mtβˆ’f_i,Mtβˆ’1_i)(2)\Delta M^{t-1}\_i = \Lambda_s(M^{t-f}\_i, M^{t-1}\_i) \times \Lambda_v(M^{t-f}\_i, M^{t-1}\_i) \tag{2}

    where

    Ξ›s(Mitβˆ’f,Mitβˆ’1)=βˆ’14(hitβˆ’fβˆ’hitβˆ’1hitβˆ’f+hitβˆ’1+witβˆ’fβˆ’witβˆ’1witβˆ’f+witβˆ’1)+12(3)\Lambda_s(M^{t-f}_i, M^{t-1}_i) = -\frac{1}{4} \left( \frac{h^{t-f}_i - h^{t-1}_i}{h^{t-f}_i + h^{t-1}_i} + \frac{w^{t-f}_i - w^{t-1}_i}{w^{t-f}_i + w^{t-1}_i} \right) + \frac{1}{2} \tag{3}
    Ξ›v(Mitβˆ’f,Mitβˆ’1)=1βˆ’2βˆ£Οƒ(vxitβˆ’fβˆ’vxitβˆ’1vxitβˆ’f+vxitβˆ’1+vyitβˆ’fβˆ’vyitβˆ’1vyitβˆ’f+vyitβˆ’1)βˆ’12∣(4)\Lambda_v(M^{t-f}_i, M^{t-1}_i) = 1 - 2 \left| \sigma \left( \frac{vx^{t-f}_i - vx^{t-1}_i}{vx^{t-f}_i + vx^{t-1}_i} + \frac{vy^{t-f}_i - vy^{t-1}_i}{vy^{t-f}_i + vy^{t-1}_i} \right) - \frac{1}{2} \right| \tag{4}
  2. Appearance confidence variation (Ξ”Aitβˆ’1\Delta A^{t-1}_i)

Ξ”Aitβˆ’1=Ξ›a(Aitβˆ’g,Aitβˆ’1)=Aitβˆ’gβ‹…Aitβˆ’1βˆ₯Aitβˆ’gβˆ₯βˆ₯Aitβˆ’1βˆ₯(5)\Delta A^{t-1}_i = \Lambda_a(A^{t-g}_i, A^{t-1}_i) = \frac{A^{t-g}_i \cdot A^{t-1}_i}{\|A^{t-g}_i\| \|A^{t-1}_i\|} \tag{5}

Tracklet Confidence Prediction for next frame

Confidence Prediction Model

  1. (DHD_H, AHA_H) : all tracklets belong to CG1.
  2. (DLD_L, AHA_H) : a subset of tracklets in RoI belongs to CG1, and the rest of tracklets outside RoI belongs to CG3.
  3. (DHD_H, ALA_L) : all tracklets belong to CG2.
  4. (DLD_L, ALA_L) : a subset of tracklets in RoI belongs to CG2, and the rest of tracklets outside RoI belongs to CG3.

Expected Confidence Score Calculation

Ξ©Λ‰Ο„k(Ξ¦1:t+1,(Dx,Ay))=βˆ‘Ο‡it+1∈Φ1:t+1Ξ©Λ‰(Ο‡it+1)∣Φ1:t+1∣(6)\bar{\Omega}_{\tau_k} (\Phi_{1:t+1}, (D_x, A_y)) = \frac{\sum_{\chi^{t+1}_i \in \Phi_{1:t+1}} \bar{\Omega}(\chi^{t+1}_i)}{|\Phi_{1:t+1}|} \tag{6}

Measured Confidence Score Calculation

Ωτk(Ξ¦1:t)=βˆ‘Ο‡it+1∈Φ1:tΞ©(Ο‡it)∣Φ1:t∣(7)\Omega_{\tau_k} (\Phi_{1:t}) = \frac{\sum_{\chi^{t+1}_i \in \Phi_{1:t}} \Omega(\chi^{t}_i)}{|\Phi_{1:t}|} \tag{7}

Confidence Improvement Calculation

ΔΩˉτk=ΔΩˉτk(Ξ¦1:t+1,(Dx,Ay))=Ξ©Λ‰Ο„k(Ξ¦1:t+1,(Dx,Ay))βˆ’Ξ©Ο„k(Ξ¦1:t)(8)\Delta \bar{\Omega}_{\tau_k} = \Delta \bar{\Omega}_{\tau_k} (\Phi_{1:t+1}, (D_x, A_y)) = \bar{\Omega}_{\tau_k} (\Phi_{1:t+1}, (D_x, A_y)) - \Omega_{\tau_k} (\Phi_{1:t}) \tag{8}

Scheduling Framework for RT-MOT

Scheduling Framework, NPFPflex^\textrm{flex}

  1. First, NPFPflex^\textrm{flex} shares the existing offline schedulability test for NPFPmin^\textrm{min}, which offers timely execution of every instance (i.e. job) of a set of multi-object tracking tasks.

  2. Second, NPFPflex^\textrm{flex} checks the feasibility for each active job to be executed beyond its minimum execution requirement, without compromising the schedulability of any future jobs to executed according to NPFPmin^\textrm{min}.

  3. Third, NPFPflex^\textrm{flex} chooses to execute the job that yields the largest expected improvement of confidence score

Task Model

Periodic task model

  • Ο„i=(Ti,Ci)\tau_i = (T_i, C_i)
  • Ci=CiD+CiAC_i = C_i^D + C_i^A (detection + association)

Detection part

  • cprec^{pre}: the WCET for RoI identification and image cropping.
  • cinter(LΒ orΒ H)c^{inter}(L \text{ or } H): the total inference time for YOLOv5 that has the two different WCETs for low-confidence (LL) and high-confidence (HH) detection.
  • CiD(LΒ orΒ H)=cpre+ciinfer(LΒ orΒ H)C^D_i(L \text{ or } H) = c^{pre} + c^{infer}_i (L \text{ or } H)

Association part

  • cias(LΒ orΒ H)c_i^{as}(L \text{ or } H): the WCET for low-confidence (L) and high-confidence (H) association.
  • ciIoUc_i^{IoU}: the WCET for a simple IoU-based matching algorithm.
  • cicascadec_i^{cascade}: the WCET for feature-based method.
  • cias(H)=ciIoU+cicascadec_i^{as}(H) = c_i^{IoU} + c_i^{cascade}, cias(L)=ciIoUc_i^{as}(L) = c_i^{IoU}
  • cpostc^{post}: the WCET for updating the confidence of all tracklets in each task.
  • CiA(LΒ orΒ H)=cias(LΒ orΒ H)+cpostC^A_i(L \text{ or } H) = c^{as}_i (L \text{ or } H) + c^{post}

Combinations

  • {CiLL,CiLH,CiHL,CiHH}\{ C^{LL}_i, C^{LH}_i, C^{HL}_i, C^{HH}_i \}

Base Scheduling Algorithm

NPFPmin^\textrm{min}

Same as the traditional non-preemptive fixed-priority scheduling with Ci=CiLLC_i = C^{LL}_i for every Ο„iβˆˆΟ„Ο„_i ∈ Ο„.

Lemma 1
Suppose that a task set ττ is scheduled by the NPFPmin^\textrm{min} scheduling algorithm.
If every task Ο„iβˆˆΟ„Ο„_i ∈ Ο„ satisfies Eq.(9)Eq. (9), every job invoked by tasks in ττ cannot miss its deadline.

Ri≀Ti,(9)R_i ≀ T_i, \tag{9}

where RiR_i, the worst-case response time of Ο„iΟ„_i, is calculated by finding Ri(x+1)=Ri(x)R_i(x + 1) = R_i(x) through iteration from Ri(0)=CiLL+maxΟ„j∈LP(Ο„i)CjLLR_i(0) = C^{LL}_i + max_{Ο„_j} ∈ LP(Ο„_i)C^{LL}_j in Eq.(10)Eq. (10).

Ri(x+1)=CiLL+max⁑τj∈LP(Ο„i)CjLL+βˆ‘Ο„h∈HP(Ο„i)⌈Ri(x)ThβŒ‰β‹…ChLL(10)R_i(x+1) = C^{LL}_i + \max_{\tau_j \in LP(\tau_i)} C^{LL}_j + \sum_{\tau_h \in HP(\tau_i)} \left\lceil \frac{R_i(x)}{T_h} \right\rceil \cdot C^{LL}_h \tag{10}

Novel Scheduling Framework for RT-MOT

NPFPflex^\textrm{flex}

To guarantee the feasibility of a job, we need to guarantee

  • No deadline miss of JkJ_k if it starts its execution for CkC_k at t0t_0.
  • No deadline miss of all future jobs to be executed after JkJ_k according to NPFPmin^\textrm{min}.

If it's deemed feasible to execute JkJ_k for given CkC_k,

  • we calculate ΔΩˉτk\Delta\bar{\Omega}_{Ο„_k}
  • set flexflex to T.

If flex is T, find the largest ΔΩˉτk\Delta\bar{\Omega}_{Ο„_k}

otherwise, follow NPFPmin^\textrm{min}.

Online feasibility test

  • Zi(t)Z_i(t): the existence of an active job of Ο„iΟ„_i at tt.
  • ri(t)r_i(t): the earliest release time of any job of Ο„iΟ„_i after or at tt.
    (if Zi(t)=TZ_i(t) = T, then the deadline of a job of Ο„i\tau_i)

Lemma 2
Suppose that we start to execute a job of Ο„kΟ„_k(denoted by JkJ_k) at t0t_0 for at most CkC_k.
Then, if Eq.(11)Eq. (11) holds, JkJ_k cannot miss its deadline.

Ck≀rk(t0)βˆ’t0(11)C_k ≀ r_k(t_0) βˆ’ t_0 \tag{11}

Lemma 3

Suppose that
(i) We start to execute an active job of Ο„kΟ„_k (denoted by JkJ_k) at t0t_0 for at most CkC_k
(ii) All jobs to be executed after JkJ_k’s execution are scheduled by NPFPmin^\textrm{min}

If Eq.(12)Eq. (12) holds, the earliest job of a given Ο„jΟ„_j with Zj(t0)=TZ_j(t_0) = T to be executed after JkJ_k’s execution (denoted by JjJ_j) cannot miss its deadline. (Note that Ο„jΟ„_j can be Ο„kΟ„_k.)

CjLL+Ck+βˆ‘Ο„h∈HP(Ο„j)βˆ–{Ο„k}∣Zh(t0)=TChLL+βˆ‘Ο„h∈HP(Ο„j)∣rh(t0)<rj(t0)⌈rj(t0)βˆ’rh(t0)ThβŒ‰ChLL≀rj(t0)βˆ’t0(12)\begin{aligned} C^{LL}_j &+ C_k + \sum_{\tau_h \in HP(\tau_j) \setminus \{\tau_k\} | Z_h(t_0) = T} C^{LL}_h \\ &+ \sum_{\tau_h \in HP(\tau_j) | r_h(t_0) < r_j(t_0)} \left\lceil \frac{r_j(t_0) - r_h(t_0)}{T_h} \right\rceil C^{LL}_h \leq r_j(t_0) - t_0 \tag{12} \end{aligned}

Lemma 4

Suppose that
(i) We start to execute an active job of Ο„kΟ„_k (denoted by JkJ_k) at t0t_0 for at most CkC_k
(ii) All jobs to be executed after JkJ_k’s execution are scheduled by NPFPmin^\textrm{min}
(iii) Eq.(9)Eq. (9) holds for every Ο„iβˆˆΟ„Ο„_i ∈ Ο„.

If Eq.(13)Eq. (13) holds, the earliest job of a given Ο„jΟ„_j with Zj(t0)=FZ_j(t_0) = F to be executed after JkJ_k’s execution (denoted by JjJ_j) cannot miss its deadline. (Note that Ο„jΟ„_j cannot be Ο„kΟ„_k.)

CjLL+Ck+βˆ‘Ο„h∈HP(Ο„j)βˆ–{Ο„k}∣Zh(t0)=TChLL+βˆ‘Ο„h∈HP(Ο„j)∣rh(t0)<rj(t0)+Tj⌈rj(t0)+Tjβˆ’rh(t0)ThβŒ‰ChLLβ€…β€Šβ‰€rj(t0)+Tjβˆ’t0(13)\begin{aligned} C^{LL}_j &+ C_k + \sum_{\tau_h \in HP(\tau_j) \setminus \{\tau_k\} | Z_h(t_0) = T} C^{LL}_h \\ &+ \sum_{\tau_h \in HP(\tau_j) | r_h(t_0) < r_j(t_0) + T_j} \left\lceil \frac{r_j(t_0)+T_j - r_h(t_0)}{T_h} \right\rceil C^{LL}_h \\\;\\ &\leq r_j(t_0) +T_j - t_0 \tag{13} \end{aligned}

Lemma 5
Suppose that
(i) a job of Ο„jΟ„_j (denoted by JjJ_j ) starts its execution at t1t_1 and does not miss its deadline if it executes for up to CjLLC^{LL}_j,
(ii) all jobs to be executed after JjJ_j ’s execution are scheduled by NPFPmin^\textrm{min}, and
(iii) Eq.(9)Eq. (9) holds for every Ο„iβˆˆΟ„Ο„_i ∈ Ο„ .

Then, any job of Ο„jΟ„_j to be executed after JjJ_j ’s execution cannot miss its deadline.

Theorem 1
Suppose that a task set ττ is scheduled by NPFPflex^\textrm{flex} in Algorithm 1.
If every task Ο„iβˆˆΟ„Ο„_i ∈ Ο„ satisfies Eq.(9)Eq. (9), every job invoked by tasks in ττ cannot miss its deadline.

Experimental Setup

Hardware and Software

  • Intel(R) Xeon(R) Silver 4215R CPU @ 3.20GHz, 251.5GB RAM, NVIDIA V100 GPU.
  • Ubuntu 18.04.4 with CUDA 10.2, and PyTorch 1.10.2
  • YOLOv5 as a front-end detector.
  • Waymo Open dataset.
  • DeepSORT, SORT for association.

Experimental Setup

Execution time profiling and run-time overhead

Time(ms)Time(ms)cprec^{pre}cinfer(L)c^{infer}(L)cinfer(H)c^{infer}(H)cas(L)c^{as}(L)cas(H)c^{as}(H)cpostc^{post}
AverageAverage0.612.613.13.223.40.7
MaximumMaximum0.917.623.29.632.70.9
  • cprec^{pre}: the WCET for RoI identification and image cropping.
  • cinter(LΒ orΒ H)c^{inter}(L \text{ or } H): the total inference time for YOLOv5 that has the two different WCETs for low-confidence (LL) and high-confidence (HH) detection.
  • cias(LΒ orΒ H)c_i^{as}(L \text{ or } H): the WCET for low-confidence (L) and high-confidence (H) association.
  • cias(H)=ciIoU+cicascadec_i^{as}(H) = c_i^{IoU} + c_i^{cascade}, cias(L)=ciIoUc_i^{as}(L) = c_i^{IoU}
  • cpostc^{post}: the WCET for updating the confidence of all tracklets in each task.

Experimental Results

MOTA

MOTA=1βˆ’βˆ‘t(mt+fpt+mmet)βˆ‘tgtMOTA = 1 - \frac{\sum_t (m_t + fp_t + mme_t)}{\sum_t g_t}
  • gtg_t: the number of all objects for time tt
  • mtm_t: the number of missed for time tt
  • fptfp_t: the number of false positive for time tt
  • mmetmme_t: the number of mismatches for time tt

Three versions

  • RT-MOTmin^\textrm{min} employs NPFPmin^\textrm{min}.
  • RT-MOTflex-NPI^\textrm{flex-NPI} employs NPFPflex^\textrm{flex} but does not allow priority inversion.
  • RT-MOTflex^\textrm{flex} employs NPFP$^\textrm{flex} $ in Algorithm 1 as it is.

Fixed-Priority policy

  • Rate-monotonic
  • H+SORT employs YOLOv5 with the original frame size (672Γ—672) for detection and SORT for association.
  • L+DeepSORT employs YOLOv5 with the down-scaled frame size (256Γ—256) for detection and DeepSORT for association.
  • H+DeepSORT employs YOLOv5 with the original frame size (672Γ—672) for detection and DeepSORT for association.

SubjectRelated works
One-Stage MOTFairMOT, ByteTrack
Two-Stage MOTDeepSORT, StrongSORT
RT DetectionDNN-SAM, AnytimeNet
RT SchedulingSubFlow, Real-Time Multi-Path
Single MOT TaskSelf-cueing Attention

Conclusion

RT-MOT

  • a method to estimate the overall accuracy variation according to different detector/tracker selections
  • a scheduling framework that provides offline timing guarantees while maximizing overall accuracy at run-time using the method
  • a system architecture that supports the framework.

Future Works

  • More combinations of various detectors and trackers
  • Improve the scheduling framework in terms of schedulability performance.
    (e.g., by allowing a preemption between the completion of detection and the beginning of association for each MOT task.)
profile
돈 λ˜λŠ” 건 λ‹€ κ³΅λΆ€ν•©λ‹ˆλ‹€.

0개의 λŒ“κΈ€