Yilong Zhu bio photo

Email

Github

Google Scholar


About Me


Here is Yilong Zhu (朱亦隆).

I am currently a Ph.D. candidate in the Aerial Robotics Group at the Hong Kong University of Science and Technology (HKUST), supervised by Prof. Shaojie Shen. My research focuses on navigation for autonomous systems.

Before starting my Ph.D., I worked as Algorithm Manager in the Mapping and Localization Group at Unity Drive Innovation, where I led the development of localization systems using LiDAR, UWB, and inertial sensors.

I have published papers in top-tier journals such as IEEE Transactions on Robotics (T-RO) and The International Journal of Robotics Research (IJRR), and hold multiple patents in localization technologies. I also serve as a reviewer for ICRA, IROS, and IEEE Transactions.

If you are interested in collaboration or research discussions, feel free to contact me or check out my GitHub.


News


  • [2025/6] One paper is accepted to IROS 2025,The topic is about Visual Localization using the Novel Satellite Image. See you in Hangzhou, China!

  • [2025/2] One paper is accepted to IEEE Transactions on Instrumentation and Measurement,The topic is about Global Optimal Solutions to Scaled Quadratic Pose Estimation Problems. Congratulation to Bohuan (HKUST Ph.D.)!

  • [2025/2] One paper is accepted to IEEE Transactions on Instrumentation and Measurement,The topic is about Globally Optimal Estimation of Accelerometer-Magnetometer Misalignment. Congratulation to Xiangcheng (HKUST Ph.D.)!

  • [2024/12] One paper is accepted to IEEE Robotics and Automation Letters,The topic is about Efficient Camera Exposure Control for Visual Odometry via Deep Reinforcement Learning. Congratulation to Shuyang (HKUST Ph.D.)!


Research Interests


I am actively seeking a Postdoc position for 2026 Spring admission. If you have any information, please contact me!

My research lies at the intersection of traditional robotics and learning-based approaches, with a focus on building robust and generalizable navigation systems for autonomous driving and mobile robotics. I am particularly interested in how physically grounded models (e.g., optimization-based SLAM, sensor fusion) can be enhanced with high-level semantic understanding and generative priors from modern deep learning models.


  • Simultaneous Localization and Mapping (SLAM)

    I develop tightly-coupled optimization frameworks using LiDAR, IMU, and UWB to achieve drift-free, real-time state estimation.



  • Bird’s Eye View (BEV) Representation for Localization

    I explore BEV-based geometric and semantic alignment between ego-view and satellite-view images to enable cross-view localization.



  • Tightly-Coupled LiDAR-Inertial Localization for GNSS-Denied Environments

    I develop localization frameworks that fuse LiDAR and inertial data to achieve robust and accurate navigation in environments where GNSS signals are unavailable or unreliable.

    Specifically, I proposed RLIL, a system that integrates motion distortion correction, IMU bias estimation, and a Kd-tree-accelerated scan matcher initialized by IMU priors. The method enhances robustness via a local map tracking module and inertial constraints, ensuring real-time performance across campus scenes, crowded streets, and open, featureless areas.
    RLIL achieves centimeter-level accuracy and significantly outperforms existing open-source baselines in both structured and dynamic environments.



  • Dynamic-Aware Localization and Mapping in Changing Environments

    I study localization and dynamic object detection in complex, dynamic environments by leveraging map consistency and multi-stage processing.

    In my recent work, I propose a system that first builds a clean, static TSDF map using data collected with high-precision GNSS/INS positioning and dynamic object removal. During online operation, LiDAR scans are registered to this dynamic-free map, and deviations are used to detect dynamic points in real time. These filtered point clouds are then used in a tightly-coupled LiDAR-inertial localization framework to improve robustness in cluttered urban scenes.

    This approach enables autonomous vehicles and UAVs to achieve robust localization while simultaneously detecting dynamic agents such as pedestrians and vehicles.



  • Generative Models for Cross-view Understanding

    I apply diffusion models to generate semantically aligned BEV representations from monocular images, facilitating viewpoint-invariant localization.



  • Semantic-Guided BEV Generation for Cross-View Localization

    I explore how structured semantic priors can guide generative models to overcome extreme viewpoint gaps in cross-view localization.

    In particular, I proposed DiffLoc, a framework that integrates IPM projection, Navier-Stokes inpainting, and CLIP-based scene description (CLSD) to condition a diffusion model in latent space for synthesizing BEV images. These BEV representations are both geometrically consistent and semantically rich, enabling accurate matching against satellite imagery.

    This approach demonstrates robust performance in degraded urban environments, offering a viable vision-based alternative to GNSS.



  • Long-Horizon and Cross-Modal Navigation via Fast-Slow System Integration

    In autonomous driving, the perception range is typically limited to around 200 meters. While navigation APIs (e.g., Google/AMap) provide coarse routing instructions like “turn left in 300m,” current systems often rely heavily on short-range perception and weak localization, making it easy to miscount lanes or miss intersections.

    I explore the integration of fast reactive modules (for perception and control) with slow semantic planners that leverage satellite imagery, vision-language models (VLMs), and onboard camera views to achieve long-horizon, cross-modal navigation. This enables the system to generate early, semantically guided instructions before approaching complex urban intersections.


🔍 My goal is to combine physical models with learned methods to build generalizable and interpretable navigation systems for robotics.