I began my Ph.D. in Computer Science and Engineering at The Hong Kong University of Science and Technology (HKUST) in September 2023, advised by Prof. Qifeng Chen. Since then, I have worked closely with Shanghai AI Laboratory on multimodal foundation models and agentic systems, with recent efforts centered on open-source computer use agents, multimodal large language models, AI-generated content, and video understanding. My recent research includes projects such as ScaleCUA, InternVL, VisionLLM, ControlLLM, AudioX, and InternGPT, and I am broadly interested in building practical multimodal systems that can perceive, reason, and act in complex real-world settings.
🔥 News
- 2026.04: 🎉 One paper accepted to ACL 2026.
- 2026.03: 🎉 One paper accepted to SIGGRAPH 2026.
- 2026.02: 🎉 Two papers accepted to CVPR 2026.
- 2026.01: 🎉 Three papers accepted to ICLR 2026, including one Oral and two Posters.
💼 Experience
- 2023.04 - 2025.10 | Research Intern, Shanghai AI Laboratory, Shanghai, China
Worked with Wenhai Wang and Jifeng Dai on large language models and multimodal learning. - 2022.04 - 2023.04 | Part-time Researcher, Shanghai AI Laboratory, Shanghai, China
Led interns on video understanding, multimodal learning, and pose pretraining. - 2020.07 - 2023.04 | Algorithm Engineer, SenseTime, Shanghai, China
Led a project on detecting highlight moments in videos and worked on video classification, highlight detection, and boundary detection. - 2019.09 - 2020.07 | Research Intern, SenseTime, Beijing, China
Worked on action recognition. - 2019.05 - 2019.09 | Research Intern, Tencent Youtu Lab, Shanghai, China
Worked on action recognition.
📚 Selected Publications
Show All Publications from Google Scholar
Synced from Google Scholar on 2026-04-15.
-
Annual Meeting of the Association for Computational Linguistics (ACL), 2026
-
arXiv preprint arXiv:2512.16295 , 2025
-
International Conference on Learning Representations (ICLR), 2026 Oral
-
arXiv preprint arXiv:2508.18265 , 2025
-
arXiv preprint arXiv:2507.19478 , 2025
-
arXiv preprint arXiv:2505.23762 , 2025
-
International Conference on Learning Representations (ICLR), 2026 Poster
-
arXiv preprint arXiv:2504.10479 , 2025
-
International Conference on Learning Representations (ICLR), 2026 Poster
-
arXiv preprint arXiv:2412.05271, 2025
-
Proceedings of the Computer Vision and Pattern Recognition Conference, 18782 … , 2025
-
arXiv preprint arXiv:2412.18966 , 2024
-
Advances in Neural Information Processing Systems 37, 69925-69975 , 2024
-
arXiv preprint arXiv:2412.05271 , 2024
-
European Conference on Computer Vision, 89-105 , 2024
-
arXiv preprint arXiv:2407.20962 , 2024
-
arXiv preprint arXiv:2405.19334 , 2024
-
arXiv preprint arXiv:2305.05662 , 2023
-
Proceedings of the IEEE/CVF international conference on computer vision … , 2023
-
International Journal of Computer Vision (IJCV) 2024 , 2022
-
European Conference on Computer Vision, 431-448 , 2022
-
arXiv preprint arXiv:2206.15268 , 2022
-
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern … , 2022
-
arXiv preprint arXiv:2102.09471 , 2021
-
Proceedings of the IEEE/CVF international conference on computer vision … , 2021
-
IEEE Transactions on Image Processing 29, 7970-7983 , 2020
-
Proceedings of the AAAI conference on artificial intelligence 34 (07), 11669 … , 2020
-
Pacific Rim Conference on Multimedia, 545-555 , 2018
-
2018 24th international conference on pattern recognition (ICPR), 1301-1306 , 2018
-
International Conference on Learning Representations (ICLR), 2023
🏆 Honors and Awards
- 2023 RedBird PhD Award, The Hong Kong University of Science and Technology.
- 2020 Excellent Student Award, Nanjing University.
- 2020 Outstanding Graduate Award, Nanjing University.
- 2018 First Place, China Postgraduate Innovation and Practice Competition, Action Recognition Track.
- 2018 Huawei Scholarship, Nanjing University.
- 2015 Second Prize, Oracle Cup, East China Division.
- 2015 First Prize, Jingsheng Cup Computer Programming Competition, Anhui Province.
🤝 Academic Services
Journal Reviewer
- IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
- International Journal of Computer Vision (IJCV)
- IEEE Transactions on Image Processing (TIP)
- Pattern Recognition (PR)
Conference Reviewer
- CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML
🎓 Teaching
- 2018 Spring Teaching Assistant, Experiments for Programming Design, Nanjing University.
- 2024 Spring Teaching Assistant, Fundamentals of Artificial Intelligence, HKUST.
- 2024 Autumn Teaching Assistant, Design and Analysis of Algorithms, HKUST.
- 2025 Autumn Teaching Assistant, Exploring Artificial Intelligence, HKUST.