Xiao Yu (余啸)

The State Key Laboratory of Blockchain and Data Security, Zhejiang University, China

Contact me at: xiao.yu@zju.edu.cn

  Biography

I am currently a Research Fellow (a.k.a Assistant Research Professor) at the State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China.

Prior to joining Zhejiang University, I was a Postdoctoral Researcher at Huawei, where I had the privilege of working under the supervision of Prof. Xin Xia.

I received my Ph.D. degree in December 2020 from the School of Computer Science, Wuhan University, China, under the supervision of Prof. Jin Liu.

Additionally, I earned a joint Ph.D. degree in March 2021 from the Department of Computer Science, City University of Hong Kong, supervised by Prof. Qing Li and Prof. Jacky Wai Keung.

  Research Interests

LLMs Data Governance and Evaluation

LLMs Data Governance and Evaluation is conducting research on large language model (LLM) data engineering, addressing hallucination phenomena, and evaluating task-specific capabilities of LLMs within the domain of software engineering.

Intelligent Software Engineering

Intelligent Software Engineering is harnessing deep learning and LLM technologies for advancing tasks such as automated code generation, code annotation and maintenance, Stack Overflow question title generation, and bug report title generation, aiming to enhance the efficiency and effectiveness of software development and maintenance processes.

Software Security and Reliability

Software Security and Reliability is investigating methodologies for software vulnerability and defect detection, log anomaly identification, security bug report classification, as well as the detection of code smells and technical debt, with the goal of ensuring secure and reliable software systems.

  Selected Publications Google Scholar

* indicates the corresponding authors, # indicates the supervised students

(34) Xiao Yu, Haoxuan Chen#, Lei Liu#, Xing Hu, Jacky Wai Keung, Xin Xia*. RealisticCodeBench: Towards More Realistic Evaluation of Large Language Models for Code Generation. In Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025). [PDF]

(33) Yizhou Chen, Zeyu Sun, Guoqing Wang, Qingyuan Liang, Xiao Yu, Dan Hao. From Cryptic to Clear - Training on LLM Explanations to Detect Smart Contract Vulnerabilities. ACM Transactions on Software Engineering and Methodology. 2025.

(32) Xiaoxue Ma, Yishu Li, Jacky Keung, Xiao Yu*, Huiqi Zou, Zhen Yang, Federica Sarro, Earl T Barr. Practitioners’ expectations on log anomaly detection. IEEE Transactions on Software Engineering. 2025. [PDF] [IEEE]

(31) Dongdong Zhao, Zhihui Liu#, Fengji Zhang, Lei Liu, Jacky Wai Keung, Xiao Yu*. NegCPARBP: Enhancing Privacy Protection for Cross-Project Aging-Related Bug Prediction Based on Negative Database. IEEE Transactions on Emerging Topics in Computing. 2025, 13(2): 283-298. [PDF] [IEEE]

(30) Xiaoxue Ma, Huiqi Zou, Pinjia He, Jacky Keung, Yishu Li, Xiao Yu*, Federica Sarro, On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations. IEEE Transactions on Software Engineering. 2025, 51(1): 243-261. [PDF] [IEEE]

(29) Xiao Yu, Guancheng Lin#, Xing Hu*, Jacky Keung, Xin Xia. Less is More: Unlocking Semi-Supervised Deep Learning for Vulnerability Detection. ACM Transactions on Software Engineering and Methodology. 2025, 34(3): 1-37. [PDF] [ACM]

(28) Jun Li#, Lixian Li, Jin Liu*, Xiao Yu*, Xiao Liu, Jacky Wai Keung. Large language model ChatGPT versus small deep learning models for self-admitted technical debt detection: Why not together?. Software: Practice and Experience. 2025, 55(1): 3-28. [PDF] [WILEY]

(27) Xiao Yu, Lei Liu#, Xing Hu*, Jacky Keung, Jin Liu, Xin Xia. Fight Fire with Fire: How Much Can We Trust ChatGPT on Source Code-Related Tasks? . IEEE Transactions on Software Engineering. 2024, 50(12): 3435-3453. Reported by IEEE Spectrum. [PDF] [IEEE]

(26) Xiao Yu, Zexian Zhang#, Feifei Niu*, Xing Hu, Xin Xia, John Grundy. What Makes a High-Quality Training Dataset for Large Language Models: A Practitioners’ Perspective. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024). 2024: 656-668. [PDF] [ACM]

(25) Xiao Yu, Lei Liu#, Xing Hu*, Jacky Keung, Xin Xia, David Lo. Practitioners’ Expectations on Automated Test Generation. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024). 2024: 1618-1630. [PDF] [ACM]

(24) Xiaoxue Ma#, Jacky Keung, Pinjia He, Yan Xiao*, Xiao Yu*, Yishu Li. A Semi-supervised Approach for Industrial Anomaly Detection via Self-Adaptive Clustering. IEEE Transactions on Industrial Informatics. 2024, 20(2): 1687-1697. [PDF] [IEEE]

(23) Fengji Zhang#, Zexian Zhang#, Jacky Wai Keung, Xiangru Tang, Zhen Yang, Xiao Yu*, Wenhua Hu. Data preparation for deep learning based code smell detection: A systematic literature review. Journal of Systems and Software. 2024, 216, 112131. [PDF] [Elsevier]

(22) Xiao Yu, Jiqing Rao#, Wenhua Hu, Jacky Keung, Junwei Zhou, Jianwen Xiang*. Improving effort-aware defect prediction by directly learning to rank software modules. Information and Software Technology. 2024, 165: 107250. [PDF] [Elsevier]

(21) Xiao Yu, Liming Liu#, Lin Zhu, Jacky Wai Keung, Zijian Wang, Fuyang Li*. A multi-objective effort-aware defect prediction approach based on NSGA-II. Applied Soft Computing. 2024, 149 (Part A): 110941. [PDF] [Elsevier]

(20) Peixin Yang#, Lin Zhu, Yanjiao Zhang, Chuanxiang Ma, Liming Liu, Xiao Yu*, Wenhua Hu. On the relative value of clustering techniques for unsupervised effort-aware defect prediction. Expert Systems with Applications. 2024, 235: 123041. [PDF] [Elsevier]

(19) Zhen Yang#, Jacky Wai Keung, Xiao Yu*, Yan Xiao, Zhi Jin*, Jingyu Zhang. On the significance of category prediction for code-comment synchronization. ACM Transactions on Software Engineering and Methodology. 2023, 32(2): 1-41. [PDF] [ACM]

(18) Xiao Yu, Heng Dai, Li Li, Xiaodong Gu, Jacky Wai Keung, Kwabena Ebo Bennin, Fuyang Li, Jin Liu*. Finding the best learning to rank algorithms for effort-aware defect prediction. Information and Software Technology. 2023, 157: 107165. [PDF] [Elsevier]

(17) Xiaoxue Ma#, Jacky Wai Keung, Xiao Yu*, Huiqi Zou, Jingyu Zhang, Yishu Li. AttSum: A deep attention-based summarization model for bug report title generation. IEEE Transactions on Reliability. 2023, 72(4): 1663-1677. [PDF] [IEEE]

(16) Fuyang Li, Kuan Zou#, Jacky Wai Keung, Xiao Yu*, Shuo Feng, Yan Xiao. On the relative value of imbalanced learning for code smell detection. Software: Practice and Experience. 2023, 53(10): 1902-1927. [PDF] [WILEY]

(15) Fuyang Li, Peixin Yang#, Jacky Wai Keung, Wenhua Hu, Haoyu Luo, Xiao Yu*. Revisiting ‘revisiting supervised methods for effort-aware cross-project defect prediction’. IET Software. 2023, 17(4): 472-495. [PDF] [WILEY]

(14) Fuyang Li, Wanpeng Lu#, Jacky Wai Keung, Xiao Yu*, Lina Gong*, Juan Li. The impact of feature selection techniques on effort-aware defect prediction: An empirical study. IET Software. 2023, 17(2): 168-193. [PDF] [WILEY]

(13) Fengji Zhang#, Jin Liu, Yao Wan, Xiao Yu*, Xiao Liu*, Jacky Keung. Diverse title generation for Stack Overflow posts with multiple-sampling-enhanced transformer. Journal of Systems and Software. 2023, 200: 111672. [PDF] [Elsevier]

(12) Xiaoxue Ma#, Jacky Keung, Zhen Yang, Xiao Yu*, Yishu Li, Hao Zhang. CASMS: Combining clustering with attention semantic model for identifying security bug reports. Information and Software Technology. 2022, 147: 106906. [PDF] [Elsevier]

(11) Xiao Yu, Jacky Keung, Yan Xiao, Shuo Feng*, Fuyang Li*, Heng Dai, Predicting the precise number of software defects: Are we there yet?. Information and Software Technology. 2022, 146: 106847. [PDF] [Elsevier]

(10) Fengji Zhang#, Xiao Yu*, Jacky Keung, Fuyang Li*, Zhiwen Xie, Zhen Yang, Caoyuan Ma, Zhimin Zhang. Improving Stack Overflow question title generation with copying enhanced CodeBERT model and bi-modal information. Information and Software Technology. 2022, 148: 106922. [PDF] [Elsevier]

(9) Zhen Yang#, Jacky Keung, Md Alamgir Kabir, Xiao Yu*, Yutian Tang, Miao Zhang, Shuo Feng. AComNN: Attention enhanced Compound Neural Network for financial time-series forecasting with cross-regional features. Applied Soft Computing. 2021, 111: 107649. [PDF] [Elsevier]

(8) Shuo Feng#, Jacky Keung, Xiao Yu*, Yan Xiao, Miao Zhang. Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction. Information and Software Technology. 2021, 139: 106662. [PDF] [Elsevier]

(7) Zhen Yang#, Jacky Keung, Xiao Yu*, Xiaodong Gu, Zhengyuan Wei, Xiaoxue Ma, Miao Zhang. A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts. in IEEE/ACM 29th International Conference on Program Comprehension (ICPC 2021). 2021:1-12. [PDF] [IEEE]

(6) Xiao Yu, Jin Liu*, Jacky Wai Keung, Qing Li*, Kwabena Ebo Bennin, Zhou Xu, Junping Wang, Xiaohui Cui. Improving Ranking-Oriented Defect Prediction Using a Cost-Sensitive Ranking SVM. IEEE Transactions on Reliability. 2020, 69(1): 139-153. [PDF] [IEEE]

(5) Xiao Yu, Kwabena Ebo Bennin, Jin Liu*, Jacky Wai Keung*, Xiaofei Yin, Zhou Xu. An Empirical Study of Learning to Rank Techniques for Effort-Aware Defect Prediction. in IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER 2019). 2019: 298-309. [PDF] [IEEE]

(4) Xiao Yu, Qing Li, Jin Liu*. Scalable and Parallel Sequential Pattern Mining Using Spark. World Wide Web: Internet and Web Information Systems. 2019, 22 (1) :295-324. [PDF] [Springer]

(3) Xiao Yu, Man Wu, Yiheng Jian, Kwabena Ebo Bennin, Mandi Fu, Chuanxiang Ma*. Cross-Company Defect Prediction via Semi-supervised Clustering-based Data Filtering and MSTrA-based Transfer Learning. Soft Computing. 2018, 22(10): 3461-3472. [PDF] [Springer]

(2) Xiao Yu, Jin Liu*, Zijiang Yang, Xiao Liu. The Bayesian Network based Program Dependence Graph and its Application to Fault Localization. Journal of Systems and Software. 2017, 134: 44-53. [PDF] [Elsevier]

(1) Xiao Yu, Jin Liu*, Zijiang Yang, Xiangyang Jia, Qi Ling, Sizhe Ye. Learning from Imbalanced Data for Predicting the Number of Software Defects. in IEEE 28th International Symposium on Software Reliability Engineering (ISSRE 2017). 2017:78-89. [PDF] [IEEE]

  Service

Journal Reviewers  
  • ACM Transactions on Software Engineering and Methodology

  • IEEE Transactions on Dependable and Secure Computing

  • IEEE Transactions on Reliability

  • Information and Software Technology

  • Journal of Systems and Software

  • Software: Practice and Experience

  • IET Software

Program Committee Member  
  • The 32nd Asia-Pacific Software Engineering Conference (APSEC 2025)

  • The 31st Asia-Pacific Software Engineering Conference (APSEC 2024)

  • The 15th Asia-Pacific Symposium on Internetware (Internetware 2024)

Publication Chair  
  • The 32nd International Symposium on Software Reliability Engineering (ISSRE 2021)

Guest Editor  
  • Information and Software Technology ISSRE 2021 special section