About Me
I am a PhD Candidate at the Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) co-advised by Dr. Xiaobing Feng (冯晓兵) and Dr. Chenxi Wang (王晨曦).
My research focuses on the development of hard-core systems tailored for emerging hardware platforms, such as resource-disaggregated datacenter.
I am particularly interested in enhancing the performance of applications on these systems through the design and implementation of advanced programming models and compiler optimizations.
Research Interests
My research focuses on the development of hard-core systems tailored for emerging hardware platforms, such as resource-disaggregated datacenter. I am particularly interested in enhancing the performance of applications on these systems through the design and implementation of advanced programming models and compiler optimizations.
Runtime for Disaggregated Memory
As an emerging datacenter architecture, resource-disaggregation aims to reorganize datacenter hardware of each kind into their dedicated resource servers to improve resource utilization and fault tolerance and simplify hardware adoption. These servers are connected by advanced network fabrics a such as Infiniband and Intel Fabrics. As a result, the cloud application running on the resource-disaggregated cluster can get compute and memory resources from different servers.
I build a memory-disaggregated framework, Beehive, which improves the remote access throughput by exploiting the asynchrony within each thread. Beehive is capable of automatically converting applications into asynchronous execution code, which helps to reduce the microsecond-scale latency associated with remote memory access. This is achieved while maintaining low CPU overhead and enhancing data locality, leading to more efficient application performance.
Publications
Developing memory-disaggregated applications atop the emerging I/O fabrics is drawing more attention from industry and academia due to its ability to break the memory capacity wall and improve resource utilization. However, the microsecond(μs)-scale I/O fabrics raise tension between the programming productivity and performance. The multithreaded synchronous programming model is popular in developing memory-disaggregated applications due to its intuitive program logic. However, our key insight is that although thread switching can effectively mitigate the μs-scale latency, it leads to poor data locality and non-trivial scheduling overhead, leaving significant opportunities to improve the performance further.
This paper proposes a memory-disaggregated framework, Beehive, which improves the remote access throughput by exploiting the asynchrony within each thread. Beehive contains three components: the programming interfaces, the Rust compiler, and the runtime system. To improve the programming usability, Beehive allows the programmers to develop applications in the conventional multithreaded synchronous model and automatically transforms the code into pararoutine (a newly proposed computation and scheduling unit) based asynchronous code via the compiler. We evaluated Beehive with eight workloads, including data analytics, graph processing frameworks, machine learning frameworks, key-value stores, web services etc. As a result, Beehive outperforms the state-of-the-art memory-disaggregated frameworks, i.e., Hermit and AIFM, by 3.05× and 1.58× on average, correspondingly.
Even with substantial endeavors to test and validate processors, computational errors may still arise post-installation. One particular category of CPU errors transpires discreetly, without crashing applications or triggering hardware warnings. These elusive errors pose a significant threat by undermining user data, and their detection is challenging. This paper introduces Orthrus, a solution for the timely detection of silent user-data corruption caused by post-installation CPU errors. Orthrus safeguards user data in cloud applications by providing simple annotations and compiler support for users to identify data operators and validating these operators asynchronously across cores while maintaining an ultra-low overhead (2-6%), making it practical for production deployment. Our evaluation, using carefully injected errors, demonstrates that Orthrus can detect 87% of data corruptions with just a single core dedicated to validation, increasing to 91% and 96% when two and four cores are used.
Education
Ph.D. Candidate
the Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
co-advised by Dr. Xiaobing Feng (冯晓兵) and Dr. Chenxi Wang (王晨曦).Major in Computer Science and Technology
Bachelor Degree
University of Science and Technology of China (USTC)
Major in Physics
Minor in Computer Science and Technology