Skip to content Skip to main navigation Report an accessibility issue

Dr. Qing Cao


inside-a-computer-1538262-1280x960One key component of exascale computing is fast reading and writing of data using storage systems. In recent years, growing demands for data storage have driven High-Performance Computing (HPC) platforms to adopt a variety of hardware devices as viable storage solutions. However, data processing requirements have also led to more frequent failures, prompting the needs for elastic system reconfigurations and failure recoveries.

Cao’s JDRD team seeks to address these challenges by creating a software-defined storage layer that can provide flexible, reliable, and elastic storage services and capabilities based on the underlying hardware. Cao proposes that by controlling the way that the storage is configured and data objects are moved across storage tiers, this layer can achieve great improvements on access speed, reliability, and operating cost.

“To our knowledge, this proposal initiates the first systematic research on workflow-aware software defined storage layers for HPC platforms,” wrote Cao. To administrators, the project aims to provide them with tools to specify work-flow oriented configurations, so that they can easily provision the system resources. Dealing with similar concerns, the LDRD program, carried out by Cao’s collaborators at ORNL, addresses broader concepts and approaches, while the JDRD project led by Cao focuses on software and implementation.

Cao Group FinalTheir first joint paper based on this study between UT and ORNL has been accepted into the International Conference for High Performance Computing, Networking, Storage and Analysis (SC) 2015, which is the flagship conference in the area of high performance computing. Currently, Cao’s team continues their close collaboration with ORNL, by arranging periodic meetings, collaborating on challenging problems, and carrying out influential research work.

 


i*STATIONIf the various streams of data can be accurately characterized and analyzed in real time and full context—as is the ambitious aim of Qing Cao’s exceptionally promising JDRD project—their “fusion” would allow us to infer patterns of behavior and activity, enabling even smarter, data driven, context-”aware” applications for immediate point-of-use.

Existing systems attempting to exploit the rich semantics of mass heterogeneous data are rudimentary. Cao’s initiative targets head on the challenge of coming up with tools to make sense of the huge amounts of collected data. Essential to achieving success is its attention to scalability and fundamentals by undertaking the first systematic research on programming abstractions for real-time sensor data.

In year one, the JDRD team led by Cao, with PhD students Yanjun Yao and Kefa Lu, demonstrated the feasibility of a distributed client-server model for collecting and processing real-time location information. In year two, their efforts are proceeding from small-scale testbeds to larger-scale communities of volunteer participants. Curriculum and laboratory designs for undergraduate and graduate students are also part of ongoing activities. The project has produced four applications so far—Friend Book, SmartDiary, PhoneCon, and Uno—with two conference papers delivered and another two under submission.

Cao LectureOn the LDRD side of the collaboration is James Horey of ORNL’s Computational Sciences and Engineering Division. For Horey, the complementary effort serves as an opportunity to acquire unique sensor data while sharing his group’s design of a new distributed programming model to express spatiotemporal data fusion. In effect, the JDRD team is extending the original LDRD project to include innovative techniques for location-based sensor networks and data mining of novel data sources.

In addition to several publications, Cao and Horey are jointly pursuing opportunities to leverage the success of their JDRD-LDRD collaboration with next-stage funding from the National Science Foundation, where Cyber-Physical Systems has been designated as an important emerging area of research. Four proposals representing nearly $2 million in potential grants are in the pipeline—three to NSF and one to Google.


JDRD Project:
Distributed computational framework for massive heterogeneous data fusion: A location-centric approach (Year 2)
Qiang He, UT Department of Civil and Environmental Engineering Department

LDRD project:
Distributed Computational Framework for Massive Heterogeneous Data Fusion
James Horey, ORNL Computational Sciences and Engineering Division