NASA SBIR 2008 Solicitation


PROPOSAL NUMBER: 08-1 S6.04-9972
SUBTOPIC TITLE: Data Management - Storage, Mining and Visualization
PROPOSAL TITLE: Data Intensive Scientific Computing On Petabyte Scalable Infrastructure

SMALL BUSINESS CONCERN (Firm Name, Mail Address, City/State/Zip, Phone)
Open Research Inc.
104 Fountain Green Ln
Gaithersburg, MD 20878 - 7851
(301) 527-2122

PRINCIPAL INVESTIGATOR/PROJECT MANAGER (Name, E-mail, Mail Address, City/State/Zip, Phone)
Qiming He
104 Fountain Green Ln
Gaithersburg, MD 20878 - 7851
(571) 214-6386

Expected Technology Readiness Level (TRL) upon completion of contract: 6 to 7

TECHNICAL ABSTRACT (Limit 2000 characters, approximately 200 words)
The infrastructure and programming paradigm for petabyte-level data processing performed at companies like Google and Yahoo shed some promising lights on the data-intensive scientific computing. Open source software and inexpensive commodity hardware make proprietary technologies within the grasp of academic communities. By leveraging these commercially proven and publicly available technologies, we are going to develop a suite of novel data management and analysis libraries, as an extension to existing primitive algorithms originally designed for web search. These libraries take advantage of the underlying petabyte-scalable data infrastructure, parallelize computation transparently and allow scientists and future commercial users to perform rather complex tasks (data mining, data visualization and machine learning) in a data intensive environment.

POTENTIAL NASA COMMERCIAL APPLICATIONS (Limit 1500 characters, approximately 150 words)
Many science disciplines in NASA are typically data-intensive in nature. Many of NASA's computing environments are based on technologies 20 years ago, and thus insufficient to support growing data and computation demands. The outcome of our research will help NASA reengineering its data-intensive applications using Google's search as a blueprint, not only from user experience perspective but also from infrastructure and programming perspectives. We are aware that reinvention in this area is a high risk. Therefore, we choose to reuse proven technology and provide our innovative solutions as value-added services/libraries. By using our toolset powered by Google's engine (implemented by open-source software), NASA's scientists can do much more data analysis than just a search over a large dataset.

POTENTIAL NON-NASA COMMERCIAL APPLICATIONS (Limit 1500 characters, approximately 150 words)
Data-intensive computing is not a problem unique to IT companies like Google. Nowadays, infrastructure and data analysis tools to support Data-Intensive-Scalable-Computing (DISC) are becoming competitive advantage even for non-IT companies, so that they can roll out new products and services faster and cheaper. For example, Wal-Mart sells ~300 million items everyday at 6000 stores worldwide. The entire data warehouse to support its business is as large as 4 PB. Scalable and efficient data analysis tool is vital to manage its supply chain, conduct market trend analysis and devise pricing strategy. A simple data-mining 'discovery' from its own dataset, such as `send-formula-coupon-to-diaper-buyer', can be a huge marketing success. Our solution will help non-IT companies replicate Google's success.

NASA's technology taxonomy has been developed by the SBIR-STTR program to disseminate awareness of proposed and awarded R/R&D in the agency. It is a listing of over 100 technologies, sorted into broad categories, of interest to NASA.

Computer System Architectures
Database Development and Interfacing
Software Development Environments
Software Tools for Distributed Analysis and Simulation

Form Generated on 11-24-08 11:56