Participating Institutions



Project Goal:
The goal of the project is to study the feasibility of developing a data repository for storing encrypted healthcare data (raw as well as curated) to advance research in understanding and eliminating healthcare disparities, and for training the next generation of researchers, especially from under-represented groups.
Project Description:
- A prototype cloud-based repository has been developed to store Protected Health Information (PHI) and metadata for the feasibility study.
- The cloud-based repository will ensure that the data is easily accessible to all AIM-AHEAD researchers in a flexible manner.
- The repository will receive all types of structured, semi-structured and unstructured data including EHR/EMR data, image data, synthetic data, sociological data, economic data, and demographic data.
- The received data will go through the ETL/ELT processes before being stored in the repository. The stored data will be utilized for analyzing, identifying, and eliminating healthcare disparities.
- The data repository will seamlessly work with the other three AIM-AHEAD resource centers (for Data Curation & Harmonization, Data Governance, and Open-source AI/ML Tools) as well as with the AIM-AHEAD Infrastructure Core.
- The stored data will be categorized in different data marts for users to efficiently search and browse the repository and its datasets, as well as receive advice on which datasets are the most relevant for their research queries.
- The relationship of this data repository with various other resource centers is shown below:
This image depicts a complex network architecture with interconnected nodes and data flows. The nodes represent servers, routers, and switches, while the lines indicate data paths. The color-coded lines represent different types of connections.
Complex Network Architecture
This intricate diagram represents a complex network architecture with interconnected nodes and data flows. Let’s delve into the key components in detail:
- Nodes and Components: The diagram features various network components, including servers, routers, and switches. Each node is labeled and color-coded to indicate its function within the network.
- Data Paths and Flows: The lines connecting the nodes represent data paths. These data flows illustrate communication within the network. Arrows indicate the direction of data transfer. Different line styles or colors may signify distinct types of connections (e.g., wired, wireless, or virtual).
- Network Hierarchy and Scalability: The arrangement of nodes suggests a hierarchical structure. Some nodes may serve as central hubs, while others act as endpoints. The complexity of the diagram implies scalability—this network can accommodate growth and expansion.
- Redundancy and Reliability: Redundant paths ensure fault tolerance and reliability. If one path fails, data can still flow through alternative routes. This redundancy enhances network robustness.
- Security Considerations: Although not explicitly labeled, we can infer the presence of security measures. Firewalls, security appliances, or access control points protect the network from unauthorized access and threats.
- Application Context: The purpose of this network remains unspecified. It could serve corporate, data center, or cloud infrastructure needs. Critical applications likely rely on this architecture.
Project Vision:
- Once fully developed, the AIM-HDR repository will play a significant role in the application of modern AI/ML techniques to data-driven healthcare research.
- The complete design of the repository, along with the software services and APIs it provides, will greatly contribute to research on identifying and eliminating healthcare disparities.
- The datasets in the repository will include data on under-represented sections of the society, to provide unbiased datasets to the researchers.
- These datasets will span the entire gamut: synthetic, EHR, image, medical, socioeconomic, demographic, etc and will be stored in a secure, privacy-preserving manner.
- The repository will also support services that will help researchers, especially from the under-represented sections of the society, to develop their AI/ML skills in their quest to study and eliminate health-disparity.