N3C Data Enclave Tools

Enclave Tools

The N3C Data Enclave includes an expansive suite of tools geared to help you better discover, explore, and analyze N3C clinical data. Researchers can utilize familiar software, such as R and Python, to gain insights within the Enclave, and can also take advantage of core Enclave tools built specifically to enable analysis within the environment. The software tools suited for the N3C Data Enclave were selected for their popularity and ease of use. If the packages or tools you need aren’t already available, you can request that they be added through the N3C Support Desk.

Popular Tools

Note: Documentation for the tools specific to the N3C Data Enclave is only available to registered Enclave users. Please create an N3C Data Enclave account to learn more.

R & Python

The R and Python languages are both fully supported in the N3C Data Enclave. For both these languages, popular packages for data manipulation, visualization, hypothesis testing, and predictive model development have been pre-installed. These include tidyverse, Pandas, SciKitLearn, and hundreds more.

Code Workbooks

Collaborative analysis within N3C is carried out within Code Workbook, a specialized application available within the Enclave that allows users to analyze and transform data using nodes of logic depicted graphically. These nodes can be packaged into templates with parameterized inputs, thus allowing for easy sharing of best practices across projects and easier replication of results. Logic within workbooks is executed through an auto-scaling compute infrastructure to enable analysis against the full corpus of clinical data within the enclave. Full documentation for this tool is available here (please note, you must be a registered Enclave user to access this link).

Apache Spark

Spark SQL is a module for working with and querying structured data within the Spark framework (used under the hood to ingest, transform, and process most Enclave Data). It is an efficient tool for filtering, joining, and aggregating large datasets, which can be done in the Enclave natively with Spark SQL, or with R or Python using the SparkR or PySpark packages. Documentation can be found within the Enclave here (please note, you must be a registered Enclave user to access this link).

Contour

The N3C environment features an easy to use point and click data analysis tool called Contour for manipulating and graphing datasets. With it you can quickly access datasets, conduct common analytical and logical operations in sequence to explore your data, debug data quality, cleanse and transform your data, and create visualizations and reports to share your findings with others. Documentation (please note, you must be a registered Enclave user to access this link) for Contour is only available within the Enclave, as it is a core tool within the Palantir framework.

Code Workspaces

Exploratory analysis and iteration on transformation pipelines can be carried out within Code Workspaces - a tool which facilitates provisioning of code notebooks in either Python (Jupyter Notebooks) or R (RStudio) within the enclave. These workspaces include dedicated cloud compute resources along with environment management tools for the installation of needed libraries. Documentation can be found within the Enclave here (please note, you must be a registered Enclave user to access this link).