The most important aspect of the Databricks architecture is that is a unified, cloud-native platform that spans all the key aspects of data engineering, data management and data science. Read on to dig into all the details of the architecture
What is the Databricks Architecture?
The Databricks architecture is a simple and elegant cloud-native (and cloud-only) approach that combines the customer’s Databricks cloud seamlessly with their existing AWS, Google or Azure cloud account. Leveraging open-source options at every turn, Databricks is uniquely flexible to support the variety of ways in which customers choose to pursue data their strategy, all in a seamless, unified platform.
For a detailed look at the architecture within any of the major cloud providers, see below:
Two Main Components of the Databricks Architecture
The architecture consists of two layers: the Control Plane and the Data Plane. The Control Plane hosts Databricks’ back-end services, including the graphical interface and REST APIs for account management and workspaces. The Data Plane handles external/client interactions and data processing.
It is important to note that while a common approach is to leverage the customer cloud account (e.g. AWS, Azure or GCP) for both the Data Plane and data storage, Databricks does also accomodate an architecture where the Data Plane lives in their cloud, and just the data storage lives in the customer cloud account.
Security is a key aspect of Databricks architecture, with features such as encryption, access control, data governance, and architectural security controls implemented to ensure the protection and integrity of data.
The Databricks security architecture is designed to provide comprehensive measures to protect data and ensure the integrity of its platform. The architecture incorporates various security features and best practices to safeguard sensitive information and prevent unauthorized access.
Below are the key elements of Databricks security architecture in more detail:
- Encryption: Databricks employs encryption both at rest and in transit to protect data from unauthorized access. Encryption protocols are used to secure data storage, network communication, and user credentials.
- Access Control: Databricks implements robust access control mechanisms to regulate user permissions and restrict unauthorized access. Role-based access control (RBAC) and fine-grained access control enable organizations to limit user privileges and manage data access effectively.
- Network Protections: Databricks provides network protections to secure workspaces and prevent data exfiltration. These protections help safeguard against unauthorized network access and ensure the confidentiality and privacy of data.
- Data Governance: Databricks offers features for data governance, including auditing and compliance controls. These features enable organizations to track and monitor data access, manage data lifecycle, and adhere to regulatory requirements.
- Security Best Practices: Databricks promotes security best practices through its Security Reference Architecture (SRA) and provides templates to deploy workspaces with predefined security configurations. This helps organizations follow established security standards easily.
It is important to note that the specific security features and practices may vary depending on the deployment model and cloud provider (such as AWS, Azure) used with Databricks.
- Databricks Architecture Overview – Databricks Docs
- Data Lakehouse Architecture: Databricks Well-Architected Framework – Databricks Docs
- Google GCP Databricks Architecture
- Azure Databricks Architecture Overview – Microsoft Docs
- Databricks on AWS – An Architectural Perspective (part 1) – BlueTab
- Databricks Security and Trust Center
- Databricks Security and Compliance Guide
- Azure Databricks Security Best Practices
- Databricks Lakehouse Architecture: Security, Compliance, and Privacy