CONTACT US
Databricks Dolly: A Powerful Open Source Large Language Model for the Enterprise
The artificial intelligence market is marked by rapid growth and diversification, propelled by increasing demand across various industries. Databricks Dolly has emerged as a significant large language model (LLM) option in the market, offering an alternative to established players like ChatGPT, Google Bard, and others. As you consider the LLM options for your enterprise, it’s important to understand where models like Dolly fit into the market to assist you with making a more informed decision that suits your infrastructure, business, and functional goals.
AI technologies have become more sophisticated and accessible, leading to widespread adoption and acceptance as the technology has rapidly matured. The market is characterized by a surge in AI-driven products and services, ranging from advanced analytics and automation tools to AI-powered customer service and marketing solutions.
What is Databricks Dolly?
Databricks Dolly is an open source, natural language instruction-following large language model with generative text responses for summarization, question answering and brainstorming. Unlike ChatGPT or other ‘closed’ options, Databricks Dolly (officially Databricks Dolly 2.0), is fine-tuned on a set of training data that was crowdsourced from Databricks employees. This 12 billion-parameter model was built on 13,000 demonstrations of instruction-following behavior provided by over 5,000 Databricks employees March-April 2023.
Databricks Dolly 2.0 vs 1.0
The company, in its blog announcing Databricks Dolly 2.0, explained the decision to leverage an open source model and custom data set by stating that organizations will be able to build or customize their own large language models “without paying for API access or sharing data with third parties.”
Databricks touts its fine-tuning instruction dataset, databricks-dolly-15k, as “the first open source, human-generated instruction dataset specifically designed to make large language models exhibit the magical interactivity of ChatGPT.” Unlike Dolly 1.0 released in March of 2023 – which contained output from ChatGPT, was trained on a parameter model half the size, was fine-tuned on a finite 50k response/pair set and not licensed for commercial use – Dolly 2.0, released in April 2023, was trained on a 12 billion parameter open-source model (by EleutherAI) and critically, is commercially usable due to the nature of its crowdsourced / human-generated question and answer pairs.
Enterprises considering using Databricks Dolly 2.0 can be confident in the viability of the dataset. Databricks placed clear parameters around the generation of its data and answer set, ensuring that all of the answers were human-generated and unique responses.
Who Should Use Databricks Dolly?
Dolly’s differentiator is its open source data set, which opens up unique opportunities to enterprises that are looking to build AI solutions for specific use cases. Some situations where Databricks Dolly may be a good fit include:
- Enterprises with strict data compliance requirements. As an open-source large language model that isn’t reliant on APIs, Dolly makes it possible for organizations in highly regulated industries to develop AI solutions that won’t raise data security or compliance concerns in the same way an API-reliant tool might.
- AI researchers and developers: Dolly provides a robust platform that is also incredibly agile, allowing researchers and developers the ability to rapidly adjust the model as needed. This agility increases the opportunity for innovation and creativity.
- Improving existing question and answer paired solutions: Incorporating an existing solution’s question and answer pairs into Databricks Dolly is an ideal use of the model’s structure. It’s possible, for example, to create an interactive experience out of an existing technical support database structured in a Q&A format.
Limitations of the Databricks Dolly LLM
While the Databricks Dolly LLM open-sourced model has many advantages, especially for very targeted commercial use cases, it is not a “one size fits all” tool. Large language models like Dolly, with their smaller training set, will yield results that are not as refined as the larger, closed models. Language limitations are often common; Dolly 2.0 is presently limited only to returning English language responses.
There’s also a knowledge concern when deploying an open-source LLM. They often require a significant understanding of how to train and use AI solutions, as well as the compute resources necessary to run them inside the enterprise’s environment. Closed generative AI models, on the other hand, are ready ‘out of the box’ to use and integrate custom solutions with. Weighing the difference between commercial viability and resource constraints will very often determine which model is appropriate to use.
Which Large Language Model Is Right for You?
The Databricks Dolly LLM represents a significant advancement in commercially usable open-source large language models for the enterprise. While it isn’t the right choice for all use cases, it has tremendous potential at the enterprise level. From building custom solutions, improving existing tools, and innovating AI use cases, Databricks Dolly is worth looking at if your organization has the need to utilize an open-source model.
If you have questions about whether Databricks Dolly 2.0 is the correct choice for your context or if you need to augment the resources at your disposal to work with Dolly LLM, contact us at Graphable. Our Custom Development services are available to help you design and develop the solutions to meet your unique enterprise objectives.
It offers a versatile, powerful, and user-friendly platform for data analytics, predictive modeling, and automation. Its applications span various industries and functions, making it a valuable tool for a wide range of professionals in an enterprise environment. As AI continues to evolve, Dolly’s role in simplifying and enhancing data-driven decision-making is set to grow even further.
Related articles
- Understanding Large Language Models (LLMs)
- What is ChatGPT? A Complete Explanation
- ChatGPT for Analytics: Getting Access & 6 Valuable Use Cases
- What is Prompt Engineering? Unlock the value of LLMs
- LLM Pipelines / Graph Data Science Pipelines / Data Science Pipeline Steps
- Using a Named Entity Recognition LLM in a Biomechanics Use Case
- AI in Drug Discovery – Harnessing the Power of LLMs
- AI consulting guidebook
- What is ChatGPT? A Complete Explanation
- Understanding Large Language Models (LLMs)
- What is Prompt Engineering? Unlock the value of LLMs
- What are Graph Algorithms?
- Conductance Graph Community Detection: Python Examples
- What is Neo4j Graph Data Science?
- What is Databricks? For data warehousing, data engineering, and data science
- Databricks Architecture – A Clear and Concise Explanation
- Databricks vs Snowflake compared
- What is Databricks Unity Catalog?
- Databricks SQL Warehouse: Unlock the Power of 4 Proven Strategies