Conversations with ChatGPT about Analytics enable data professionals to better organize initiatives, deepen their own understanding of metrics, and troubleshoot critical coding errors all thanks to the power of Artificial Intelligence (AI).
Note: This article is an experiment to see the efficacy of ChatGPT as co-author or contributor where clearly indicated. It is also illustrated by Dall-E 2 both of which are open sources of Artificial Intelligence from OpenAI.
What is ChatGPT?
According to OpenAI, chatGPT is a massive language model that they have trained. It is capable of generating usually authentic, human-like text in responding to any question or prompt. It can be used for a variety of natural language processing (NLP) tasks, such as translating languages, summarizing text, and answering questions and much more. The use cases people are dreaming up are countless and being discovered as we speak.
Getting Access to ChatGPT to use for Analytics
In order to get access, you’ll need to create an account with OpenAI – an AI research and deployment company whose mission is to ensure that artificial general intelligence benefits all of humanity. Sign up by selecting “Try↗” from the banner across the top of the home page:
How to Use a General Purpose Artificial Intelligence:
Asking ChatGPT valuable and even complicated questions is as easy as typing a message to a colleague or mentor. See below how the AI chatbot would answer this question for itself (transcript below image):
Author Question: Can you explain how to use https://chat.openai.com/chat?
ChatGPT Response: Sure! To use the OpenAI chat interface, simply navigate to https://chat.openai.com/chat in your web browser. You will be prompted to enter a message in the input box at the bottom of the screen. After entering your message and clicking the Send button, the chat interface will respond with an answer generated by the OpenAI language model. You can continue to ask it questions or make statements for it to respond to. Please note that the chat interface is only a demonstration and is not intended for production use.
Six ChatGPT Analytics Conversation Examples
Now that we have established how to access and use OpenAI’s ChatGPT general-purpose artificial intelligence, let’s look at a few examples specifically related to Analytics professionals. Before we get started, let’s once again ask the AI chatbot to answer for itself:
Initial Question to ChatGPT
Author Question: How can I use ChatGPT’s chat interface to help me as a Data Analytics Professional?
ChatGPT Response: As a Data analytics professional, you can use the OpenAI chat interface to quickly generate insights and explanations from your data. For example, you could use the chat interface to:
- Ask for summary statistics of your data
- Ask for explanations of complex data relationships
- Ask for predictions or forecasts based on your data
- Ask for visualizations of your data
- Ask for recommendations for further analysis
- Ask for help with specific data analysis tasks
However, please note that the chat interface is intended as a demonstration and is not intended for production use. Additionally, the chat interface is not designed specifically for data analysis tasks and may not be able to answer all questions related to analytics. If you need more advanced analytics capabilities, you may want to consider using specialized analytics software or working with a data scientist.
While this response is great, it doesn’t mention some of my favorite examples of questions that data professionals should be asking.
Key Performance Indicators
KPI.org defines Key Performance Indicators (KPIs) as the critical (key) quantifiable indicators of progress toward an intended result. As an Analytics Professional, you can ask ChatGPT about KPIs across a massive variety of industries and departments. For example:
Author Question: What are the most important KPIs for an eCommerce subscription-based business?
ChatGPT Response: There are several key performance indicators (KPIs) that are important for an eCommerce subscription-based business, including:
- Monthly Recurring Revenue (MRR): This KPI measures the total revenue generated by subscription-based customers on a monthly basis.
- Customer Acquisition Cost (CAC): This KPI measures the cost of acquiring new customers, including marketing and sales expenses.
- Customer Lifetime Value (CLV): This KPI measures the total revenue that a customer is expected to generate over the course of their lifetime.
- Churn Rate: This KPI measures the rate at which customers cancel their subscriptions. A low churn rate is desirable for a subscription-based business.
- Retention Rate: This KPI measures the rate at which customers continue to subscribe to your service. A high retention rate is desirable for a subscription-based business.
- Gross Margin: This KPI measures the profitability of each subscription and the difference between revenue and cost of goods sold.
- Net Promoter Score (NPS): This KPI measures customer satisfaction and loyalty, by asking customers how likely they are to recommend your business to others.
- Subscriber Acquisition Cost (SAC): This KPI measures the cost of acquiring new subscribers, including marketing and sales expenses.
These are some of the most important KPIs that can be used to measure the performance of an eCommerce subscription-based business. Other KPIs may be important depending on the specific business model, products, and services offered.
These results are great, but if you’re processing metrics for a new industry or department for the first time, you may need more details:
Author Question: Can you provide me the mathematical formulas for the 8 metrics in your last response?
ChatGPT Response: Sure, here are the mathematical formulas for the 8 key performance indicators (KPIs) that I mentioned in my previous response:
- Monthly Recurring Revenue (MRR): This KPI is calculated by multiplying the number of paying customers by the average revenue per customer per month. MRR = Number of paying customers x Average revenue per customer per month
- Customer Acquisition Cost (CAC): This KPI is calculated by dividing the total cost of sales and marketing by the number of new customers acquired during a specific period. CAC = Total cost of sales and marketing / Number of new customers
- Customer Lifetime Value (CLV): This KPI is calculated by multiplying the average revenue per customer by the average customer lifespan. CLV = Average revenue per customer x Average customer lifespan
- Churn Rate: This KPI is calculated by dividing the number of customers who cancel their subscriptions by the total number of customers at the beginning of a specific period. Churn rate = Number of customers who cancel their subscriptions / Total number of customers at the beginning of the period
- Retention Rate: This KPI is calculated by subtracting the churn rate from 1. Retention rate = 1 – Churn rate
- Gross Margin: This KPI is calculated by subtracting the cost of goods sold (COGS) from the revenue and then divide the result by the revenue. Gross margin = (Revenue – COGS) / Revenue
- Net Promoter Score (NPS): This KPI is calculated by subtracting the percentage of detractors (customers who rate the business 0-6) from the percentage of promoters (customers who rate the business 9-10). NPS = % of promoters – % of detractors
- Subscriber Acquisition Cost (SAC): This KPI is calculated by dividing the total cost of sales and marketing by the number of new subscribers acquired during a specific period. SAC = Total cost of sales and marketing / Number of new subscribers
Please note that these formulas are general examples and the calculations may vary depending on the specific business model, products, and services offered.
Let’s continue working down the same conversation path we are currently on:
Author Question: These formulas are great, but can you give the 8 formula examples in SQL code?
ChatGPT Response: Sure! Here are examples of how you might calculate the 8 key performance indicators (KPIs) using SQL:
- Monthly Recurring Revenue (MRR):
SELECT SUM(subscription_amount) FROM subscriptions WHERE start_date >= DATE_SUB(NOW(), INTERVAL 1 MONTH)
- Customer Acquisition Cost (CAC):
SELECT SUM(marketing_cost + sales_cost) / COUNT(DISTINCT customer_id) FROM customer_acquisition
- Customer Lifetime Value (CLV):
SELECT SUM(subscription_amount) / COUNT(DISTINCT customer_id) FROM subscriptions
- Churn Rate:
SELECT COUNT(DISTINCT customer_id) FROM subscriptions WHERE end_date >= DATE_SUB(NOW(), INTERVAL 1 MONTH) / COUNT(DISTINCT customer_id) FROM subscriptions
- Retention Rate:
SELECT 1 - (COUNT(DISTINCT customer_id) FROM subscriptions WHERE end_date >= DATE_SUB(NOW(), INTERVAL 1 MONTH) / COUNT(DISTINCT customer_id) FROM subscriptions)
- Gross Margin:
SELECT SUM(subscription_amount - cost_of_goods_sold) / SUM(subscription_amount) FROM subscriptions
- Net Promoter Score (NPS):
SELECT (COUNT(DISTINCT customer_id) WHERE rating >= 9) / COUNT(DISTINCT customer_id) - (COUNT(DISTINCT customer_id) WHERE rating <= 6) / COUNT(DISTINCT customer_id)
- Subscriber Acquisition Cost (SAC):
SELECT SUM(marketing_cost + sales_cost) / COUNT(DISTINCT subscriber_id) FROM subscriber_acquisition
Please note that these SQL examples are general and may need to be adjusted to match the specific structure of your database and tables. Also, I suggest you to validate the results obtained by these queries as they may not work properly with your specific data structure, or may need some adjustments to run properly.
As you can see, what started as a quick question about KPIs in a specific industry and department was quickly converted into meaningful technical advice.
Data Strategy Planning
Once again, a great way to get started with this is to simply ask ChatGPT about how to get started with planning a Data Strategy at your organization:
Author Question: How can I break up my data strategy into categories for planning?
ChatGPT Response: Breaking up a data strategy into categories can help you organize your planning and ensure that you cover all the important aspects of your data management. Here are a few categories that you might consider when planning your data strategy:
- Data Governance: This category includes policies and procedures for managing data, such as data quality standards, data security, data privacy, data retention, and data archiving.
- Data Architecture: This category includes the design and organization of your data systems, such as data modeling, data warehousing, data integration, and data lakes.
- Data Analytics: This category includes the use of data for business intelligence, reporting, and decision-making, such as data visualization, data mining, data modeling, and data science.
- Data Operations: This category includes the day-to-day management of data, such as data ingestion, data processing, data storage, and data backup.
- Data Security: This category includes measures to protect your data from unauthorized access, breaches, and loss, such as data encryption, data masking, and data backup.
- Data Privacy: This category includes measures to protect personal data of customers, employees and other parties and comply with regulations such as GDPR, CCPA, HIPAA and others.
- Data Infrastructure: This category includes the hardware and software used to store, process, and manage data, such as servers, storage devices, databases, and cloud services.
- Data Compliance: This category includes measures to ensure compliance with legal and regulatory requirements, such as data retention, data archiving, data deletion, and data reporting.
By breaking up your data strategy into these categories, you can ensure that you cover all the important aspects of data management and develop a comprehensive data strategy that addresses the unique needs of your organization.
Generate Sample Data
To assist us with our next section about Ad-hoc data analysis, let’s create some sample data:
Author Question: Generate an example of a transactions dataset
ChatGPT Response: Sure, here’s an example of a transactions dataset:
Transaction ID | Date | Customer ID | Product ID | Quantity | Price | Total ----------------------------------------------------------------------------------- 1 | 01/01/2022 | 1 | 1001 | 2 | 25.99 | 51.98 2 | 01/02/2022 | 2 | 1002 | 1 | 50.00 | 50.00 3 | 01/03/2022 | 3 | 1003 | 3 | 10.99 | 32.97 4 | 01/04/2022 | 4 | 1004 | 4 | 15.99 | 63.96 5 | 01/05/2022 | 5 | 1005 | 2 | 30.99 | 61.98 6 | 01/06/2022 | 6 | 1006 | 1 | 75.00 | 75.00 7 | 01/07/2022 | 7 | 1007 | 5 | 12.99 | 64.95 8 | 01/08/2022 | 8 | 1008 | 3 | 20.99 | 62.97 9 | 01/09/2022 | 9 | 1009 | 4 | 25.99 | 103.96 10 | 01/10/2022 | 10 | 1010 | 2 | 35.99 | 71.98
Please note that the above data is just an example, and the specific data in a table would depend on the actual transactions and the company’s data structure. Also, the above data is fake and it’s not based on real data, so it’s not recommended to use it in any real-world scenarios.
Quick Summary Metrics
General-purpose artificial intelligence is great at quickly summarizing datasets, so we progress to more detailed questions based on the data set:
Author Question: Considering the table [see above] what summary metrics can you provide?
- Total number of transactions: 10
- Total Quantity Sold: 35
- Total Revenue: $634.65
- Average Quantity Sold per transaction: 3.5
- Average revenue per transaction: $63.47
- Most popular product: Product ID 1007 (sold 5 units)
- Least popular product: Product ID 1002 (sold 1 unit)
- Most valuable customer: Customer ID 9 (generated $103.96 in revenue)
- Least valuable customer: Customer ID 2 (generated $50.00 in revenue)
- Most common unit price: $25.99 (Products ID 1001, 1009)
- Least common unit price: $75.00 (Product ID 1006)
Critical Limitations of General Purpose AI in Analytics
While powerful, using ChatGPT has a variety of limitations as well, some being that it lacks the ability to interact with changing values or dashboards in a real-time fashion or the ability to connect seamlessly to your dispersed data sources.
Luckily, Graphable has you covered, as this is a core offering of the Domo Platform through interactive capabilities like Domo Variables and data connection flexibility that includes over 1000+ pre-built connectors alongside industry-leading on-premise data transfer solutions in Domo Workbench.
However, in the interest of continuing the experiment, let’s ask ChatGPT about it’s own limitations:
Author Question: What are your critical limitations related to Data Analytics?
ChatGPT Response: As a language model AI, my capabilities are based on the data and training process used to develop me. However, there are some critical limitations related to data analytics that you should be aware of:
- Data Quality: The quality of data can have a significant impact on the accuracy and reliability of analytics. Poor data quality, such as missing or inconsistent data, can lead to inaccurate or unreliable results.
- Data Bias: Data bias can occur when data is collected or analyzed in a way that is not representative of the population being studied. This can lead to inaccurate or unfair conclusions.
- Data Privacy: The use of data for analytics can raise privacy concerns, especially when personal data is involved. It’s important to ensure that data is collected and used in compliance with relevant privacy laws and regulations.
- Data Security: Data security is a critical concern when working with sensitive or confidential data. It’s important to ensure that data is protected against unauthorized access, breaches, and loss.
- Model Limitations: Machine learning models are only as good as the data they are trained on, and they may not generalize well to new data. They can also be affected by the biases, outliers and missing data in the training set.
- Human bias: Humans can introduce bias in data collection, data cleaning, feature engineering and model selection. Therefore, it’s important to be aware of potential sources of bias and take steps to mitigate them.
- Data Volume and Velocity: Handling large volumes of data and processing it in real-time can be challenging, requiring specialized infrastructure and technologies.
- Interpreting the results: The results obtained from data analytics may not always be straightforward and may require expertise to be properly understood. The results should be validated and interpreted in the context of the business problem and the industry.
Given these limitations, it’s important to be aware of them and take appropriate steps to mitigate them as much as possible, to ensure the accuracy and reliability of your data analytics.
In conclusion, general-purpose AI is evolving into a tremendous tool for data professionals to leverage across strategy planning, technical troubleshooting, creating sample data, quick data summaries (great for QA), and much more. Artificial intelligence being used alongside data professionals on the right data platform opens the door for endless use cases and efficiency gains in analytics as well as many other fields.
If you’re curious about solutions to better leverage your business’ data to create valuable insights that lead to meaningful actions and efficiency gains, please don’t hesitate to contact the Graphable team. We’d love to sync with you on how the latest technologies, products, and best practices can productively impact the data and analytics culture at your organization.
Graphable delivers insightful graph database (e.g. Neo4j consulting) / machine learning (ml) / natural language processing (nlp) projects as well as graph and Domo consulting for BI/analytics, with measurable impact. We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.
Still learning? Check out a few of our introductory articles to learn more:
- What is a Graph Database?
- What is Neo4j (Graph Database)?
- What Is Domo (Analytics)?
- What is Hume (GraphAware)?
We would also be happy to learn more about your current project and share how we might be able to help. Schedule a consultation with us today. We can discuss Neo4j pricing or Domo pricing, or any other topic. We look forward to speaking with you!