DeepSeek vs Gemini vs Mistral: Prompt Engineering
Of course! Here is a comprehensive guide to data analysis prompts for ChatGPT, structured from beginner to advanced levels.
The key to effective prompting is to be specific about your goal, your data context, and the desired output format.
Part 1: The Anatomy of a Great Data Analysis Prompt
A good prompt provides context and direction. Use this framework:
- Role & Context: "Act as a [Data Analyst/Marketing Analyst]. I am analyzing a dataset of [describe your data, e.g., e-commerce transactions]."
- Data Description: "The dataset contains columns like: [list key columns, e.g.,
orderid
,customerid
,product
,category
,order_date
,revenue
]." - Specific Goal/Task: "My goal is to [find the top-selling categories, understand customer churn, forecast next quarter's sales]."
- Output Format: "Please provide [a step-by-step plan, a Python code snippet, a summary of insights in a table, a SQL query]."
Part 2: Prompt Examples by Use Case
Category 1: Planning & Strategy (Before You Code)
Use these to structure your analysis.
- Prompt: "I have a dataset of customer support tickets with columns:
ticketid
,dateopened
,dateclosed
,customersegment
,issuetype
,agentid
,satisfaction_rating
(1-5). My goal is to identify the main drivers of low satisfaction ratings. Generate a step-by-step analysis plan." - Prompt: "Act as a business intelligence analyst. I need to analyze our SaaS company's user onboarding funnel. The key metrics are sign-ups, activations, and paid subscriptions. What are the top 3 questions I should answer with this data to improve conversion rates?"
Category 2: Code Generation & Data Wrangling
Get help with Python (Pandas, NumPy, Scikit-learn) or R code.
- Cleaning Data:
- Prompt: "Write a Python function using Pandas to clean a sales dataset. The function should: 1) Handle missing values in the 'revenue' column by filling them with 0. 2) Convert the 'date' column to a datetime object. 3) Remove duplicate rows based on the 'transaction_id' column."
- Feature Engineering:
- Prompt: "Given a Pandas DataFrame
df
with a 'signupdate' column, write code to create a new feature called 'dayssincesignup' calculated from today's date. Then, create another feature 'isweekend' based on the 'signup_date'." - Statistical Testing:
- Prompt: "I have two arrays:
groupAscores
andgroupBscores
. Write Python code to perform a two-sample t-test to see if there's a statistically significant difference between the means of the two groups. Usescipy.stats
. Also, interpret a p-value of 0.03."
Category 3: Exploratory Data Analysis (EDA) & Visualization
Ask for both code and interpretation.
- Prompt: "I have a DataFrame
df
with columns 'age', 'income', and 'purchaseamount'. Generate Python code (using Matplotlib/Seaborn) to create the following plots: 1) A histogram of 'age'. 2) A scatter plot of 'age' vs. 'income', colored by 'purchaseamount'. 3) A boxplot of 'income' by age groups (e.g., 18-25, 26-40, 41+)." - Prompt: "Interpret the following correlation matrix for an e-commerce store. Focus on the relationship between 'timeonsite' and 'addtocartcount', and between 'productprice' and 'purchase_likelihood'. What are two potential business insights?"
Category 4: Advanced Analysis & Modeling
Get help with machine learning and statistical models.
- Clustering:
- Prompt: "I want to segment customers based on their 'recency', 'frequency', and 'monetary_value' (RFM). Suggest a suitable clustering algorithm (like K-Means) and write a step-by-step Python code outline for this task, including feature scaling and determining the optimal number of clusters."
- Predictive Modeling:
- Prompt: "My goal is to predict customer churn (binary: 1 for churned, 0 for active). The features are 'tenure', 'monthlycharges', 'contracttype', and 'numberofsupport_calls'. Which classification model would be a good starting point? Provide a Python code snippet using Scikit-learn to train a Logistic Regression model, including splitting the data and evaluating it with a confusion matrix."
- Time Series Forecasting:
- Prompt: "I have a time series of daily sales for the past two years in a DataFrame. Write Python code to plot the data and decompose it into trend, seasonality, and residuals using
statsmodels
. Then, suggest a simple forecasting model like SARIMA for this data."
Category 5: SQL for Data Analysis
Translate your questions into SQL queries.
- Prompt: "Write a SQL query to find the top 5 customers by total revenue in the last quarter. Tables:
customers
(id, name),orders
(id, customerid, orderdate, total_amount)." - Prompt: "I have a table
usersessions
withuserid
,sessionstart
, andsessionend
. Write a SQL query to calculate the average session duration per user in minutes."
Category 6: Interpreting Results & Storytelling
This is where ChatGPT shines—turning numbers into narratives.
- Prompt: "Here are the key findings from my analysis: 1) Customers from Region A have a 30% higher lifetime value. 2) Product category 'Electronics' has the highest return rate at 15%. 3) Support tickets resolved within 2 hours have a 4.5/5 avg. satisfaction score. Summarize these insights into 3 concise bullet points for a non-technical manager and suggest one recommended action for each."
- Prompt: "Act as a data analyst. I found a strong positive correlation (r=0.85) between marketing spend and website traffic. Draft two paragraphs explaining this finding, its business implications, and a caveat about correlation not implying causation."
Part 3: Pro-Tips & Best Practices
- Iterate: Your first prompt might not be perfect. Follow up with "Now, modify that code to also handle outliers" or "Explain that statistical term in simpler language."
- Provide Sample Data: For complex tasks, provide a small sample of your data structure.
- Example: "My data looks like this:
[{'user': 'Alice', 'clicks': 45}, {'user': 'Bob', 'clicks': 12}]
. Write code to find the user with the maximum clicks." - Ask for Explanations: Don't just accept the code. Ask "Can you explain how this part of the code works?" or "Why did you choose a t-test instead of a z-test here?"
- Specify the Library Version: If compatibility is crucial, mention it. "Using Pandas version 1.5.3, write code to..."
- Error Debugging: Paste your error message and code. "I'm getting this error
ValueError: could not convert string to float: 'N/A'
. Here is my code: [paste code]. How can I fix it?"
By using these structured prompts, you can transform ChatGPT from a simple chatbot into a powerful, on-demand data analysis assistant.
Okay, here's a collection of ChatGPT prompts tailored for data analysis tasks, categorized for clarity. I've aimed to provide a range of prompts from simple to more complex, incorporating different analytical techniques and focusing on clarity and desired output.
I. Data Exploration & Summary
- Prompt 1 (Basic Summary): "Summarize the key statistics for the 'Sales' column in this dataset: [Paste dataset here or provide a link to the data]. Include the mean, median, standard deviation, minimum, and maximum values."
- Prompt 2 (Distribution): "Describe the distribution of the 'Age' variable in this dataset: [Paste dataset here or provide a link to the data]. Is it normally distributed? Are there any outliers or skewness?"
- Prompt 3 (Missing Values): "Identify and quantify the missing values in each column of this dataset: [Paste dataset here or provide a link to the data]. Report the number of missing values and the percentage of missing values for each column."
- Prompt 4 (Data Types): "Analyze the data types of each column in this dataset: [Paste dataset here or provide a link to the data]. Are the data types appropriate for the data they contain? If not, suggest alternative data types."
- Prompt 5 (Unique Values): "List the unique values in the 'Country' column of this dataset: [Paste dataset here or provide a link to the data]. Also, count the occurrences of each unique value."
- Prompt 6 (Top N): "Identify the top 5 most frequent values in the 'Product Category' column of this dataset: [Paste dataset here or provide a link to the data]. Include their respective counts."
- Prompt 7 (Data Profiling): "Perform a basic data profiling of this dataset: [Paste dataset here or provide a link to the data]. Include the number of rows and columns, data types of each column, missing value counts, and basic descriptive statistics for numerical columns."
II. Relationship Analysis & Correlation
- Prompt 8 (Correlation Matrix): "Calculate the correlation matrix for the numerical columns in this dataset: [Paste dataset here or provide a link to the data]. Interpret the strongest positive and negative correlations."
- Prompt 9 (Scatter Plot Analysis): "Analyze the relationship between 'Advertising Spend' and 'Sales' in this dataset: [Paste dataset here or provide a link to the data]. Describe the nature of the relationship (linear, non-linear, positive, negative) and provide a possible explanation."
- Prompt 10 (Grouped Analysis): "Compare the average 'Customer Satisfaction' score for different 'Product Categories' in this dataset: [Paste dataset here or provide a link to the data]. Which product category has the highest average satisfaction score?"
- Prompt 11 (Hypothesis Testing - Correlation): "Test the hypothesis that there is a statistically significant correlation between 'Time Spent on Website' and 'Conversion Rate' in this dataset: [Paste dataset here or provide a link to the data]. State the null and alternative hypotheses, and provide a conclusion based on a significance level of 0.05 (assuming the p-value is provided by you or another tool)." (Note: ChatGPT won't perform statistical tests itself, but it can help you structure your hypothesis and interpret results you get from statistical software).
- Prompt 12 (Segment Analysis): "Analyze the relationship between 'Age' and 'Income', segmented by 'Gender' in this dataset: [Paste dataset here or provide a link to the data]. Are there different trends in income based on age for males and females?"
III. Data Cleaning & Transformation
- Prompt 13 (Outlier Detection): "Identify outliers in the 'Transaction Amount' column of this dataset: [Paste dataset here or provide a link to the data]. Use the IQR (Interquartile Range) method to define outliers."
- Prompt 14 (Data Type Conversion): "Convert the 'Order Date' column in this dataset: [Paste dataset here or provide a link to the data] to a datetime data type. Explain the steps involved."
- Prompt 15 (Handling Missing Values): "Impute the missing values in the 'Salary' column of this dataset: [Paste dataset here or provide a link to the data]. Use the mean imputation method. Justify your choice." (You could also specify median, mode, or a more sophisticated imputation method).
- Prompt 16 (Data Normalization/Standardization): "Normalize the 'Price' column in this dataset: [Paste dataset here or provide a link to the data] using min-max scaling. Explain why normalization might be necessary."
- Prompt 17 (Text Cleaning): "Clean the 'Product Description' column in this dataset: [Paste dataset here or provide a link to the data]. Remove punctuation, convert to lowercase, and remove stop words."
- Prompt 18 (Duplicate Removal): "Identify and remove duplicate rows from this dataset: [Paste dataset here or provide a link to the data]. Explain what constitutes a duplicate in this context (e.g., identical values across all columns)."
IV. Predictive Analysis (Conceptual - ChatGPT can't execute code)
- Prompt 19 (Model Selection): "I want to predict 'Customer Churn' based on 'Age', 'Usage Frequency', and 'Contract Length' in this dataset: [Paste dataset here or provide a link to the data]. Which machine learning algorithm would be most suitable for this task, and why?"
- Prompt 20 (Feature Engineering): "Suggest potential new features that could be engineered from the existing columns in this dataset: [Paste dataset here or provide a link to the data] to improve the accuracy of a model predicting 'Sales'."
- Prompt 21 (Model Evaluation Metrics): "I'm training a model to predict whether a customer will click on an ad. What are the key evaluation metrics I should use to assess the performance of my model, and why?" (Focus on metrics like precision, recall, F1-score, AUC).
- Prompt 22 (Overfitting/Underfitting): "How can I tell if my machine learning model is overfitting or underfitting the data, and what steps can I take to address these problems?"
V. Specific Business Questions
- Prompt 23 (Sales Trends): "Analyze the sales data in this dataset: [Paste dataset here or provide a link to the data] and identify any significant trends or patterns over time. What months or quarters have the highest sales?"
- Prompt 24 (Customer Segmentation): "Based on the customer data in this dataset: [Paste dataset here or provide a link to the data], suggest potential customer segments that could be targeted with different marketing campaigns."
- Prompt 25 (Risk Assessment): "Using the loan application data in this dataset: [Paste dataset here or provide a link to the data], identify the factors that are most strongly associated with loan default."
- Prompt 26 (Marketing Campaign Analysis): "Analyze the results of a marketing campaign in this dataset: [Paste dataset here or provide a link to the data]. Which campaign was the most effective in terms of conversion rate and cost per acquisition?"
- Prompt 27 (Inventory Optimization): "Based on the sales and inventory data in this dataset: [Paste dataset here or provide a link to the data], suggest strategies for optimizing inventory levels to minimize stockouts and holding costs."
VI. Prompting for Specific Output Formats
- Prompt 28 (Table Output): "Present the summary statistics (mean, median, standard deviation, min, max) for the 'Price' and 'Quantity' columns in this dataset: [Paste dataset here or provide a link to the data] in a table format."
- Prompt 29 (Bullet Point Summary): "Summarize the key findings from your analysis of this dataset: [Paste dataset here or provide a link to the data] in a concise bullet point list."
- Prompt 30 (Executive Summary): "Write a brief executive summary (no more than 200 words) summarizing the key insights from this dataset: [Paste dataset here or provide a link to the data]. Focus on the implications for business decision-making."
VII. Prompts for Code (Python with Pandas - Use with caution, always test!)
- Prompt 31: "Write Python code using the Pandas library to load this dataset [Paste dataset here as CSV or a link to a CSV file] into a DataFrame."
- Prompt 32: "Write Python code using the Pandas library to calculate the correlation matrix of numerical columns in this dataset [Paste dataset here as CSV or a link to a CSV file] and display it as a heatmap using Seaborn."
- Prompt 33: "Write Python code using the Pandas library to group the data by 'Category' and calculate the average 'Price' for each category in this dataset [Paste dataset here as CSV or a link to a CSV file]."
Important Considerations When Using These Prompts:
- Data Privacy: Be extremely cautious about sharing sensitive or confidential data with ChatGPT. Consider using anonymized or synthetic data for testing purposes.
- Data Format: ChatGPT is better at handling data in a clear, structured format like CSV (comma-separated values). If your data is in a different format, you may need to convert it first.
- Specificity: The more specific you are in your prompt, the better the results will be. Clearly define what you want to analyze, what relationships you're interested in, and what kind of output you expect.
- Context: Provide as much context as possible about the data and your goals. This will help ChatGPT understand your needs and provide more relevant insights.
- Verification: Always critically evaluate the output from ChatGPT. Don't blindly trust its analysis. Double-check its calculations and interpretations. ChatGPT is a tool to assist you, not a replacement for your own judgment.
- Limitations: ChatGPT cannot perform real data analysis requiring execution of code directly (e.g., statistical tests). It can generate example code, explain concepts, and help you formulate hypotheses, but you'll need to use dedicated data analysis tools (Python with Pandas/Scikit-learn, R, SPSS, etc.) to actually perform the computations.
- Iterative Process: Data analysis is often an iterative process. Start with broad questions and then refine your prompts based on the results you get.
By using these prompts as a starting point and adapting them to your specific needs, you can leverage ChatGPT to explore your data, generate insights, and answer important business questions. Remember to always verify the results and use your own judgment. Good luck!
Certainly! Here are some prompts you can use to engage with ChatGPT for data analysis tasks:
General Data Analysis Prompts
- Data Cleaning:
- "Can you help me clean this dataset? It has missing values and some inconsistent entries."
- "How can I handle outliers in my dataset?"
- Data Exploration:
- "What are some common techniques for exploratory data analysis (EDA)?"
- "Can you generate a summary statistics report for this dataset?"
- Data Visualization:
- "What are the best types of charts to visualize time series data?"
- "Can you create a bar chart to compare the sales figures for different regions?"
- Statistical Analysis:
- "How can I perform a t-test to compare two groups in my dataset?"
- "Can you explain the concept of p-value and how to interpret it?"
- Machine Learning:
- "What are the steps to build a predictive model using this dataset?"
- "Can you help me choose the right machine learning algorithm for my classification problem?"
Specific Data Analysis Prompts
- Customer Segmentation:
- "How can I segment my customer data based on purchasing behavior?"
- "Can you help me perform a cluster analysis on this dataset?"
- Sales Forecasting:
- "What are the best methods for forecasting future sales based on historical data?"
- "Can you help me build a time series forecasting model?"
- Market Basket Analysis:
- "How can I identify which products are frequently bought together?"
- "Can you help me perform an association rule mining on this transaction dataset?"
- Sentiment Analysis:
- "How can I analyze the sentiment of customer reviews?"
- "Can you help me build a sentiment analysis model using natural language processing?"
- Anomaly Detection:
- "How can I detect anomalies in my dataset?"
- "Can you help me implement an anomaly detection algorithm?"
Advanced Data Analysis Prompts
- Deep Learning:
- "How can I use deep learning for image classification?"
- "Can you help me build a neural network for predicting stock prices?"
- Natural Language Processing (NLP):
- "How can I perform topic modeling on a large text dataset?"
- "Can you help me build a chatbot using NLP techniques?"
- Big Data:
- "How can I process and analyze large datasets using Hadoop and Spark?"
- "Can you help me set up a data pipeline for real-time data analysis?"
- Data Privacy and Ethics:
- "What are the best practices for ensuring data privacy in my analysis?"
- "Can you help me understand the ethical implications of data analysis?"
Practical Examples
- Data Cleaning Example:
- "I have a dataset with missing values in the 'age' column. How can I impute these missing values?"
- Data Visualization Example:
- "Can you help me create a heatmap to visualize the correlation matrix of my dataset?"
- Statistical Analysis Example:
- "How can I perform a chi-square test to determine if there is a significant association between two categorical variables?"
- Machine Learning Example:
- "Can you help me build a decision tree model to predict customer churn?"
- Advanced Analysis Example:
- "How can I use LSTM (Long Short-Term Memory) networks for time series forecasting?"
These prompts should help you get started with various data analysis tasks using ChatGPT.
💡 Try this comparison yourself:Compare AI models side-by-side on SNEOS
Analysis
This comparison demonstrates the different approaches each AI model takes when responding to the same prompt. Here are the key differences observed:
Response Characteristics
DeepSeek: Provides a direct response with 99 sentences.
Gemini: Provides a detailed response with 97 sentences.
Mistral: Provides a direct response with 56 sentences.
Key Takeaways
- Each model brings unique strengths to this type of query
- Response styles vary significantly between models
- Consider your specific use case when choosing between these models
Try This Comparison Yourself
Want to test these models with your own prompts? Visit SNEOS.com to compare AI responses side-by-side in real-time.
This comparison was generated using the SNEOS AI Comparison ToolPublished: October 01, 2025 | Models: DeepSeek, Gemini, Mistral