ChatGPT vs Claude vs Grok: Python Data Analysis
Certainly! Here's a brief overview and example usage of pandas, numpy, and matplotlib in Python for data analysis:
1. NumPy
- Purpose: Fundamental package for numerical computing in Python.
- Use: Provides support for arrays, mathematical functions, linear algebra, random number generation, etc.
Example:
import numpy as np# Create a numpy array
arr = np.array([1, 2, 3, 4, 5])
# Basic operations
mean = np.mean(arr)
std_dev = np.std(arr)
print("Mean:", mean)
print("Standard Deviation:", std_dev)
2. pandas
- Purpose: Library providing data structures and data analysis tools.
- Use: Handling and manipulating tabular data (similar to spreadsheets or SQL tables).
Example:
import pandas as pd# Create a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Salary': [70000, 80000, 120000, 95000]
}
df = pd.DataFrame(data)
# Basic data exploration
print(df.head())
print(df.describe())
# Filter data
high_salary = df[df['Salary'] > 90000]
print(high_salary)
3. Matplotlib
- Purpose: Plotting library for creating static, animated, and interactive visualizations.
- Use: Visualizing data distributions, trends, comparisons, etc.
Example:
import matplotlib.pyplot as plt# Sample data
x = [1, 2, 3, 4, 5]
y = [10, 15, 7, 12, 9]
# Line plot
plt.plot(x, y, label='Sample Data')
# Add labels and title
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.title('Simple Line Plot')
plt.legend()
# Show plot
plt.show()
Combined Example: Using all three together
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt# Generate random data using numpy
np.random.seed(0)
ages = np.random.randint(20, 60, size=100)
salaries = np.random.normal(50000, 15000, size=100)
# Create a DataFrame
df = pd.DataFrame({'Age': ages, 'Salary': salaries})
# Analyze data
print(df.describe())
# Scatter plot to visualize relationship between Age and Salary
plt.scatter(df['Age'], df['Salary'], alpha=0.6)
plt.title('Age vs Salary')
plt.xlabel('Age')
plt.ylabel('Salary')
plt.show()
If you want, I can provide more detailed examples, tutorials, or explain any specific functions!
Python for Data Analysis: pandas, numpy, matplotlib
I'll walk you through the essential libraries for data analysis in Python with practical examples.
1. NumPy - Numerical Computing Foundation
NumPy provides the foundation for numerical computing in Python.
import numpy as np# Creating arrays
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
# Array operations
print(f"Array shape: {arr2.shape}")
print(f"Array dtype: {arr1.dtype}")
# Mathematical operations
numbers = np.array([1, 2, 3, 4, 5])
print(f"Mean: {np.mean(numbers)}")
print(f"Standard deviation: {np.std(numbers)}")
print(f"Sum: {np.sum(numbers)}")
# Broadcasting
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
result = matrix + 10 # Adds 10 to each element
print("Matrix + 10:")
print(result)
# Random number generation
random_data = np.random.normal(0, 1, 1000) # Normal distribution
print(f"Random data mean: {np.mean(random_data):.2f}")
2. Pandas - Data Manipulation and Analysis
Pandas is built on NumPy and provides high-level data structures.
import pandas as pd
import numpy as np# Creating DataFrames
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'City': ['New York', 'London', 'Tokyo', 'Paris', 'Sydney'],
'Salary': [50000, 60000, 70000, 55000, 65000]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print(f"\nDataFrame info:")
print(df.info())
# Basic operations
print(f"\nFirst 3 rows:")
print(df.head(3))
print(f"\nBasic statistics:")
print(df.describe())
# Filtering data
high_earners = df[df['Salary'] > 55000]
print(f"\nHigh earners:")
print(high_earners)
# Grouping and aggregation
city_stats = df.groupby('City')['Salary'].agg(['mean', 'count'])
print(f"\nSalary stats by city:")
print(city_stats)
# Adding new columns
df['Salary_Category'] = df['Salary'].apply(
lambda x: 'High' if x > 60000 else 'Medium' if x > 50000 else 'Low'
)
print(f"\nDataFrame with salary category:")
print(df)
3. Reading and Writing Data
```python
Create sample data and save to CSV
sample_data = pd.DataFrame({
'Date': pd.date_range('2024-01-01', periods=100),
'Sales': np.random.randint(100, 1000, 100),
'Product': np.random.choice(['A', 'B', 'C'], 100),
'Region': np.random.choice(['North', 'South', 'East', 'West'], 100)
})
Save to CSV
sampledata.tocsv('sales_data.csv', index=False)
Read from CSV
dfsales = pd.readcsv('sales_data.csv')
dfsales['Date'] = pd.todatetime(df_sales['Date']) # Convert to datetime
print("Sales data sample:")
print(df_sales.head())
Data analysis
monthlysales = dfsales.groupby(df_sales['Date'].dt.month)['Sales'].sum()
productperformance = dfsales.groupby('Product')['Sales'].agg(['sum', 'mean', 'count'])
print(f"\nMonthly sales:")
Python is one of the most popular programming languages for data analysis due to its powerful libraries like Pandas, NumPy, and Matplotlib. Below, I’ll provide an overview of each library, along with explanations and examples to help you get started with data analysis in Python.
1. NumPy (Numerical Python)
NumPy is the foundational library for numerical computations in Python. It provides support for arrays, matrices, and high-performance mathematical functions.
Key Features:
- Efficient multi-dimensional arrays (
ndarray
). - Broadcasting for element-wise operations.
- Mathematical functions (e.g., trigonometry, statistics, linear algebra).
- Random number generation.
Installation:
If not already installed, you can install NumPy via pip:
pip install numpy
Example: Basic Operations with NumPy
import numpy as np# Create a 1D array
arr = np.array([1, 2, 3, 4, 5])
print("Array:", arr)
# Perform element-wise operations
arr_squared = arr ** 2
print("Squared Array:", arr_squared)
# Create a 2D array (matrix)
matrix = np.array([[1, 2], [3, 4]])
print("Matrix:\n", matrix)
# Matrix multiplication
result = np.dot(matrix, matrix)
print("Matrix Multiplication:\n", result)
# Statistical operations
mean_val = np.mean(arr)
print("Mean of Array:", mean_val)
Output:
Array: [1 2 3 4 5]
Squared Array: [ 1 4 9 16 25]
Matrix:
[[1 2]
[3 4]]
Matrix Multiplication:
[[ 7 10]
[15 22]]
Mean of Array: 3.0
2. Pandas (Python Data Analysis Library)
Pandas is built on top of NumPy and is used for data manipulation and analysis. It provides data structures like
Series
(1D) and DataFrame
(2D) to handle tabular data efficiently.Key Features:
- DataFrame for handling structured data (like spreadsheets or SQL tables).
- Tools for reading/writing data (CSV, Excel, JSON, SQL, etc.).
- Data cleaning, filtering, merging, and grouping.
- Handling missing data.
Installation:
Install Pandas via pip:
pip install pandas
Example: Working with a DataFrame
import pandas as pd# Create a dictionary to form a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000, 60000, 75000]
}
# Convert to DataFrame
df = pd.DataFrame(data)
print("DataFrame:\n", df)
# Basic operations
print("\nAverage Salary:", df['Salary'].mean())
# Filter data
filtered_df = df[df['Age'] > 25]
print("\nFiltered DataFrame (Age > 25):\n", filtered_df)
# Add a new column
df['Bonus'] = df['Salary'] * 0.1
print("\nDataFrame with Bonus Column:\n", df)
Output:
DataFrame:
Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 75000Average Salary: 61666.666666666664
Filtered DataFrame (Age > 25):
Name Age Salary
1 Bob 30 60000
2 Charlie 35 75000
DataFrame with Bonus Column:
Name Age Salary Bonus
0 Alice 25 50000 5000.0
1 Bob 30 60000 6000.0
2 Charlie 35 75000 7500.0
3. Matplotlib (Matplotlib.pyplot)
Matplotlib is a plotting library for creating visualizations in Python. The
pyplot
module provides a simple interface for generating plots like line charts, bar charts, histograms, etc.Key Features:
- Customizable 2D and 3D plots.
- Integration with Pandas and NumPy for data visualization.
- Wide variety of plot types (scatter, bar, pie, etc.).
Installation:
Install Matplotlib via pip:
pip install matplotlib
Example: Plotting Data
import matplotlib.pyplot as plt
import numpy as np# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])
# Create a line plot
plt.plot(x, y, label='Linear Trend', color='blue', marker='o')
plt.title('Simple Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.grid(True)
plt.show()
# Create a bar plot using Pandas DataFrame
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Salary': [50000, 60000, 75000]}
df = pd.DataFrame(data)
plt.bar(df['Name'], df['Salary'], color='green')
plt.title('Salary Bar Chart')
plt.xlabel('Name')
plt.ylabel('Salary')
plt.show()
Output:
- The first code snippet generates a line plot with a linear trend.
- The second snippet creates a bar chart showing salaries for different individuals.
Combining Pandas, NumPy, and Matplotlib for Data Analysis
Here’s an end-to-end example where we analyze a small dataset and visualize the results.
Example: Sales Data Analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt# Create a sample dataset
data = {
'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'],
'Sales': [200, 250, 300, 280, 320],
'Expenses': [180, 200, 220, 210, 240]
}
df = pd.DataFrame(data)
# Calculate Profit
df['Profit'] = df['Sales'] - df['Expenses']
print("DataFrame:\n", df)
# Summary statistics using NumPy
mean_sales = np.mean(df['Sales'])
print("\nAverage Sales:", mean_sales)
# Plot Sales and Expenses over Months
plt.figure(figsize=(10, 5))
plt.plot(df['Month'], df['Sales'], label='Sales', color='blue', marker='o')
plt.plot(df['Month'], df['Expenses'], label='Expenses', color='red', marker='s')
plt.title('Sales vs Expenses Over Months')
plt.xlabel('Month')
plt.ylabel('Amount')
plt.legend()
plt.grid(True)
plt.show()
Output:
DataFrame:
Month Sales Expenses Profit
0 Jan 200 180 20
1 Feb 250 200 50
2 Mar 300 220 80
3 Apr 280 210 70
4 May 320 240 80Average Sales: 270.0
- A line plot will be displayed showing Sales and Expenses trends over the months.
Key Takeaways
- NumPy: Use for numerical computations and array operations.
- Pandas: Use for data manipulation and working with tabular data (DataFrames).
- Matplotlib: Use for creating visualizations to explore and communicate insights.
These libraries are often used together in data analysis workflows. Start with small datasets to practice, and gradually explore more advanced features like grouping in Pandas, matrix operations in NumPy, and advanced plotting in Matplotlib.
Let me know if you'd like deeper explanations or examples on any specific topic!
💡 Try this comparison yourself:Compare AI models side-by-side on SNEOS
Analysis
This comparison demonstrates the different approaches each AI model takes when responding to the same prompt. Here are the key differences observed:
Response Characteristics
ChatGPT: Provides a direct response with 39 sentences.
Claude: Provides a direct response with 46 sentences.
Grok: Provides a direct response with 78 sentences.
Key Takeaways
- Each model brings unique strengths to this type of query
- Response styles vary significantly between models
- Consider your specific use case when choosing between these models
Try This Comparison Yourself
Want to test these models with your own prompts? Visit SNEOS.com to compare AI responses side-by-side in real-time.
This comparison was generated using the SNEOS AI Comparison ToolPublished: September 30, 2025 | Models: ChatGPT, Claude, Grok