ChatGPT vs DeepSeek vs Mistral: E-Discovery
Predictive coding is a technique used in various fields such as data compression, signal processing, and machine learning to reduce redundancy by predicting future or unknown data points from known data, then encoding only the differences (residuals). Here's a simple explanation and example of predictive coding applied to small cases:
What is Predictive Coding?
Predictive coding works by estimating the value of a data point based on previous values and then encoding the difference between the actual and predicted value rather than the actual value itself. This difference is typically smaller and more compressible.
Simple Example: Predictive Coding for a Small Sequence
Suppose you have a small sequence of integers:
Original sequence: 10, 12, 14, 13, 15
Step 1: Choose a Prediction Method
The simplest prediction method for a sequence is to predict that the next value is the same as the previous one.
Step 2: Predict and Encode Differences
Index | Actual Value | Predicted Value (previous actual) | Difference (Actual - Predicted) |
---|---|---|---|
0 | 10 | (no previous value) | 10 |
1 | 12 | 10 | 2 |
2 | 14 | 12 | 2 |
3 | 13 | 14 | -1 |
4 | 15 | 13 | 2 |
Step 3: Store the Differences
Instead of storing the original sequence, you store:
10, 2, 2, -1, 2
The first value is stored as-is because there is no previous value to predict from.
Reconstruction:
To reconstruct the original data:
- Start with the first value: 10
- Add differences cumulatively:
Index 1: 10 + 2 = 12
Index 2: 12 + 2 = 14
Index 3: 14 + (-1) = 13
Index 4: 13 + 2 = 15
Sequence reconstructed perfectly.
Summary
- Predictive coding predicts data points based on previous points.
- Only the difference (residual) between actual and predicted values is encoded.
- This usually reduces the amount of data to be transmitted or stored.
- Simple predictors (previous value) are easy but can be improved with more sophisticated methods (e.g., linear prediction).
If you want examples for other types of small datasets or more complex predictive models, please let me know!
Of course! Let's break down predictive coding with small, concrete examples. We'll move from the core intuition to a simple numerical example and a Python implementation.
1. The Core Intuition in a Nutshell
At its heart, predictive coding is a "guess and correct" mechanism for the brain (and for AI).
- Prediction: Your brain has an internal model of the world. It constantly uses this model to predict what you will see, hear, or feel next.
- Comparison: It then compares this prediction to the actual sensory input it receives.
- Error Calculation: The difference between the prediction and the reality is the prediction error.
- Learning & Updating:
- The brain uses this error to update its internal model to make better predictions in the future (learning).
- It also uses the error to update its current belief about the state of the world (perception).
It's like your brain is a scientist, always forming hypotheses (predictions) and testing them against data (sensory input).
2. A Simple, Everyday Example: The Hot Coffee Mug
Let's say you're about to pick up a mug you think is full of hot coffee.
- Prediction: Your brain's model, based on context (coffee shop, steam you saw earlier), predicts: "This mug will be hot."
- Sensation: Your fingers touch the mug. The actual sensory input is: "The mug is cold."
- Prediction Error:
Cold (Reality) - Hot (Prediction) = A large prediction error.
Your brain is surprised! - Updating:
- Perception: The strong error signal immediately makes you perceive the mug as "surprisingly cold." It forces a update in your conscious perception.
- Learning: Later, you update your internal model: "That specific mug in the office microwave might not be hot, even if it looks like it." Future predictions will be more accurate.
3. A Numerical Example: Predicting a Sequence
Let's take a toy problem: predicting the next number in a simple sequence. Our "brain" will be a single variable, our model's prediction. Our "sensory input" is the next number in the sequence.
Sequence: 2, 4, 6, 8, ?
We'll use a very simple learning rule: New Prediction = Old Prediction + (Learning Rate * Prediction Error)
Let's set a Learning Rate of 0.1
. This controls how quickly we update our model based on error.
Step | Sensory Input (Actual) | Model's Prediction | Prediction Error (Input - Prediction) | Update (Learning Rate * Error) | New Model Prediction (Old Prediction + Update) |
---|---|---|---|---|---|
1 | 2 | 0 (initial) | 2 - 0 = 2 | 0.1 * 2 = 0.2 | 0 + 0.2 = 0.2 |
2 | 4 | 0.2 | 4 - 0.2 = 3.8 | 0.1 * 3.8 = 0.38 | 0.2 + 0.38 = 0.58 |
3 | 6 | 0.58 | 6 - 0.58 = 5.42 | 0.1 * 5.42 = 0.542 | 0.58 + 0.542 = 1.122 |
4 | 8 | 1.122 | 8 - 1.122 = 6.878 | 0.1 * 6.878 = 0.6878 | 1.122 + 0.6878 = 1.8098 |
5 | ? | 1.8098 | ... | ... | ... |
What happened?
- Our model started with a terrible prediction (0).
- The large errors forced it to quickly increase its prediction.
- After just 4 steps, its prediction for the next number is ~1.81. While not perfect (the pattern suggests 10), it's clearly learning the upward trend. With more data, it would converge to predicting
n * 2
.
4. A Simple Python Implementation
Let's code the example above. We'll make it slightly more complex by having our model be a simple linear rule: prediction = weight * input
.
Our goal is to learn the correct weight
. The true rule is nextnumber = 2 * currentnumber
.
import numpy as np# A simple dataset: input (x) and the true output (y) following the rule y = 2x
x_data = np.array([1, 2, 3, 4])
y_data = np.array([2, 4, 6, 8])
# Initialize our model: just a single weight
weight = 0.1
learning_rate = 0.05
epochs = 10 # Number of times to loop through the data
print(f"Initial weight: {weight:.4f}\n")
for epoch in range(epochs):
total_error = 0
for i in range(len(x_data)):
# 1. PREDICTION
x = x_data[i]
y_true = y_data[i]
y_pred = weight * x
# 2. COMPARISON & ERROR CALCULATION
error = y_true - y_pred
total_error += error**2 # Squared error
# 3. LEARNING (UPDATE THE MODEL)
# The update is: learning_rate * error * input (a simple gradient)
weight_update = learning_rate * error * x
weight += weight_update
# Print details for the first epoch
if epoch == 0:
print(f" x={x}, True={y_true}, Pred={y_pred:.4f}, Error={error:.4f}, Update={weight_update:.4f}, New Weight={weight:.4f}")
print(f"Epoch {epoch+1}, Weight: {weight:.4f}, Total Error: {total_error:.4f}")
# Test the final model
print("\n--- Testing the trained model ---")
test_x = 5
prediction = weight * test_x
print(f"Input {test_x} -> Prediction: {prediction:.4f} (True value should be {2*test_x})")
Expected Output:
Initial weight: 0.1000 x=1, True=2, Pred=0.1000, Error=1.9000, Update=0.0950, New Weight=0.1950
x=2, True=4, Pred=0.3900, Error=3.6100, Update=0.3610, New Weight=0.5560
x=3, True=6, Pred=1.6680, Error=4.3320, Update=0.6498, New Weight=1.2058
x=4, True=8, Pred=4.8232, Error=3.1768, Update=0.6354, New Weight=1.8412
Epoch 1, Weight: 1.8412, Total Error: 39.9114
Epoch 2, Weight: 1.9921, Total Error: 0.6944
Epoch 3, Weight: 1.9998, Total Error: 0.0028
...
Epoch 10, Weight: 2.0000, Total Error: 0.0000
--- Testing the trained model ---
Input 5 -> Prediction: 10.0000 (True value should be 10)
What the code demonstrates:
- Prediction:
y_pred = weight * x
- Error:
error = ytrue - ypred
- Learning:
weight += learning_rate error x
(This is a simplified version of gradient descent). - The model successfully learns the correct weight (
2.0
) to make perfect predictions.
Summary for Small Cases
- Core Idea: A continuous cycle of prediction -> error calculation -> model updating.
- Key Components: Generative Model, Prediction, Prediction Error, Feedback.
- Small Case Usefulness: It's a fantastic framework for understanding learning algorithms (like this simple linear regressor), designing filters (e.g., Kalman Filters), and building robust AI systems that can handle unexpected inputs. For cognitive science, it provides a powerful metaphor for perception and learning.
Predictive coding, also known as Technology-Assisted Review (TAR), is a machine learning technique often used in eDiscovery to help identify relevant documents in a dataset. While it's commonly used in large-scale cases, it can also be beneficial for smaller cases. Here's how you can apply predictive coding to small cases effectively:
- Initial Review: Start with a human review of a small subset of documents. This initial set should be randomly selected and manually coded as relevant or non-relevant.
- Training the System: Use the coded documents to train the predictive coding system. The system will analyze the features of the relevant and non-relevant documents to create a model that can predict the relevance of uncoded documents.
- Iterative Process: Run the model on the remaining documents and validate the results with another round of human review. This is an iterative process where the system's predictions are checked and used to further refine the model.
- Quality Control: Implement quality control measures to ensure the accuracy of the predictive coding. This can include statistical sampling and human spot-checking.
- Document Ranking: Use the predictive coding system to rank documents by their likelihood of relevance. This can help prioritize review efforts even in smaller cases.
- Continuous Active Learning (CAL): In smaller cases, it's particularly important to use a workflow that combines human review and machine learning effectively. CAL involves continuous training of the model with new documents as they are reviewed, improving the model's accuracy over time.
Benefits for Small Cases
- Efficiency: Even in smaller cases, predictive coding can significantly reduce the time and cost associated with manual review.
- Consistency: The model can help ensure consistent coding across the dataset, reducing human error.
- Prioritization: By ranking documents, predictive coding can help focus review efforts on the most likely relevant documents first.
- Scalability: The same techniques used in large cases can be scaled down for smaller datasets, making the process more flexible.
Challenges
- Data Volume: Smaller datasets may not provide enough training data to create a highly accurate model initially. This can be mitigated by careful selection of the initial training set.
- Overfitting: There's a risk of the model overfitting to the small training set, which can be addressed by using cross-validation techniques.
- Cost-Benefit Analysis: The cost of setting up and running predictive coding might not be justified for very small cases, so a cost-benefit analysis is essential.
Best Practices
- Start Small: Begin with a small, well-chosen training set to ensure the model is trained on high-quality data.
- Iterate Frequently: Regularly update the model with new data to improve its accuracy.
- Validate Results: Continuously validate the model's predictions with human review to ensure reliability.
- Use Expert Judgment: Involve subject matter experts in the initial review and validation stages to enhance the quality of the training data.
By following these steps and best practices, predictive coding can be effectively applied to smaller cases, providing significant benefits in terms of efficiency, consistency, and accuracy.
💡 Try this comparison yourself:Compare AI models side-by-side on SNEOS
Analysis
This comparison demonstrates the different approaches each AI model takes when responding to the same prompt. Here are the key differences observed:
Response Characteristics
ChatGPT: Provides a detailed response with 14 sentences.
DeepSeek: Provides a concise response with 146 sentences.
Mistral: Provides a direct response with 44 sentences.
Key Takeaways
- Each model brings unique strengths to this type of query
- Response styles vary significantly between models
- Consider your specific use case when choosing between these models
Try This Comparison Yourself
Want to test these models with your own prompts? Visit SNEOS.com to compare AI responses side-by-side in real-time.
This comparison was generated using the SNEOS AI Comparison ToolPublished: October 01, 2025 | Models: ChatGPT, DeepSeek, Mistral