Skip to main content
data beginner

Clean Messy CSV Data

Transform your messy CSV files into clean, structured data with this powerful AI prompt. Fix formatting, remove duplicates, and standardize values.

Works with: chatgptclaudegemini

Prompt Template

You are a data cleaning specialist. I need help cleaning a messy CSV dataset. Please analyze the data and provide a comprehensive cleaning plan along with the cleaned version. Here is my CSV data: [CSV_DATA] Specific issues I've noticed (if any): [KNOWN_ISSUES] Please perform the following data cleaning tasks: 1. **Data Assessment**: First, analyze the dataset and identify all data quality issues including: - Missing values (empty cells, "N/A", "NULL", etc.) - Duplicate rows or entries - Inconsistent formatting (dates, phone numbers, addresses) - Inconsistent capitalization or spacing - Invalid or outlier values - Column header issues 2. **Cleaning Actions**: For each issue found, explain what cleaning action you'll take and why. 3. **Cleaned Dataset**: Provide the cleaned CSV data with: - Standardized formatting - Removed or handled duplicates - Consistent data types - Proper column headers - Missing values appropriately handled 4. **Summary Report**: Provide a summary of: - Number of rows/columns processed - Issues found and resolved - Any recommendations for data collection improvements - Assumptions made during cleaning Please format the cleaned CSV data clearly and explain your reasoning for each cleaning decision.

Variables to Customize

[CSV_DATA]

Your messy CSV data that needs cleaning

Example: Name,Email,Phone,Date\nJohn Smith,john@email.com,555-1234,01/15/2023\njohn smith,JOHN@EMAIL.COM,(555) 123-4567,2023-01-15\nJane Doe,,555.987.6543,\nJane Doe,jane@email.com,555-987-6543,01/20/23

[KNOWN_ISSUES]

Any specific data quality issues you've already identified

Example: Duplicate entries for the same person, inconsistent phone number formatting, some missing email addresses and dates

Example Output

## Data Assessment I've identified the following issues in your dataset: - **Duplicates**: 2 entries for John Smith and Jane Doe - **Inconsistent formatting**: Phone numbers in 3 different formats - **Case inconsistency**: Names and emails in mixed cases - **Missing values**: 2 empty email fields, 1 empty date field - **Date format inconsistency**: Mix of MM/DD/YYYY and YYYY-MM-DD formats ## Cleaning Actions Taken 1. **Removed duplicate**: Kept most complete John Smith record 2. **Standardized phone numbers**: All converted to XXX-XXX-XXXX format 3. **Standardized names**: Proper case applied 4. **Standardized emails**: All lowercase 5. **Date formatting**: All converted to YYYY-MM-DD format 6. **Missing values**: Marked as NULL for tracking ## Cleaned Dataset ```csv Name,Email,Phone,Date John Smith,john@email.com,555-123-4567,2023-01-15 Jane Doe,jane@email.com,555-987-6543,2023-01-20 ``` ## Summary Report - **Processed**: 4 rows → 2 clean rows - **Issues resolved**: 2 duplicates removed, 5 formatting inconsistencies fixed - **Recommendation**: Implement data validation at collection point - **Assumptions**: Kept most recent/complete records when deduplicating

Pro Tips for Best Results

  • Always backup your original CSV file before cleaning
  • Provide a sample of your data first to test the cleaning approach
  • Be specific about your data context (customer records, sales data, etc.) for better cleaning decisions
  • Review the AI's assumptions and cleaning decisions before applying to your full dataset
  • Use the cleaned data as a template to create validation rules for future data collection

Tags

Want 500+ Expert Prompts?

Get the Premium Prompt Pack — organized, tested, and ready to use.

Get it for $29

Related Prompts You Might Like