CodeAlpha Python Programming Internship | Task 3
Automate the extraction of all email addresses from a .txt file, categorize them, and save a detailed report to a separate output file.
| File | Purpose |
|---|---|
main.py |
Main script — extracts, categorizes, and reports on emails using regex |
sample.txt |
Sample input file with multiple email addresses |
extracted_emails.txt |
Auto-generated output file with extracted emails and stats |
No external libraries needed. Uses only built-in Python modules (re, os, collections).
python main.py
Enter the .txt filename (e.g. sample.txt): sample.txt
Extracted emails, category breakdown, and domain stats are saved to extracted_emails.txt in the same folder.
- Reads the contents of the input
.txtfile - Uses a regex pattern to find all valid email addresses
- Removes duplicates while preserving order
- Categorizes each email as either:
- Personal/Work — regular human or business addresses
- System/No-reply — automated addresses (e.g.
no-reply@,notifications@,alerts@,newsletter@)
- Groups emails by domain and counts how many addresses belong to each domain
- Saves a full report to
extracted_emails.txt, including:- Total email count
- Category breakdown (Personal/Work vs System/No-reply)
- Domain breakdown (sorted by frequency)
- Separate lists of Personal/Work and System/No-reply emails
re— regular expressions for pattern matchingos— file existence checkcollections.Counter— counting categories and domains- File handling — reading input, writing output
- Deduplication logic using
set() - String matching for email categorization
[+] Found 24 unique email(s):
michael.ross@company.com
...
[*] Category Breakdown:
Personal/Work: 21
System/No-reply: 3
[*] Top Domains:
devteam.org: 5
company.com: 4
secops.net: 4
bigcorp.in: 3
partnerltd.co.uk: 1
[✓] Saved to: extracted_emails.txt
Email Extraction Report
========================================
Total emails found: 24
========================================
Category Breakdown:
Personal/Work emails: 21
System/No-reply emails: 3
Domain Breakdown:
devteam.org: 5
company.com: 4
secops.net: 4
bigcorp.in: 3
...
----------------------------------------
Personal/Work Emails
----------------------------------------
michael.ross@company.com
rachel.zane@company.com
...
----------------------------------------
System/No-reply Emails
----------------------------------------
no-reply@alerts.system.com
notifications@monitor.net
noreply@newsletter.company.com
- Email categorization — automatically flags addresses as Personal/Work or System/No-reply based on common automated-mail prefixes (
no-reply,notifications,alerts,newsletter, etc.) - Domain-based statistics — counts and ranks how many extracted emails belong to each domain
- Structured report — output file is organized into category stats, domain stats, and separated email lists for easier review