Extracting Flight Data from Email
Guide for identifying actual flight bookings and extracting structured information from Gmail.
For Gmail search query patterns, see Gmail Search Patterns (Travel > Flights section).
Identifying Flight Confirmations
Not every email with "flight" in the subject is a flight confirmation. Use inference judgment to filter.
Actual flight confirmations contain:
- Flight numbers (e.g., "UA 123", "BA456")
- PNR/confirmation codes (6 alphanumeric characters)
- Specific departure/arrival times and airports
- Airline name or logo
- Passenger name matching the user
Exclude:
- Hotel-only bookings (no flight info)
- Car rental confirmations
- Flight deal newsletters and promotions
- Price alerts and fare tracking emails
- Generic "thank you for booking" without flight details
Extracting Flight Details
For each confirmed flight email, extract:
| Field | What to Look For | Example |
|---|---|---|
| Departure City | Airport code (3 letters) or city name near "from", "depart" | SFO, San Francisco |
| Arrival City | Airport code or city name near "to", "arrive" | CDG, Paris |
| Departure Date | Date near departure info, format varies | Jan 15, 2024, 2024-01-15 |
| Departure Time | Time in 12h or 24h format | 10:30 AM, 14:45 |
| Arrival Date | Often same as departure, check for overnight | Jan 16, 2024 |
| Arrival Time | Time at destination (local time) | 07:00 |
| Airline | Carrier name or 2-letter code | United, UA |
| Flight Number | Alphanumeric, often after airline | UA 837 |
| Confirmation Code | 6-character PNR | ABC123 |
Airport to Country Mapping
Common airport codes and their countries:
North America: JFK/EWR/LGA (USA-NY), LAX/SFO (USA-CA), ORD (USA-IL), YYZ (Canada), MEX (Mexico)
Europe: LHR/LGW (UK), CDG/ORY (France), FRA/MUC (Germany), AMS (Netherlands), FCO (Italy), MAD/BCN (Spain), ZRH (Switzerland)
Asia: NRT/HND (Japan), ICN (South Korea), PEK/PVG (China), SIN (Singapore), BKK (Thailand), HKG (Hong Kong)
Middle East: DXB (UAE), DOH (Qatar), TLV (Israel)
Oceania: SYD/MEL (Australia), AKL (New Zealand)
For unlisted airports, use the city name to determine country.
Handling Ambiguity
Multiple segments: A single confirmation may have multiple flights (outbound + return, or connections). Extract each segment separately.
Missing data: If departure time is missing but date is known, record the date. If airport code is ambiguous, prefer the full city name.
Duplicates: The same flight may appear multiple times (confirmation + reminder + check-in). Deduplicate by: same date + same route + same flight number.
Output Format
Write extracted flights as JSON array:
[
{
"source": "email",
"confirmationCode": "ABC123",
"airline": "United",
"flightNumber": "UA 837",
"departureCity": "San Francisco",
"departureAirport": "SFO",
"departureCountry": "USA",
"departureDate": "2024-01-15",
"departureTime": "10:30",
"arrivalCity": "Paris",
"arrivalAirport": "CDG",
"arrivalCountry": "France",
"arrivalDate": "2024-01-16",
"arrivalTime": "07:00"
}
]Sort by departure date ascending. Include all fields even if some are null.