Extracting events data from Wikipedia into an SQL table using ChatGPT
Darwin Timeline Project
sql
timelines
I am using a denormalized, wide tables data structure because this makes it much easier to transfer the data between ChatGPT and a database. (BigQuery uses denormalized, wide tables).
CREATE TABLE EventDetails (
INTEGER PRIMARY KEY AUTOINCREMENT,
EventID
ShortDescription TEXT,
LongDescription TEXT,Date TEXT,
--(Exact, Approximate, Circa)
DateAccuracy TEXT, --(e.g., Marriage, Birth, Death, Publication)
EventType TEXT,
LocationName TEXT,
Location Country TEXT,
SourceURL TEXT,UNIQUE
PersonName TEXT );
ChatGPT Prompt
Extract events related to life events, achievements, and notable publications from the Wikipedia article given below. Extract as many events as you can. Every time a date or year appears in the text, you should make an appropriate entry in the output. Format the output as a CSV with columns: Description, EventDate, DateAccuracy, EventType, LocationName, Country, SourceURL, and PersonName. String fields should have quotes.
1. ShortDescription: Summarize the event in one sentence.
2. LongDescription: A detailed description of the event, with all information about the event from the article.
3. EventDate: Use ‘YYYY-MM-DD’ format if available. If no precise date is given, use an approximate guess, e.g. ‘YYYY-01-01’ and put a suitable value in the DateAccuracy column. If an event has a start and end date, create two rows, one for the start and one for the end.
4. DateAccuracy: Specify ‘Exact’, ‘Approximate’, or ‘Circa’.
5. EventType: Use the most appropriate category from the following list. “Birth, Education, Career, Marriage, Divorce, Publication, Award, Travel, Death, Childbirth, Move, Illness, Retirement, Invention, Discovery, Collaboration, Speech, Performance, Scandal, Charity, Investment, Appointment, Achievement, Other”
6. LocationName: Include if available; otherwise, leave blank.
7. LocationCountry: If the country no longer exists, use the most appropriate contemporary country name.
8. SourceURL: The source URL.
9. PersonName: Give the full name of the person involved. If more than one person is involved (a marriage, for example), create two rows, one for each person.
---[URL]