Data Summarisation Template
Mandatory?
Question
Explanation
REQUIRED
1. How would you describe the dataset in one sentence?
What is the dataset about?
2. What does the dataset look like?
File format, data type, information about the structure of the dataset
3.What are the headers?
Can you group them in a sensible way? Is there a key column?
4. What are the value types and value ranges for the most important headers?
Words/numbers/dates and their possible ranges
OPTIONAL
5. Where is the data from?
When was the data collected/published/updated? Where was the data published and by whom? (required if not mentioned in metadata)
6. In what way does the dataset mention time?
What timeframes are covered by the data, what do they refer to and what is the level of detail they are reported in? (E.g. years/day/time/hours etc.)
7. In what way does the dataset mention location?
What geographical areas does the data refer to? To what level of detail is the area or location reported? (E.g. latitude/longitude, streetname, city, county, country etc.)
8. Is there anything unclear about the data, or do you have reason to doubt the quality?
How complete is the data (are there missing values)? Are all column names self explanatory? What do missing values mean?
9. Is there anything that you would like to point out or analyse in more detail?
Particular trends or patterns in the data?
Source: Everything You Always Wanted to Know About a Dataset: Studies in Data Summarisation
Last updated