Data Project - Art, Space, And Time
I mentioned in a previous article, that the best way to get good at somethings is to juts do it. So here I am, working on following my own advice. Now, my progress in the data analytics course is slow, and so far all the content is about the role of an analysts which means, I'll be working on this project rogue without a solid idea or plan where this will go, but isn't that the beauty of beginnings, the uncertainty? Anyways, I'll be sharing my thought process, and what I did so let's go~
At the start, I though of jobs and skills as the theme of the project. Discover if there is a relation between space and job skills, meaning each location what is the predominant skills they want. In regards to data, I planed on using this API. After a little while... I thought, nope, nope, first project should be fun, so scratch “jobs” and in with paintings. I'll follow the data analysis process which I'll be using to break down this article
Ask: Chose a topic, and ask a question
So for my first Data project, I went with famous paintings and time, and I have no clue how will this turn out... I’m just embracing the chaos. So questions that came to mind were
- What are the most famous paintings in the world according to search queries?
- Is there a relation between the fame and where the artist is from?
- Does gender play a role?
- Does style play a role?
- The time period, does it have any impact on notoriety?
- Subject matter?
- Does the painting price correlates with its fame?
- How many controversial paintings are from the list?
- Does the time to complete the painting has an effect?
- Was there an event that made a painting famous?
Most are very interesting and I would like to investigate them all, but in hopes not to get lost in the process, I narrowed it down to Paintings (Art), and it's relation to the time period (time), and location of the artist (space).
Prepare: Get the data
The data source is the first page result from Google along with the snippet (The search term used was “famous paintings”) I decided to just use the first page because I'm running on a tight deadline of 1 week. Here's the list of websites I used:
- 10 most famous paintings in the world - CNN
- The most famous paintings of all time - Timeout
- 20 Most Famous Paintings of All Time - madisonartshop
- Top 100 Masterpieces - World's Most Famous Paintings
- 50 Most Famous Paintings of All Time in the Art History (Ranked)
- 15 Most Famous Paintings of all Time - touropia
- The 10 Most Famous Paintings In The World - worldatlas
- Google snippet
- 50 famous paintings and the stories behind them
- 49 Famous Paintings Of All Time In The History Of Art
- 20 Famous Paintings From Western Art History Any Art Lover Should Know
Process: Cleaning the data
So I started with gathering the name of the portrait, artist, date, and subject matter.

Then while getting to know the topics and history I went to cleaning it because some columns had more that one data point and then expand the data a bit by adding how long it took to complete the portrait, a secondary subject matter and the movement or period. I could have gone even further with “if a scandal is associated with it”, “if it’s controversial”, “the price of the painting and if the artist died poor”, but that would have taken me ages, and I have a deadline so this will do

Analyze: Explore the data
Here are the couple of observations that I noticed while looking at the spreadsheet
- When sorting in a ascending order by the year of completion then the earliest painting is on 1423 and latest was 2006
- When looking at the length it took for the painting to complete then we see the shortest is 1 year where the longest took 20 years
- Now I want to see how many times a painting is mentioned because this to me will determine the popularity. The more it’s mentioned the more famous it is. So the most mentioned paintings are two actually “Mona Lisa” and “Girl with a Pearl Earring” with 11 mentions
- Now when it comes to artists per country it’s a bit interesting France is the most with 28 Artists and the least artists are from Norway, Japan, Latvia, Greece, Ukraine with a single artist
- Now another data point that interested me a bit was which of those artists has more that one work listed? And this was very interesting since the most artwork attributed to an artist is Vincent van Gogh with 9 different art pieces.
- Another table I constructed was list out each decade, and the number of paintings produced that are in the list (with the associated artist) I think this will be the my graphic
I have to say for this part, I am clearly out of my depth with confusion, why might you ask, let me tell you, I bloody did all the calculations by hand (or a calculator) because for some god damn reason the spreadsheet number won’t work with me, I want to do SQL queries but don’t know how 😢
Also, I have some gripes with the spreadsheet formula, I wanted to count unique paintings, so I highlighted over 300 rows of painting names, and I am sure the uniqueness will be more than 50 paintings by scanning them but what did the count formula say... 16 yeah fuck that. I counted them by hand and it was 73. So yeah... what is the lesson I learned. Athoug learn how to use those formula’s properly to not waste your time!
Share: Visualize your data
This is the best bit! So, my idea was to make an infographic using the data gathered. So I broke the process into steps
Sketch and Design the Layout
In this part, I need to use illustrator as my design tool, and I must say... I am nervous as hell. It’s already the due date of when I need to publish this article, time is of the essence, so I jumped straight to the design software. Okay now stop writing and open illustrator and just whip something out Athoug!
So I’ve settled on this layout. I won’t overthink it, this is my first data project so obviously It will suck so lets move on (The dark gray will be the graph section where the light gray would be the text)

Visualization: Tools and Design
I Tried to use Raw Graphs to generate a graph for my visualization. At the beginning, I was struggling with the CSV file, it wasn’t working despite how many times I tried to clean it and reducing it to only the top 10, so I went to what is familiar to me, JSON, I took the “Mentions per Artist” table and converted it to JSON by hard coding it. This is how it looks like 👇🏼
[
{
"artist": "Vincent van Gogh",
"mentions": 9
},
{
"artist": "Caravaggio",
"mentions": 5
},
{
"artist": "Leonardo da Vinci",
"mentions": 5
},
{
"artist": "Pablo Picasso",
"mentions": 5
},
{
"artist": "Jacques-Louis David",
"mentions": 4
},
{
"artist": "Pieter Bruegel the Elder",
"mentions": 4
},
{
"artist": "Claude Monet",
"mentions": 3
},
{
"artist": "Edouard Manet",
"mentions": 3
},
{
"artist": "Francisco Goya",
"mentions": 3
},
{
"artist": "Gustav Klimt",
"mentions": 3
}
]
And I chose to go with a bar graph. Basic, I know, but I digress. After filling up the data, choosing my variables, and playing around with the colors I sort of like the result.

I’m thinking of adding a sketch of those artist but not sure if I’ll add them to the top or bottom of the bar 🤔 Anyways I’ll add that later because it’s just decorative. Let’s move on to the main graph which is a bit more complex giving the variables I need to figure out.
Clearly, this shows my naivety when it comes to data capturing because I have multiple values for a a single column (a big nay nay in the data world) and well, not sure how to translate this into a way raw graphs will understand...

Let’s see how to construct that in JSON. So JSON seemed like a good way to structure this because I was able to add the artist and how much they contributed that decade and where they are from. Here is a sample
[
{
"decade": 1420,
"paintingsCount": 1,
"Artists": [
{
"name": "Gentile da Fabriano",
"paintings": 1,
"country": "Italy"
}
]
},
{
"decade": 1430,
"paintingsCount": 2,
"Artists": [
{
"name": "Jan van Eyck",
"paintings": 1,
"country": "Belgium"
},
{
"name": "Hubert and Jan van Eyck",
"paintings": 1,
"country": "Belgium"
}
]
},
{
"decade": 1470,
"paintingsCount": 2,
"Artists": [
{
"name": "Paolo Uccello",
"paintings": 1,
"country": "Italy"
},
{
"name": "Leonardo Da Vinci",
"paintings": 1,
"country": "Italy"
}
]
},
Now lets add that to Raw Graphs and see... Crap it doesn’t understand arrays 😢

I can construct a graph, but still, I need the rest of the data to make it interesting (in my opinion) so I slept on it, and it actually helped! I will test out how to make multiple rows for each painter. Something that looks like this
[{
"decade": 1420,
"name": "Gentile da Fabriano",
"paintings": 1,
"country": "Italy"
},
{
"decade": 1430,
"name": "Jan van Eyck",
"paintings": 1,
"country": "Belgium"
},
{
"decade": 1430,
"name": "Hubert and Jan van Eyck",
"paintings": 1,
"country": "Belgium"
},
And it actually worked!!! Raw graphs was able to process it

Now it's time to chose a graph type. I wanted to show time, and the number of paintings along with incorporating the country and the artist. I tested out a bunch but what seemed to work was the Beeswarm Plot, not sure if it’s the right representation but let’s move on. I changed the colors (used material colors), changed the diameter size and well here is the result

Approach and Elements
I started by adding the graph, and trying to see how I can make info clear. Legends with the colors for people to understand what the colors meant (they mean country) was a must otherwise it’s just a bunch of colorful dots. and the dimensions. what are the big circles? what are the small ones? so I decided to add one that shows that a small circle is a single painting and a big one is 8 paintings
Then well I started playing around with what data can I add from a statistic point of view. What would be interesting. Number of paintings? most active decade? longest painting it took to complete? the country with the most artist? and well here is how it looks like so far

From the guides you could tell I’m out of my depth especially the left side. What’s up with those two parallel guide? why am I arranging it like that. I can feel every designer dying a bit inside while looking at this 😂
At this moment I remembered “Hey I have another graph” remember the bar graph in the beginning... Yeah, I completely forgot about that. I went back to it, and looking at it. It doesn’t fit the style of the infographic so far, I had to do some adjustments. I first changed the secondary color (the yellow) and chose one that matched the current graph the Beeswarm one, I chose to go with Italy’s color. and then went back to the bar graph and changed it’s color to match the new secondary, then added it to the document which gave me this result

Okay I can jazz it up a bit. I said at the beginning of this, I wanted to add a sketch of them (the artists) so I whipped out my tablet and started drawing and here are the lovely 10

Lastly, is adding them to the document. One would think this was the easy part, just drag and drop. but no, no, it wasn’t. For some reason the PNG file would not save as a transparent background. At this point I’m getting tired and low on energy. I decided to just change the white background and match it to the graph, and play around with arrangement. It took a while but we got a result.

And there you have it, my first data project. I won’t lie, it was a journey, but I don’t hate the result, it’s quite nice. This is the first step in growing so yay! first project done ✅