Big Data Series
Posted on za 02 september 2017 in Academics
I've just added three blog posts I made during the Big Data bachelor course given at the Radboud university. As a master's student I'm allowed to take on one or two bachelor courses if there's a good reason... because no other course really goes into Spark, hadoop and Scala I figured it would be a nice addition to the Python-heavy curriculum. Not that I dislike Python, of course.
There are three posts in total:
Hadoop and the HDFS - an introduction to hadoop and HDFS. Spark - On looking at a Kaggle competition data set in Spark The class project: A solo project about submitting code to a national research cluster and running queries against 1.73 billion web pages.
You can find the posts here: Big Data Series
I learnt a lot and finished the class project with a 9.5, so hoped to share it.