Posts

Showing posts from 2020

PySpark Installation

 Do pyspark fun it will be, they said. I had a pretty frustrating week just trying to install and get it working. primarily because I am not very technical and secondary, everything is a huge clusterfuck. Where do we start, First to use PySpark - You need Spark - and to use Spark ideally it should be installed on hadoop platform.  To install hadoop you can download CDH from cloudera website as my friends did. Steps I followed : Install virtual machine on your windows system  Everyone preferred Oracle's virtual box because free and trusted Done..Easy Peasy Download CDH ISO file from cloudera This is very things started getting bonkers Cloudera did away with download of CDH in favor of CDP. What is CDP you ask ? It is like combination of CDH and HDP. You don't care about it ? Fair enough neither did I Problem is for CDH they give ISO file for CDP they give commands which you can run on linux machine to download it, that for 60 days only Now I realize I have to download linux OS o