What We Can Do With Apache Spark and Stack Overflow Data ?

What is Apache Spark

What We Can do with Apache Spark ?

Machine Learning

Streaming Data:

GraphX:

SQL programming

MapReduce Paradigm

What is MapReduce ?

Now let’s Start Coding

from pyspark.sql import SparkSession
if __name__ == "__main__":spark = SparkSession\
.builder\
.appName("PythonWordCount")\
.getOrCreate()
lines = spark.read.text('Tags.csv').rdd.map(lambda r: r[0])
counts = lines.flatMap(lambda x: x.split(' ')) \
.map(lambda x: (x, 1)) \
.reduceByKey(add)
output = counts.collect()for (word, count) in output:
print("%s: %i" % (word, count))
spark.stop()
spark-submit MapReduce.py

You Can fin The Full Code here

You Can get the Dataset from kaggle

Follow me in kaggle to follow my new kernels and projects in data science

Unlisted

--

--

<script>alert('try your best')</script>

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rebai Ahmed

Rebai Ahmed

<script>alert('try your best')</script>