What We Can Do With Apache Spark and Stack Overflow Data ?

What is Apache Spark

What We Can do with Apache Spark ?

Machine Learning

Streaming Data:

GraphX:

SQL programming

MapReduce Paradigm

What is MapReduce ?

Now let’s Start Coding

from pyspark.sql import SparkSession
if __name__ == "__main__":spark = SparkSession\
.builder\
.appName("PythonWordCount")\
.getOrCreate()
lines = spark.read.text('Tags.csv').rdd.map(lambda r: r[0])
counts = lines.flatMap(lambda x: x.split(' ')) \
.map(lambda x: (x, 1)) \
.reduceByKey(add)
output = counts.collect()for (word, count) in output:
print("%s: %i" % (word, count))
spark.stop()
spark-submit MapReduce.py

You Can fin The Full Code here

You Can get the Dataset from kaggle

Follow me in kaggle to follow my new kernels and projects in data science

--

--

--

<script>alert('try your best')</script>

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rebai Ahmed

Rebai Ahmed

<script>alert('try your best')</script>

More from Medium

PySpark — Getting Dynamic Schema from String

Python, the developer-oriented programming language

An Introduction to Hadoop for Beginners

DATA STRUCTURES IN PYTHON