What We Can Do With Apache Spark and Stack Overflow Data ?

What is Apache Spark

What We Can do with Apache Spark ?

Machine Learning

Streaming Data:


SQL programming

MapReduce Paradigm

What is MapReduce ?

Now let’s Start Coding

from pyspark.sql import SparkSession
if __name__ == "__main__":spark = SparkSession\
lines = spark.read.text('Tags.csv').rdd.map(lambda r: r[0])
counts = lines.flatMap(lambda x: x.split(' ')) \
.map(lambda x: (x, 1)) \
output = counts.collect()for (word, count) in output:
print("%s: %i" % (word, count))
spark-submit MapReduce.py

You Can fin The Full Code here

You Can get the Dataset from kaggle

Rebai Ahmed

Rebai Ahmed

