Data Stream : you v=can do various functions on it.. can do time components

DS has global counters..

SO data stream is cut every t units .. each unit is called Data Frame. So its a stream of those DFs which are a result of cutting every t units.

Its the ability to tell sprak every 2secs cut off what you have and give it to me.

https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.streaming.package

sc.getConf

import org.apache.spark.streaming

val ssc = new StreamingContext(sc.getConf, Seconds(2));

HBase is better than Hive.

val lines = ssc.socketTextStream("localhost", 9999);

val words = lines.flatMap(_.split(" "));

map has to return a value; flatMap does not require you to return a value.

window - lets us do operations of grps of macrobatches

so i can make a window of 3 macrobatches --> so i can do operations on all three - on a sliding scale.

sliding --> so slid the window one macrobatch at a time. one macrobatch overlapping at a time.

so --> ti ti+1 ti+2

then next will be ti+1 ti+2 ti+3 and so on..

results matching ""

    No results matching ""