Data Stream : you v=can do various functions on it.. can do time components
DS has global counters..
SO data stream is cut every t units .. each unit is called Data Frame. So its a stream of those DFs which are a result of cutting every t units.
Its the ability to tell sprak every 2secs cut off what you have and give it to me.
https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.streaming.package
sc.getConf
import org.apache.spark.streaming
val ssc = new StreamingContext(sc.getConf, Seconds(2));
HBase is better than Hive.
val lines = ssc.socketTextStream("localhost", 9999);
val words = lines.flatMap(_.split(" "));
map has to return a value; flatMap does not require you to return a value.
window - lets us do operations of grps of macrobatches
so i can make a window of 3 macrobatches --> so i can do operations on all three - on a sliding scale.
sliding --> so slid the window one macrobatch at a time. one macrobatch overlapping at a time.
so --> ti ti+1 ti+2
then next will be ti+1 ti+2 ti+3 and so on..