On Big Data Stream Processing

Dmitry Namiot

Abstract


In this paper, we would like to discuss data stream processing in the big data area. Our goal is to provide a quick introduction and survey of the technical solutions for big data streams processing. In this survey, we target Machine to Machine communications, sensors fusion in Internet of Things as well as time series data processing. We discuss the basic elements behind data streams processing, the existing technical solutions for their implementations as well some prospect system architectures.


Full Text:

PDF

References


Namiot, D., & Sneps-Sneppe, M. (2014). On M2M Software Platforms. International Journal of Open Information Technologies, 2(8), 29-33.

Namiot, D., & Sneps-Sneppe, M. (2014). On IoT Programming. International Journal of Open Information Technologies, 2(10), 25-28.

Gama J., and Gaber MM (Eds), Learning from Data Streams: Processing Techniques in Sensor Networks, Springer Verlag, 2007

Aggarwal, C. C. (2007). Data streams: models and algorithms (Vol. 31). Springer Science & Business Media.

Cugola, G., & Margara, A. (2012). Processing flows of information: From data stream to complex event processing. ACM Computing Surveys (CSUR), 44(3), 15.

Liu, Y. B., Cai, J. R., Yin, J., & Fu, A. W. C. (2008). Clustering text data streams. Journal of computer science and technology, 23(1), 112-128.

Gama, J. (2010). Clustering from Data Streams. In Encyclopedia of Machine Learning (pp. 180-183). Springer US.

Aggarwal, C. C., Han, J., Wang, J., & Yu, P. S. (2004, August). On demand classification of data streams. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 503-508). ACM.

Yang, Y., Wu, X., & Zhu, X. (2005, August). Combining proactive and reactive predictions for data streams. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (pp. 710-715). ACM.

Hulten, G., Spencer, L., & Domingos, P. (2001, August). Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 97-106). ACM.

Kifer, D., Ben-David, S., & Gehrke, J. (2004, August). Detecting change in data streams. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 (pp. 180-191). VLDB Endowment.

Giannella, C., Han, J., Pei, J., Yan, X., & Yu, P. S. (2003). Mining frequent patterns in data streams at multiple time granularities. Next generation data mining, 212, 191-212.

Subramaniam, S., Palpanas, T., Papadopoulos, D., Kalogeraki, V., & Gunopulos, D. (2006, September). Online outlier detection in sensor data using non-parametric models. In Proceedings of the 32nd international conference on Very large data bases (pp. 187-198). VLDB Endowment.

Stonebraker, M., Çetintemel, U., & Zdonik, S. (2005). The 8 requirements of real-time stream processing. ACM SIGMOD Record, 34(4), 42-47.

Silva, J. A., Faria, E. R., Barros, R. C., Hruschka, E. R., de Carvalho, A. C., & Gama, J. (2013). Data stream clustering: A survey. ACM Computing Surveys (CSUR), 46(1), 13.

Qin, Y., Sheng, Q. Z., Falkner, N. J., Dustdar, S., & Wang, H. (2014). When Things Matter: A Data-Centric View of the Internet of Things. arXiv preprint arXiv:1407.2704.

Subramaniam, S., & Gunopulos, D. (2007). A survey of stream processing problems and techniques in sensor networks. In Data Streams (pp. 333-352). Springer US.

Wang, F. and Liu, J. 2011. Networked Wireless Sensor Data Collection: Issues, Challenges, and Approaches. IEEE Communications Surveys and Tutorials 13, 4, 673–687

Real-Time Stream Processing as Game Changer in a Big Data World with Hadoop and Data Warehouse http://www.infoq.com/articles/stream-processing-hadoop Retrieved: Jul, 2015

Jain, A., & Nalya, A. (2014). Learning Storm. Packt Publ..

Spark Streaming http://spark.apache.org/docs/latest/streaming-programming-guide.html Retrieved: Jul, 2015

Shoro, A. G., & Soomro, T. R. (2015). Big Data Analysis: Apache Spark Perspective. Global Journal of Computer Science and Technology, 15(1).

Apache Samza http://samza.apache.org/ Retrieved: Jul, 2015

Apache Flume http://flume.apache.org/ Retrieved: Jul, 2015

Kafka, A. (2014). A high-throughput, distributed messaging system. URL: kafka. apache. org as of, 5(1).

Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform (Part 1) http://www.confluent.io/blog/stream-data-platform-1/ Retrieved: Jul, 2015

Amazon Kinesis http://aws.amazon.com/kinesis/ Retrieved: Jul, 2015

Ballard, C., Brandt, O., Devaraju, B., Farrell, D., Foster, K., Howard, C., ... & Uleman, R. (2014). Ibm Infosphere Streams: Accelerating Deployments with Analytic Accelerators. IBM Redbooks.

Of Streams and Storms https://developer.ibm.com/streamsdev/wp-content/uploads/sites/15/2014/04/Streams-and-Storm-April-2014-Final.pdf Retrieved: Jul, 2015

The TIBCO StreamBase Complex Event Processing http://www.tibco.com/products/event-processing/complex-event-processing/streambase-complex-event-processing Retrieved: Jul, 2015

Lambda architecture http://lambda-architecture.net/ Retrieved: Jul, 2015

Simplifying the (complex) Lambda architecture http://voltdb.com/blog/simplifying-complex-lambda-architecture Retrived: Jul, 2015.

Questioning the Lambda Architecture http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html Retrieved: Jul, 2015

Boykin, O., Ritchie, S., O'Connell, I., & Lin, J. (2014). Summingbird: A framework for integrating batch and online mapreduce computations. Proceedings of the VLDB Endowment, 7(13), 1441-1451.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162