CSC 591

Data Intensive Computing

Taught S15. F15, F16, F17

This course will explore processing massive amounts of data. Students will study current software frameworks and tools. They will understand the design principles underlying large clusters that support data intensive computing. This project-oriented course will survey many distributed computing frameworks, such as Hadoop and HPCC. Each student will work in a medium-size group on a semester-long project using the above frameworks and supporting systems, such as HDFS, NoSQL (eg, MongoDB), and Hive.

Because the project is a significant portion of the grade students are expected to have a strong systems background, including completion of CSC 501.

Some of the topics covered during Fall 17.

  • Big Data Problems
  • Overview of Distributed Computing
  • NoSQL DBs
  • AWS
  • ETL-extract transform load
  • Stream processing
  • HPCC
  • Hadoop
  • Ceph
  • Kafka
  • Druid
  • TensorFlow
  • Lambda
  • Zookeeper