Data Intensive Computing

Taught S15. F15, F16, F17

This course will explore processing massive amounts of data. Students will study current software frameworks and tools. They will understand the design principles underlying large clusters that support data intensive computing. This project-oriented course will survey many distributed computing frameworks, such as Hadoop and HPCC. Each student will work in a medium-size group on a semester-long project using the above frameworks and supporting systems, such as HDFS, NoSQL (eg, MongoDB), and Hive.

Because the project is a significant portion of the grade students are expected to have a strong systems background, including completion of CSC 501.

Some of the topics covered during Fall 17.

Big Data Problems
Overview of Distributed Computing
NoSQL DBs
AWS
ETL-extract transform load
Stream processing
HPCC
Hadoop
Ceph
Kafka
Druid
TensorFlow
Lambda
Zookeeper

Freeh Speech

Professor V W Freeh

CSC 591

Data Intensive Computing