STSCI 5065
Last Updated
- Schedule of Classes - April 4, 2023 12:09PM EDT
- Course Catalog - April 3, 2023 12:59PM EDT
Classes
STSCI 5065
Course Description
Course information provided by the 2022-2023 Catalog.
Concepts, challenges, and industry trends of big data, with a focus on the Hadoop system. Topics include: basics of the Apache Hadoop platform and Hadoop ecosystem; the Hadoop distributed file system (HDFS); MapReduce or its alternative, a parallel programming model for distributed processing of large data sets; common big data tools, such as Pig (a procedural data processing language for Hadoop parallel computation), Hive (a declarative SQL-like language to handle Hadoop jobs), HBase (the most popular NoSQL database), and YARN; case studies; and integration of Hadoop with statistical software packages, e.g., SAS and R.
Prerequisites/Corequisites Prerequisite: knowledge of a general purpose computer programming language, such as JAVA, Python, Ruby, or C++, or at least taking STSCI 4060 in parallel with this course; STSCI 5060 or basic SQL knowledge; STSCI 5010 or basic knowledge of SAS programming; STSCI 4520 or STSCI 4030 or basic knowledge of R programming.
Permission Note Enrollment preference given to: MPS Applied Statistics students.
When Offered Spring.
Regular Academic Session.
-
Credits and Grading Basis
3 Credits Graded(Letter grades only)
-
Class Number & Section Details
-
Meeting Pattern
- MWF Ives Hall 115
- Jan 23 - May 9, 2023
Instructors
Yang, X
-
Additional Information
Instruction Mode: In Person
Prerequisites: Knowledge of a general purpose computer programming language, such as JAVA, Python, Ruby, or C++, or at least taking STSC 4060 in parallel with this course; STSCI 5060 or basic SQL knowledge; STSCI 5010 or basic knowledge of SAS programming; STSCI 3520 or STSCI 4030 or basic knowledge of R programming. If this course is full, please add yourself to the waitlist via Student Center. If you have questions about the waitlist email courses@cis.cornell.edu.