Complex Analysis of Big Data
Main
MOTD
Schedule
Details
Date
Topic
9/16
Overview of topics & class mechanics
9/23
Databases and Distributed Systems in a Nutshell
9/30
Mini-tutorial on MapReduce and Hadoop
Mini-tutorial on Amazon Web Services
Read
:
"MapReduce: Simplified Data Processing on Large Clusters"
Apache Hadoop website
Amazon Cloud Architectures
10/7
CAP Applications
Links:
Data Mining
Predictive Analytics
Data sets:
UCI machine learning repository
AWS Datasets
Various Public Datasets
10/14
10/21
10/28
11/4
10/11
11/18
11/25
12/2
12/9
Thanksgiving break
12/3
12/17
CAP motivation & applications
Competing on Analytics
Petascale computational systems
.
The SDSS SkyServer
CERN LHC Data Challenge
Crime pattern detection
Parallel programming languages/primitives
MapReduce: Simplified Data Processing on Large Clusters
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Pig Latin: A Not-So-Foreign Language for Data Processing
Big data stores
The Google File System
Bigtable: A Distributed Storage System for Structured Data
.
HadoopDB
Hive
HBase
Data Mining
Data Mining
Predictive Analytics
Overview of Data Mining Techniques
Data Mining Tutorial
Information Extraction Survey
Languages
,
Tools, and Services
R Language
Mapreduce Mahout
Oracle Data Mining
PSPP
RapidMiner
RHIPE
Weka
OpenCalais
Orchestr8
Research: Database Support
MAD skills
Monte Carlo DB
RIOT
Forecasting Queries
SQL Queries over Unstructured Text Databases
Data Sets:
UCI machine learning repository
AWS Datasets
Public Datasets