Cerca insegnamenti o docenti



Coding for Data Science and Data Management - Structure
Coding for Data Science and Data Management - Structure

Print

STEFANO MONTANELLI , responsible for the course

Degree in DATA SCIENCE AND ECONOMICS (DSE) Classe LM-91 Enrolled from 2018/2019 academic year - Laurea Magistrale - 2018/2019

Compulsory course or activityyes
Year of course1s
Term or semester2nd term
Scientific fields (settori scientifico-disciplinari)
  • SECS-S/01 - Statistica (6 Credits: )
  • INF/01 - Informatica (6 Credits: )
ECTS credits (CFU) compulsory12
ECTS credits - facultative-

General information

Aims and objectives: The course is articulated in two modules. The module of Coding for Data Science aims at providing technical skills about data analysis with focus on coding aspects supported by the Python programming language and the R framework. The module of Data Management deals with information and data management issues within modern information systems, with focus on relational databases and emergent NoSQL systems.

Language of instruction: English

Teaching methods: Exam organization: written and oral exam on the syllabus of Coding for Data Science; written exam on the syllabus of Data Management.
Attendance: highly recommended.
Class organization: lectures.

Syllabus

Web:

http://islab.di.unimi.it/cds

Short course description english flag

The course leverages on two modules to provide skills concerned with programming of data analysis scripts and data storage and management. In the module of Coding for Data Science, the focus is on discussing the basic programming concepts with the support of the Python language and the R framework. In particular, the goal is to deal with essential notions about data structures (e.g., lists, sets, data frames) and control structures (sequence, conditional and iterative operations). Specific software libraries (e.g., NumPy and Pandas) for effectively supporting analysis, manipulation, and visualization of given datasets are also illustrated. In the module of Data Management, the focus is on presenting alternative, and complementary solutions for persistently storing the given datasets in support of data analysis procedures. In particular, the goal is to understand the core notions of relational databases, such as keys, integrity, and primary/foreign key constraints, as well as the SQL language for data definition, manipulation, and query. Recent and innovative NoSQL solutions are also discussed, with special focus on document-oriented systems (e.g., MongoDB) and column-family systems (e.g., Cassandra).

Syllabus Module Coding for Data Science:

PYTHON
Introduction to the language, environment, set up
Data structures 1: numeric, string.
Control flow structures
Data structures 2: list, set, tuple, dict
Data Structures 3: collections, lambda, json
File I/O
Objects: introduction and basics
Numpy and linear algebra
Pandas 1: indexing and series
Pandas 2: dataframe
Pandas 3: graphics
R
Introduction and first steps
Data structures: vectors, array, data.frames, lists, environments
Loop and flow controls
Functions and object
Vectorized calculus and function vectorization
S3 and S4 object oriented programming
Parallelization and optimizztion

Syllabus Module Coding for Data Science non-attending students

PYTHON
Introduction to the language, environment, set up
Data structures 1: numeric, string.
Control flow structures
Data structures 2: list, set, tuple, dict
Data Structures 3: collections, lambda, json
File I/O
Objects: introduction and basics
Numpy and linear algebra
Pandas 1: indexing and series
Pandas 2: dataframe
Pandas 3: graphics
R
Introduction and first steps
Data structures: vectors, array, data.frames, lists, environments
Loop and flow controls
Functions and object
Vectorized calculus and function vectorization
S3 and S4 object oriented programming
Parallelization and optimizztion
data structures: vectors, matrices, arrays, data.frames, lists, environment
flow control

Readings Module Coding for Data Science:

Online resources and handouts

Readings Module Coding for Data Science non-attending students

Online resources and handouts

Syllabus Module Data Management:

Introduction to relational database. Information and data; database and database systems (DBMS); the relational model, integrity constraints, key definition, primary key constraints; foreign key constraints.
Database languages. Data definition languages; data manipulation languages; queries with the SQL language; simple queries, group queries with aggregate operators; queries with set operators; nested queries.
Introduction to NoSQL databases. Data models for NoSQL; CAP theorem (coherence, availability, tolerance); types of NoSQL; comparison against the relational model.
NoSQL database systems. The “document-oriented” data model; the MongoDB system; collection in MongoDB; collection queries in MongoDB; aggregation pipeline in MongoDB; the “column-family” data model; the Cassandra system; keyspace in Cassandra; keyspace queries in Cassandra; the CQL language.

Readings Module Data Management:

Choose the preferred book between the following alternatives:
- P. Atzeni, S. Ceri, S. Paraboschi, R. Torlone, Database Systems – Concepts, Languages and Architectures - Mc-Graw Hill, Available on-line at http://dbbook.dia.uniroma3.it/.
- R. Elmasri, S.B. Navathe, Fundamentals of Database Systems, 7th edition, Pearson, 2015.

Online resources and lecture stuff that can be downloaded from the course website

Prerequisites, exams and assessment

Examunico
Type of assessmentEsame
Assessmentvoto verbalizzato in trentesimi

Prerequisites, exams and assessment The exam aims at verifying that the objectives of the two modules have been achieved.
Regarding Coding for Data Science: no prerequisites. The exam consists in a written exam followed by an oral exam about the topics of the module syllabus.
Regarding Data Management: no prerequisites. The exam consists in a written exam about the topics of the module syllabus

Structure of the course

Module Coding for Data Science, Module Coding for Data Science

Scientific fields

  • INF/01 - Informatica - Credits: 6
Activities

Lezioni: 40 hours

Module Data Management, Module Data Management

Scientific fields

  • SECS-S/01 - Statistica - Credits: 6
Activities

Lezioni: 40 hours

Teachers ' office hours

Teacher's office hours
TeacherOffice location
STEFANO MONTANELLI , responsible for the courseGiovedì, ore 11-12Stanza 7015, Dipartimento di Informatica "Giovanni Degli Antoni", Via Celoria 18 - 20133 Milano
NICOLO' ANTONIO CESA BIANCHIMercoledì 9:30-12:30via Celoria 18, stanza 7007
STEFANO MARIA IACUSSu ricevimento, scrivere via email.stanza 28-3° piano.Via Conservatorio 7. III Piano, Dipartimento di Economia, Management e Metodi Quantitativi