[Official course website]

This is a set of lecture notes for the course Neural Networks for Data Science Applications (link), delivered in the Master’s Degree in Data Science (Sapienza University of Rome). This is a draft with no pretense of completeness, which is only provided as auxiliary material for self-study. It covers a superset of material compared to the slides, going more or less in-depth depending on the topic. A description of the organization of the notes can be found in Lecture 1 - Introduction, while a brief introduction to each chapter can be found below.

Feel free to comment here on Notion to provide feedback on the draft. Many parts are missing, let me know if you want to help in completing them.

<aside> ⚠️ TO-DO:

Notation


As we will see in Lecture 2 - Preliminaries, our fundamental data type for the course is a tensor, which we define as an $n$-dimensional array of objects, typically real-valued numbers. We call $n$ the rank of the tensor (with the necessary apologies to any mathematician reading us). The notation in the notes vary depending on $n$:

We use a variety of indexing strategies described better in Linear algebra, while additional notation is introduced when necessary. In many cases, fully understanding a method requires to understand precisely the shape of each tensor involved. To denote the shape concisely, we use the following notation:

$$ X \sim(b,h,w,3) $$

This is a rank-$4$ tensor with shape $(b,h,w,3)$. Some dimensions can be pre-specified (e.g., $3$ in this case), other dimensions can instead be denoted by variables. Note that we use the same symbol to denote drawing from a probability distribution, e.g., $\varepsilon \sim \mathcal{N}(0,1)$, but we do this rarely and the meaning of the symbol should always be clear from context. Hence, $\mathbf{x} \sim (d)$ will substitute the more common $\mathbf{x} \in \mathbb{R}^d$, and similarly for $\mathbf{X} \sim (n,d)$ instead of $\mathbf{X} \in \mathbb{R}^{n \times d}$.

Sometimes we want to constraint the elements of a tensor, for which we use a special notation:

  1. $\mathbf{x} \sim \text{Binary}(c)$ denotes a tensor with only binary values, $\left\{0,1\right\}$.
  2. $\mathbf{X} \sim \text{Int}_{1,4}(a,b)$ denotes an $a \times b$ matrix with integers in the range $[1,4]$.
  3. $\mathbf{x} \sim \Delta(a)$ denotes a vector belonging to the so-called simplex, i.e., $x_i \ge 0$ and $\sum_i x_i = 1$. For tensors with higher rank, e.g., $\mathbf{X} \sim \Delta(n,c)$, we assume the normalization is applied with respect to the last dimension. For example, in this case each row $\mathbf{X}_i$ belongs to the simplex.

Code snippets