A computing unit that takes inputs, multiplies by weights, adds bias, applies activation.
Weights control the strength of connections. Bias shifts the activation threshold.
Choose an activation function to see its graph and formula.
Error flows backward through the network to update weights. Chain rule is used to compute gradients.