Swish function and a Swiss mathematician

1 · John Cook · Aug. 6, 2023, 5:38 p.m.
The previous post looked at the swish function and related activation functions for deep neural networks designed to address the “dying ReLU problem.” Unlike many activation functions, the function f(x) is not monotone but has a minimum near x0 = -1.2784. The exact location of the minimum is where W is the Lambert W function, […] The post Swish function and a Swiss mathematician first appeared on John D. Cook....