top | item 18810193

(no title)

I would have instead done like this:

(tldr; use a sine and cosine function regression like a linear regression. Think like solving for a free angle and a free phase instead than for a free bias and weight).

1. Convert the hours to an angle in degrees or in radians (a simple linear transformation).

2. Take the cos and sin of the angle to get the x and y position in a plane, respectively.

3. Introduce a time axis such that the thing doesn't draw a circle but rather an helix (like DNA).

4. So we now have a ton of 3D data points: (time, x, y). Create a ML model to fit a sine and a cosine to those data points to match them perfectly. Your model has only 2 free parameters to optimize for: a shared phase offset and a shared frequency. The sine uses (time, y) and the cosine (time, x).

5. Initialize the model with a random phase offset and a frequency ideally already close to the one you think you have. Don't initialize with a too high frequency to avoid fitting just Nyquist-frequency-close-noise.

6. Optimize! (With the least squares.) I guess that you might congerge only to a local minima and need to try different randon starting frequencies if you fail to converge.

7. The answer to your problem is the now-optimized free parameter of the frequency. It won't sit between two bins of your fft anymore.

Note: This link contains images picturing the transformations I try to explain.

Disclaimer: I didn't do that yet, this is just off the top of my head. If I said something wrong, please comment. Mostly about a wrong convergence to Nyquist freq or something like that (?).

In the end, this way, you won't have discrete fft bins. You'll approach the problem orthogonally to that: you solve for finding the one best fft bin (frequency) directly.

In other words: solve for the content in the exponential of "e" as free parameters, and for one such frequency and phase offset instead of many bins.

discuss

GChevalier|7 years ago

Also, I forgot: to improve convergence, I'd use an Hann-Poisson window such as here: https://en.wikipedia.org/wiki/Window_function#Hann%E2%80%93P...

I'd apply the window to randomly-sampled mini-batches of consecutive points instead of optimizing the neural network on just randomly-sampled batch points or on all the dataset at once. I guess that using an Hann-Poisson window will make the "gradient" valley easier to "ski down" with gradient descent which is a greedy algorithm. I guess that the spectral leakage caused by the Hann-Poisson window function will make the gradient landscape more monotonically decreasing in every point towards the global minima.