What is the correct method to specify input shapes of a n_dimensional tensor of features in Keras Sequential models?

Question

## ---- INTRO ----
I'm new to Team Treehouse and I primarily created an account here because I received really positive feedback about the community, forums and support. I'm not an absolute beginner when it comes to Programming or ML/AI applications but I am new to using tools like Keras, Tensorflow etc.. I don't know if I can ask details about these libraries or about Deep Learning here considering that these topics are not covered by some courses or tracks here yet. But since there seems to be a section for Machine Learning and Data Analysis, I thought I'd give it a try anyway. If I've posted this in the wrong forum, I apologise. I will recreate this post in other relevant forums if you could point them out to me. I'm making the post as explanatory as possible with verbiage and context explanation so pardon me if it is too long to read. Please feel free to skip sections of the post marked by ---- ----`
## ---- PREMISE ----
I tried searching through the Keras documentation about input shapes here:
[Keras Layers Core] https://keras.io/layers/core/
[Keras Sequential Guide] https://keras.io/getting-started/sequential-model-guide/
I also tried looking around online through StackOverflow etc.. 
[Medium Fit Generator Keras] https://medium.com/@fromtheast/implement-fit-generator-in-keras-61aa2786ce98
[Stack Overflow Keras Input] https://stackoverflow.com/questions/44747343/keras-input-explanation-input-shape-units-batch-size-dim-etc
[Stack Overflow Keras Numpy Arrays] https://stackoverflow.com/questions/51100057/keras-list-of-numpy-arrays-not-the-size-model-expected
[Stack Overflow Set input of a Keras Layer with Tensorflow Tensor] https://stackoverflow.com/questions/42441431/how-to-set-the-input-of-a-keras-layer-with-a-tensorflow-tensor
[Tensorflow API Docs Convert to Tensor] https://www.tensorflow.org/apidocs/python/tf/convertto_tensor
[Stack Overflow Role of Flatten in Keras] https://stackoverflow.com/questions/43237124/role-of-flatten-in-keras
I'm attempting to use Keras Sequential Model for Audio Classification and Anomaly Detection.
From the raw audio samples, I've already computed these following features (with their shapes) using the librosa python library:
python
1. feature[0] => Melspectrogram_Energy -> Shape -> (4, 128, 44)
2. feature[1] => Melspectrogram_Power -> Shape -> (4, 128, 44)
3. feature[2] => Tempogram -> Shape -> (4, 384, 44)
4. feature[3] => Zero_Crossing_Rate -> Shape -> (4, 1, 44)
5. feature[4] => Spectral_Centroid -> Shape -> (4, 1, 44)
6. feature[5] => Spectral_Bandwidth -> Shape -> (4, 1, 44)
7. feature[6] => MFCC -> Shape -> (4, 20, 44)
8. feature[7] => Spectral_Centroid_Bands -> Shape -> (4, 1, 44)
9. feature[8] => Spectral_Bandwidth_Bands -> Shape -> (4, 1, 44)
10. feature[9] => Spectral_Rolloff_Bands -> Shape -> (4, 1, 44)
11. feature[10] => MFCC_Bands -> Shape -> (4, 9, 44)
12. feature[11] => Poly_Features_Order_1 -> Shape -> (4, 32, 88)
13. feature[12] => Poly_Features_Order_2 -> Shape -> (4, 32, 132)
14. feature[13] => Poly_Features_Order_3 -> Shape -> (4, 32, 176)
15. feature[14] => RMSE -> Shape -> (4, 32, 44)
16. feature[15] => RMSE_Samples -> (4, 32, 44)
17. feature[16] => Spectral_Flatness_Energy -> Shape -> (4, 32, 44)
18. feature[17] => Spectral_Flatness_Power -> Shape -> (4, 32, 44)

I understand that I would have flatten the tensor if I want to feed it to sklearn/ scikit-learn models but I was told that I don't have to do that with Keras considering that it supports CNNs.
### -- Explanation of Feature Shapes -- :
All these features are saved in a HDF5 format H5 file.
There are 4 channels in each audio sample and they can be processed by librosa either by feeding the sample directly or passing in the results of the STFT of the sample for each frequency band (either 32 or 128 or 384 or 1 etc.. depending on the setting and feature)
Audio data is sampled either at 22 Khz or 44 Khz but mostly 44 Khz as you can see from the last dimension of the shapes of each feature there.
Please note that these 18 features are for every sample. And since they are of varying sizes, I did not want to pad them with zeros and put them all in one big nested n_d numpy array/ tensor. I instead created a list of these features for every sample. Meaning, if I have 9000 samples, the data structure would look something like this:
python
sample_feature_list[0] = [feature[0], feature[1], ......, feature[17]] 
sample_feature_list[1] = [feature[0], feature[1], ......, feature[17]]
sample_feature_list[2] = [feature[0], feature[1], ......, feature[17]]
.
.
.
.
.
sample_feature_list[8999] = [feature[0], feature[1], ......, feature[17]]

Accessing the feature "column/s" independently is also possible -> feature[0] for all samples -> (9000, 4, 128, 44), ..... feature[17] for all samples 0 -> (9000, 4, 32, 44)
I will be able to extract the feature columns for all samples and pass them in or I can also pass them in based on sample.
Let's say that the Target Variable is Boolean containing True/False or 0s/1s that is the same shape as the number of samples -> 
python
y = np.zeros((9000, 1))
y[4500 :] = np.ones((4500, 1))

## ---- QUESTION ----
I want to maintain the temporal and frequency relationship so I don't want to flatten the tensor into a long single dimensional array. I also don't want to subsample the tensor and lose information. I'd like to pass in all the features because I want to plot feature importance in other models and find out which feature is contributing to the accuracy and performance the most. And I also want to provide all these features for all samples as inputs to the Sequential Model. Is there a way to specify different input shapes for different features when dealing with an ndimensional tensor or structure of nested lists and nd numpy arrays? 
Something like this:
```python
model = Sequential()
model.add(Dense(512, inputshape=(None, 4, None, 44), kernelinitializer='normal', activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(2, kernel_initializer='normal', activation='softmax'))
Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
```
Where the first None in the input shape is to accommodate any number of sample, Second None is to accommodate any number of frequency bands.
Thank you so much for your time and kind support!
Best Regards,
Barry

Welcome to the Treehouse Community

Looking to learn something new?

Barry N

Barry N

What is the correct method to specify input shapes of a n_dimensional tensor of features in Keras Sequential models?