I am trying to compute the micro F measure for a prediction my model did. I trained the model using word2vec Vectors with Keras and Tensorflow. I use the scikit library to compute the mirco F measure.

But the function throws this message: ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets

Also, am i doing the prediction right? I trained the model on x_train(wordVectors) and y_train(resultVectors) and validated with x_test and y_test.

Now i did a prediction of x_test and want to evaluate the prediction using y_test. Am i doing it right so far?

The prediction array looks like this:

```
[[ 1.7533608e-02 5.8055294e+01 2.2185498e-03 ... -1.2394511e-03
1.0454212e+00 -1.6698670e-03]
[ 1.7539740e-02 5.8173992e+01 2.1747553e-03 ... -1.2764656e-03
1.0475068e+00 -1.6941782e-03]
[ 1.7591618e-02 5.8222389e+01 2.2053251e-03 ... -1.2856000e-03
1.0484750e+00 -1.6668942e-03] ...
```

and the true values look like this:

```
[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]...
```

I already tried to convert both arrays into binary values (with np.argmax(..., axis=1)). Then there is no error and i get the micro F measure which is around 0,59... which is far to high and so i think i did a mistake. My question is if there is another way of converting the data? Can i convert the prediction to multilabel-indicator values?

```
model = load_model('model.h5')
prediction = model.predict(x_test)
prediction_binary = np.argmax(prediction, axis=1)
y_test_binary = np.argmax(y_test, axis=1)
print(f1_score(y_test_binary, prediction_binary, average='micro'))
```

I expect the output of <0.20 but instead i get 0.59 which is a far to good value.

Any suggestions are greatly appreciated!