I am trying to compute the micro F measure for a prediction my model did. I trained the model using word2vec Vectors with Keras and Tensorflow. I use the scikit library to compute the mirco F measure.

But the function throws this message: ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets

Also, am i doing the prediction right? I trained the model on x_train(wordVectors) and y_train(resultVectors) and validated with x_test and y_test.

Now i did a prediction of x_test and want to evaluate the prediction using y_test. Am i doing it right so far?

The prediction array looks like this:

[[ 1.7533608e-02  5.8055294e+01  2.2185498e-03 ... -1.2394511e-03
   1.0454212e+00 -1.6698670e-03]
 [ 1.7539740e-02  5.8173992e+01  2.1747553e-03 ... -1.2764656e-03
   1.0475068e+00 -1.6941782e-03]
 [ 1.7591618e-02  5.8222389e+01  2.2053251e-03 ... -1.2856000e-03
   1.0484750e+00 -1.6668942e-03] ...

and the true values look like this:

[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]...

I already tried to convert both arrays into binary values (with np.argmax(..., axis=1)). Then there is no error and i get the micro F measure which is around 0,59... which is far to high and so i think i did a mistake. My question is if there is another way of converting the data? Can i convert the prediction to multilabel-indicator values?

model = load_model('model.h5')
prediction = model.predict(x_test)

prediction_binary = np.argmax(prediction, axis=1)
y_test_binary = np.argmax(y_test, axis=1)

print(f1_score(y_test_binary, prediction_binary, average='micro'))

I expect the output of <0.20 but instead i get 0.59 which is a far to good value.

Any suggestions are greatly appreciated!


The problem is that you compute your metric only on the label predicted by the highest value of your output vector with only one value of the test vector.

Indeed, np.argmax return only one value, even if the vector have several minimal values. for example np.argmax([0,0,1,0,1,1]) will return only 2.

As your problem consists of a multilabel classification problem, you want your input to be possibly classified in several categories. For that, you have to convert the output vectors of your classifier to the same shape of your test vectors.

You can do that as following :

prediction_int = np.zeroes_like(prediction)
prediction_int[prediction > 0.5] = 1
  • Thanks for the explanation! That helped a lot. And i guess that your code does the job. I now get a micro F measure of 0.02.. which is far more likely to be true. – ihtk Jun 7 at 10:53

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.