spark ML에서 StringIndexer: handling unseen labels를 보았다면

rupert·2021년 7월 21일
0

handleInvalid를 설정해주면 된다

StringIndexerModel.from_labels(labels,inputCol=categoricalCol, outputCol=categoricalCol + 'Index',handleInvalid="keep")

종류

  • 'error': throws an exception (which is the default)
  • 'skip': skips the rows containing the unseen labels entirely (removes the rows on the output!)
  • 'keep': puts unseen labels in a special additional bucket, at index numLabels

Ref

https://stackoverflow.com/questions/34681534/spark-ml-stringindexer-handling-unseen-labels

profile
hi there

0개의 댓글