Apply OneHotEncoder For Several Categorical Columns In SparkMlib
Answer : Spark >= 3.0 : In Spark 3.0 OneHotEncoderEstimator has been renamed to OneHotEncoder : from pyspark.ml.feature import OneHotEncoderEstimator, OneHotEncoderModel encoder = OneHotEncoderEstimator(...) with from pyspark.ml.feature import OneHotEncoder, OneHotEncoderModel encoder = OneHotEncoder(...) Spark >= 2.3 You can use newly added OneHotEncoderEstimator : from pyspark.ml.feature import OneHotEncoderEstimator, OneHotEncoderModel encoder = OneHotEncoderEstimator( inputCols=[indexer.getOutputCol() for indexer in indexers], outputCols=[ "{0}_encoded".format(indexer.getOutputCol()) for indexer in indexers] ) assembler = VectorAssembler( inputCols=encoder.getOutputCols(), outputCol="features" ) pipeline = Pipeline(stages=indexers + [encoder, assembler]) pipeline.fit(df).transform(df) Spark < 2.3 It is not possible. StringIndexer transformer operates only on a single column at the time so you'll ne...