In PySpark, the split() function, when used without specifying a delimiter, defaults to splitting the string by any whitespace character (spaces, tabs, newlines). It returns an array of strings. The limit parameter, which controls the number of splits, defaults to -1, meaning "split as many times as possible."
Python
from pyspark.sql.functions import split
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("SplitExample").getOrCreate()
data = [("apple banana orange",), ("grape kiwi",), ("pear peach plum",)]
df = spark.createDataFrame(data, ["fruits"])
df = df.withColumn("split_fruits", split(df["fruits"], "\\s+"))
df.show(truncate=False)
# Output
# +-------------------+-------------------------+
# |fruits |split_fruits |
# +-------------------+-------------------------+
# |apple banana orange|[apple, banana, orange] |
# |grape kiwi |[grape, kiwi] |
# |pear peach plum |[pear, peach, plum] |
# +-------------------+-------------------------+
spark.stop()