๐Ÿ“’ Spark(13) - ML(3)

Kimdongkiยท2024๋…„ 6์›” 22์ผ

Spark

๋ชฉ๋ก ๋ณด๊ธฐ
13/22

๐Ÿ“Œ ๋ณด์Šคํ„ด ์ฃผํƒ๊ฐ€๊ฒฉ ์˜ˆ์ธก - Regression

๊ฐœ์š”

  • 1970๋…„๋Œ€ ๋ฏธ๊ตญ ์ธ๊ตฌ์กฐ์‚ฌ ์„œ๋น„์Šค(US Census Service)์—์„œ Boston ์ง€์—ญ์˜ ์ฃผํƒ ๊ฐ€๊ฒฉ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•œ ๊ฐ€๊ฒฉ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ชจ๋ธ์„ ๋นŒ๋“œํ•œ๋‹ค.

  • Training Set
    -> ๊ฐœ๋ณ„ ์ฃผํƒ๊ฐ€๊ฒฉ์˜ ์˜ˆ์ธก์ด ์•„๋‹ˆ๋ผ ์ง€์—ญ๋ณ„ ์ค‘๊ฐ„ ์ฃผํƒ๊ฐ€๊ฒฉ ์—์ธก์ด๋‹ค.

  • Regression ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์‚ฌ์šฉ
    -> ์—ฐ์†์ ์ธ ์ฃผํƒ๊ฐ€๊ฒฉ ์˜ˆ์ธก์ด๊ธฐ๋•Œ๋ฌธ์— Classification ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค.

Data

  • ์ด 506๊ฐœ์˜ ๋ ˆ์ฝ”๋“œ๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ๋‹ค.
  • 13๊ฐœ์˜ Feature์™€ Lable Field๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ๋‹ค.
  • 506๊ฐœ์˜ ๋™๋„ค์˜ ์ฃผํƒ ์ค‘๊ฐ„๊ฐ’ ๋ฐ์ดํ„ฐ์ด๋‹ค. (๊ฐœ๋ณ„ ์ฃผํƒ์ด ์•„๋‹˜์— ์œ ์˜ํ•ด์•ผํ•œ๋‹ค.)
  • 14๋ฒˆ์งธ ํ•„๋“œ๊ฐ€ ๋ฐ”๋กœ ์˜ˆ์ธกํ•ด์•ผ ํ•˜๋Š” ์ค‘๊ฐ„ ์ฃผํƒ ๊ฐ€๊ฒฉ์ด๋‹ค.(MEDV)
No.FieldExplain
1CRIM์ฃผํƒ์ด ์žˆ๋Š” ์ง€์—ญ์˜ ์ธ๋‹น ๋ฒ”์ฃ„์œจ
2ZN25000 sqft(์•ฝ700ํ‰)์ด์ƒ์˜ ๋•…์ด ์ฃผ๊ฑฐ ์ง€์—ญ์œผ๋กœ ์„ค์ •๋œ ๋น„์œจ
3INDUS์—์ด์ปค๋‹น ๊ณต์—…๋‹จ์ง€์˜ ๋น„์œจ
4CHAS์ฃผํƒ์ด ๊ฐ•๊ฐ€์— ์œ„์น˜ํ•œ ๋น„์œจ
5NOX์‚ฐํ™”์งˆ์†Œ ๋†๋„๋ฅผ ์ด์šฉํ•œ ์˜ค์—ผ์ •๋„
6RM์ฃผํƒ๋‹น ํ‰๊ท  ๋ฐฉ์˜ ์ˆ˜
7AGE1940๋…„ ์ „์— ์ง€์–ด์ง„ ์ฃผํƒ์˜ ๋น„์œจ
8DIS๋ณด์Šคํ„ด ์ง€์—ญ ๊ณ ์šฉ ์„ผํ„ฐ๊นŒ์ง€์˜ ํ‰๊ท  ๊ฑฐ๋ฆฌ
9RAD๊ณ ์†๋„๋กœ ์ ‘๊ทผ์„ฑ์— ๋Œ€ํ•œ ์ธ๋ฑ์Šค
10TAX์žฌ์‚ฐ์„ธ(์ฃผํƒ๊ฐ€๊ฒฉ $10k ๊ธฐ์ค€)
11PTRATO์ดˆ๋“ฑํ•™๊ต ํ•™์ƒ - ์„ ์ƒ๋‹˜์˜ ๋น„์œจ
12Bํ‘์ธ ์ธ๊ตฌ์˜ ๋น„์œจ
13LSTAT์ €์†Œ๋“์ž์˜ ์ธ๊ตฌ ๋น„์œจ
14MEDV$1000 ๋‹จ์œ„์˜ ์ฃผํƒ ํ‰๊ท ๊ฐ’

Code

  • ๋จผ์ € pyspark์™€ py4j๋ฅผ ์„ค์น˜ํ•ด์ฃผ์ž. ๊ฐ๊ฐ ์„œ๋กœ ๊ฐ„์„ญ์ด ์—†๋Š” ๋ฒ„์ „์„ ์„ ํƒํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค.
!pip install pyspark==3.3.1 py4j==0.10.9.5

๐Ÿ“™ Create SparkSession

  • Regression ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘์—์„œ Linear Regression ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด๋ณด์ž.
from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Boston Housing Linear Regression") \
    .getOrCreate()

๐Ÿ“™ Data Download

data = spark.read.csv('./boston_housing.csv', header=True, inferSchema=True)
  • Data์˜ ์Šคํ‚ค๋งˆ๋ฅผ ํ™•์ธํ•ด๋ณด์ž. ์œ„์—์„œ ํ™•์ธํ•œ ๋ฐ์ดํ„ฐ ๋ช…์„ธ์„œ์™€ ๋™์ผํ•œ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง„๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.
data.printSchema()
root
 |-- crim: double (nullable = true)
 |-- zn: double (nullable = true)
 |-- indus: double (nullable = true)
 |-- chas: integer (nullable = true)
 |-- nox: double (nullable = true)
 |-- rm: double (nullable = true)
 |-- age: double (nullable = true)
 |-- dis: double (nullable = true)
 |-- rad: integer (nullable = true)
 |-- tax: integer (nullable = true)
 |-- ptratio: double (nullable = true)
 |-- b: double (nullable = true)
 |-- lstat: double (nullable = true)
 |-- medv: double (nullable = true)
  • show๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋ฐ์ดํ„ฐ์˜ ๊ฐ’์„ ๋ฏธ๋ฆฌ ํ™•์ธํ•ด๋ณด์ž.
data.show()
+-------+----+-----+----+-----+-----+-----+------+---+---+-------+------+-----+----+
|   crim|  zn|indus|chas|  nox|   rm|  age|   dis|rad|tax|ptratio|     b|lstat|medv|
+-------+----+-----+----+-----+-----+-----+------+---+---+-------+------+-----+----+
|0.00632|18.0| 2.31|   0|0.538|6.575| 65.2|  4.09|  1|296|   15.3| 396.9| 4.98|24.0|
|0.02731| 0.0| 7.07|   0|0.469|6.421| 78.9|4.9671|  2|242|   17.8| 396.9| 9.14|21.6|
|0.02729| 0.0| 7.07|   0|0.469|7.185| 61.1|4.9671|  2|242|   17.8|392.83| 4.03|34.7|
|0.03237| 0.0| 2.18|   0|0.458|6.998| 45.8|6.0622|  3|222|   18.7|394.63| 2.94|33.4|
|0.06905| 0.0| 2.18|   0|0.458|7.147| 54.2|6.0622|  3|222|   18.7| 396.9| 5.33|36.2|
|0.02985| 0.0| 2.18|   0|0.458| 6.43| 58.7|6.0622|  3|222|   18.7|394.12| 5.21|28.7|
|0.08829|12.5| 7.87|   0|0.524|6.012| 66.6|5.5605|  5|311|   15.2| 395.6|12.43|22.9|
|0.14455|12.5| 7.87|   0|0.524|6.172| 96.1|5.9505|  5|311|   15.2| 396.9|19.15|27.1|
|0.21124|12.5| 7.87|   0|0.524|5.631|100.0|6.0821|  5|311|   15.2|386.63|29.93|16.5|
|0.17004|12.5| 7.87|   0|0.524|6.004| 85.9|6.5921|  5|311|   15.2|386.71| 17.1|18.9|
|0.22489|12.5| 7.87|   0|0.524|6.377| 94.3|6.3467|  5|311|   15.2|392.52|20.45|15.0|
|0.11747|12.5| 7.87|   0|0.524|6.009| 82.9|6.2267|  5|311|   15.2| 396.9|13.27|18.9|
|0.09378|12.5| 7.87|   0|0.524|5.889| 39.0|5.4509|  5|311|   15.2| 390.5|15.71|21.7|
|0.62976| 0.0| 8.14|   0|0.538|5.949| 61.8|4.7075|  4|307|   21.0| 396.9| 8.26|20.4|
|0.63796| 0.0| 8.14|   0|0.538|6.096| 84.5|4.4619|  4|307|   21.0|380.02|10.26|18.2|
|0.62739| 0.0| 8.14|   0|0.538|5.834| 56.5|4.4986|  4|307|   21.0|395.62| 8.47|19.9|
|1.05393| 0.0| 8.14|   0|0.538|5.935| 29.3|4.4986|  4|307|   21.0|386.85| 6.58|23.1|
| 0.7842| 0.0| 8.14|   0|0.538| 5.99| 81.7|4.2579|  4|307|   21.0|386.75|14.67|17.5|
|0.80271| 0.0| 8.14|   0|0.538|5.456| 36.6|3.7965|  4|307|   21.0|288.99|11.69|20.2|
| 0.7258| 0.0| 8.14|   0|0.538|5.727| 69.5|3.7965|  4|307|   21.0|390.95|11.28|18.2|
+-------+----+-----+----+-----+-----+-----+------+---+---+-------+------+-----+----+
only showing top 20 rows

๐Ÿ“™ Create Feature Vector

  • VectorAssembler๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์ง„ํ–‰ํ•˜์ž
    -> ์—ฌ๋Ÿฌ ์ปฌ๋Ÿผ์„ ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ ์ปฌ๋Ÿผ์œผ๋กœ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•œ๋‹ค.

  • inputCols์—๋Š” ๊ฒฐํ•ฉํ•  ์ปฌ๋Ÿผ๋“ค์˜ ์ด๋ฆ„ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ง€์ •ํ•œ๋‹ค.

  • outputCols์—๋Š” ๊ฒฐํ•ฉ๋œ ๋ฒกํ„ฐ๋ฅผ ์ €์žฅํ•  ์ƒˆ ์ปฌ๋Ÿผ ์ด๋ฆ„์„ ์ง€์ •ํ•œ๋‹ค.

from pyspark.ml.feature import VectorAssembler

feature_columns = data.columns[:-1] # data์˜ ๋งˆ์ง€๋ง‰ ์ปฌ๋Ÿผ์„ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€
assembler = VectorAssembler(inputCols=feature_columns, outputCol="features")
  • transform์„ ์‚ฌ์šฉํ•ด์„œ ์›๋ณธ DataFrame์ธ data์— features ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•ด์„œ vectorized_data DataFrame์„ ์ƒ์„ฑํ•œ๋‹ค.
vectorized_data = assembler.transform(data)ใ…‚
vectorized_data.show()
+-------+----+-----+----+-----+-----+-----+------+---+---+-------+------+-----+----+--------------------+
|   crim|  zn|indus|chas|  nox|   rm|  age|   dis|rad|tax|ptratio|     b|lstat|medv|            features|
+-------+----+-----+----+-----+-----+-----+------+---+---+-------+------+-----+----+--------------------+
|0.00632|18.0| 2.31|   0|0.538|6.575| 65.2|  4.09|  1|296|   15.3| 396.9| 4.98|24.0|[0.00632,18.0,2.3...|
|0.02731| 0.0| 7.07|   0|0.469|6.421| 78.9|4.9671|  2|242|   17.8| 396.9| 9.14|21.6|[0.02731,0.0,7.07...|
|0.02729| 0.0| 7.07|   0|0.469|7.185| 61.1|4.9671|  2|242|   17.8|392.83| 4.03|34.7|[0.02729,0.0,7.07...|
|0.03237| 0.0| 2.18|   0|0.458|6.998| 45.8|6.0622|  3|222|   18.7|394.63| 2.94|33.4|[0.03237,0.0,2.18...|
|0.06905| 0.0| 2.18|   0|0.458|7.147| 54.2|6.0622|  3|222|   18.7| 396.9| 5.33|36.2|[0.06905,0.0,2.18...|
|0.02985| 0.0| 2.18|   0|0.458| 6.43| 58.7|6.0622|  3|222|   18.7|394.12| 5.21|28.7|[0.02985,0.0,2.18...|
|0.08829|12.5| 7.87|   0|0.524|6.012| 66.6|5.5605|  5|311|   15.2| 395.6|12.43|22.9|[0.08829,12.5,7.8...|
|0.14455|12.5| 7.87|   0|0.524|6.172| 96.1|5.9505|  5|311|   15.2| 396.9|19.15|27.1|[0.14455,12.5,7.8...|
|0.21124|12.5| 7.87|   0|0.524|5.631|100.0|6.0821|  5|311|   15.2|386.63|29.93|16.5|[0.21124,12.5,7.8...|
|0.17004|12.5| 7.87|   0|0.524|6.004| 85.9|6.5921|  5|311|   15.2|386.71| 17.1|18.9|[0.17004,12.5,7.8...|
|0.22489|12.5| 7.87|   0|0.524|6.377| 94.3|6.3467|  5|311|   15.2|392.52|20.45|15.0|[0.22489,12.5,7.8...|
|0.11747|12.5| 7.87|   0|0.524|6.009| 82.9|6.2267|  5|311|   15.2| 396.9|13.27|18.9|[0.11747,12.5,7.8...|
|0.09378|12.5| 7.87|   0|0.524|5.889| 39.0|5.4509|  5|311|   15.2| 390.5|15.71|21.7|[0.09378,12.5,7.8...|
|0.62976| 0.0| 8.14|   0|0.538|5.949| 61.8|4.7075|  4|307|   21.0| 396.9| 8.26|20.4|[0.62976,0.0,8.14...|
|0.63796| 0.0| 8.14|   0|0.538|6.096| 84.5|4.4619|  4|307|   21.0|380.02|10.26|18.2|[0.63796,0.0,8.14...|
|0.62739| 0.0| 8.14|   0|0.538|5.834| 56.5|4.4986|  4|307|   21.0|395.62| 8.47|19.9|[0.62739,0.0,8.14...|
|1.05393| 0.0| 8.14|   0|0.538|5.935| 29.3|4.4986|  4|307|   21.0|386.85| 6.58|23.1|[1.05393,0.0,8.14...|
| 0.7842| 0.0| 8.14|   0|0.538| 5.99| 81.7|4.2579|  4|307|   21.0|386.75|14.67|17.5|[0.7842,0.0,8.14,...|
|0.80271| 0.0| 8.14|   0|0.538|5.456| 36.6|3.7965|  4|307|   21.0|288.99|11.69|20.2|[0.80271,0.0,8.14...|
| 0.7258| 0.0| 8.14|   0|0.538|5.727| 69.5|3.7965|  4|307|   21.0|390.95|11.28|18.2|[0.7258,0.0,8.14,...|
+-------+----+-----+----+-----+-----+-----+------+---+---+-------+------+-----+----+--------------------+
only showing top 20 rows

๐Ÿ“™ Divide Data & Build Model

  • Data๋Š” 8:2๋กœ Train๊ณผ Test๋กœ ๋‚˜๋ˆˆ๋‹ค.
    -> Scikit-Learn์—์„œ train_test_split์™€ ๊ฐ™์€ ์—ญํ• ๋Š” ํ•˜๋Š” randomSplit์„ ์‚ฌ์šฉํ•˜์ž.
  • ๋‚˜๋ˆ„๋Š” ๋น„์œจ์€ ๋ชจ๋ธ, ๋ฐ์ดํ„ฐ ๋“ฑ๋“ฑ์˜ ๋ณ€์ˆ˜๋ฅผ ๊ณ ๋ คํ•ด์„œ ์ •ํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.
train, test = vectorized_data.randomSplit([0.8, 0.2])
  • pyspark.ml.regression์—์„œ ์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ(LinearRegression)์„ ๊ฐ€์ ธ์™€์„œ ๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜์ž.
  • Feature Column๊ณผ ์˜ˆ์ธกํ•˜๊ณ ์ž ํ•˜๋Š” Label์„ ์ง€์ •ํ•ด์ค€๋‹ค.
  • ๋งˆ์ง€๋ง‰์œผ๋กœ fit์„ ์ด์šฉํ•˜์—ฌ ํ•™์Šต ์‹œํ‚จ๋‹ค. Train๋ณ€์ˆ˜๋ฅผ ์ง€์ •ํ•ด์ฃผ์–ด์„œ ํ•™์Šต์‹œ์ผœ์•ผํ•œ๋‹ค.
from pyspark.ml.regression import LinearRegression

model = LinearRegression(featuresCol="features", labelCol="medv").fit(train)

๐Ÿ“™ Model Evaluation

  • ์ด์ œ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•  ์ฐจ๋ก€์ด๋‹ค.
  • ํ‰๊ฐ€ ๋‹จ๊ณ„๋Š” ์‰ฝ๊ฒŒ ์ƒ๊ฐํ•˜๋ฉด ํ•™์Šต๋œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์–ด๋А์ •๋„์ธ์ง€ ์ง€ํ‘œ๋ฅผ ๋ณด๋Š”๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด๋œ๋‹ค.
  • ์ฐธ๊ณ  : ๋‚ด๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํ‰๊ฐ€ ํ•จ์ˆ˜๋ฅผ ์ฐพ์•„์„œ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค.
evaluation_summary = model.evaluate(test)
evaluation_summary
<pyspark.ml.regression.LinearRegressionSummary at 0x7cc1242a2590>
  • ํ‰๊ท  ์ ˆ๋Œ€ ์˜ค์ฐจ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” meanAbsoluteError๋ฅผ ์‚ฌ์šฉํ•ด๋ณด์ž
  • ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’์˜ ์ฐจ์ด์˜ ์ ˆ๋Œ€๊ฐ’์˜ ํ‰๊ท ์ด๋‹ค.
  • ์˜ค์ฐจ์˜ ํฌ๊ธฐ๋ฅผ ์›๋ž˜ ๋‹จ์œ„๋กœ ํ‘œํ˜„ํ•˜๋ฉฐ, ์ด์ƒ์น˜์— ๋œ ๋ฏผ๊ฐํ•˜๋‹ค.
evaluation_summary.meanAbsoluteError
3.751751468553107
  • ํ‰๊ท  ์ œ๊ณฑ๊ทผ ์˜ค์ฐจ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” rootMeanSquaredError๋ฅผ ์‚ฌ์šฉํ•ด๋ณด์ž
  • ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’ ์ฐจ์ด์˜ ์ œ๊ณฑ์˜ ํ‰๊ท ์— ๋ฃจํŠธ๋ฅผ ์”Œ์šด ๊ฐ’์ด๋‹ค.
  • MAE๋ณด๋‹ค ํฐ ์˜ค์ฐจ์— ๋” ๋งŽ์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๋ฉฐ, ์ด์ƒ์น˜์— ๋” ๋ฏผ๊ฐํ•˜๋‹ค.
evaluation_summary.rootMeanSquaredError
5.266467730878813
  • ๊ฒฐ์ • ๊ณ„์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” r2๋ฅผ ์‚ฌ์šฉํ•ด๋ณด์ž.
  • ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ์˜ ๋ถ„์‚ฐ์„ ์–ผ๋งˆ๋‚˜ ์ž˜ ์„ค๋ช…ํ•˜๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.
  • 0์—์„œ 1 ์‚ฌ์ด์˜ ๊ฐ’์„ ๊ฐ€์ง€๋ฉฐ, 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ข‹์€ ๊ฒƒ์ด๋‹ค.
evaluation_summary.r2
0.7002142469300772

Model Predictioon

  • ์ด์ œ ๋ชจ๋ธ์„ ์‹œํ—˜ํ•ด๋ด์•ผํ•œ๋‹ค.
  • ํ‰๊ฐ€ ์ง€ํ‘œ์™€ ์‹ค์‚ฌ์šฉ์€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๋‹ค.
  • ์ด์ „์— ๋งŒ๋“ค์–ด๋†“์€ 0.2 ๋น„์œจ์˜ test ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์ž.
  • ๋ชจ๋“  ์ปฌ๋Ÿผ์„ ๋ณผ ํ•„์š”๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์— medv, features, prediction ๋งŒ ํ‘œ์ถœํ•ด์„œ ํ™•์ธํ•˜์ž.
predictions = model.transform(test)
predictions.select(predictions.columns[13:]).show()
+----+--------------------+------------------+
|medv|            features|        prediction|
+----+--------------------+------------------+
|32.7|[0.01301,35.0,1.5...|30.515144998498805|
|35.4|[0.01311,90.0,1.2...|30.631754822999135|
|24.5|[0.01501,80.0,2.0...|27.128954091991645|
|44.0|[0.01538,90.0,3.7...| 36.96650331489955|
|30.1|[0.01709,90.0,2.0...|  24.6221945758976|
|23.1|[0.0187,85.0,4.15...|24.785644795798774|
|42.3|[0.02177,82.5,2.0...|37.095922515731644|
|16.5|[0.02498,0.0,1.89...| 22.67389351259429|
|34.9|[0.0315,95.0,1.47...|29.789848917775686|
|33.4|[0.03237,0.0,2.18...|28.764672253072774|
|19.4|[0.03466,35.0,6.0...|23.414721268194157|
|45.4|[0.03578,20.0,3.3...| 38.85772413090571|
|23.5|[0.03584,80.0,3.3...|30.000658140048248|
|20.7|[0.03738,0.0,5.19...| 21.66209468225044|
|18.2|[0.04301,80.0,1.9...|13.568129819698605|
|20.5|[0.04337,21.0,5.6...|23.829695990158015|
|19.4|[0.04379,80.0,3.3...|25.147443347186538|
|20.6|[0.04527,0.0,11.9...|22.249892769719427|
|30.3|[0.04666,80.0,1.5...| 32.62958993192931|
|22.6|[0.04684,0.0,3.41...|27.323120358774855|
+----+--------------------+------------------+
only showing top 20 rows

์—ฌ๊ธฐ ๊นŒ์ง€ Linear Regression ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด ๋ณด๋Š” ๋ฐ๋ชจ๋ฅผ ์ง„ํ–‰ํ•ด๋ณด์•˜๋‹ค.
์ด๋Š” ๊ธฐ๋ณธ์ ์ธ ๊ฐ€์ด๋“œ๋ผ์ธ์ผ ๋ฟ์ด๊ธฐ ๋•Œ๋ฌธ์— ์‹ค์‚ฌ์šฉ์—๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์„ ๊ฒƒ์ด๋‹ค.

0๊ฐœ์˜ ๋Œ“๊ธ€