Here is the link to the website.

The link to the data used in the project is located in the "Data Collection" part of the project

Relationship between Price of ETH with the result of Steam hardware survey¶

Kai Yu by himself

Intro¶

This project is meant to investgate the relationship between the price of ETH, the virtral coin, with the video cards information in the Steam user survey with the attempt to build a predictive model and its examination.

Some explaination of the project¶

This topic might be ambiguous in the first case, but I am sure there exists some relation between video cards people are using with the price of ETH from my knowledge of economics. ETH is the most popular virtual coin that needs videocards to mine, and since the amount of videocards manufactures were able to make is relatively fixed, these miners are eating up the cards that were supposed to be purchased by the gamers.

There are two reasons that we choosed 5 years as the time restriction. The first one is that ETH isn't as ancient as bitcoins: it is a relatively newly invented coin that was found in 2013 and become worth 1$ each in 2015. I believe the relationship will be more obvious when ETH forms the majority of the mining-videocard market. The second one is that the videocards evolves and eliminate themselves when new generation of cards came out. The 5-year restriction makes it possible for us to focus on certain videocards that is influencial and allows easier comparison.

# This bar of code is meant to import all libraries we needed.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.feature_extraction import DictVectorizer
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score

Data Collection¶

The main dataset we are using is the steam hardware survey, a survey conducted by steam monthly to collect data about what kinds of computer hardware and software their customers are using. In this project, we are mainly focused on the different videocards people are using. In our raw data, we will have a list of different videocards that each followed by a percentage, repersenting the percentage of steam user using that type of videocard. Steam doesn't supply the historical data in a nice form, but luckily there is a selfishless contributer that monthly update the data into a github repository. Let us thank him: jdegene and his repository.

The second dataset we are using is the price of the ETH coin daily from Coingecko. In this project, in order to correspond with the data from steam, we are going to use the price from the first day of every month.

ETL with EDA¶

We loaded two csv data files included in this dataset folder from the sources refered above in the "Project Dataset" section.

from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/My Drive/Colab Notebooks/project

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/My Drive/Colab Notebooks/project

eth_df = pd.read_csv('eth.csv')
steam_survey_df = pd.read_csv('shs.csv')

eth_df.head()

It's quite obvious that we do not need all the data in this dataframe of prices, and therefore we will tidy the dataframe in the next block.

# removing data not needed from the eth_df to tidy the dataframe

eth_df["Year"] = eth_df.snapped_at.str.split("-", expand=True)[0]
eth_df["Month"] = eth_df.snapped_at.str.split("-", expand=True)[1]
eth_df["Day"] = eth_df.snapped_at.str.split("-", expand=True)[2].str.split(" ", expand=True)[0]
eth_monthly_df = eth_df[eth_df['Day'] == "01"]
eth_monthly_df['Time'] = eth_monthly_df["Year"] + "-" + eth_monthly_df["Month"]
eth_monthly_df = eth_monthly_df.set_index('Time')
eth_monthly_df = eth_monthly_df[["price"]]
eth_monthly_df

<ipython-input-28-561e2e16105f>:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eth_monthly_df['Time'] = eth_monthly_df["Year"] + "-" + eth_monthly_df["Month"]

Now this dataframe is tidy enough to do some visualization. To be fare I never realized that the up and downs for the price of ETH is this huge until I see this graph.

eth_monthly_df.price.plot.line(x='Month_index', y='price')

<matplotlib.axes._subplots.AxesSubplot at 0x7f2c3687f280>

Our next goal is the dataframe for the Steam Survey.

steam_survey_df.head()

Wow, wow. It seems that there are too much irrelevent data in this dataframe we are observing, as I believe that "Free Hard Drive Space" is not a great indicator of ETH prices. Lets start to clean it up.

# Cleaning up data from shs.csv, as we only need data for GPUs for a certain time period.
# But firstly, lets do some necceassary work for easier assign of index later.
steam_survey_df["Year"] = steam_survey_df.date.str.split("-", expand=True)[0]
steam_survey_df["Month"] = steam_survey_df.date.str.split("-", expand=True)[1]
steam_survey_df["Year"] = pd.to_numeric(steam_survey_df["Year"], errors="coerce")
steam_survey_df["Month"] = pd.to_numeric(steam_survey_df["Month"], errors="coerce")
steam_survey1_df = steam_survey_df[((steam_survey_df.Year > 2017) | ((steam_survey_df.Year == 2017) & (steam_survey_df.Month > 9))) & ((steam_survey_df.Year != 2022) | (steam_survey_df.Month != 10))]
GPU_df = steam_survey1_df[steam_survey_df.category == "Video Card Description"]

GPU_df.name.value_counts()
GPU_df

<ipython-input-31-d9663a3efb16>:8: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  GPU_df = steam_survey1_df[steam_survey_df.category == "Video Card Description"]

It is worth to note that in the original plan we were supposed to include the data in the September of 2017. However it was found that there are some dataloss for that month as shown below.

# In the primary design we should collect data for 5 years. However after looking into primary data I found there are some data loss
# for the Sept.2017. Therefore our research can be only start at 2017.10

steam_20179_survey_df = steam_survey_df[(steam_survey_df.Year == 2017) & (steam_survey_df.Month == 9)]
GPU_20179_df = steam_20179_survey_df[steam_survey_df.category == "Video Card Description"]
GPU_20179_df

# As you can see theres only 4 line of data

<ipython-input-32-26f12f71cddf>:5: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  GPU_20179_df = steam_20179_survey_df[steam_survey_df.category == "Video Card Description"]

The next part is the most time-consuming part of the project. We know that there are a number of GPUs that is not suitable for mining, as there is a hard requirement on the VRAM of the GPU. Therefore we have to do a manual selection to filter out those GPUs that are relatively "irrelavent" here. For a closer look please follow the next block of code.

#Since there is a hard requirement about VRAM on doing mining, we have to filter the cards with "mining related" and "non-related"

# First let's divide cards into AMD side and NVIDIA side
GPU_df["manu"] = GPU_df.name.str.split(" ", expand=True)[0]
GPU_df["mining"] = 0

# We know that all GPUs from NVIDIA named after "RTX" is capable for mining

GPU_df["type1"] = GPU_df.name.str.split(" ", expand=True)[2]
GPU_df.loc[GPU_df['type1'] == "RTX",["mining"]] = 1

# Next, We know that cards started with GTX after the 10 series are all capable except GTX1050 and GTX1050TI.

GPU_df["type2"] = GPU_df.name.str.split(" ", expand=True)[3]
# GPU_df.loc[(GPU_df['type1'] == "GTX" & (GPU_df['type2'] == "1080" | GPU_df['type2'] == "1070" | GPU_df['type2'] == "1060")) ,["mining"]] = 1
GPU_df.loc[(GPU_df['type1'] == "GTX") & (GPU_df['type2'] == "1080"),["mining"]] = 1
GPU_df.loc[(GPU_df['type1'] == "GTX") & (GPU_df['type2'] == "1070"),["mining"]] = 1
GPU_df.loc[(GPU_df['type1'] == "GTX") & (GPU_df['type2'] == "1060"),["mining"]] = 1

# Next step, lets work on the AMD cards. We know that most cards started with RX are capable for mining except a few low-end card.
GPU_df.loc[(GPU_df['type1'] == "RX"), ["mining"]] = 1
GPU_df.loc[(GPU_df['type1'] == "RX") & (GPU_df['type2'] == "460"),["mining"]] = 0
GPU_df.loc[(GPU_df['type1'] == "RX") & (GPU_df['type2'] == "560"),["mining"]] = 0
GPU_df.loc[(GPU_df['type1'] == "RX") & (GPU_df['type2'] == "450"),["mining"]] = 0
GPU_df.loc[(GPU_df['type1'] == "RX") & (GPU_df['type2'] == "550"),["mining"]] = 0
GPU_df.loc[(GPU_df['type1'] == "RX") & (GPU_df['type2'] == "Vega"),["mining"]] = 0

# There are some problem with AMD R9 200 series. We know that 290 and 280 are capable for mining while 250 and 260 are not, but
# for some reason, they all shown the same in the survey like "200 series". Sadly these data will not count towards our program.

GPU_df.loc[(GPU_df['type1'] == "R9") & (GPU_df['type2'] == "390"),["mining"]] = 1
GPU_df.loc[(GPU_df['type1'] == "R9") & (GPU_df['type2'] == "380"),["mining"]] = 1

GPU_df

<ipython-input-33-e1544ff5d785>:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  GPU_df["manu"] = GPU_df.name.str.split(" ", expand=True)[0]
<ipython-input-33-e1544ff5d785>:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  GPU_df["mining"] = 0
<ipython-input-33-e1544ff5d785>:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  GPU_df["type1"] = GPU_df.name.str.split(" ", expand=True)[2]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:1817: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)
<ipython-input-33-e1544ff5d785>:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  GPU_df["type2"] = GPU_df.name.str.split(" ", expand=True)[3]

Now we marked out all the "minable" GPUs. There were some data that is useful in that selection but no longer needed. Let us finish them.

# Cleaning up the dataframe
gpu1_df = GPU_df.loc[(GPU_df['mining'] == 1)]
gpu1_df["Time"] = gpu1_df.date.str.split("-", expand=True)[0] + "-" + gpu1_df.date.str.split("-", expand=True)[1]
gpu1_df = gpu1_df[["Time","name","change","percentage"]]
gpu1_df

<ipython-input-34-7bbf11be2e0f>:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gpu1_df["Time"] = gpu1_df.date.str.split("-", expand=True)[0] + "-" + gpu1_df.date.str.split("-", expand=True)[1]

Nice! This is the piece of information we needed. Now the problem we are facing is that, as one can tell from the dataframe, it is almost impossible to produce a graph that is meaningful if we want to plot different GPUs one by one, as there are so many of them.(Trust me, no one want to see that messy graph!) Therefore, for a straight-forward and understandable purpose, the next block of code will supply a graph where the total percentage of minable GPUs change with time.

# Get the sum of all minable card at one time and plot them

gpu3_df = gpu1_df.groupby("Time").sum()

axes = gpu3_df.plot.line(subplots=True)
type(axes)

numpy.ndarray

To be fair, the first edition of the project was focusing on the sum of the minable GPUs. Although that idea was later abandoned, there are still something we can utilized out of it, like this clean correlation table below which shows us that the change is unreliable in the prediction.

# Actual merge
gpu3_df = gpu3_df.merge(eth_monthly_df, on=["Time"])
gpu3_df = gpu3_df.reset_index()
gpu3_df = gpu3_df.set_index("Time")
gpu3_df = gpu3_df.reset_index()
gpu3_df = gpu3_df.reset_index()
gpu2_df = gpu3_df[["index","percentage","change","price"]]
gpu2_df.corr()

We have to admit that both the graph and table are clean and tidy. However, when doing the knn model to do prediction, we need the information about videocards to be seperate for a better result. Therefore the dataframe might be a little bit messy...

gpu1_df = gpu1_df.groupby(["Time","name"])["percentage"].sum().unstack().fillna(0)
gpu1_df

This is how the entire dataframe looks like. We are going to merge this with the price dataframe which allows us to build a model out of it in the next section.

# Merge data here with the data from the ETH price.
# Some preperation to make similiar format of data
gpu1_df = gpu1_df.reset_index()
gpu1_df = gpu1_df.reset_index()

# Actual merge
gpu2_df = gpu1_df.merge(eth_monthly_df, on=["Time"])
gpu2_df.head()

The Model¶

The goal of this project is to build a pridictive model that pridict the price of ETH with the videocards people are using. After it we can run some analysis and check if our model works. If we reached a positive result, then we can announce that the model is buildable and there is a relationship between the price and the videocards people using.

Firstly, we should select the correct k value to use. To do this we run a error analysis to check where do the lowest error appear.

# get the features (in dict format) and the labels
# (do not split into training and validation sets)
features = gpu1_df.columns.to_list()
X_dict = gpu2_df[features].to_dict(orient="records")
y = gpu2_df["price"]

# specify the pipeline
vec = DictVectorizer(sparse=False)
scaler = StandardScaler()
model = KNeighborsRegressor(n_neighbors=10)
pipeline = Pipeline([("vectorizer", vec), ("scaler", scaler), ("fit", model)])

# calculates estimate of test error based on 10-fold cross validation
def get_cv_error(k):
    model = KNeighborsRegressor(n_neighbors=k)
    pipeline = Pipeline([("vectorizer", vec), ("scaler", scaler), ("fit", model)])
    mse = np.mean(-cross_val_score(
        pipeline, X_dict, y, 
        cv=10, scoring="neg_mean_squared_error"
    ))
    return mse
    
ks = pd.Series(range(1, 51))
ks.index = range(1, 51)
test_errs = ks.apply(get_cv_error)

test_errs.plot.line()
test_errs.sort_values()

# According to this, we should select 2-nearest neighbor

2     3.853555e+05
1     4.936272e+05
3     5.127928e+05
10    5.654536e+05
5     5.671676e+05
6     5.849864e+05
8     5.900740e+05
9     5.946717e+05
11    6.013919e+05
4     6.176275e+05
12    6.330194e+05
13    6.390203e+05
7     6.454238e+05
14    6.617963e+05
15    6.952350e+05
16    6.992996e+05
17    7.267150e+05
18    7.707532e+05
19    7.964468e+05
20    8.264556e+05
21    8.721644e+05
22    8.764426e+05
23    8.937190e+05
24    9.102366e+05
25    9.322279e+05
26    9.648371e+05
27    9.921911e+05
28    1.015427e+06
29    1.043338e+06
30    1.073274e+06
31    1.097883e+06
32    1.117335e+06
33    1.131684e+06
34    1.152017e+06
35    1.175076e+06
36    1.199736e+06
37    1.222803e+06
38    1.247678e+06
39    1.279414e+06
40    1.308594e+06
41    1.345659e+06
42    1.375562e+06
43    1.409627e+06
44    1.459220e+06
45    1.496547e+06
46    1.530694e+06
47    1.571892e+06
48    1.632691e+06
49    1.684954e+06
50    1.724338e+06
dtype: float64

According to the result above, now we test with a 2-fold validation about the exact value of error when k = 2.

# next, we will build a 2-nearest neighbor model and check its cross-validation error

def get_val_error(X_train_dict, y_train, X_val_dict, y_val):
    
    # convert categorical variables to dummy variables
    vec = DictVectorizer(sparse=False)
    vec.fit(X_train_dict)
    X_train = vec.transform(X_train_dict)
    X_val = vec.transform(X_val_dict)

    # standardize the data
    scaler = StandardScaler()
    scaler.fit(X_train)
    X_train_sc = scaler.transform(X_train)
    X_val_sc = scaler.transform(X_val)
    
    # Fit a 2-nearest neighbors model.
    model = KNeighborsRegressor(n_neighbors=2)
    model.fit(X_train_sc, y_train)
    
    # Make predictions on the validation set.
    y_val_pred = model.predict(X_val_sc)
    rmse = np.sqrt(((y_val - y_val_pred) ** 2).mean())
    
    return rmse

def get_cross(a, b, c, d):
  return (get_val_error(a, b, c, d) + get_val_error(c, d, a, b))/2

train = gpu2_df.sample(frac=.5)
val = gpu2_df.drop(train.index)

X_train_dict = train[features].to_dict(orient="records")
X_val_dict = val[features].to_dict(orient="records")

y_train = train["price"]
y_val = val["price"]

get_cross(X_train_dict, y_train, X_val_dict, y_val)

# seems acceptable

458.9752099894905

I will say that this error is acceptable. Also this is a better result compared to the model built from the sum of all percentages, where the error is at somewhere around 580.

Finally, lets actually build the graph and see how it works with a graph.

# Lets see how our model works with a graph

features = gpu1_df.columns.to_list()
X_train_dict = gpu2_df[features].to_dict(orient="records")
y_train = gpu2_df["price"]

vec = DictVectorizer(sparse=False)
vec.fit(X_train_dict)
X_train = vec.transform(X_train_dict)

scaler = StandardScaler()
scaler.fit(X_train)
X_train_sc = scaler.transform(X_train)

# Fit a 2-nearest neighbors model.
model = KNeighborsRegressor(n_neighbors=2)
model.fit(X_train_sc, y_train)

# Calculate the model predictions on the training data.
y_train_pred = model.predict(X_train_sc)
y_train_pred

dict = {'real_price': y_train, 'predicted_price': y_train_pred} 
df_com = pd.DataFrame(dict)
df_com

axes = df_com.plot.line(subplots=True)
type(axes)

numpy.ndarray

With this graph, We believe that it is allowed for us to say the we achieved our goal as these two graph are so similiar, which indicate that our hypothesis is true and there is a relationship between the x and y here.

Conclusion and some further insight¶

With the result in the model section, We succeassfully built a model and checked its accuracy. However, as indicated in the beginning of this project, since ETH officially stopped their support of the "merge" model, which suggest that it can no longer be mined with videocards, we remain pessimistic about the accuracy of this model in the future. We will call this "Shattered dreams of wealth"

Thanks for reading!

%%shell
jupyter nbconvert --to html FinalProject.ipynb

[NbConvertApp] Converting notebook FinalProject.ipynb to html
[NbConvertApp] Writing 504374 bytes to FinalProject.html

	snapped_at	price	market_cap	total_volume
0	2015-08-07 00:00:00 UTC	2.831620	0.000000e+00	9.062200e+04
1	2015-08-08 00:00:00 UTC	1.330750	8.033948e+07	3.680700e+05
2	2015-08-10 00:00:00 UTC	0.687586	4.155631e+07	4.004641e+05
3	2015-08-11 00:00:00 UTC	1.067379	6.453901e+07	1.518998e+06
4	2015-08-12 00:00:00 UTC	1.256613	7.601326e+07	2.073893e+06

	price
Time
2015-09	1.350787
2015-10	0.685812
2015-11	1.058542
2015-12	0.875004
2016-01	0.946313
...	...
2022-06	1944.078767
2022-07	1057.630999
2022-08	1682.011374
2022-09	1552.562966
2022-10	1329.146032

	date	category	name	change	percentage
0	2008-11-01	AMD CPU Speeds	1.4 Ghz to 1.49 Ghz	-0.0004	0.0036
1	2008-11-01	AMD CPU Speeds	1.5 Ghz to 1.69 Ghz	-0.0025	0.0224
2	2008-11-01	AMD CPU Speeds	1.7 Ghz to 1.99 Ghz	-0.0024	0.0714
3	2008-11-01	AMD CPU Speeds	2.0 Ghz to 2.29 Ghz	-0.0040	0.1343
4	2008-11-01	AMD CPU Speeds	2.3 Ghz to 2.69 Ghz	0.0001	0.0727

	date	category	name	change	percentage	Year	Month
31764	2017-10-01	Video Card Description	AMD Radeon HD 7700 Series	-0.0029	0.0048	2017	10
31765	2017-10-01	Video Card Description	AMD Radeon HD 7800 Series	-0.0015	0.0042	2017	10
31766	2017-10-01	Video Card Description	AMD Radeon HD 7900 Series	-0.0027	0.0041	2017	10
31767	2017-10-01	Video Card Description	AMD Radeon HD 8370D	-0.0009	0.0024	2017	10
31768	2017-10-01	Video Card Description	AMD Radeon HD 8470D	-0.0021	0.0038	2017	10
...	...	...	...	...	...	...	...
51114	2022-09-01	Video Card Description	NVIDIA GeForce RTX 3080	0.0000	0.0164	2022	9
51115	2022-09-01	Video Card Description	NVIDIA GeForce RTX 3080 Laptop GPU	0.0000	0.0018	2022	9
51116	2022-09-01	Video Card Description	NVIDIA GeForce RTX 3080 Ti	0.0000	0.0072	2022	9
51117	2022-09-01	Video Card Description	NVIDIA GeForce RTX 3090	-0.0001	0.0047	2022	9
51118	2022-09-01	Video Card Description	Other	-0.0041	0.0963	2022	9

	date	category	name	change	percentage	Year	Month
31553	2017-09-01	Video Card Description	NVIDIA GeForce GTX 1060	0.0117	0.0747	2017	9
31554	2017-09-01	Video Card Description	NVIDIA GeForce GTX 750 Ti	0.0081	0.0603	2017	9
31555	2017-09-01	Video Card Description	NVIDIA GeForce GTX 960	0.0090	0.0659	2017	9
31556	2017-09-01	Video Card Description	Other	0.6720	0.7991	2017	9

	Time	name	change	percentage
31777	2017-10	AMD Radeon R9 380 Series	-0.0013	0.0023
31778	2017-10	AMD Radeon R9 390 Series	-0.0011	0.0017
31780	2017-10	AMD Radeon RX 470	-0.0008	0.0016
31781	2017-10	AMD Radeon RX 480	-0.0024	0.0037
31819	2017-10	NVIDIA GeForce GTX 1060	0.0382	0.1129
...	...	...	...	...
51113	2022-09	NVIDIA GeForce RTX 3070 Ti Laptop GPU	0.0006	0.0024
51114	2022-09	NVIDIA GeForce RTX 3080	0.0000	0.0164
51115	2022-09	NVIDIA GeForce RTX 3080 Laptop GPU	0.0000	0.0018
51116	2022-09	NVIDIA GeForce RTX 3080 Ti	0.0000	0.0072
51117	2022-09	NVIDIA GeForce RTX 3090	-0.0001	0.0047

	index	percentage	change	price
index	1.000000	0.971512	-0.016450	0.692169
percentage	0.971512	1.000000	0.052833	0.561530
change	-0.016450	0.052833	1.000000	-0.021742
price	0.692169	0.561530	-0.021742	1.000000

	index	Time	AMD Radeon R9 380 Series	AMD Radeon R9 390 Series	AMD Radeon RX 470	AMD Radeon RX 480	...	price
0	0	2017-10	0.0023	0.0017	0.0016	0.0037	...	299.674889
1	1	2017-11	0.0018	0.0000	0.0000	0.0029	...	292.330539
2	2	2017-12	0.0019	0.0000	0.0015	0.0030	...	469.863480
3	3	2018-01	0.0017	0.0000	0.0000	0.0026	...	776.080330
4	4	2018-02	0.0019	0.0000	0.0000	0.0029	...	1106.138172