A common task for almost all data science projects is to preprocess your data. This is especially the case for tabular data. In this blog I want to build a pre-process function which can handle will handle all these pre-processing steps for us. On top of that, I will add enough flexibility to that function to make it useful not just in this example.

Let's start by loading the data.

Imports

import numpy as np
import pandas as pd
import pickle
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler

We later want to save and load data with pickle, so let's define these functions:

def save_obj(obj, name ):
    with open(f'{name}.pkl', 'wb') as f:
        pickle.dump(obj, f)

def load_obj(name ):
    with open(f'{name}.pkl', 'rb') as f:
        return pickle.load(f)

df = pd.read_csv('train.csv')

df.head()

df.shape

(1460, 81)

We've got 81 columns in our data. That's a lot of data, for this blog we only need a couple of these variables. The function will then work for any length of datasets. We define which columns to keep and then check the data types.

to_keep = ['Id', 'MSZoning', 'LotFrontage', 'LotArea', 'Street', 'OverallQual', 'YearBuilt', 'CentralAir', 'SaleType', 'SalePrice']

df = df[to_keep]
df.head()

pd.set_option('display.max_rows', 20)
df.dtypes

Id               int64
MSZoning        object
LotFrontage    float64
LotArea          int64
Street          object
OverallQual      int64
YearBuilt        int64
CentralAir      object
SaleType        object
SalePrice        int64
dtype: object

Datetime vs Continuous vs Categorical

In tabular data, there are usually four different data types: dates, continuous and categorical variables (of which boolean are a special type). We need to define which column is of which type. This dataset does not contain a datetime column, so I will define some random datetime data to show you how to cope with datetime data too.

df["Fake_date"] = np.random.choice(pd.date_range('1980-01-01', '2000-01-01'), len(df)).astype('str')
df["Fake_date_2"] = np.random.choice(pd.date_range('1980-01-01', '2000-01-01'), len(df)).astype('str')

df.head()

df.dtypes

Id               int64
MSZoning        object
LotFrontage    float64
LotArea          int64
Street          object
OverallQual      int64
YearBuilt        int64
CentralAir      object
SaleType        object
SalePrice        int64
Fake_date       object
Fake_date_2     object
dtype: object

We now need to define which column should be of which type. We use four lists for this:

date_type = ['Fake_date', 'Fake_date_2']
continuous_type = ['LotFrontage', 'LotArea', 'YearBuilt', 'SalePrice']
categorical_type = ['MSZoning', 'Street', 'OverallQual', 'SaleType','CentralAir']

I defined 'OverallQual' as a category, because even though this variable is in ascending order, we do not know whether the difference from OverallQual of 6 and 7 is the same magnitude than OverallQual from 0 to 1. However, you can define this variable as continous as well.

We then define a small function, which will take our lists as input and transform our data to the correct data types.

def df_to_type(df, cont_type, cat_type):
    if cat_type is not None:
        df[cat_type] = df[cat_type].astype('category')
    for i in date_type:
        df[i] = pd.to_datetime(df[i])
    return df

Did it work?

df_to_type(df, continuous_type, categorical_type)
df.dtypes

Id                      int64
MSZoning             category
LotFrontage           float64
LotArea                 int64
Street               category
OverallQual          category
YearBuilt               int64
CentralAir           category
SaleType             category
SalePrice               int64
Fake_date      datetime64[ns]
Fake_date_2    datetime64[ns]
dtype: object

That looks good! We can now make use of pandas various datetime functions to add more informations to our data. I therefore make use of fastai's add_datepart function. However, I do not want to import the whole library, so I simple get the parts for this specific function.

import re

def ifnone(a:any,b:any)->any:
    "`a` if `a` is not None, otherwise `b`."
    return b if a is None else a
    
def make_date(df, date_field):
    "Make sure `df[date_field]` is of the right date type."
    field_dtype = df[date_field].dtype
    if isinstance(field_dtype, pd.core.dtypes.dtypes.DatetimeTZDtype):
        field_dtype = np.datetime64
    if not np.issubdtype(field_dtype, np.datetime64):
        df[date_field] = pd.to_datetime(df[date_field], infer_datetime_format=True)
    
def add_datepart(df, field_name, prefix=None, drop=True, time=False):
    "Helper function that adds columns relevant to a date in the column `field_name` of `df`."
    make_date(df, field_name)
    field = df[field_name]
    prefix = ifnone(prefix, re.sub('[Dd]ate$', '', field_name))
    attr = ['Year', 'Month', 'Day', 'Dayofweek', 'Dayofyear', 'Is_month_end', 'Is_month_start',
            'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start']
    if time: attr = attr + ['Hour', 'Minute', 'Second']
    for n in attr: df[prefix + n] = getattr(field.dt, n.lower())
    # Pandas removed `dt.week` in v1.1.10
    week = field.dt.isocalendar().week if hasattr(field.dt, 'isocalendar') else field.dt.week
    df.insert(3, prefix+'Week', week)
    mask = ~field.isna()
    df[prefix + 'Elapsed'] = np.where(mask,field.values.astype(np.int64) // 10 ** 9,None)
    if drop: df.drop(field_name, axis=1, inplace=True)
    return df

Let's add to the function, so that after defining the correct data type it takes all of our date-variables and transforms them according to add_datepart.

def df_to_type(df, date_type=None, cont_type=None, cat_type=None):
    if cat_type is not None:
        df[cat_type] = df[cat_type].astype('category')
    if date_type is not None:
        for i in date_type:
            df[i] = pd.to_datetime(df[i])
            df = add_datepart(df, i)
    return df

df_1 = df_to_type(df, date_type, continuous_type, categorical_type)
df_1.head()

Pretty cool right? The next step is to split the data into train and validation set. There are usually three ways to do this: split randomly, split based on a identifier column (so that the same person is either in train or valid regardless of how many rows she represents in the data) or by date (usually in time series).

I give three examples how to do this given this specific data. I start by randomly splitting the data:

First, we need to define our dependend variable and the rest:

dep_var = 'SalePrice'
cols = list(df_1.columns)
cols.remove(dep_var)

Here's how we can make a random split using the train_test_split:

X_train, X_test, y_train, y_test = train_test_split(df_1[cols], df_1[dep_var], test_size=0.33, random_state=42)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((978, 35), (482, 35), (978,), (482,))

Another common way how to split the data is based on an ID, so that an ID can only be in one group. Next to the dependend variable we need to define the variable on which to split.

split_var = 'Id'

# list of unique_id
unique_id_array = list(df_1[split_var].unique())

# split into train and test data based on uid
test_size=0.33
cnt_uid = len(unique_id_array)
len_test = np.round(cnt_uid*test_size).astype(int)
len_train = cnt_uid - len_test

test_idx = list(np.random.choice(unique_id_array, len_test, replace=False))
train_idx = list(set(unique_id_array) - set(test_idx))

X_train = df_1[df_1[split_var].isin(train_idx)].copy()
y_train = X_train[dep_var]
X_train = X_train[cols]
X_test = df_1[df_1[split_var].isin(test_idx)].copy()
y_test = X_test[dep_var]
X_test = X_test[cols]

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((978, 35), (482, 35), (978,), (482,))

In this case we end up with the exact same shape of all train and test data. This is due to the fact, that in this case the Id is unique. To see that this will change when there is more than one row per Id I create a fake_ID and show you the results. Let's assume we have 10 IDs:

IDS = np.array([1,2,3,4,5,6,7,8,9,10])
n = df_1.shape[0]
df_1['Fake_ID'] = np.resize(IDS, n)

df_1.Fake_ID.head(12)

0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10
10     1
11     2
Name: Fake_ID, dtype: int64

We now do the same thing as before, only using this Fake_ID as split_ID:

split_var = 'Fake_ID'

# list of unique_id
unique_id_array = list(df_1[split_var].unique())

# split into train and test data based on uid
test_size=0.33
cnt_uid = len(unique_id_array)
len_test = np.round(cnt_uid*test_size).astype(int)
len_train = cnt_uid - len_test

test_idx = list(np.random.choice(unique_id_array, len_test, replace=False))
train_idx = list(set(unique_id_array) - set(test_idx))

X_train = df_1[df_1[split_var].isin(train_idx)].copy()
y_train = X_train[dep_var]
X_train = X_train[cols]
X_test = df_1[df_1[split_var].isin(test_idx)].copy()
y_test = X_test[dep_var]
X_test = X_test[cols]

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((1022, 35), (438, 35), (1022,), (438,))

Beatiful! So let's move on to our last way to split data, defined by the date. We have created the Fake_date variable, resulting in random dates from 01.01.2980 - 01.01.2000. Let's say, we want the last 6 month to be our testset.

cond = (df_1.Fake_Year<1999) | (df_1.Fake_Month<6)

train_idx = np.where( cond)[0]
test_idx = np.where(~cond)[0]

X_train = df_1.iloc[train_idx]
y_train = X_train[dep_var]
X_train = X_train[cols]
X_test = df_1.iloc[test_idx]
y_test = X_test[dep_var]
X_test = X_test[cols]

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((1413, 35), (47, 35), (1413,), (47,))

We now have the three most common cases for how to split your data. Let's put that into a function.

def split_df(df, x_cols, dep_var, test_size, split_mode='random', split_var=None, cond=None):
    '''
    split_mode can take three values: random, on_split_id, on_condition
    '''
    if split_mode == 'random':
        from sklearn.model_selection import train_test_split
        X_train, X_test, y_train, y_test = train_test_split(df[x_cols], df_1[dep_var], test_size=test_size)
    elif split_mode == 'on_split_id':
        if split_var is None:
            print('Give name of split_var')
        else:
            # list of unique_id
            unique_id_array = list(df[split_var].unique())

            # split into train and test data based on uid
            test_size=0.33
            cnt_uid = len(unique_id_array)
            len_test = np.round(cnt_uid*test_size).astype(int)
            len_train = cnt_uid - len_test

            test_idx = list(np.random.choice(unique_id_array, len_test, replace=False))
            train_idx = list(set(unique_id_array) - set(test_idx))

            X_train = df[df[split_var].isin(train_idx)].copy()
            y_train = X_train[dep_var]
            X_train = X_train[x_cols]
            X_test = df[df[split_var].isin(test_idx)].copy()
            y_test = X_test[dep_var]
            X_test = X_test[x_cols]
    elif split_mode == 'on_condition':
        if cond is None:
            print('You have to specify cond, for example like so: cond = (df_1.Fake_Year<1999) | (df_1.Fake_Month<6)')
        else:
            train_idx = np.where( cond)[0]
            test_idx = np.where(~cond)[0]

            X_train = df_1.iloc[train_idx]
            y_train = X_train[dep_var]
            X_train = X_train[cols]
            X_test = df_1.iloc[test_idx]
            y_test = X_test[dep_var]
            X_test = X_test[cols]
    else:
        print('Something is not working right, did you specify the split_mode?')
    
    return X_train, X_test, y_train, y_test

Let's check our function for all three split_modes:

X_train, X_test, y_train, y_test   = split_df(df=df_1, x_cols=cols, dep_var=dep_var, test_size=0.33, split_mode='random')
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((978, 35), (482, 35), (978,), (482,))

X_train, X_test, y_train, y_test   = split_df(df=df_1, x_cols=cols, dep_var=dep_var, test_size=0.33, split_mode='on_split_id', split_var=split_var)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((1022, 35), (438, 35), (1022,), (438,))

X_train, X_test, y_train, y_test   = split_df(df=df_1, x_cols=cols, dep_var=dep_var, test_size=0.33, split_mode='on_condition', cond=cond)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((1413, 35), (47, 35), (1413,), (47,))

Absolutely brilliant. And we're almost done now. The last piece of the puzzle is the transformation and standardization of our data. We convert the categorical variables into numbers and save the mapping into a dictionary. We standardize the continuous variables with a standardizer of choice and save it. IMPORTANT: we build the dictionaries and the standardizer on the trainset only and use it on the testset to avoid any spillover effects.

First, we build a function which will transform all of our categorical variables and save it to path. We therefore need a function to save our dicts (and later load them).

def cat_transform(X_train, X_test, cat_type, path=''):
    dict_list = []
    dict_inv_list = []
    for i in cat_type:
        dict_ = dict( enumerate(X_train[i].cat.categories ) )
        dict_inv_ = {v: k for k, v in dict_.items()}
        X_train[i] = X_train[i].map(dict_inv_)
        X_test[i] = X_test[i].map(dict_inv_)
        dict_list.append(dict_)
        dict_inv_list.append(dict_inv_list)
    dict_name = f'{path}dict_list_cat'
    save_obj(dict_list, dict_name)
    dict_inv_name = f'{path}dict_inv_list_cat'
    save_obj(dict_inv_list, dict_inv_name)
    return X_train, X_test, dict_list, dict_inv_list

X_train, X_test, y_train, y_test = split_df(df=df_1, x_cols=cols, dep_var=dep_var, test_size=0.33, split_mode='random')
X_train, X_test, dict_list, dict_inv_list = cat_transform(X_train, X_test, categorical_type)

X_train[categorical_type].head()

With this function we automatically saved our dictionaries and mappings to the corresponding path. Next, we take the continuous variables, use a standardizer of our choice and save the standardizer to path. We have to indicate the ID column, because we do not want to standardize this column. We also have to define whether we want to standardize our target variable.

def cont_standardize(X_train, X_test, y_train, y_test, cat_type=None, id_type=None, transform_y=True, path='', standardizer='StandardScaler'):
    if standardizer =='StandardScaler':
        scaler = StandardScaler()
        if cat_type==None:
            cont_type = list(X_train.columns)
            cont_type.remove(id_type)
        elif id_type==None:
            list(set(X_train.columns) - set(id_type))
        elif cat_type==None and id_type==None:
            cont_type = list(X_train.columns)
        else:
            cont_type = list(set(X_train.columns) - set(cat_type))
            cont_type.remove(id_type)
                                 
        X_train[cont_type] = scaler.fit_transform(X_train[cont_type])
        X_test[cont_type] = scaler.transform(X_test[cont_type])
        scaler_name = f'{path}StandardScaler'
        save_obj(scaler, scaler_name)
        if transform_y:
            scaler_y = StandardScaler()
            y_train = scaler_y.fit_transform(y_train.values.reshape(-1, 1))
            y_test = scaler_y.transform(y_test.values.reshape(-1, 1))
            scaler_y_name = f'{path}StandardScaler_y'
            save_obj(scaler_y, scaler_name)
        else:
            pass
        if transform_y:
            return X_train, X_test, y_train, y_test, scaler, scaler_y
        else:
            return X_train, X_test, y_train, y_test, scaler
    
    elif standardizer =='MinMaxScaler':
        scaler = MinMaxScaler()
        if cat_type==None:
            cont_type = list(X_train.columns)
            cont_type.remove(id_type)
        elif id_type==None:
            list(set(X_train.columns) - set(id_type))
        elif cat_type==None and id_type==None:
            cont_type = list(X_train.columns)
        else:
            cont_type = list(set(X_train.columns) - set(cat_type))
            cont_type.remove(id_type)
                                 
        X_train[cont_type] = scaler.fit_transform(X_train[cont_type])
        X_test[cont_type] = scaler.transform(X_test[cont_type])
        scaler_name = f'{path}MinMaxScaler'
        save_obj(scaler, scaler_name)
        if transform_y:
            scaler_y = MinMaxScaler()
            y_train = scaler_y.fit_transform(y_train.values.reshape(-1, 1))
            y_test = scaler_y.transform(y_test.values.reshape(-1, 1))
            scaler_y_name = f'{path}MinMaxScaler_y'
            save_obj(scaler_y, scaler_name)
        else:
            pass
        if transform_y:
            return X_train, X_test, y_train, y_test, scaler, scaler_y
        else:
            return X_train, X_test, y_train, y_test, scaler
        
    else:
        print('standardizer can either be StandardScaler or MinMaxScaler')

id_type='Id'

X_train, X_test, y_train, y_test, scaler, scaler_y = cont_standardize(X_train, X_test, y_train, y_test, cat_type=categorical_type, id_type='Id', transform_y=True)

X_train.head()

y_train[:5]

array([[-0.02988161],
       [-0.11953991],
       [-1.1544938 ],
       [-0.71334758],
       [-0.62014767]])

We have now saved our StandardScaler to path and transformed the data the way we wanted it. To see the "real" values of our y_train we simply invert our standardizer:

scaler_y.inverse_transform(y_train[:5])

array([[180000.],
       [172785.],
       [ 89500.],
       [125000.],
       [132500.]])

Beautiful! And now I want to show you how incredibly quickly we can now setup our data. I deleted everything in memory and started from here.

Complete run-through

df = pd.read_csv('train.csv')
to_keep = ['Id', 'MSZoning', 'LotFrontage', 'LotArea', 'Street', 'OverallQual', 'YearBuilt', 'CentralAir', 'SaleType', 'SalePrice']
df = df[to_keep]
df["Fake_date"] = np.random.choice(pd.date_range('1980-01-01', '2000-01-01'), len(df)).astype('str')
df.head()

date_type = ['Fake_date']
continuous_type = ['LotFrontage', 'LotArea', 'YearBuilt', 'SalePrice']
categorical_type = ['MSZoning', 'Street', 'OverallQual', 'SaleType','CentralAir']

df_1 = df_to_type(df, date_type, continuous_type, categorical_type)

dep_var = 'SalePrice'
cols = list(df_1.columns)
cols.remove(dep_var)

X_train, X_test, y_train, y_test = split_df(df=df_1, x_cols=cols, dep_var=dep_var, test_size=0.33, split_mode='random')

X_train, X_test, dict_list, dict_inv_list = cat_transform(X_train, X_test, categorical_type)

X_train, X_test, y_train, y_test, scaler, scaler_y = cont_standardize(X_train, X_test, y_train, y_test, cat_type=categorical_type, id_type='Id', transform_y=True)

X_train.head()

y_train[:5]

array([[-0.2703821 ],
       [-0.96272526],
       [ 2.92223258],
       [ 0.02950289],
       [-0.59634404]])

Absolutely brilliant. Look how easy and fast we pre-processed our data. We now can start any Machine Learning Algorithm we want. It's super easy to replicate the results, because we saved the mapping and the scaler. We automatically created train and testset, we are flexible to use this in any way we want to split our data. I hope this will come in handy for your next machine learning project.

Stay tuned for the next blogpost!

Lasse

	Id	MSSubClass	MSZoning	LotFrontage	LotArea	Street	Alley	LotShape	LandContour	Utilities	...	PoolQC	Fence	MiscFeature	MoSold	YrSold	SaleType	SaleCondition	SalePrice
0	1	60	RL	65.0	8450	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	2	2008	WD	Normal	208500
1	2	20	RL	80.0	9600	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	5	2007	WD	Normal	181500
2	3	60	RL	68.0	11250	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	9	2008	WD	Normal	223500
3	4	70	RL	60.0	9550	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	2	2006	WD	Abnorml	140000
4	5	60	RL	84.0	14260	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	12	2008	WD	Normal	250000

	Id	MSZoning	LotFrontage	LotArea	Street	OverallQual	YearBuilt	CentralAir	SaleType	SalePrice
0	1	RL	65.0	8450	Pave	7	2003	Y	WD	208500
1	2	RL	80.0	9600	Pave	6	1976	Y	WD	181500
2	3	RL	68.0	11250	Pave	7	2001	Y	WD	223500
3	4	RL	60.0	9550	Pave	7	1915	Y	WD	140000
4	5	RL	84.0	14260	Pave	8	2000	Y	WD	250000

	Id	MSZoning	LotFrontage	LotArea	Street	OverallQual	YearBuilt	CentralAir	SaleType	SalePrice	Fake_date	Fake_date_2
0	1	RL	65.0	8450	Pave	7	2003	Y	WD	208500	1985-12-22T00:00:00.000000000	1988-05-22T00:00:00.000000000
1	2	RL	80.0	9600	Pave	6	1976	Y	WD	181500	1999-11-12T00:00:00.000000000	1983-08-11T00:00:00.000000000
2	3	RL	68.0	11250	Pave	7	2001	Y	WD	223500	1998-03-07T00:00:00.000000000	1991-12-15T00:00:00.000000000
3	4	RL	60.0	9550	Pave	7	1915	Y	WD	140000	1985-04-11T00:00:00.000000000	1980-11-27T00:00:00.000000000
4	5	RL	84.0	14260	Pave	8	2000	Y	WD	250000	1980-12-04T00:00:00.000000000	1982-10-01T00:00:00.000000000

	Id	MSZoning	LotFrontage	Fake_date_2Week	Fake_Week	LotArea	Street	OverallQual	YearBuilt	CentralAir	...	Fake_date_2Day	Fake_date_2Dayofweek	Fake_date_2Dayofyear	Fake_date_2Is_month_end	Fake_date_2Is_month_start	Fake_date_2Is_quarter_end	Fake_date_2Is_quarter_start	Fake_date_2Is_year_end	Fake_date_2Is_year_start	Fake_date_2Elapsed
0	1	RL	65.0	20	51	8450	Pave	7	2003	Y	...	22	6	143	False	False	False	False	False	False	580262400
1	2	RL	80.0	32	45	9600	Pave	6	1976	Y	...	11	3	223	False	False	False	False	False	False	429408000
2	3	RL	68.0	50	10	11250	Pave	7	2001	Y	...	15	6	349	False	False	False	False	False	False	692755200
3	4	RL	60.0	48	15	9550	Pave	7	1915	Y	...	27	3	332	False	False	False	False	False	False	344131200
4	5	RL	84.0	39	49	14260	Pave	8	2000	Y	...	1	4	274	False	True	False	True	False	False	402278400

	Id	MSZoning	LotFrontage	Fake_date_2Week	Fake_Week	LotArea	Street	OverallQual	YearBuilt	CentralAir	...	Fake_date_2Day	Fake_date_2Dayofweek	Fake_date_2Dayofyear	Fake_date_2Is_month_end	Fake_date_2Is_month_start	Fake_date_2Is_quarter_end	Fake_date_2Is_quarter_start	Fake_date_2Is_year_start	Fake_date_2Elapsed
744	0.043062	3	-1.147645	1.484951	1.160387	-0.511627	1	7	0.716393	1	...	-0.455404	1.448463	1.544753	-0.15861	-0.186871	-0.101639	-0.111456	-0.064084	-1.213687
492	-0.554919	3	1.391675	-0.384396	-0.232891	0.521756	1	5	1.155090	1	...	0.557602	-0.500088	-0.401376	-0.15861	-0.186871	-0.101639	-0.111456	-0.064084	-1.640805
307	-0.993913	4	NaN	-1.252306	-1.029050	-0.255387	1	5	-1.747058	1	...	0.670158	1.448463	-1.250422	-0.15861	-0.186871	-0.101639	-0.111456	-0.064084	-0.513396
985	0.614941	3	-0.076369	-0.117346	0.032495	0.044997	1	4	-0.734681	0	...	0.895271	1.448463	-0.086561	-0.15861	-0.186871	-0.101639	-0.111456	-0.064084	0.043000
484	-0.573903	3	NaN	0.149703	-0.630971	-0.271827	1	4	-0.329730	1	...	0.107377	-1.474364	0.132856	-0.15861	-0.186871	-0.101639	-0.111456	-0.064084	0.887644

	Id	MSZoning	LotFrontage	Fake_Week	LotArea	Street	OverallQual	YearBuilt	CentralAir	SaleType	...	Fake_Day	Fake_Dayofweek	Fake_Dayofyear	Fake_Is_month_end	Fake_Is_month_start	Fake_Is_quarter_end	Fake_Is_quarter_start	Fake_Is_year_end	Fake_Is_year_start	Fake_Elapsed
253	254	3	0.570755	-0.515555	-0.113494	1	5	-0.242016	1	1	...	-1.189170	-1.484321	-0.556901	-0.171679	-0.180928	-0.111456	-0.106655	-0.064084	-0.045268	-1.648408
1185	1186	3	-0.416423	0.870733	-0.079594	1	4	-1.588959	1	8	...	-0.623669	1.480279	0.922893	-0.171679	-0.180928	-0.111456	-0.106655	-0.064084	-0.045268	-1.231187
515	516	3	0.926139	0.540664	0.137262	1	9	1.273295	1	6	...	1.525232	-0.496121	0.536450	-0.171679	-0.180928	-0.111456	-0.106655	-0.064084	-0.045268	-0.907465
433	434	3	1.163061	0.012555	0.016602	1	5	0.869212	1	8	...	-0.962969	0.986179	0.036902	-0.171679	-0.180928	-0.111456	-0.106655	-0.064084	-0.045268	-0.932367
1406	1407	3	-0.021552	-0.911637	-0.192566	1	4	0.027373	1	8	...	1.412132	0.986179	-0.924493	-0.171679	-0.180928	-0.111456	-0.106655	-0.064084	-0.045268	1.422486