pycmtensor.data#

PyCMTensor data module

Module Contents#

class pycmtensor.data.Data(df: pandas.DataFrame, choice: str, **kwargs)[source]#

Base Data class object.

Parameters:
  • df (pandas.DataFrame) – the input Pandas dataframe

  • choice (str) – column string name of the choice dependent variable

  • **kwargs – Keyword arguments, accepted arguments are drop:pd.Series, autoscale:bool, autoscale_except:list[str], split:float

Note

The following is an example initialization of the swissmetro dataset:

swissmetro = pd.read_csv("../data/swissmetro.dat", sep="\t")
db = pycmtensor.Data(
    df=swissmetro,
    choice="CHOICE",
    drop=[swissmetro["CHOICE"]==0],
    autoscale=True,
    autoscale_except=["ID", "ORIGIN", "DEST"],
    split=0.8,
)
property x[source]#
property y[source]#
property all[source]#
property n_train_samples[source]#
property n_valid_samples[source]#
property train_data[source]#
property valid_data[source]#
split_db(split_frac: float)[source]#

Split database data into train and valid sets

Arg:

split_frac (float): fractional value between 0.0 and 1.0.

get_nrows() int[source]#

Returns the lenth of the DataFrame object

get_train_data(tensors, index=None, batch_size=None, shift=None)[source]#

Alias to get train data slice from self.pandas.inputs()

See PandasDataFrame.inputs() for details

get_valid_data(tensors, index=None, batch_size=None, shift=None)[source]#

Alias to get valid data slice from self.pandas.inputs()

See PandasDataFrame.inputs() for details

scale_data(**kwargs)[source]#

Scales data values by data/scale from key=scale keyword argument

Parameters:

**kwargs – {key: scale} keyword arguments

autoscale_data(except_for=[None])[source]#

Autoscale variable values to within -10.0 < x < 10.0

Parameters:

except_for (list[str]) – list of column labels to skip autoscaling step

info()[source]#

Outputs information about the Data class object