metrics.feature.continuous_feature module

class ContinuousFeature(name: str, affinity: str, is_null: bool)[source]

Bases: metrics.feature.feature.Feature

Representation of a continuous data feature.

name

The name of the feature.

affinity

The SQLite3 type affinity of the feature.

Type

{“NUMERIC”, “INTEGER”, “REAL”, “TEXT”, “BLOB”}

is_null

True if feature data can be null, False otherwise.

compare_feature_info(sample_data: pandas.core.frame.DataFrame, simulation_data: pandas.core.frame.DataFrame) float[source]

Uses Kullback-Leibler divergence (KL divergence) compare continuous features.

Parameters
  • sample_data – Loaded sample data.

  • simulation_data – Loaded tumor data.

Returns

Result of KL divergence that are keyed by the category.

compare_feature_stat(sample_data: pandas.core.frame.DataFrame, simulation_data: pandas.core.frame.DataFrame) float[source]

Uses statistical tests to compare continuous features.

Uses Kolmogorov-Smirnov test to compare continuous feature between sample and tumor distributions. The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution functions.

Parameters
  • sample_data – Loaded sample data.

  • simulation_data – Loaded tumor data.

Returns

Result of statistical test.

feature_type = 'continuous'

Type of the feature.

Type

string

get_pdfs(sample_data: list[float], reference_data: list[float]) Union[Tuple[numpy.ndarray, numpy.ndarray], Tuple[float, float]][source]
Parameters
  • reference_data – Distribution of the feature in the simulation data.

  • sample_data – Distribution of the feature in the sample data.

Returns

Probability density function of the feature.

is_valid_feature_name(simulation_data: pandas.core.frame.DataFrame, sample_data: pandas.core.frame.DataFrame) bool[source]
Parameters
  • simulation_data – Loaded tumor data.

  • sample_data – Loaded sample data.

Returns

True if feature name valid and if dataframe contains data, False otherwise.

write_feature_data(data_list: list, sample_data: pandas.core.frame.DataFrame, simulation_data: pandas.core.frame.DataFrame) List[Any][source]

Writes feature data into the list of data.

Parameters
  • data_list – List of data in analysis table.

  • sample_data – Loaded sample data.

  • simulation_data – Loaded tumor data.

Returns

List of data needed for analysis dataframe.