metrics.feature.discrete_feature module
- class DiscreteFeature(name: str, affinity: str, is_null: bool)[source]
Bases:
metrics.feature.feature.Feature
Representation of a discrete data feature.
- name
The name of the feature.
- affinity
The SQLite3 type affinity of the feature.
- Type
{“NUMERIC”, “INTEGER”, “REAL”, “TEXT”, “BLOB”}
- is_null
True if feature data can be null, False otherwise.
- compare_feature_info(sample_data: pandas.core.frame.DataFrame, simulation_data: pandas.core.frame.DataFrame) float [source]
Uses KL-divergence to compare discrete features.
- Parameters
sample_data – Loaded sample data.
simulation_data – Loaded tumor data.
- Returns
Result of KL divergence that are keyed by the category.
- compare_feature_stat(sample_data: pandas.core.frame.DataFrame, simulation_data: pandas.core.frame.DataFrame) Union[Dict[str, Any], float] [source]
Uses statistical tests to compare discrete features.
Uses hypergeometric test to compare discrete feature between sample and true distributions. Hypergeometric distribution describes the probability of k successes in N draws, without replacement, from a finite population of size M that contains exactly n objects.
- Parameters
sample_data – Loaded sample data.
simulation_data – Loaded tumor data.
- Returns
Result of statistical tests that are keyed by the category.
- feature_type = 'discrete'
Type of the feature.
- Type
string
- static get_count(data: list, category: str) int [source]
Returns the number of categories of the feature.
- Parameters
data – Loaded data.
category – Categories of data.
- Returns
Number of categories of the feature.
- write_feature_data(data_list: list, sample_data: pandas.core.frame.DataFrame, simulation_data: pandas.core.frame.DataFrame) List[Any] [source]
Uses KL-divergence compare continuous features.
- Parameters
data_list – List of data in analysis table.
sample_data – Loaded sample data.
simulation_data – Loaded tumor data.
- Returns
List of data needed for analysis dataframe.