Convenience Functions¶
There are a handful of functions at your disposal for making quick interactive queries.
These can be helpful during the fact/test development process.
Unless noted otherwise, these can all be imported from the top-level greenlight module.
from greenlight import get_variables, get_percent_null, get_rec_count # etc...
average_runtime(product_code)¶
Returns the average (mean, non-weighted) duration for validation runs by product_code.
Duration is provided in seconds.
>>> average_runtime('ff.all') 24.60149371623993
get_variables(path_to_lib, dsn)¶
Returns a list of all the variables referenced by the table dsn in path_to_lib.
>>> get_variables('/wrds/ff/sasdata', 'factors_monthly') set(['mktrf', 'month', 'hml', 'umd', 'rf', 'dateff', 'year', 'date', 'smb'])
get_percent_null(variable, path_to_lib, dsn)¶
Returns percentage null by variable for dsn in path_to_lib.
>>> get_percent_null('PREV_DATE', '/wrds/zacks/sasdata', 'rating_detail') 15.474037693076328
get_rec_count(path_to_lib, dsn)¶
Returns total number of records for dsn in path_to_lib.
>>> get_rec_count('/wrds/bank/sasdata/holding', 'bhcf201706') 4673
get_keys(path_to_lib, dsn)¶
- Returns a list of keys obtained by querying the dictionary.columns table for columns where:
- sorted_by is not null
- sorted_by > 0
The empty list denotes that no keys could be found.
>>> get_keys('/wrds/zacks/sasdata', 'eps_cons_weekly') ['zid', 'est_type', 'reference_period', 'observation_date']
get_counts_by_keys(path_to_lib, dsn, keys=’NONE’)¶
- Produce counts by keys. The following measures are calculated:
- uniques_by_keys :: # of unique records (keyed by provided keys)
- duplicates_by_keys :: # of duplicate records (keyed by provided keys)
- missing_by_keys :: # of missing records (keyed by provided keys)
(Note: this will also return the keys used to produce these counts for reference.)
>>> get_counts_by_keys('/wrds/ff/sasdata', 'factors_daily') {'duplicates_by_keys': 0, 'keys': ['date'], 'missing_by_keys': 0, 'uniques_by_keys': 24265}You can specify custom keys by using the
keysargument:>>> get_counts_by_keys('/wrds/ff/sasdata', 'factors_daily', keys=['date', 'smb']) {'duplicates_by_keys': 0, 'keys': ['date', 'smb'], 'missing_by_keys': 0, 'uniques_by_keys': 24265}If you do not specify keys, appropriate keys will be obtained by querying the dictionary.columns table for columns where: * sorted_by is not null * sorted_by > 0
get_fact(path_to_lib, dsn, fact, where=‘1’)¶
A wrapper function to enable convenient retrieval of certain types of SAS facts.
The query can be thought of as:
libname current "<path_to_lib>";
proc sql;
select
<fact>
from current.<dsn>
where <where_condition>;
>>> get_fact('/wrds/ff/sasdata', 'factors_monthly', 'count(date)', where='smb = .')
0
get_null_variables(path_to_lib, dsn)¶
Determine which variables are entirely null (if any) for a given dataset.
This returns a list of all variables that are composed entirely of missing values.
If all variables contain at least some data, the empty list will be returned.
>>> get_null_variables('/wrds/ff/sasdata', 'factors_daily') []
get_encoding(path_to_lib, dsn)¶
Determine the encoding of the given dataset.
>>> get_encoding('/wrds/ff/sasdata', 'factors_daily') 'latin1 Western (ISO)'