Signature | Description |
---|---|
enum class random_policy : unsigned char { num_rows_with_seed = 1, // Number of rows with specifying a seed num_rows_no_seed = 2, // Number of rows with no seed specification frac_rows_with_seed = 3, // Fraction of rows with specifying a seed frac_rows_no_seed = 4, // Fraction of rows with no seed specification }; |
Specification for calling get_[data|view]_by_rand() Number of rows means the n parameter is an positive integer specifying the number of rows to select Fraction of rows means the n parameter is a positive real number [0:1] specifying a fraction of rows to select |
Signature | Description | Parameters |
---|---|---|
template<typename ... Ts> DataFrame<I> get_data_by_rand(random_policy spec, double n, std::size_t seed = 0) const; |
It returns a DataFrame (including the index and data columns) containing the data from uniform random selection. random_policy determines the behavior of method. NOTE: The actual number of rows returned might be smaller than requested. That is because the random process might produce the same number more than once. NOTE: The columns in the result are not padded with NaN. |
Ts: The list of types for all columns. A type should be specified only once. random_policy: Please see random_policy in DataFrameTypes.h. It specifies how this function should proceed. n: Depending on the random policy, it is either the number of rows to sample or a fraction of rows to sample. In case of fraction, for example 0.4 means 40% of rows. seed: Depending on the random policy, user could specify a seed. The same seed should always produce the same random selection. |
template<typename ... Ts> DataFramePtrView<I> get_view_by_rand(random_policy spec, double n, std::size_t seed = 0); |
It behaves like get_data_by_rand(), but it returns a DataFrameView. A view is a DataFrame that is a reference to the original DataFrame. So if you modify anything in the view the original DataFrame will also be modified. NOTE: There are certain operations that you cannot do with a view. For example, you cannot add/delete columns, etc. NOTE: The columns in the result are not padded with NaN. NOTE: Views could not be const, becuase you can change original data through views. |
Ts: The list of types for all columns. A type should be specified only once. random_policy: Please see random_policy in DataFrameTypes.h. It specifies how this function should proceed. n: Depending on the random policy, it is either the number of rows to sample or a fraction of rows to sample. In case of fraction, for example 0.4 means 40% of rows. seed: Depending on the random policy, user could specify a seed. The same seed should always produce the same random selection. |
static void test_get_data_by_rand() { std::cout << "\nTesting get_data_by_rand() ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460}; std::vector<double> d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 }; std::vector<double> d2 = { 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 }; std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 }; std::vector<double> d4 = { 22, 23, 24, 25, 26, 27 }; std::vector<std::string> s1 = { "11", "22", "33", "aa", "bb", "cc", "dd", "tt", "uu", "ii", "88" }; MyDataFrame df; df.load_data(std::move(idx), std::make_pair("col_1", d1), std::make_pair("col_2", d2), std::make_pair("col_3", d3), std::make_pair("col_str", s1)); df.load_column("col_4", std::move(d4), nan_policy::dont_pad_with_nans); auto result = df.get_data_by_rand<double, std::string>(random_policy::num_rows_no_seed, 5); auto result2 = df.get_data_by_rand<double, std::string>(random_policy::frac_rows_with_seed, 0.8, 23); assert(result2.get_index().size() == 6); assert(result2.get_column<double>("col_1").size() == 6); assert(result2.get_column<double>("col_4").size() == 1); assert(result2.get_column<std::string>("col_str").size() == 6); assert(result2.get_column<double>("col_4")[0] == 25.0); assert(result2.get_column<double>("col_3")[4] == 24.0); assert(result2.get_column<double>("col_1")[5] == 11.0); assert(result2.get_column<std::string>("col_str")[4] == "ii"); } // ----------------------------------------------------------------------------- static void test_get_view_by_rand() { std::cout << "\nTesting get_view_by_rand() ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460}; std::vector<double> d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 }; std::vector<double> d2 = { 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 }; std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 }; std::vector<double> d4 = { 22, 23, 24, 25, 26, 27 }; std::vector<std::string> s1 = { "11", "22", "33", "aa", "bb", "cc", "dd", "tt", "uu", "ii", "88" }; MyDataFrame df; df.load_data(std::move(idx), std::make_pair("col_1", d1), std::make_pair("col_2", d2), std::make_pair("col_3", d3), std::make_pair("col_str", s1)); df.load_column("col_4", std::move(d4), nan_policy::dont_pad_with_nans); auto result = df.get_view_by_rand<double, std::string>(random_policy::num_rows_no_seed, 5); auto result2 = df.get_view_by_rand<double, std::string>(random_policy::frac_rows_with_seed, 0.8, 23); assert(result2.get_index().size() == 6); assert(result2.get_column<double>("col_1").size() == 6); assert(result2.get_column<double>("col_4").size() == 1); assert(result2.get_column<std::string>("col_str").size() == 6); assert(result2.get_column<double>("col_4")[0] == 25.0); assert(result2.get_column<double>("col_3")[4] == 24.0); assert(result2.get_column<double>("col_1")[5] == 11.0); assert(result2.get_column<std::string>("col_str")[4] == "ii"); result2.get_column<std::string>("col_str")[4] = "TEST"; assert(result2.get_column<std::string>("col_str")[4] == "TEST"); assert(result2.get_column<std::string>("col_str")[4] == df.get_column<std::string>("col_str")[9]); }