Signature | Description |
---|---|
enum class random_policy : unsigned char { num_rows_with_seed = 1, // Number of rows with specifying a seed num_rows_no_seed = 2, // Number of rows with no seed specification frac_rows_with_seed = 3, // Fraction of rows with specifying a seed frac_rows_no_seed = 4, // Fraction of rows with no seed specification }; |
Specification for calling get_[data|view]_by_rand() Number of rows means the n parameter is an positive integer specifying the number of rows to select Fraction of rows means the n parameter is a positive real number [0:1] specifying a fraction of rows to select |
Signature | Description | Parameters |
---|---|---|
template<typename ... Ts> DataFrame<I> get_data_by_rand(random_policy spec, double n, std::size_t seed = 0) const; |
It returns a DataFrame (including the index and data columns) containing the data from uniform random selection. random_policy determines the behavior of method. NOTE: The actual number of rows returned might be smaller than requested. That is because the random process might produce the same number more than once. NOTE: The columns in the result are not padded with NaN. |
Ts: The list of types for all columns. A type should be specified only once. random_policy: Please see random_policy in DataFrameTypes.h. It specifies how this function should proceed. n: Depending on the random policy, it is either the number of rows to sample or a fraction of rows to sample. In case of fraction, for example 0.4 means 40% of rows. seed: Depending on the random policy, user could specify a seed. The same seed should always produce the same random selection. |
template<typename ... Ts> DataFramePtrView<I> get_view_by_rand(random_policy spec, double n, std::size_t seed = 0); |
It behaves like get_data_by_rand(), but it returns a DataFrameView. A view is a DataFrame that is a reference to the original DataFrame. So if you modify anything in the view the original DataFrame will also be modified. NOTE: There are certain operations that you cannot do with a view. For example, you cannot add/delete columns, etc. NOTE: The columns in the result are not padded with NaN. NOTE: Views could not be const, becuase you can change original data through views. |
Ts: The list of types for all columns. A type should be specified only once. random_policy: Please see random_policy in DataFrameTypes.h. It specifies how this function should proceed. n: Depending on the random policy, it is either the number of rows to sample or a fraction of rows to sample. In case of fraction, for example 0.4 means 40% of rows. seed: Depending on the random policy, user could specify a seed. The same seed should always produce the same random selection. |
template<typename ... Ts> DataFrameConstPtrView<I> get_view_by_rand(random_policy spec, double n, std::size_t seed = 0) const; |
Same as above view, but it returns a const view. You can not change data in const views. But if the data is changed in the original DataFrame or through another view, it is refelcted in the const view. |
Ts: The list of types for all columns. A type should be specified only once. random_policy: Please see random_policy in DataFrameTypes.h. It specifies how this function should proceed. n: Depending on the random policy, it is either the number of rows to sample or a fraction of rows to sample. In case of fraction, for example 0.4 means 40% of rows. seed: Depending on the random policy, user could specify a seed. The same seed should always produce the same random selection. |
static void test_get_data_by_rand() { std::cout << "\nTesting get_data_by_rand() ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460 }; std::vector<double> d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 }; std::vector<double> d2 = { 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 }; std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 }; std::vector<double> d4 = { 22, 23, 24, 25, 26, 27 }; std::vector<std::string> s1 = { "11", "22", "33", "aa", "bb", "cc", "dd", "tt", "uu", "ii", "88" }; MyDataFrame df; df.load_data(std::move(idx), std::make_pair("col_1", d1), std::make_pair("col_2", d2), std::make_pair("col_3", d3), std::make_pair("col_str", s1)); df.load_column("col_4", std::move(d4), nan_policy::dont_pad_with_nans); auto result = df.get_data_by_rand<double, std::string>(random_policy::num_rows_no_seed, 5); auto result2 = df.get_data_by_rand<double, std::string>(random_policy::frac_rows_with_seed, 0.8, 23); result2.write<std::ostream, double, std::string>(std::cout); std::vector<unsigned long> idx2 = { 123450 }; std::vector<double> d12 = { 1 }; std::vector<double> d22 = { 8 }; std::vector<double> d32 = { 15 }; std::vector<double> d42 = { 22 }; std::vector<std::string> s12 = { "11" }; MyDataFrame df2; df2.load_data(std::move(idx2), std::make_pair("col_1", d12), std::make_pair("col_2", d22), std::make_pair("col_3", d32), std::make_pair("col_str", s12)); df2.load_column("col_4", std::move(d42), nan_policy::dont_pad_with_nans); auto result3 = df2.get_data_by_rand<double, std::string>(random_policy::num_rows_no_seed, 1); result3.write<std::ostream, double, std::string>(std::cout); std::vector<unsigned long> idx3 = { 123450, 123451 }; std::vector<double> d13 = { 1, 2 }; std::vector<double> d23 = { 8, 9 }; std::vector<double> d33 = { 15, 16 }; std::vector<double> d43 = { 22, 23 }; std::vector<std::string> s13 = { "11", "22" }; MyDataFrame df3; df3.load_data(std::move(idx3), std::make_pair("col_1", d13), std::make_pair("col_2", d23), std::make_pair("col_3", d33), std::make_pair("col_str", s13)); df3.load_column("col_4", std::move(d43), nan_policy::dont_pad_with_nans); auto result4 = df3.get_data_by_rand<double, std::string>(random_policy::num_rows_no_seed, 1); result4.write<std::ostream, double, std::string>(std::cout); std::vector<unsigned long> idx4 = { 123450, 123451, 123452 }; std::vector<double> d14 = { 1, 2, 3 }; std::vector<double> d24 = { 8, 9, 10 }; std::vector<double> d34 = { 15, 16, 17 }; std::vector<double> d44 = { 22, 23, 24 }; std::vector<std::string> s14 = { "11", "22", "33" }; MyDataFrame df4; df4.load_data(std::move(idx4), std::make_pair("col_1", d14), std::make_pair("col_2", d24), std::make_pair("col_3", d34), std::make_pair("col_str", s14)); df4.load_column("col_4", std::move(d44), nan_policy::dont_pad_with_nans); auto result5 = df4.get_data_by_rand<double, std::string>(random_policy::num_rows_no_seed, 1); result5.write<std::ostream, double, std::string>(std::cout); } // ----------------------------------------------------------------------------- static void test_get_view_by_rand() { std::cout << "\nTesting get_view_by_rand() ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460 }; std::vector<double> d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 }; std::vector<double> d2 = { 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 }; std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 }; std::vector<double> d4 = { 22, 23, 24, 25, 26, 27 }; std::vector<std::string> s1 = { "11", "22", "33", "aa", "bb", "cc", "dd", "tt", "uu", "ii", "88" }; MyDataFrame df; df.load_data(std::move(idx), std::make_pair("col_1", d1), std::make_pair("col_2", d2), std::make_pair("col_3", d3), std::make_pair("col_str", s1)); df.load_column("col_4", std::move(d4), nan_policy::dont_pad_with_nans); const MyDataFrame &const_df = df; auto result = df.get_view_by_rand<double, std::string>(random_policy::num_rows_no_seed, 5); auto result2 = df.get_view_by_rand<double, std::string>(random_policy::frac_rows_with_seed, 0.8, 23); auto const_result = const_df.get_view_by_rand<double, std::string>(random_policy::num_rows_no_seed, 5); auto const_result2 = const_df.get_view_by_rand<double, std::string>(random_policy::frac_rows_with_seed, 0.8, 23); result2.write<std::ostream, double, std::string>(std::cout); }