Signature Description

enum class random_policy : unsigned char  {
    num_rows_with_seed = 1,  // Number of rows with specifying a seed
    num_rows_no_seed = 2,    // Number of rows with no seed specification
    frac_rows_with_seed = 3, // Fraction of rows with specifying a seed
    frac_rows_no_seed = 4,   // Fraction of rows with no seed specification
};
Specification for calling get_[data|view]_by_rand()
Number of rows means the n parameter is an positive integer specifying
the number of rows to select
Fraction of rows means the n parameter is a positive real number [0:1]
specifying a fraction of rows to select

Signature Description Parameters

template<typename ... Ts>
DataFrame<I>
get_data_by_rand(random_policy spec,
                 double n,
                 std::size_t seed = 0) const;
        
It returns a DataFrame (including the index and data columns) containing the data from uniform random selection. random_policy determines the behavior of method.
NOTE: The actual number of rows returned might be smaller than requested. That is because the random process might produce the same number more than once.
NOTE: The columns in the result are not padded with NaN.
Ts: The list of types for all columns. A type should be specified only once.
random_policy: Please see random_policy in DataFrameTypes.h. It specifies how this function should proceed.
n: Depending on the random policy, it is either the number of rows to sample or a fraction of rows to sample. In case of fraction, for example 0.4 means 40% of rows.
seed: Depending on the random policy, user could specify a seed. The same seed should always produce the same random selection.

template<typename ... Ts>
DataFramePtrView<I>
get_view_by_rand(random_policy spec,
                 double n,
                 std::size_t seed = 0);
        
It behaves like get_data_by_rand(), but it returns a DataFrameView. A view is a DataFrame that is a reference to the original DataFrame. So if you modify anything in the view the original DataFrame will also be modified.
NOTE: There are certain operations that you cannot do with a view. For example, you cannot add/delete columns, etc.
NOTE: The columns in the result are not padded with NaN.
NOTE: Views could not be const, becuase you can change original data through views.
Ts: The list of types for all columns. A type should be specified only once.
random_policy: Please see random_policy in DataFrameTypes.h. It specifies how this function should proceed.
n: Depending on the random policy, it is either the number of rows to sample or a fraction of rows to sample. In case of fraction, for example 0.4 means 40% of rows.
seed: Depending on the random policy, user could specify a seed. The same seed should always produce the same random selection.

template<typename ... Ts>
DataFrameConstPtrView<I>
get_view_by_rand(random_policy spec,
                 double n,
                 std::size_t seed = 0) const;
        
Same as above view, but it returns a const view. You can not change data in const views. But if the data is changed in the original DataFrame or through another view, it is refelcted in the const view. Ts: The list of types for all columns. A type should be specified only once.
random_policy: Please see random_policy in DataFrameTypes.h. It specifies how this function should proceed.
n: Depending on the random policy, it is either the number of rows to sample or a fraction of rows to sample. In case of fraction, for example 0.4 means 40% of rows.
seed: Depending on the random policy, user could specify a seed. The same seed should always produce the same random selection.
static void test_get_data_by_rand()  {

    std::cout << "\nTesting get_data_by_rand() ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460 };
    std::vector<double> d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };
    std::vector<double> d2 = { 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 };
    std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 };
    std::vector<double> d4 = { 22, 23, 24, 25, 26, 27 };
    std::vector<std::string> s1 = { "11", "22", "33", "aa", "bb", "cc", "dd", "tt", "uu", "ii", "88" };
    MyDataFrame         df;

    df.load_data(std::move(idx),
                 std::make_pair("col_1", d1),
                 std::make_pair("col_2", d2),
                 std::make_pair("col_3", d3),
                 std::make_pair("col_str", s1));
    df.load_column("col_4", std::move(d4), nan_policy::dont_pad_with_nans);

    auto    result =
        df.get_data_by_rand<double, std::string>(random_policy::num_rows_no_seed, 5);
    auto    result2 =
        df.get_data_by_rand<double, std::string>(random_policy::frac_rows_with_seed, 0.8, 23);

    result2.write<std::ostream, double, std::string>(std::cout);

    std::vector<unsigned long>  idx2 = { 123450 };
    std::vector<double> d12 = { 1 };
    std::vector<double> d22 = { 8 };
    std::vector<double> d32 = { 15 };
    std::vector<double> d42 = { 22 };
    std::vector<std::string> s12 = { "11" };
    MyDataFrame         df2;

    df2.load_data(std::move(idx2),
                  std::make_pair("col_1", d12),
                  std::make_pair("col_2", d22),
                  std::make_pair("col_3", d32),
                  std::make_pair("col_str", s12));
    df2.load_column("col_4", std::move(d42), nan_policy::dont_pad_with_nans);

    auto    result3 =
        df2.get_data_by_rand<double, std::string>(random_policy::num_rows_no_seed, 1);

    result3.write<std::ostream, double, std::string>(std::cout);

    std::vector<unsigned long>  idx3 = { 123450, 123451 };
    std::vector<double> d13 = { 1, 2 };
    std::vector<double> d23 = { 8, 9 };
    std::vector<double> d33 = { 15, 16 };
    std::vector<double> d43 = { 22, 23 };
    std::vector<std::string> s13 = { "11", "22" };
    MyDataFrame         df3;

    df3.load_data(std::move(idx3),
                  std::make_pair("col_1", d13),
                  std::make_pair("col_2", d23),
                  std::make_pair("col_3", d33),
                  std::make_pair("col_str", s13));
    df3.load_column("col_4", std::move(d43), nan_policy::dont_pad_with_nans);

    auto    result4 =
        df3.get_data_by_rand<double, std::string>(random_policy::num_rows_no_seed, 1);

    result4.write<std::ostream, double, std::string>(std::cout);

    std::vector<unsigned long>  idx4 = { 123450, 123451, 123452 };
    std::vector<double> d14 = { 1, 2, 3 };
    std::vector<double> d24 = { 8, 9, 10 };
    std::vector<double> d34 = { 15, 16, 17 };
    std::vector<double> d44 = { 22, 23, 24 };
    std::vector<std::string> s14 = { "11", "22", "33" };
    MyDataFrame         df4;

    df4.load_data(std::move(idx4),
                  std::make_pair("col_1", d14),
                  std::make_pair("col_2", d24),
                  std::make_pair("col_3", d34),
                  std::make_pair("col_str", s14));
    df4.load_column("col_4", std::move(d44), nan_policy::dont_pad_with_nans);

    auto    result5 =
        df4.get_data_by_rand<double, std::string>(random_policy::num_rows_no_seed, 1);

    result5.write<std::ostream, double, std::string>(std::cout);
}

// -----------------------------------------------------------------------------

static void test_get_view_by_rand()  {

    std::cout << "\nTesting get_view_by_rand() ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460 };
    std::vector<double> d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };
    std::vector<double> d2 = { 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 };
    std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 };
    std::vector<double> d4 = { 22, 23, 24, 25, 26, 27 };
    std::vector<std::string> s1 = { "11", "22", "33", "aa", "bb", "cc", "dd", "tt", "uu", "ii", "88" };
    MyDataFrame         df;

    df.load_data(std::move(idx),
                 std::make_pair("col_1", d1),
                 std::make_pair("col_2", d2),
                 std::make_pair("col_3", d3),
                 std::make_pair("col_str", s1));
    df.load_column("col_4", std::move(d4), nan_policy::dont_pad_with_nans);

    const MyDataFrame   &const_df = df;
    auto    result =
        df.get_view_by_rand<double, std::string>(random_policy::num_rows_no_seed, 5);
    auto    result2 =
        df.get_view_by_rand<double, std::string>(random_policy::frac_rows_with_seed, 0.8, 23);
    auto    const_result =
        const_df.get_view_by_rand<double, std::string>(random_policy::num_rows_no_seed, 5);
    auto    const_result2 =
        const_df.get_view_by_rand<double, std::string>(random_policy::frac_rows_with_seed, 0.8, 23);

    result2.write<std::ostream, double, std::string>(std::cout);
}
C++ DataFrame