Signature Description

enum class random_policy : unsigned char  {
    num_rows_with_seed = 1,  // Number of rows with specifying a seed
    num_rows_no_seed = 2,    // Number of rows with no seed specification
    frac_rows_with_seed = 3, // Fraction of rows with specifying a seed
    frac_rows_no_seed = 4,   // Fraction of rows with no seed specification
};
Specification for calling get_[data|view]_by_rand()
Number of rows means the n parameter is an positive integer specifying
the number of rows to select
Fraction of rows means the n parameter is a positive real number [0:1]
specifying a fraction of rows to select

Signature Description Parameters

template<typename ... Ts>
DataFrame<I>
get_data_by_rand(random_policy spec,
                 double n,
                 std::size_t seed = 0) const;
        
It returns a DataFrame (including the index and data columns) containing the data from uniform random selection. random_policy determines the behavior of method.
NOTE: The actual number of rows returned might be smaller than requested. That is because the random process might produce the same number more than once.
NOTE: The columns in the result are not padded with NaN.
Ts: The list of types for all columns. A type should be specified only once.
random_policy: Please see random_policy in DataFrameTypes.h. It specifies how this function should proceed.
n: Depending on the random policy, it is either the number of rows to sample or a fraction of rows to sample. In case of fraction, for example 0.4 means 40% of rows.
seed: Depending on the random policy, user could specify a seed. The same seed should always produce the same random selection.

template<typename ... Ts>
DataFramePtrView<I>
get_view_by_rand(random_policy spec,
                 double n,
                 std::size_t seed = 0);
        
It behaves like get_data_by_rand(), but it returns a DataFrameView. A view is a DataFrame that is a reference to the original DataFrame. So if you modify anything in the view the original DataFrame will also be modified.
NOTE: There are certain operations that you cannot do with a view. For example, you cannot add/delete columns, etc.
NOTE: The columns in the result are not padded with NaN.
NOTE: Views could not be const, becuase you can change original data through views.
Ts: The list of types for all columns. A type should be specified only once.
random_policy: Please see random_policy in DataFrameTypes.h. It specifies how this function should proceed.
n: Depending on the random policy, it is either the number of rows to sample or a fraction of rows to sample. In case of fraction, for example 0.4 means 40% of rows.
seed: Depending on the random policy, user could specify a seed. The same seed should always produce the same random selection.
static void test_get_data_by_rand()  {

    std::cout << "\nTesting get_data_by_rand() ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460};
    std::vector<double> d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };
    std::vector<double> d2 = { 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 };
    std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 };
    std::vector<double> d4 = { 22, 23, 24, 25, 26, 27 };
    std::vector<std::string> s1 = { "11", "22", "33", "aa", "bb", "cc", "dd", "tt", "uu", "ii", "88" };
    MyDataFrame         df;

    df.load_data(std::move(idx),
                 std::make_pair("col_1", d1),
                 std::make_pair("col_2", d2),
                 std::make_pair("col_3", d3),
                 std::make_pair("col_str", s1));
    df.load_column("col_4", std::move(d4), nan_policy::dont_pad_with_nans);

    auto    result =
        df.get_data_by_rand<double, std::string>(random_policy::num_rows_no_seed, 5);
    auto    result2 =
        df.get_data_by_rand<double, std::string>(random_policy::frac_rows_with_seed, 0.8, 23);

    assert(result2.get_index().size() == 6);
    assert(result2.get_column<double>("col_1").size() == 6);
    assert(result2.get_column<double>("col_4").size() == 1);
    assert(result2.get_column<std::string>("col_str").size() == 6);
    assert(result2.get_column<double>("col_4")[0] == 25.0);
    assert(result2.get_column<double>("col_3")[4] == 24.0);
    assert(result2.get_column<double>("col_1")[5] == 11.0);
    assert(result2.get_column<std::string>("col_str")[4] == "ii");
}

// -----------------------------------------------------------------------------

static void test_get_view_by_rand()  {

    std::cout << "\nTesting get_view_by_rand() ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460};
    std::vector<double> d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };
    std::vector<double> d2 = { 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 };
    std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 };
    std::vector<double> d4 = { 22, 23, 24, 25, 26, 27 };
    std::vector<std::string> s1 = { "11", "22", "33", "aa", "bb", "cc", "dd", "tt", "uu", "ii", "88" };
    MyDataFrame         df;

    df.load_data(std::move(idx),
                 std::make_pair("col_1", d1),
                 std::make_pair("col_2", d2),
                 std::make_pair("col_3", d3),
                 std::make_pair("col_str", s1));
    df.load_column("col_4", std::move(d4), nan_policy::dont_pad_with_nans);

    auto    result =
        df.get_view_by_rand<double, std::string>(random_policy::num_rows_no_seed, 5);
    auto    result2 =
        df.get_view_by_rand<double, std::string>(random_policy::frac_rows_with_seed, 0.8, 23);

    assert(result2.get_index().size() == 6);
    assert(result2.get_column<double>("col_1").size() == 6);
    assert(result2.get_column<double>("col_4").size() == 1);
    assert(result2.get_column<std::string>("col_str").size() == 6);
    assert(result2.get_column<double>("col_4")[0] == 25.0);
    assert(result2.get_column<double>("col_3")[4] == 24.0);
    assert(result2.get_column<double>("col_1")[5] == 11.0);
    assert(result2.get_column<std::string>("col_str")[4] == "ii");

    result2.get_column<std::string>("col_str")[4] = "TEST";
    assert(result2.get_column<std::string>("col_str")[4] == "TEST");
    assert(result2.get_column<std::string>("col_str")[4] == df.get_column<std::string>("col_str")[9]);
}
C++ DataFrame