Signature | Description |
---|---|
enum class bucket_type : unsigned charĀ { by_distance = 1, // Bucketize by distance between two index values (i.g. X2 - X1 = N) by_count = 2, // Bucketize by counting of index values (e.g. every N index items) }; |
This determines the bucketization logic |
Signature | Description | Parameters |
---|---|---|
template<typename V, typename I_V, typename ... Ts> DataFrame bucketize(bucket_type bt, const V &value, I_V &&idx_visitor, Ts&& ... args) const; |
It bucketizes the data and index into intervals, based on index values and bucket_type. You must specify how the index column is bucketized, by providing a visitor. You must specify how each column is bucketized, by providing 3-member tuples (triples). Each triple must have the following members:
|
V: Type of value to be uased for bucketizing based on bucket_type I_V: Type of visitor to be used to bucketize the index column Ts: Types of triples to specify each column's bucketization bt: bucket_type to specify bucketization logic value: The value to be uased to bucketize based on bucket_type. For example, if bucket_type is by_distance, then value is the distance between two index values. If bucket_type is by_count, then value is an integer count. idx_visitor: A visitor to specify the index bucketization args: Variable argument list of triples as specified above |
template<typename V, typename I_V, typename ... Ts> std::future<DataFrame> bucketize_async(bucket_type bt, const V &value, I_V &&idx_visitor, Ts&& ... args) const; |
Same as bucketize() above, but executed asynchronously |
static void test_bucketize() { std::cout << "\nTesting bucketize( ) ..." << std::endl; MyDataFrame df; try { df.read("FORD.csv", io_format::csv2); auto fut = df.bucketize_async(bucket_type::by_distance, 100, LastVisitor<MyDataFrame::IndexType, MyDataFrame::IndexType>(), std::make_tuple("Date", "Date", LastVisitor<std::string>()), std::make_tuple("FORD_Close", "High", MaxVisitor<double>()), std::make_tuple("FORD_Close", "Low", MinVisitor<double>()), std::make_tuple("FORD_Close", "Open", FirstVisitor<double>()), std::make_tuple("FORD_Close", "Close", LastVisitor<double>()), std::make_tuple("FORD_Close", "Mean", MeanVisitor<double>()), std::make_tuple("FORD_Close", "Std", StdVisitor<double>()), std::make_tuple("FORD_Volume", "Volume", SumVisitor<long>())); MyDataFrame result = fut.get(); result.write<std::ostream, std::string, double, long>(std::cout, io_format::csv2); // FORD index is just an increasing number starting from 0. // So, by_count should give the same result as by_distance // auto fut2 = df.bucketize_async(bucket_type::by_count, 100, LastVisitor<MyDataFrame::IndexType, MyDataFrame::IndexType>(), std::make_tuple("Date", "Date", LastVisitor<std::string>()), std::make_tuple("FORD_Close", "High", MaxVisitor<double>()), std::make_tuple("FORD_Close", "Low", MinVisitor<double>()), std::make_tuple("FORD_Close", "Open", FirstVisitor<double>()), std::make_tuple("FORD_Close", "Close", LastVisitor<double>()), std::make_tuple("FORD_Close", "Mean", MeanVisitor<double>()), std::make_tuple("FORD_Close", "Std", StdVisitor<double>()), std::make_tuple("FORD_Volume", "Volume", SumVisitor<long>())); MyDataFrame result2 = fut2.get(); assert(result.is_equal(result2)); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; } }