Signature | Description | Parameters |
---|---|---|
template<typename S, typename ... Ts> bool write(S &o, io_format iof = io_format::csv, std::streamsize precision = 12, bool columns_only = false, long max_recs = std::numeric_limits |
It outputs the content of DataFrame into the stream o. Currently 3 formats (i.e. csv, csv2, json) are supported specified by the iof parameter. The CSV file format is written: INDEX:<Number of data points>:<Comma delimited list of values> <Column1 name>:<Number of data points>:<Column1 type>:<Comma delimited list of values> <Column2 name>:<Number of data points>:<Column2 type>:<Comma delimited list of values> . . .All empty lines or lines starting with # will be skipped. For examples, see files in test directory The CSV2 file format must be (this is similar to Pandas csv format): INDEX:<Number of data points>:<Index type>:,<Column1 name>:<Number of data points>:<Column1 type>,<Column2 name>:<Number of data points>:<Column2 type>, . . . Comma delimited rows of values . . .All empty lines or lines starting with # will be skipped. For examples, see IBM and FORD files in test directory The JSON file format looks like this: { "INDEX":{"N":3,"T":"ulong","D":[123450,123451,123452]}, "col_3":{"N":3,"T":"double","D":[15.2,16.34,17.764]}, "col_4":{"N":3,"T":"int","D":[22,23,24]}, "col_str":{"N":3,"T":"string","D":["11","22","33"]}, "col_2":{"N":3,"T":"double","D":[8,9.001,10]}, "col_1":{"N":3,"T":"double","D":[1,2,3.456]} }Please note DataFrame json does not follow json spec 100%. In json, there is no particular order in dictionary fields. But in DataFrame json:
In all formats the following data types are supported: float double longdouble -- long double int uint -- unsigned int long longlong -- long long int ulong -- unsigned long ulonglong -- unsigned long long int string bool DateTime -- DateTime data in format of <Epoch seconds>.<nanoseconds> (1516179600.874123908)In case of io_format::csv2 the following additional types are also supported: DateTimeAME -- DateTime string printed in American style (MM/DD/YYYY HH:MM:SS.mmm) DateTimeEUR -- DateTime string printed in European style (YYYY/MM/DD HH:MM:SS.mmm) DateTimeISO -- DateTime string printed in ISO style (YYYY-MM-DD HH:MM:SS.mmm) dbl_vector -- A vector of double precision values, The vector is printed as "s[d1|d2|...]" where s is the size of the vector and d's are the double values. str_dbl_map -- A map of string keys to double precision values, The map is printed as "s{k1:v1|k2:v2|...}" where s is the size of the map and k's and v's are keys and values. str_dbl_unomap -- An unordered map of string keys to double precision values, The map is printed as "s{k1:v1|k2:v2|...}" where s is the size of the map and k's and v's are keys and values. |
S: Output stream type Ts: The list of types for all columns. A type should be specified only once o: Reference to an streamable object (e.g. cout, file, ...) iof: Specifies the I/O format. The default is CSV precision: Specifies the precision for floating point numbers columns_only: If true, the index columns is not written into the stream max_recs: Max number of rows to write. If it is positive, it will write max_recs from the beginning of DataFrame. If it is negative, it will write max_recs from the end of DataFrame |
template<typename ... Ts> std::future<bool> write(const char *file_name, io_format iof = io_format::csv, std::streamsize precision = 12, bool columns_only = false, long max_recs = std::numeric_limits |
Same as write() above, but it takes a file name NOTE:: This version of write() can be substantially faster, especially for larger files, than if you open the file yourself and use the write() version above. |
|
template<typename S, typename ... Ts> std::future<bool> write_async(S &o, io_format iof = io_format::csv, std::streamsize precision = 12, bool columns_only = false, long max_recs = std::numeric_limits |
Same as write() above, but executed asynchronously | |
template<typename ... Ts> std::future<bool> write_async(const char *file_name, io_format iof = io_format::csv, std::streamsize precision = 12, bool columns_only = false, long max_recs = std::numeric_limits |
Same as write_async() above, but it takes a file name | |
template<typename ... Ts> std::string to_string(std::streamsize precision = 12) const; |
This is a convenient function (simple implementation) to convert a DataFrame into a string that could be restored later by calling from_string(). It utilizes the write() member function of DataFrame. These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, … I have been asked why I implemented to_string instead of/before doing “to binary format” Implementing a binary format as a form of serialization is a legit ask and I will add that option when I find time to implement it. But implementing a binary format is more involved. And binary format is not always more efficient than string format. Two issues stand out
|
Ts: The list of types for all columns. A type should be specified only once precision: Specifies the precision for floating point numbers |
template<typename ... Ts> std::fututre<std::string> to_string_async(std::streamsize precision = 12) const; |
Same as to_string() above, but executed asynchronously |
static void test_write_json() { std::cout << "\nTesting write(json) ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460 }; std::vector<double> d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 }; std::vector<double> d2 = { 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 }; std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 }; std::vector<double> d4 = { 22, 23, 24, 25, 26, 27 }; std::vector<std::string> s1 = { "11", "22", "33", "aa", "bb", "cc", "dd", "tt", "uu", "ii", "88" }; MyDataFrame df; df.load_data(std::move(idx), std::make_pair("col_1", d1), std::make_pair("col_2", d2), std::make_pair("col_3", d3), std::make_pair("col_str", s1)); df.load_column("col_4", std::move(d4), nan_policy::dont_pad_with_nans); std::cout << "Writing in JSON:" << std::endl; df.write<std::ostream, int, double, std::string>(std::cout, false, io_format::json); } // ----------------------------------------------------------------------------- static void test_io_format_csv2() { std::cout << "\nTesting io_format_csv2( ) ..." << std::endl; std::vector<unsigned long> ulgvec2 = { 123450, 123451, 123452, 123450, 123455, 123450, 123449, 123450, 123451, 123450, 123452, 123450, 123455, 123450, 123454, 123450, 123450, 123457, 123458, 123459, 123450, 123441, 123442, 123432, 123450, 123450, 123435, 123450 }; std::vector<unsigned long> xulgvec2 = ulgvec2; std::vector<int> intvec2 = { 1, 2, 3, 4, 5, 3, 7, 3, 9, 10, 3, 2, 3, 14, 2, 2, 2, 3, 2, 3, 3, 3, 3, 3, 36, 2, 45, 2 }; std::vector<double> xdblvec2 = { 1.2345, 2.2345, 3.2345, 4.2345, 5.2345, 3.0, 0.9999, 10.0, 4.25, 0.009, 8.0, 2.2222, 3.3333, 11.0, 5.25, 1.009, 2.111, 9.0, 3.2222, 4.3333, 12.0, 6.25, 2.009, 3.111, 10.0, 4.2222, 5.3333 }; std::vector<double> dblvec22 = { 0.998, 0.3456, 0.056, 0.15678, 0.00345, 0.923, 0.06743, 0.1, 0.0056, 0.07865, 0.0111, 0.1002, -0.8888, 0.14, 0.0456, 0.078654, -0.8999, 0.8002, -0.9888, 0.2, 0.1056, 0.87865, -0.6999, 0.4111, 0.1902, -0.4888 }; std::vector<std::string> strvec2 = { "4% of something", "Description 4/5", "This is bad", "3.4% of GDP", "Market drops", "Market pulls back","$15 increase", "Running fast", "C++14 development", "Some explanation", "More strings", "Bonds vs. Equities", "Almost done", "XXXX04", "XXXX2", "XXXX3", "XXXX4", "XXXX4", "XXXX5", "XXXX6", "XXXX7", "XXXX10", "XXXX11", "XXXX02", "XXXX03" }; std::vector<bool> boolvec = { true, true, true, false, false, true }; MyDataFrame df; df.load_data(std::move(ulgvec2), std::make_pair("ul_col", xulgvec2)); df.load_column("xint_col", std::move(intvec2), nan_policy::dont_pad_with_nans); df.load_column("str_col", std::move(strvec2), nan_policy::dont_pad_with_nans); df.load_column("dbl_col", std::move(xdblvec2), nan_policy::dont_pad_with_nans); df.load_column("dbl_col_2", std::move(dblvec22), nan_policy::dont_pad_with_nans); df.load_column("bool_col", std::move(boolvec), nan_policy::dont_pad_with_nans); df.write<std::ostream, int, unsigned long, double, bool, std::string>(std::cout, false, io_format::csv2); MyDataFrame df_read; try { df_read.read("csv2_format_data.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; } df_read.write<std::ostream, int, unsigned long, double, bool, std::string>(std::cout, false, io_format::csv2); }
// ----------------------------------------------------------------------------- static void test_to_from_string() { std::cout << "\nTesting to_from_string() ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466 }; std::vector<double> d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 }; std::vector<double> d2 = { 8, 9, 10, 11, 12, 13, 14, 20, 22, 23, 30, 31, 32, 1.89 }; std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21, 0.34, 1.56, 0.34, 2.3, 0.1, 0.89, 0.45 }; std::vector<int> i1 = { 22, 23, 24, 25, 99, 100, 101, 3, 2 }; std::vector<std::string> strvec = { "zz", "bb", "cc", "ww", "ee", "ff", "gg", "hh", "ii", "jj", "kk", "ll", "mm", "nn" }; MyDataFrame df; df.load_data(std::move(idx), std::make_pair("col_1", d1), std::make_pair("col_2", d2), std::make_pair("col_3", d3), std::make_pair("col_4", i1), std::make_pair("str_col", strvec)); std::future<std::string> f = df.to_string_async<double, int, std::string>(); const std::string str_dump = f.get(); // std::cout << str_dump << std::endl; MyDataFrame df2; df2.from_string(str_dump.c_str()); // std::cout << '\n' << std::endl; // df2.write<std::ostream, double, int, std::string>(std::cout); assert((df.is_equal<double, int, std::string>(df2))); }