Signature Description Parameters

template<typename S, typename ... Ts>
bool
write(S &o,
      io_format iof = io_format::csv,
      std::streamsize precision = 12,
      bool columns_only = false,
      long max_recs = std::numeric_limits::max()) const; 
        
It outputs the content of DataFrame into the stream o. Currently 3 formats (i.e. csv, csv2, json) are supported specified by the iof parameter.
The CSV file format is written:
  INDEX:<Number of data points>:<Comma delimited list of values>
  <Column1 name>:<Number of data points>:<Column1 type>:<Comma delimited list of values>
  <Column2 name>:<Number of data points>:<Column2 type>:<Comma delimited list of values>
      .
      .
      .
        
All empty lines or lines starting with # will be skipped. For examples, see files in test directory

The CSV2 file format must be (this is similar to Pandas csv format):
  INDEX:<Number of data points>:<Index type>:,<Column1 name>:<Number of data points>:<Column1 type>,<Column2 name>:<Number of data points>:<Column2 type>, . . .
  Comma delimited rows of values
      .
      .
      .
        
All empty lines or lines starting with # will be skipped. For examples, see IBM and FORD files in test directory

The JSON file format looks like this:
  {
    "INDEX":{"N":3,"T":"ulong","D":[123450,123451,123452]},
    "col_3":{"N":3,"T":"double","D":[15.2,16.34,17.764]},
    "col_4":{"N":3,"T":"int","D":[22,23,24]},
    "col_str":{"N":3,"T":"string","D":["11","22","33"]},
    "col_2":{"N":3,"T":"double","D":[8,9.001,10]},
    "col_1":{"N":3,"T":"double","D":[1,2,3.456]}
  }
        
Please note DataFrame json does not follow json spec 100%. In json, there is no particular order in dictionary fields. But in DataFrame json:
  1. Column “INDEX” must be the first column, if it exists
  2. Fields in column dictionaries must be in N (name), T (type), D (data) order


In all formats the following data types are supported:
          float
          double
          longdouble  -- long double
          int
          uint        -- unsigned int
          long
          longlong    -- long long int
          ulong       -- unsigned long
          ulonglong   -- unsigned long long int
          string
          bool
          DateTime    -- DateTime data in format of <Epoch seconds>.<nanoseconds> (1516179600.874123908)
        
In case of io_format::csv2 the following additional types are also supported:
          DateTimeAME -- DateTime string printed in American style (MM/DD/YYYY HH:MM:SS.mmm)
          DateTimeEUR -- DateTime string printed in European style (YYYY/MM/DD HH:MM:SS.mmm)
          DateTimeISO -- DateTime string printed in ISO style (YYYY-MM-DD HH:MM:SS.mmm)
        
 S: Output stream type
 Ts: The list of types for all columns. A type should be specified only once
 o: Reference to an streamable object (e.g. cout, file, ...)
 iof: Specifies the I/O format. The default is CSV
 precision: Specifies the precision for floating point numbers
 columns_only: If true, the index columns is not written into the stream
 max_recs: Max number of rows to write. If it is positive, it will write max_recs from the beginning of DataFrame. If it is negative, it will write max_recs from the end of DataFrame
        

template<typename ... Ts>
std::future<bool>
write(const char *file_name,
      io_format iof = io_format::csv,
      std::streamsize precision = 12,
      bool columns_only = false,
      long max_recs = std::numeric_limits::max()) const; 
        
Same as write() above, but it takes a file name

NOTE:: This version of write() can be substantially faster, especially for larger files, than if you open the file yourself and use the write() version above.

template<typename S, typename ... Ts>
std::future<bool>
write_async(S &o,
            io_format iof = io_format::csv,
            std::streamsize precision = 12,
            bool columns_only = false,
            long max_recs = std::numeric_limits::max()) const; 
        
Same as write() above, but executed asynchronously

template<typename ... Ts>
std::future<bool>
write_async(const char *file_name,
            io_format iof = io_format::csv,
            std::streamsize precision = 12,
            bool columns_only = false,
            long max_recs = std::numeric_limits::max()) const; 
        
Same as write_async() above, but it takes a file name

template<typename ... Ts>
std::string
to_string(std::streamsize precision = 12) const; 
        
This is a convenient function (simple implementation) to convert a DataFrame into a string that could be restored later by calling from_string(). It utilizes the write() member function of DataFrame.
These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, …

I have been asked why I implemented to_string instead of/before doing “to binary format”
Implementing a binary format as a form of serialization is a legit ask and I will add that option when I find time to implement it. But implementing a binary format is more involved. And binary format is not always more efficient than string format. Two issues stand out
  1. Consider Options market data. Options' prices and sizes are usually smaller numbers. For example, consider the number 0.5. In string format that is 3 bytes ".5|". In binary format it is always 8 bytes. So, if you have a dataset with millions/billions of this kind of numbers, it makes a significant difference
  2. In binary format you must deal with big-endian vs. little-endian. It is a pain in the neck and affects efficiency
Ts: The list of types for all columns. A type should be specified only once
precision: Specifies the precision for floating point numbers

template<typename ... Ts>
std::fututre<std::string>
to_string_async(std::streamsize precision = 12) const; 
        
Same as to_string() above, but executed asynchronously
static void test_write_json()  {

    std::cout << "\nTesting write(json) ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460 };
    std::vector<double> d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };
    std::vector<double> d2 = { 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 };
    std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 };
    std::vector<double> d4 = { 22, 23, 24, 25, 26, 27 };
    std::vector<std::string> s1 = { "11", "22", "33", "aa", "bb", "cc", "dd", "tt", "uu", "ii", "88" };
    MyDataFrame         df;

    df.load_data(std::move(idx),
                 std::make_pair("col_1", d1),
                 std::make_pair("col_2", d2),
                 std::make_pair("col_3", d3),
                 std::make_pair("col_str", s1));
    df.load_column("col_4", std::move(d4), nan_policy::dont_pad_with_nans);

    std::cout << "Writing in JSON:" << std::endl;
    df.write<std::ostream, int, double, std::string>(std::cout, false, io_format::json);
}

// -----------------------------------------------------------------------------

static void test_io_format_csv2()  {

    std::cout << "\nTesting io_format_csv2( ) ..." << std::endl;

    std::vector<unsigned long>  ulgvec2 =
        { 123450, 123451, 123452, 123450, 123455, 123450, 123449, 123450, 123451, 123450, 123452, 123450, 123455, 123450,
          123454, 123450, 123450, 123457, 123458, 123459, 123450, 123441, 123442, 123432, 123450, 123450, 123435, 123450 };
    std::vector<unsigned long>  xulgvec2 = ulgvec2;
    std::vector<int>            intvec2 =
        { 1, 2, 3, 4, 5, 3, 7, 3, 9, 10, 3, 2, 3, 14, 2, 2, 2, 3, 2, 3, 3, 3, 3, 3, 36, 2, 45, 2 };
    std::vector<double>         xdblvec2 =
        { 1.2345, 2.2345, 3.2345, 4.2345, 5.2345, 3.0, 0.9999, 10.0, 4.25, 0.009, 8.0, 2.2222, 3.3333,
          11.0, 5.25, 1.009, 2.111, 9.0, 3.2222, 4.3333, 12.0, 6.25, 2.009, 3.111, 10.0, 4.2222, 5.3333 };
    std::vector<double>         dblvec22 =
        { 0.998, 0.3456, 0.056, 0.15678, 0.00345, 0.923, 0.06743, 0.1, 0.0056, 0.07865, 0.0111, 0.1002, -0.8888,
          0.14, 0.0456, 0.078654, -0.8999, 0.8002, -0.9888, 0.2, 0.1056, 0.87865, -0.6999, 0.4111, 0.1902, -0.4888 };
    std::vector<std::string>    strvec2 =
        { "4% of something", "Description 4/5", "This is bad", "3.4% of GDP", "Market drops", "Market pulls back","$15 increase", "Running fast", "C++14 development",
          "Some explanation", "More strings", "Bonds vs. Equities",
          "Almost done", "XXXX04", "XXXX2", "XXXX3", "XXXX4", "XXXX4", "XXXX5", "XXXX6",
          "XXXX7", "XXXX10", "XXXX11", "XXXX02", "XXXX03" };
    std::vector<bool>           boolvec = { true, true, true, false, false, true };

    MyDataFrame df;

    df.load_data(std::move(ulgvec2), std::make_pair("ul_col", xulgvec2));
    df.load_column("xint_col", std::move(intvec2), nan_policy::dont_pad_with_nans);
    df.load_column("str_col", std::move(strvec2), nan_policy::dont_pad_with_nans);
    df.load_column("dbl_col", std::move(xdblvec2), nan_policy::dont_pad_with_nans);
    df.load_column("dbl_col_2", std::move(dblvec22), nan_policy::dont_pad_with_nans);
    df.load_column("bool_col", std::move(boolvec), nan_policy::dont_pad_with_nans);

    df.write<std::ostream, int, unsigned long, double, bool, std::string>(std::cout, false, io_format::csv2);

    MyDataFrame df_read;

    try  {
        df_read.read("csv2_format_data.csv", io_format::csv2);
    }
    catch (const DataFrameError &ex)  {
        std::cout << ex.what() << std::endl;
    }
    df_read.write<std::ostream, int, unsigned long, double, bool, std::string>(std::cout, false, io_format::csv2);
}
// -----------------------------------------------------------------------------

static void test_to_from_string()  {

    std::cout << "\nTesting to_from_string() ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466 };
    std::vector<double> d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 };
    std::vector<double> d2 = { 8, 9, 10, 11, 12, 13, 14, 20, 22, 23, 30, 31, 32, 1.89 };
    std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21, 0.34, 1.56, 0.34, 2.3, 0.1, 0.89, 0.45 };
    std::vector<int>    i1 = { 22, 23, 24, 25, 99, 100, 101, 3, 2 };
    std::vector<std::string>    strvec =
        { "zz", "bb", "cc", "ww", "ee", "ff", "gg", "hh", "ii", "jj", "kk", "ll", "mm", "nn" };
    MyDataFrame         df;

    df.load_data(std::move(idx),
                 std::make_pair("col_1", d1),
                 std::make_pair("col_2", d2),
                 std::make_pair("col_3", d3),
                 std::make_pair("col_4", i1),
                 std::make_pair("str_col", strvec));

    std::future<std::string>    f = df.to_string_async<double, int, std::string>();
    const std::string           str_dump = f.get();

    // std::cout << str_dump << std::endl;

    MyDataFrame df2;

    df2.from_string(str_dump.c_str());
    // std::cout << '\n' << std::endl;
    // df2.write<std::ostream, double, int, std::string>(std::cout);
    assert((df.is_equal<double, int, std::string>(df2)));
}
C++ DataFrame