Signature | Description | Parameters |
---|---|---|
static void set_lock (SpinLock *lock); static void remove_lock (); |
DataFrame has unprotected static data. If you are using DataFrame in a multi-threaded program, you must provide a SpinLock. DataFrame will use your SpinLock to protect its static data. This is done this way, so by default, there is no locking overhead. |
lock: A pointer to SpinLock defined in Utils/ThreadGranularity.h file |
static void test_thread_safety() { std::cout << "\nTesting Thread safety ..." << std::endl; const size_t vec_size = 100000; auto do_work = [vec_size]() { MyDataFrame df; std::vector<size_t> vec; for (size_t i = 0; i < vec_size; ++i) vec.push_back(i); df.load_data( MyDataFrame::gen_sequence_index(0, static_cast<unsigned long>(vec_size), 1), std::make_pair("col1", vec)); // This is an extremely inefficient way of doing it, especially in // a multithreaded program. Each “get_column” is a hash table // look up and in multithreaded programs requires a lock. // It is much more efficient to call “get_column” outside the loop // and loop over the referenced vector. // Here I am doing it this way to make sure synchronization // between threads are bulletproof. // for (size_t i = 0; i < vec_size; ++i) { const size_t j = df.get_column<size_t>("col1")[i]; assert(i == j); } df.shrink_to_fit<size_t>(); }; SpinLock lock; std::vector<std::thread> thr_vec; MyDataFrame::set_lock(&lock); for (size_t i = 0; i < 20; ++i) thr_vec.push_back(std::thread(do_work)); for (size_t i = 0; i < 20; ++i) thr_vec[i].join(); MyDataFrame::remove_lock(); }