Patch for mw_datatools

Chris,

I’m submitting a patch for mw_datatools. It’s generated against the most recent git repository.
Basically, it adds the following stuffs:

MWKStream:

  • Writing Python objects (int, float, dict, tuple, list, etc.) to a mwk file (in scarab format)
  • Low-level scarab IO functions (_scarab_session_read, _scarab_session_write, etc.)

MWKFile:

  • Appending new codes into a existing mwk file (mostly for merging)
  • Cached codec/reverse_codec
  • Other small changes, such as managing temp files (again, for merging), real mwk files, etc.

PythonDataBindingsHelpers.cpp:

  • Python object to scarab conversion function (convert_python_to_scarab)
  • Bridge functions to support low-level IO functions in MWKStream (scarab_session*)
  • Some modifications adopted from PEP and Official Python Website (e.g., “return Py_None;” to “Py_RETURN_NONE;”)

Let me know if you have any questions.
Thanks!

Ha

Attachment: mw_datatools.patch (27.9 KB)

Thanks, Ha. I’ll take a look at this and get back to you.

Chris

Hi Ha,

This is a very nice patch! Thanks for taking the time to put it together.

I’m happy with all the C++ code changes you’ve made. However, the stuff you added to data.py makes me nervous. I’m not sure we should be encouraging people to merge data files. If you or others want to do so, you should be free to try (so I’m fine with exposing the low-level Scarab I/O functions as private methods of MWKStream). However, I don’t want to be the person who has to provide support if something goes horribly wrong!

I propose incorporating your C++ code changes (as well as the codec-caching and reworked MWKFile.unindex() from data.py) into mw_datatools, while the file-merging stuff remains in your personal toolbox (which you’re free to share with others, of course). Does that sound OK?

Thanks,
Chris

Hi Chris,

Thanks for your note, and I agree with your points. I’m considering making a new class by inheriting MWKFile and putting all the merge stuffs into the new class. By this way, we can minimize some potential side-effects of the merge code while I can continue what I’m doing now and maintain/develop my “experimental” code without worrying too much.

Meanwhile, I stripped off the merge code from data.py and attached a concise version of it. You might want to take a look at it.

Thanks,
Ha

Attachment: data_concise.py (7.08 KB)

Hi Ha,

I’m considering making a new class by inheriting MWKFile and putting all the merge stuffs into the new class. By this way, we can minimize some potential side-effects of the merge code while I can continue what I’m doing now and maintain/develop my “experimental” code without worrying too much.

That sounds like a good plan.

Meanwhile, I stripped off the merge code from data.py and attached a concise version of it.

Thanks. I made a few additional changes to the MWKFile class in data_concise.py:

  • Removed the reserved codec code constants
  • Removed the backup-management methods
  • Renamed empty_dir() to _empty_dir()
  • Changed the default value of the empty_dir parameter of unindex() to True (to preserve existing behavior)

I just committed the changes. They should be in tonight’s nightly build.

Chris