Friday, December 17, 2010

Storing and delivering large files (binary data)

The regular way to store configuration is to persist it to database. In my case Hibernate is used for that purpose and the configuration is sent to client computers in XML format that is created by JAXB (Metro implementation) via HTTP.  In this post I describe what are the pros and cons of different ways to persist and to send large files.
The file can be stored as a BLOB in database, as a BFile (only in case of ORACLE db vendor - see or as a reference from a db table to a file, that is saved in the filesystem. In any case large files must be accessed using streams and never uploaded to the memory. Lets see what are these options good for:

Pros: Transaction support, uniformity of configuration(all the configuration is stored in db)
Cons: Bugs in hibernate (Bug example for Oracle db) + bugs in JDBC driver 

BFile or Reference from a db table:
Pros: Performance, Easier to run scheduled tasks on these files (for example one that manages disk space and deletes the eldest files)
Cons: Transactions not supported

Another issue that should be considered when decided to save files in the filesystem is the permissions to edit the files at the filesystem.  In summary, if there is no need in disk space management and similar features, I would prefer storing files as BLOBs. Otherwise I would use Reference from db table. I don't see what additional value BFile gives.

Delivering the files to clients:
Here the options are:  to encode the stream from the file (base64 encoding for instance) or to use callback method - i.e. to send an id of a file and to send the data in following request generated by clients. Both these options are discussed below.

Encode stream of a file:
Pros: Transaction of the configuration - the configuration is either sent completely or not sent completely
Cons: JAXB bug when working with DataHandler. The base64 (or any other) encoding itself adds junk size to xml and effects performance. Xml with large encoded file encoded inside is big and not readable

Callback method:
Pros: Flexibility - client asks for the file when it needs it. Cache - if the file is not changed since last update (i.o. the id is the same id of the file it has), the file is not retrieved (client logics)
Cons: This demands more effort from client. It is more difficult to manage transaction.

IMHO, For large files Callback mechanism serves better because of its flexibility, performance boost and the annoying JAXB bug if trying to serialize the file into xml.

1 comment:

  1. Are you want to send your important data securely then login here, Tecnostore-Group provides online services of Send Large Files.