Log in

24 January 2011 @ 05:48 pm
a bit of notes for xfs  
Entry update was issued by Mark Callaghan tests.

XFS file system is known to provide parallelism in operations. There is question-answer text from IRC below, stealing secret knowledge & explanations. Short summary is:
  • metadata operations, like modifying ctime, are not concurrent (for the same object)
  • direct-io data operations are concurrent (you can do several writes in to the same file in one time)
  • for buffered I/O there can be either one writer, or multiple readers
  • extN & co allow readers in parallel to writers which actually is against the fine print in posix

CoolCold> dchinner: am i understanding right, that multiple concurrent operations can be done withing different ag groups, not in the one?
dchinner<< CoolCold: as long as you are not trying to operate on the same object in different transactions, be it a directory, inode or AG, operations should run concurrently
dchinner> e.g. two creates in the same directory with serialise, but do them in different directories and they will run concurently
CoolCold> dchinner: thanks, i've been thinking that AGs are something like subfilesystems, so concurrency is available through this
dchinner<< CoolCold: yes, they are - they allow concurrency at teh allocation level, but not all operations allocate or free space.
dchinner> Indeed, the biggest concurrency limitation has traditionally been the transaction commit/journalling code, but that's a lot more scalable now with delayed logging....

Here is extra knowledge after reading Mark's post:

dchinner<< CoolCold: those are discussing two different levels of concurrency.
dchinner> The mytechspam quotes are to do with metadata modifications - you can't have two threads concurrently modify teh same object
dchinner> but a direct IO write is a data operation, not a metadata operation.
dchinner> sure, if teh dio write requires allocation, it will exclude all other metadata operations to the inode, the AG, etc
dchinner> but that can run in parallel with another dio write to the same inode that has already done all it's metadata lookups (e.g. extent mapping) and only needs to issue the data write to disk....
dchinner> i.e. XFS treats data concurrency on an inode (via the iolock) separately to metadata concurrency (via the ilock)
CoolCold> i should update mytechspam entry then, to make things more clear
CoolCold> dchinner: so, if file is already allocated (as innodb usually does, precreate file with specified size) then writes should go in parallel?
CoolCold> but what about file attribute updates, like mtime? is it metadata update? or it is updated only when file handle is closed?
dchinner<< CoolCold: the only serialisation point in direct io writes is the metadata operations. If no metadata needs modifying, then they go in parallel
dchinner> mtime updates are captured when teh inode is next written or modified....

CoolCold> dchinner: mm...sorry, i do not completely understand "11:21:49 < dchinner> mtime updates are captured when teh inode is next written or modified...." - if i write at least one byte into file, so inode is modified and metadata update should happen?
CoolCold> and does concurrency apply only to O_DIRECT operations?
dchinner<< Coolcold: the inode is locked very briefly for things like mtime updates. They don't require a transaction during a write, so very little overhead/serialisation occurs.

hch> note that writes don't update the mtime
hch> only the ctime
hch> but yes, the actual ctime update won't happen concurrently
hch> but it's just a tiny critical section compared to the actual direct I/O write
CoolCold> hch: and what about O_DIRECT and non-directed operations? are there any additional concurrency limits for non-direct?
hch<< CoolCold: for buffered I/O there can be either one writer, or multiple readers
hch<< CoolCold: don't mix buffered and direct I/O - it's just going to cause problems
hch> note that extN & co allow readers in parallel to writers
hch> which actually is against the fine print in posix
CoolCold> hch: oh, didn't know it is regulated by POSIX, thanks a lot
hch<< CoolCold: only for buffered I/O
hch> O_DIRECT is a rather underspecified extension