设为首页 收藏本站
查看: 703|回复: 0

[经验分享] Glusterfs hacker guide(三)

[复制链接]

尚未签到

发表于 2019-2-1 10:49:01 | 显示全部楼层 |阅读模式
1 GlusterFS Translator API 2
1.1   Introduction

  Before we dive into the specifics of the API, it's important to understandtwo things about the context in which it exists. The first is that it's afilesystem API, exposing most functionality through a dispatch table. InGlusterFS this is xlator_fops, which sort of combines Linux's file_operations, inode_operations,and super_operations (all in fs.h) in one place. To understandhow translators work, you'll first need to understand what these calls do andhow they>
  The second essential aspect of the translator API is that it'sasynchronous and callback-based. What this means is that your code for handlinga particular request must be broken up into two parts which get to see therequest before and after the next translator. In other words, your dispatch(first-half) function calls the next translator's dispatch function and thenreturns without blocking. Your callback (second-half function) might be calledimmediately when you call the next translator's dispatch function, or it mightbe called some time later from a completely different thread (most often thenetwork transport's polling thread). In neither case can the callback just pickup its context from the stack as would be the case with a synchronous API.GlusterFS does provide several ways to preserve and pass context between thedispatch function and its callback, but it's fundamentally something you haveto deal with yourself instead of>1.2   Dispatch Tables andDefault Functions
  The main dispatch table for a translator is always called fops (thetranslator loading code specifically looks up this name using dlsym) whichcontains pointers to all the "normal" filesystem functions. You onlyneed to fill in the fields for functions that your particular translator caresabout. Any others will be filled in at runtime with default values that justpass the request straight on to the next translator, specifying a callback thatjust passes the result back to the previous translator.
  In addition to providing this default functionality, these defaultfunctions and callbacks serve another useful purpose. Any time you need to adda new function to your translators, the easiest way to start is to copy andrename the consistently-named default function for that same operation - e.g.default_open for open, default_truncate for truncate, etc. This ensures thatyou start with the correct argument list and a reasonable kind of defaultbehavior. Just make sure you update your fops table to pointto your copy.
  When you copy and rename a default function, your copy will often use thedefault callback as well (e.g. default_open will refer to default_open_cbk).Often this will be exactly what you need; if you do all of your work beforepassing the request onward, you might not need to do anything at all in thecallback and might as well use the default one. Even when that's not the case,copying and renaming the default callback works just as well as copying andrenaming the dispatch function to ensure the correct argument list and so on.
  Each translator may also have additional dispatch tables, including atable named cbk which is used to manage inode and filedescriptor lifecycles; see the section on inode and file descriptor context formore details.
1.3   STACK_WIND andSTACK_UNWIND
  The main functions that implement the callback-based translator API arecalled STACK_WIND and STACK_UNWIND. These operate not on the usual call stackas you'd see in gdb, but on a separately maintained stack of framesrepresenting calls to translators. When your fops entry pointgets called, that call represents a request on its way from FUSE on the clientto a local filesystem on server. Your entry point can do whatever processing itwants, then pass the request on to the next translator along that path usingSTACK_WIND. The arguments are as follows:
  ·        frame: the "stack frame" representing therequest
  ·        rfn: the callback function to call when the othertranslator is done with the request
  ·        obj: the translator object to which you're yieldingcontrol
  ·        fn: the specific translator function you're calling, fromthe next translator's fops table
  ·        params...: any other entry-point-specific arguments (e.g.inodes, file descriptors, offsets, data buffers) for this call
  As mentioned in the previous section, your "rfn" callback mightbe invoked from within the STACK_WIND call, or it might be invoked later in adifferent context. To complete a request without invoking the next translator(e.g. returning data from cache), or to pass it back to the previous one fromyour callback when it's done, you use STACK_UNWIND. Actually, you're better offusing STACK_UNWIND_STRICT, which allows you to specify what kind of requestyou're completing. The arguments are:
  ·        op: the type of operation (e.g. open, rename) which isused to check that the set of additional parameters matches expecations forthat operation
  ·        params...: additional parameters according to the type ofrequest
  In practice, almost all of the request types use two additional parametersbetween op and params, even though these aren'tapparent in the macro definition:
  ·        op_ret: the status of the operation so far (sometimes acount of bytes read or written, more often just zero to indicate success or -1to indicate failure)
  ·        op_errno: a standard error code (e.g. EPERM) if the operationfailed
  The specific arguments used by each dispatch function and its associatedcallback are operation-specific, but you can always count on the first fewarguments to a dispatch function being as follows:
  ·        frame: the "stack frame" for the currentrequest
  ·        this: the translator object representing this instance ofyour translator
  Callbacks are similar, except that there's an additional argument betweenthose two. This is the "cookie" which is an opaque pointer stored bythe matching STACK_WIND. By default this is a pointer to the stack frame thatwas created by the STACK_WIND (which doesn't seem terribly useful) but there'salso a STACK_WIND_COOKIE call that allows you to specify a different value. Inthis case, the extra argument comes between the rfn and obj argumentsto STACK_WIND, and can be used to pass some context from the dispatch functionto its callback. Note that this must not be a pointer toanything on the stack, because the stack might be gone by the time the callbackis invoked.
  One other important note: STACK_UNWIND might cause the entire callstack to be unwound, at which point the last call will free all of its frames.For this reason, you should never do anything that mightrequire even the current frame to be intact after calling STACK_UNWIND.
1.4   Per Request Context
  Part of each translator-stack frame is a "local" pointer whichis used to store translator-specific context. This is the primary mechanism forsaving context between your dispatch function and its callback, so you might aswell get used to the following pattern:
  /* in dispatch function */
  local = (my_locals_t  *)GF_CALLOC(1,sizeof(*local),...);
  if (!local) {
  /* STACK_UNWIND with  ENOMEM eror */
  }
  /* fill in my_locals_t fields  */
  frame->local = local;
  /* in callback */
  local = frame->local;

  The important thing to remember is that every frame's local fieldwill be passed to GF_FREE if it's non-NULL when the stack is destroyed, but noother cleanup will be done. If your own local structurecontains pointers or references to other objects, then you'll need to take careof those yourself. It would also be nice if memory (and other resources) couldbe freed before the stack is destroyed, so it's best not to>  void my_destructor (call_frame_t *frame)
  {
  my_own_cleanup(frame->local);
  GF_FREE(frame->local);
  /* Make sure STACK_DESTROY  doesn't double-free it. */
  frame->local = NULL;
  }
  It would be nice if the call_frame_t structure held a pointer to thedestructor and invoked it automatically from STACK_UNWIND, and if local structureswere handled more efficiently than by requiring two trips through the glibcmemory allocator per translator, but that's not the world we live in.
1.5   Inode and FileDescriptor Context
  Most dispatch functions and callbacks take either a file descriptor (fd_t)or an inode (inode_t) as an argument. Often, your translator might needto store some of its own context on these objects, in a way that persistsbeyond the lifetime of a single request. For example, DHT stores layout mapsfor directories and last known locations on inodes. There's a whole set offunctions for storing this kind of context. In each case, the second argumentis a pointer to the translator object with which values are being associated,and the values are unsigned 64-bit integers. They all return zero for success,using reference parameters instead of return values for the _get and _delfunctions.
  ·        inode_ctx_put (inode, xlator, value) /* NB: put, not set*/
  ·        inode_ctx_get (inode, xlator, &value)
  ·        inode_ctx_del (inode, xlator, &value)
  ·        fd_ctx_set (fd, xlator, value) /* NB: set, not put */
  ·        fd_ctx_get (fd, xlator, &value)
  ·        fd_ctx_del (fd, xlator, &value)
  The _del functions are really "destructive gets" which bothreturn and delete values. Also, the inode functions have two-value forms (e.g.inode_ctx_put2) which allow manipulation of two values per translator insteadof one.
  The use of a translator-object pointer as a key/index for these calls isnot merely cosmetic. When an inode_t or fd_t isbeing deleted, the delete code looks through the context slots. For each onethat's used, it looks in the translator's cbk dispatch tableand calls its forget entry point for inodes or release entrypoint for file descriptors. If the context is a pointer, this is your chance tofree it and any other associated resources.
  Lastly, it's important to remember that an inode_t or fd_t pointerpassed to a dispatch function or callback represents only a borrowed reference.If you want to be sure that object is still around later, you need tocall inode_ref or fd_ref to add a permanentreference, and then call inode_unref or fd_unref whenthe reference is no longer needed.
1.6   Dictionaries andTranslator Options
  Another common argument type is a dict_t, which is a sort ofgeneric dictionary or hash-map data structure capable of holding arbitraryvalues associated with string-valued keys. For example, values might be varioussizes of signed or unsigned integers, strings, or binary blobs. Strings andbinary blobs might be marked to be free with GlusterFS functions when no longerneeded, to be freed with glibc functions, or not to be freed at all. Boththe dict_t* and the *data_t objects that hold values arereference-counted and destroyed only when their reference counts reach zero. Aswith inodes and file descriptors, if you want to make sure that a dict_t youreceived as an argument will be around later, you need to add _ref and _unrefcalls to manage its lifecycle appropriately.
  Dictionaries are not only used as dispatch function and callbackarguments. They are also used to pass options to various modules, including theoptions for your translator's init function. In fact, thebodies of existing translators' init functions are oftenmostly consumed with interpreting options contained in dictionaries. To add anoption for your translator, you also need to add an entry in yourtranslator's options array (another of those names that thetranslator-loading code looks up with dlsym). Each option can be a boolean, aninteger, a string, a path, a translator name, or any of several otherspecialized types you can find by looking for GF_OPTION_TYPE_ in the code. Ifit's a string, you can even specify a list of valid values. The parsed options,plus any other information that's translator-wide, can be stored in a structureusing the opaque private pointer in the xlator_t structure(usually this in most contexts).
1.7   Logging
  Most logging in translators is done using the gf_log function. This takesas arguments a string (generally this->name for translator code), a loglevel, a vsprintf-sytle format, and possibly additional arguments according tothe format. Commonly used log levels include GF_LOG_ERROR, GF_LOG_WARNING, andGF_LOG_DEBUG. It's often useful to define your own macros which wrap gf_log, oryour own levels which map to the official ones, so that the level of debuginformation coming out of your translator can be adjusted at run time. In thesimplest case, this might mean tweaking the variables in gdb. If you're feelinga bit more ambitious, you can add a translator option for the debug level(several of the base translators do this). If you're feelingreally ambitious,you can implement a "magic" xattr call to pass in new values to arunning translator.
1.8   Child Enumeration andFan Out

  One common pattern in translators is to enumerate its children, either tomatch the one that meets some criterion or to operate on all of them. Forexample, DHT needs to gather hash-layout "maps" from all of itschildren to determine where files should go; AFR needs to fetch pendingoperation counts for the same file from children to determine replicationstatus. The>  xlator_list_t *trav;
  xlator_t *xl;
  for (trav = this->children;  trav; trav = trav->next) {
  xl = trav->xlator;
  do_something(xl);
  }
  If the goal is to "fan out" a request to each child, someadditional gyrations are necessary. The most common approach is to do somethinglike this in the original dispatch function:
  local->call_count = priv->num_children;
  for (trav = this->children;  trav; trav = trav->next) {
  xl = trav->xlator;
  STACK_WIND(frame,my_callback,xl,xl->fops->whatever,...);
  }
  Then, in my_callback:
  LOCK(&frame->lock);
  call_cnt =  --local->call_count;
  UNLOCK(&frame->lock);
  /* Do whatever you do for  every call */
  if (!call_cnt) {
  /* Do last-call  processing. */
  STACK_UNWIND(frame,op_ret,op_errno,...);
  }
  return 0;
  In some cases, you can also use STACK_WIND_COOKIE to let each callbackknow which of N calls has returned. Examples of this are legion in the AFRcode.
1.9   Stubs and sync calls
1.9.1   GlusterFSAlgorithms: Distribution
  A lot of people seem to be curious about how GlusterFS works, not just inthe sense of effects but in terms of internal algorithms etc. as well. Here’sanexample from thismorning. The documentation at this level really is kind of sparse, so I mightas well start filling some of the gaps. Today I’ll talk about DHT, which is thereal core of how GlusterFS aggregates capacity and performance across multipleservers. Its responsibility is to place each file on exactly one of itssubvolumes – unlike either replication (which places copies onall of its subvolumes) or striping (which places pieces ontoall of its subvolumes). It’s a routing function, not splitting or copying.
  The basic method used in DHT is consistent hashing. Each subvolume (brick)is assigned a range within a 32-bit hash space, covering the entire range withno holes or overlaps. Then each file is also assigned a value in that samespace, by hashing its name. Exactly one brick will have an assigned rangeincluding the file’s hash value, and so the file “should” be on that brick.However, there are many cases where that won’t be the case, such as when theset of bricks (and therefore the range assignment of ranges) has changed sincethe file was created, or when a brick is nearly full. Much of the complexity inDHT involves these special cases, which we’ll discuss in a moment. First,though, it’s worth making a couple more observations about the basic algorithm.

  ·        The assignment of hash ranges to bricks is determined byextended attributes stored on directories (here’s a description of those data structures). This means thedistribution is directory-specific. You could well distribute files differently– e.g. across different sets of bricks – in different directories if you knowwhat you’re doing, but it’s quite unsafe. Firstly it’s unsafe because you’dreally better know what you’re doing. Secondly it’s unsafe because there’s nomanagement support for this, so the next time you do a rebalance (more aboutthat later) it will happily stomp on your carefully hand-crafted xattrs. In thefairly near future, I hope to add a feature to recognize hand-set xattrs assuch and leave them alone. In the more distant future, there might bemanagement support for assigning bricks to various pools or>  ·        Consistent hashing is usually thought of as hashingaround a circle, but in GlusterFS it’s more linear. There’s no need to “wraparound” at zero, because there’s always a break (between one brick’s range andanother’s) at zero.
  ·        If a brick is missing, there will be a hole in the hashspace. Even worse, if hash ranges are reassigned while a brick is offline, someof the new ranges might overlap with the (now out of date) range stored on thatbrick, creating a bit of confusion about where files should be. GlusterFS triesreally hard to avoid these problems, but it also checks aggressively to makesure nothing slips through. If you ever see messages in your logs about holesor overlaps, that means the checking code is doing its job.
  So, those are the basics. How about those special cases? It’s probablyeasiest to look at the “read” path first, where we’re trying to find a filethat we expect to be there. Here’s the sequence of operations.
  1.  Make sure we havethe hash-range assignments (the “layout”) for each brick’s copy of the parentdirectory. This information is cached, so we’ll usually have it already.
  2.  Hash the file nameand look up the corresponding brick in the layout.
  3.  Send a LOOKUPrequest to that brick, specifying the file path.
  4.  If the LOOKUPcomes back positive, we found the file and we’re done.
  5.  Otherwise, re-sendthe LOOKUP to all bricks to see who really has the file.
  6.  If nobody gives apositive reply, the file really isn’t there and we’re done again.
  7.  Go back to thebrick where the file “should” have been, and create a link file (describedbelow) pointing to the real location.
  8.  Return the LOOKUPresult to the caller.
  What’s a link file, then? Have you ever looked on one of your bricks andseen zero-length files with weird permissions (sticky bit set)? Those are linkfiles. If you look closer, you’ll also see that they have trusted.dht.linkfilexattrs with brick names in them. That’s how we avoid the “broadcast” mentionedabove. On subsequent lookups, if we find a link file we just follow it to thereal brick. Considering that we only go through this lookup procedure once perfile per client anyway (location information is cached), the cost of “guessingwrong” is therefore pretty minimal. I once implemented a scheme where we do anexponentially expanding search instead of an immediate broadcast, hoping toachieve a better balance of lookup latency vs. network traffic, but in the end itjust didn’t seem to make a difference so the extra complexity wasn’t worth it.Now, let’s look at the file-creation path.
  1.  Assume we’vealready done a lookup, so we already have the layout information cached and weknow the file doesn’t already exist anywhere.
  2.  Hash the file nameand look up the corresponding brick in the layout.
  3.  If that brick isfull, choose another brick (doesn’t really matter how) that isn’t instead.
  4.  Send a CREATErequest to the chosen brick for that file.
  5.  If we “diverted”because of a full brick, go back and add a link file to the brick chosen bypure hashing. The next client will almost certainly need it.
  This brings us to rebalancing, which is one of the key challenges – andtherefore one of the most interesting research areas IMO – in this kind ofsystem. The first thing to know about GlusterFS rebalancing is that it’s notautomatic. If you add a new brick, even new files won’t be put on it until youdo the “fix-layout” part of rebalance, and old files won’t be put on it untilyou do the “migrate-data” part. What do these do?
  ·        Fix-layout just walks the directory tree recalculatingand modifying the trusted.glusterfs.dht xattrs to reflect the new list ofbricks. It does this in a pretty simple way, assigning exactly one range oflength MAXINT/nbricks to each brick in turn starting at zero.
  ·        Migrate-data is much more costly. For each file, itcalculates where the file “should” be according to the new layout. Then, if thefile is not already there, it moves the file by copying and renaming over theoriginal. There’s some tricky code to make sure this is transparent and atomicand so forth, but that’s the algorithm.
  In my personal opinion, there are problemsenhancementopportunities in both of these areas. Let’s take these in reverse order. Migrate-datais slow. What it should do is run in parallel on all of the bricks, with eachbrick either “pushing” data that is currently local but needs to be elsewhereor “pulling” data that’s the other way around. What it does instead is run onone node, potentially moving files for which it is neither source nordestination. This is a big deal, because it causes rebalance to take days whenit should take hours – or weeks when it should take days, on largerinstallations. The amount of I/O involved is also why you don’t necessarilywant this to be an automatic process.

  While the migrate-data issue is at the level of mechanics andimplementation, the fix-layout issue is at more of a conceptual level. To putit simply, when we add a new brick we should reallocate approximately1/new_brick_count hash values. Because our layout calculations are naive, wewill usually reallocate much more than that – exacerbating the migrate-dataproblem because reallocated hash values correspond to moved files. Time for apicture.  The outerring represents the state with just three bricks – hash value zero at the top,split into three equal ranges. The inner ring represents the state after addinga fourth brick. Any place where the inner and outer rings are different colors representsa range that has been reassigned from one brick to another – implying amigration of data likewise. If you look carefully, you’ll see that we’removing half of the data when it should be only a quarter – 8%blue to orange, 17% orange to yellow, and 25% yellow to green. What could we dothat’s better? Not much, if we stay within the limitation of a single brickclaiming a single range, but there really doesn’t need to be such a limitation.Instead, we could borrow from Dynamo and assign multiple “virtual node>  That’s how DHT works today, and some thoughts about how it might evolve inthe future. The next article will focus on replication.
  ---------------
1.9.2  * cluster/replicate
  ---------------
  Before understanding replicate, one must understand two internal FOPs:
  GF_FILE_LK:
  This is exactly like fcntl(2) locking, except the locks are in a
  separate domain from locks held by applications.
  GF_DIR_LK (loc_t *loc, char *basename):
  This allows one to lock a name under a directory. For example,
  to lock /mnt/glusterfs/foo, one would use the call:
  GF_DIR_LK ({loc_t for "/mnt/glusterfs"}, "foo")
  If one wishes to lock *all* the names under a particular directory,
  supply the basename argument as NULL.
  The locks can either be read locks or write locks; consult the
  function prototype for more details.
  Both these operations are implemented by the features/locks (earlier
  known as posix-locks) translator.
  --------------
1.9.3  * Basic design
  --------------

  All FOPs can be>  - inode-read
  Operations that read an inode's data (file contents) ormetadata (perms, etc.).
  access, getxattr, fstat, readlink, readv, stat.
  - inode-write
  Operations that modify an inode's data or metadata.
  chmod, chown, truncate, writev, utimens.
  - dir-read
  Operations that read a directory's contents or metadata.
  readdir, getdents, checksum.
  - dir-write
  Operations that modify a directory's contents or metadata.
  create, link, mkdir, mknod, rename, rmdir, symlink, unlink.
  Some of these make a subgroup in that they modify *two*different entries:
  link, rename, symlink.
  - Others
  Other operations.
  flush, lookup, open, opendir, statfs.
  ------------
1.9.4  * Algorithms
  ------------
  Each of the four major groups has its own algorithm:
  ----------------------
  - inode-read, dir-read
  ----------------------
  = Send a request to the first child that is up:
  - if it fails:
  try the next available child
  - if we have exhausted all children:
  return failure
  -------------
  - inode-write
  -------------
  All operations are done in parallel unless specified otherwise.
  (1) Send a GF_FILE_LK request on all children for a write lock on
  the appropriate region
  (formetadata operations: entire file (0, 0)
  for writev: (offset, offset+size of buffer))
  - If a lock request fails on a child:
  unlock all children
  try to acquire a blockinglock (F_SETLKW) on each child, serially.
  If this fails (due to ENOTCONN or EINVAL):
  Consider thischild as dead for rest of transaction.
  (2) Mark all children as "pending" on all (alive) children
  (see below for meaning of "pending").
  - If it fails on any child:
  mark it as dead (intransaction local state).
  (3) Perform operation on all (alive) children.
  - If it fails on any child:
  mark it as dead (intransaction local state).
  (4) Unmark all successful children as not "pending" on allnodes.
  (5) Unlock region on all (alive) children.
  -----------
  - dir-write
  -----------
  The algorithm for dir-write is same as above except instead ofholding
  GF_FILE_LK locks we hold a GF_DIR_LK lock on the name being operatedupon.
  In case of link-type calls, we hold locks on both the operand names.
  -----------
1.9.5  * "pending"
  -----------
  The "pending" number is like a journal entry. A pendingentry is an
  array of 32-bit integers stored in network byte-order as theextended
  attribute of an inode (which can be a directory as well).
  There are three keys corresponding to three types of pendingoperations:
  - AFR_METADATA_PENDING
  There are some metadata operations pending onthis inode (perms, ctime/mtime,
  xattr, etc.).
  - AFR_DATA_PENDING
  There is some data pending on this inode(writev).
  - AFR_ENTRY_PENDING
  There are some directory operations pending onthis directory
  (create, unlink, etc.).
  -----------
1.9.6  * Self heal
  -----------
  - On lookup, gather extended attribute data:
  - If entry is a regular file:
  - If an entry is present on one child and not onothers:
  - create entry on others.
  - If entries exist but have different metadata(perms, etc.):
  - consider the entry with the highestAFR_METADATA_PENDING number as
  definitive and replicateits attributes on children.
  - If entry is a directory:
  - Consider the entry with the higestAFR_ENTRY_PENDING number as
  definitive and replicate its contentson all children.
  - If any two entries have non-matching types (i.e., one isfile and
  other is directory):
  - Announce to the user via log that a split-brainsituation has been
  detected, and do nothing.
  - On open, gather extended attribute data:
  - Consider the file with the highest AFR_DATA_PENDING numberas
  the definitive one and replicate its contents onall other
  children.
  During all self heal operations, appropriate locks must be held onall
  regions/entries being affected.
  ---------------
1.9.7  * Inode scaling
  ---------------
  Inode scaling is necessary because if a situation arises where:
  - An inode number is returned for a directory (by lookup) which was
  previously the inode number of a file (as per FUSE'stable), then
  FUSE gets horribly confused (consult a FUSE expert formore details).
  To avoid such a situation, we distribute the 64-bit inode space equally
  among all children of replicate.
  To illustrate:
  If c1, c2, c3 are children of replicate, they each get 1/3 of theavailable
  inode space:
  Child:        c1  c2   c3   c1   c2   c3  c1   c2   c3   c1   c2 ...
  Inode number: 1    2    3   4    5    6    7   8    9    10   11 ...
  Thus, if lookup on c1 returns an inode number "2", it is scaledto "4"
  (which is the second inode number in c1's space).
  This way we ensure that there is never a collision of inode numbers from
  two different children.
  This reduction of inode space doesn't really reduce the usability of
  replicate since even if we assume replicate has 1024 children (which wouldbe a
  highly unusual scenario), each child still has a 54-bit inode space.
  2^54 ~ 1.8 * 10^16
  which is much larger than any real world requirement.
1.9.8   functioncomments
  ==============================================
  $ Last updated: Sun Oct 12 23:17:01 IST 2008 $
  $ Author: Vikas Gorur   $
  ==============================================
  creating a call stub and pausing a call
  ---------------------------------------
  libglusterfs provides seperate API to pause each of the fop. parameters toeach API is
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  NOTE: @fn should exactly take thesame type and number of parameters that
  the corresponding regular fop takes.
  rest will be the regular parameters to corresponding fop.
  NOTE: @frame can never be NULL. fop__stub() fails with errno
  set to EINVAL, if @frame is NULL. alsowherever @loc is applicable,
  @loc cannot be NULL.
  refer to individual stub creation API to know about call-stub creation'sbehaviour with
  specific parameters.
  here is the list of stub creation APIs for xlator fops.
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @loc       - pointer to location structure.
  NOTE: @loc will be copied to a different location, with inode_ref() to
  @loc->inode and @loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  @need_xattr - flag to specify if xattr should be returned or not.
  call_stub_t *
  fop_lookup_stub (call_frame_t *frame,
  fop_lookup_t fn,
  loc_t *loc,
  int32_t need_xattr);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  call_stub_t *
  fop_stat_stub (call_frame_t *frame,
  fop_stat_t fn,
  loc_t *loc);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @fd    - file descriptor parameter to lk fop.
  NOTE: @fd is stored witha fd_ref().
  call_stub_t *
  fop_fstat_stub (call_frame_t *frame,
  fop_fstat_t fn,
  fd_t *fd);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to @loc->inode and
  @loc->parent, if not NULL.also @loc->path will be copied to a different location.
  @mode  - mode parameter to chmod.
  call_stub_t *
  fop_chmod_stub (call_frame_t *frame,
  fop_chmod_t fn,
  loc_t *loc,
  mode_t mode);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @fd    - file descriptor parameter to lk fop.
  NOTE: @fd is stored witha fd_ref().
  @mode  - mode parameter for fchmod fop.
  call_stub_t *
  fop_fchmod_stub (call_frame_t *frame,
  fop_fchmod_t fn,
  fd_t *fd,
  mode_t mode);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to @loc->inode and
  @loc->parent, if not NULL.also @loc->path will be copied to a different location.
  @uid   - uid parameter to chown.
  @gid   - gid parameter to chown.
  call_stub_t *
  fop_chown_stub (call_frame_t *frame,
  fop_chown_t fn,
  loc_t *loc,
  uid_t uid,
  gid_t gid);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @fd    - file descriptor parameter to lk fop.
  NOTE: @fd is stored witha fd_ref().
  @uid   - uid parameter to fchown.
  @gid   - gid parameter to fchown.
  call_stub_t *
  fop_fchown_stub (call_frame_t *frame,
  fop_fchown_t fn,
  fd_t *fd,
  uid_t uid,
  gid_t gid);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location,if not NULL.
  @off   - offset parameter to truncate fop.
  call_stub_t *
  fop_truncate_stub (call_frame_t *frame,
  fop_truncate_t fn,
  loc_t *loc,
  off_t off);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @fd    - file descriptor parameter to lk fop.
  NOTE: @fd is stored witha fd_ref().
  @off   - offset parameter to ftruncate fop.
  call_stub_t *
  fop_ftruncate_stub (call_frame_t *frame,
  fop_ftruncate_t fn,
  fd_t *fd,
  off_t off);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  @tv    - tv parameter to utimens fop.
  call_stub_t *
  fop_utimens_stub (call_frame_t *frame,
  fop_utimens_t fn,
  loc_t *loc,
  struct timespec tv[2]);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  @mask  - mask parameter for access fop.
  call_stub_t *
  fop_access_stub (call_frame_t *frame,
  fop_access_t fn,
  loc_t *loc,
  int32_t mask);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.

  @size  ->  call_stub_t *
  fop_readlink_stub (call_frame_t *frame,
  fop_readlink_t fn,
  loc_t *loc,

  >  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  @mode  - mode parameter to mknod fop.
  @rdev  - rdev parameter to mknod fop.
  call_stub_t *
  fop_mknod_stub (call_frame_t *frame,
  fop_mknod_t fn,
  loc_t *loc,
  mode_t mode,
  dev_t rdev);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  @mode  - mode parameter to mkdir fop.
  call_stub_t *
  fop_mkdir_stub (call_frame_t *frame,
  fop_mkdir_t fn,
  loc_t *loc,
  mode_t mode);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  call_stub_t *
  fop_unlink_stub (call_frame_t *frame,
  fop_unlink_t fn,
  loc_t *loc);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  call_stub_t *
  fop_rmdir_stub (call_frame_t *frame,
  fop_rmdir_t fn,
  loc_t *loc);
  @frame    - call frame which has to be used to resume thecall at call_resume().
  @fn       - procedure to call duringcall_resume().
  @linkname - linkname parameter to symlink fop.
  @loc      - pointer to location structure.
  NOTE:@loc will be copied to a different location, with inode_ref() to
  @loc->inode and @loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  call_stub_t *
  fop_symlink_stub (call_frame_t *frame,
  fop_symlink_t fn,
  const char *linkname,
  loc_t *loc);
  @frame    - call frame which has to be used to resume thecall at call_resume().
  @fn       - procedure to call duringcall_resume().
  @oldloc   - pointer to location structure.
  NOTE:@oldloc will be copied to a different location, with inode_ref() to
  @oldloc->inode and @oldloc->parent, if not NULL. also @oldloc->pathwill
  be copied to a different location, if not NULL.
  @newloc   - pointer to location structure.
  NOTE:@newloc will be copied to a different location, with inode_ref() to
  @newloc->inode and @newloc->parent, if not NULL. also @newloc->pathwill
  be copied to a different location, if not NULL.
  call_stub_t *
  fop_rename_stub (call_frame_t *frame,
  fop_rename_t fn,
  loc_t *oldloc,
  loc_t *newloc);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc     - pointer to location structure.
  NOTE: @locwill be copied to a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  @newpath - newpath parameter to link fop.
  call_stub_t *
  fop_link_stub (call_frame_t *frame,
  fop_link_t fn,
  loc_t *oldloc,
  const char *newpath);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  @flags - flags parameter to create fop.
  @mode  - mode parameter to create fop.
  @fd    - file descriptor parameter to create fop.
  NOTE: @fd is stored witha fd_ref().
  call_stub_t *
  fop_create_stub (call_frame_t *frame,
  fop_create_t fn,
  loc_t *loc,
  int32_t flags,
  mode_t mode, fd_t *fd);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @flags - flags parameter to open fop.
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  call_stub_t *
  fop_open_stub (call_frame_t *frame,
  fop_open_t fn,
  loc_t *loc,
  int32_t flags,
  fd_t *fd);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @fd    - file descriptor parameter to lk fop.
  NOTE: @fd is stored witha fd_ref().

  @size  ->  @off   - offset parameter to readv fop.
  call_stub_t *
  fop_readv_stub (call_frame_t *frame,
  fop_readv_t fn,
  fd_t *fd,

  size_t>  off_t off);
  @frame  - call frame which has to be used to resume the call atcall_resume().
  @fn     - procedure to call during call_resume().
  @fd     - file descriptor parameter to lk fop.
  NOTE: @fd is storedwith a fd_ref().
  @vector - vector parameter to writev fop.
  NOTE: @vector is iov_dup()ed while creating stub. andframe->root->req_refs
  dictionary is dict_ref()ed.
  @count  - count parameter to writev fop.
  @off    - off parameter to writev fop.
  call_stub_t *
  fop_writev_stub (call_frame_t *frame,
  fop_writev_t fn,
  fd_t *fd,
  struct iovec *vector,
  int32_t count,
  off_t off);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @fd    - file descriptor parameter to flush fop.
  NOTE: @fd is stored witha fd_ref().
  call_stub_t *
  fop_flush_stub (call_frame_t *frame,
  fop_flush_t fn,
  fd_t *fd);
  @frame    - call frame which has to be used to resume thecall at call_resume().
  @fn       - procedure to call duringcall_resume().
  @fd       - file descriptor parameter to lkfop.
  NOTE:@fd is stored with a fd_ref().
  @datasync - datasync parameter to fsync fop.
  call_stub_t *
  fop_fsync_stub (call_frame_t *frame,
  fop_fsync_t fn,
  fd_t *fd,
  int32_t datasync);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to @loc->inode and
  @loc->parent, if not NULL.also @loc->path will be copied to a different location.
  @fd    - file descriptor parameter to opendir fop.
  NOTE: @fd is stored witha fd_ref().
  call_stub_t *
  fop_opendir_stub (call_frame_t *frame,
  fop_opendir_t fn,
  loc_t *loc,
  fd_t *fd);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @fd    - file descriptor parameter to getdents fop.
  NOTE: @fd is stored witha fd_ref().

  @size  ->  @off   - off parameter to getdents fop.
  @flags - flags parameter to getdents fop.
  call_stub_t *
  fop_getdents_stub (call_frame_t *frame,
  fop_getdents_t fn,
  fd_t *fd,

  >  off_t off,
  int32_t flag);
  @frame   - call frame which has to be used to resume the call atcall_resume().
  @fn      - procedure to call duringcall_resume().
  @fd      - file descriptor parameter to setdentsfop.
  NOTE: @fd isstored with a fd_ref().
  @flags   - flags parameter to setdents fop.
  @entries - entries parameter to setdents fop.
  call_stub_t *
  fop_setdents_stub (call_frame_t *frame,
  fop_setdents_t fn,
  fd_t *fd,
  int32_t flags,
  dir_entry_t *entries,
  int32_t count);
  @frame    - call frame which has to be used to resume thecall at call_resume().
  @fn       - procedure to call duringcall_resume().
  @fd       - file descriptor parameter tosetdents fop.
  NOTE:@fd is stored with a fd_ref().
  @datasync - datasync parameter to fsyncdir fop.
  call_stub_t *
  fop_fsyncdir_stub (call_frame_t *frame,
  fop_fsyncdir_t fn,
  fd_t *fd,
  int32_t datasync);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  call_stub_t *
  fop_statfs_stub (call_frame_t *frame,
  fop_statfs_t fn,
  loc_t *loc);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  @dict  - dict parameter to setxattr fop.
  NOTE: stub creationprocedure stores @dict pointer with dict_ref() to it.
  call_stub_t *
  fop_setxattr_stub (call_frame_t *frame,
  fop_setxattr_t fn,
  loc_t *loc,
  dict_t *dict,
  int32_t flags);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  @name  - name parameter to getxattr fop.
  call_stub_t *
  fop_getxattr_stub (call_frame_t *frame,
  fop_getxattr_t fn,
  loc_t *loc,
  const char *name);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  @name  - name parameter to removexattr fop.
  NOTE: name string will becopied to a different location while creating stub.
  call_stub_t *
  fop_removexattr_stub (call_frame_t *frame,
  fop_removexattr_t fn,
  loc_t *loc,
  const char *name);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @fd    - file descriptor parameter to lk fop.
  NOTE: @fd is stored witha fd_ref().
  @cmd   - command parameter to lk fop.
  @lock  - lock parameter to lk fop.
  NOTE: lock will be copiedto a different location while creating stub.
  call_stub_t *
  fop_lk_stub (call_frame_t *frame,
  fop_lk_t fn,
  fd_t *fd,
  int32_t cmd,
  struct flock *lock);
  @frame    - call frame which has to be used to resume thecall at call_resume().
  @fn       - procedure to call duringcall_resume().
  @fd       - fd parameter to gf_lk fop.
  NOTE: @fd is fd_ref()ed while creating stub, ifnot NULL.
  @cmd      - cmd parameter to gf_lk fop.
  @lock     - lock paramater to gf_lk fop.
  NOTE: @lock is copied to a different memorylocation while creating
  stub.
  call_stub_t *
  fop_gf_lk_stub (call_frame_t *frame,
  fop_gf_lk_t fn,
  fd_t *fd,
  int32_t cmd,
  struct flock *lock);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @fd    - file descriptor parameter to readdir fop.
  NOTE: @fd is stored witha fd_ref().

  @size  ->  @off   - offset parameter to readdir fop.
  call_stub_t *
  fop_readdir_stub (call_frame_t *frame,
  fop_readdir_t fn,
  fd_t *fd,

  >  off_t off);
  @frame - call frame which has to be used to resume the call atcall_resume().
  @fn    - procedure to call during call_resume().
  @loc   - pointer to location structure.
  NOTE: @loc will be copiedto a different location, with inode_ref() to
  @loc->inode and@loc->parent, if not NULL. also @loc->path will be
  copied to a different location.
  @flags - flags parameter to checksum fop.
  call_stub_t *
  fop_checksum_stub (call_frame_t *frame,
  fop_checksum_t fn,
  loc_t *loc,
  int32_t flags);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @inode     - inode parameter to @fn.
  NOTE: @inode pointer is stored with ainode_ref().
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  @dict      - dict parameter to @fn.
  NOTE: @dict pointer is stored withdict_ref().
  call_stub_t *
  fop_lookup_cbk_stub (call_frame_t *frame,
  fop_lookup_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  inode_t *inode,
  struct stat *buf,
  dict_t *dict);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_stat_cbk_stub (call_frame_t *frame,
  fop_stat_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_fstat_cbk_stub (call_frame_t *frame,
  fop_fstat_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_chmod_cbk_stub (call_frame_t *frame,
  fop_chmod_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_fchmod_cbk_stub (call_frame_t *frame,
  fop_fchmod_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_chown_cbk_stub (call_frame_t *frame,
  fop_chown_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_fchown_cbk_stub (call_frame_t *frame,
  fop_fchown_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_truncate_cbk_stub (call_frame_t *frame,
  fop_truncate_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_ftruncate_cbk_stub (call_frame_t *frame,
  fop_ftruncate_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_utimens_cbk_stub (call_frame_t *frame,
  fop_utimens_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  call_stub_t *
  fop_access_cbk_stub (call_frame_t *frame,
  fop_access_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @path      - path parameter to @fn.
  NOTE: @path is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_readlink_cbk_stub (call_frame_t *frame,
  fop_readlink_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  const char *path);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @inode     - inode parameter to @fn.
  NOTE: @inode pointer is stored with ainode_ref().
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_mknod_cbk_stub (call_frame_t *frame,
  fop_mknod_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  inode_t *inode,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @inode     - inode parameter to @fn.
  NOTE: @inode pointer is stored with ainode_ref().
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_mkdir_cbk_stub (call_frame_t *frame,
  fop_mkdir_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  inode_t *inode,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  call_stub_t *
  fop_unlink_cbk_stub (call_frame_t *frame,
  fop_unlink_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  call_stub_t *
  fop_rmdir_cbk_stub (call_frame_t *frame,
  fop_rmdir_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @inode     - inode parameter to @fn.
  NOTE: @inode pointer is stored with ainode_ref().
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_symlink_cbk_stub (call_frame_t *frame,
  fop_symlink_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  inode_t *inode,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_rename_cbk_stub (call_frame_t *frame,
  fop_rename_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @inode     - inode parameter to @fn.
  NOTE: @inode pointer is stored with ainode_ref().
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_link_cbk_stub (call_frame_t *frame,
  fop_link_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  inode_t *inode,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @fd        - fd parameter to @fn.
  NOTE: @fd pointer is stored with afd_ref().
  @inode     - inode parameter to @fn.
  NOTE: @inode pointer is stored with ainode_ref().
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_create_cbk_stub (call_frame_t *frame,
  fop_create_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  fd_t *fd,
  inode_t *inode,
  struct stat *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @fd        - fd parameter to @fn.
  NOTE: @fd pointer is stored with afd_ref().
  call_stub_t *
  fop_open_cbk_stub (call_frame_t *frame,
  fop_open_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  fd_t *fd);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @vector    - vector parameter to @fn.
  NOTE: @vector is copied to a differentmemory location, if not NULL. also
  frame->root->rsp_refs is dict_ref()ed.
  @stbuf     - stbuf parameter to @fn.
  NOTE: @stbuf is copied to a differentmemory location, if not NULL.
  call_stub_t *
  fop_readv_cbk_stub (call_frame_t *frame,
  fop_readv_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct iovec *vector,
  int32_t count,
  struct stat *stbuf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @stbuf     - stbuf parameter to @fn.
  NOTE: @stbuf is copied to a differentmemory location, if not NULL.
  call_stub_t *
  fop_writev_cbk_stub (call_frame_t *frame,
  fop_writev_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct stat *stbuf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  call_stub_t *
  fop_flush_cbk_stub (call_frame_t *frame,
  fop_flush_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  call_stub_t *
  fop_fsync_cbk_stub (call_frame_t *frame,
  fop_fsync_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @fd        - fd parameter to @fn.
  NOTE: @fd pointer is stored with afd_ref().
  call_stub_t *
  fop_opendir_cbk_stub (call_frame_t *frame,
  fop_opendir_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  fd_t *fd);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @entries   - entries parameter to @fn.
  @count     - count parameter to @fn.
  call_stub_t *
  fop_getdents_cbk_stub (call_frame_t *frame,
  fop_getdents_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  dir_entry_t *entries,
  int32_t count);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  call_stub_t *
  fop_setdents_cbk_stub (call_frame_t *frame,
  fop_setdents_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  call_stub_t *
  fop_fsyncdir_cbk_stub (call_frame_t *frame,
  fop_fsyncdir_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @buf       - buf parameter to @fn.
  NOTE: @buf is copied to a different memorylocation, if not NULL.
  call_stub_t *
  fop_statfs_cbk_stub (call_frame_t *frame,
  fop_statfs_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct statvfs *buf);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  call_stub_t *
  fop_setxattr_cbk_stub (call_frame_t *frame,
  fop_setxattr_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  @value     - value dictionary parameter to @fn.
  NOTE: @value pointer is stored with adict_ref().
  call_stub_t *
  fop_getxattr_cbk_stub (call_frame_t *frame,
  fop_getxattr_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  dict_t *value);
  @frame     - call frame which has to be used to resumethe call at call_resume().
  @fn        - procedure to call duringcall_resume().
  @op_ret    - op_ret parameter to @fn.
  @op_errno  - op_errno parameter to @fn.
  call_stub_t *
  fop_removexattr_cbk_stub (call_frame_t *frame,
  fop_removexattr_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno);
  @frame    - call frame which has to be used to resume thecall at call_resume().
  @fn       - procedure to call duringcall_resume().
  @op_ret   - op_ret parameter to @fn.
  @op_errno - op_errno parameter to @fn.
  @lock     - lock parameter to @fn.
  NOTE: @lock is copied to a different memorylocation while creating
  stub.
  call_stub_t *
  fop_lk_cbk_stub (call_frame_t *frame,
  fop_lk_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct flock *lock);
  @frame    - call frame which has to be used to resume thecall at call_resume().
  @fn       - procedure to call duringcall_resume().
  @op_ret   - op_ret parameter to @fn.
  @op_errno - op_errno parameter to @fn.
  @lock     - lock parameter to @fn.
  NOTE: @lock is copied to a different memorylocation while creating
  stub.
  call_stub_t *
  fop_gf_lk_cbk_stub (call_frame_t *frame,
  fop_gf_lk_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  struct flock *lock);
  @frame    - call frame which has to be used to resume thecall at call_resume().
  @fn       - procedure to call duringcall_resume().
  @op_ret   - op_ret parameter to @fn.
  @op_errno - op_errno parameter to @fn.
  @entries  - entries parameter to @fn.
  call_stub_t *
  fop_readdir_cbk_stub (call_frame_t *frame,
  fop_readdir_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  gf_dirent_t *entries);
  @frame         - call frame whichhas to be used to resume the call at call_resume().
  @fn            -procedure to call during call_resume().
  @op_ret        - op_ret parameter to@fn.
  @op_errno      - op_errno parameter to @fn.
  @file_checksum - file_checksum parameter to @fn.
  NOTE: file_checksum will be copied to a different memory location
  while creating stub.
  @dir_checksum  - dir_checksum parameter to @fn.
  NOTE: file_checksum will be copied to a different memory location
  while creating stub.
  call_stub_t *
  fop_checksum_cbk_stub (call_frame_t *frame,
  fop_checksum_cbk_t fn,
  int32_t op_ret,
  int32_t op_errno,
  uint8_t *file_checksum,
  uint8_t *dir_checksum);
  resuming a call:
  ---------------
  call can be resumed using call stub through call_resume API.
  void call_resume (call_stub_t *stub);
  stub - call stub created during pausing a call.
  NOTE: call_resume() will decrease reference count of any fd_t,dict_t and inode_t that it finds
  in  stub->args... so,if any fd_t, dict_t or
  inode_t pointers are assigned at stub->args.. after
  fop__stub() call, they must be _ref()ed.
  call_resume does not STACK_DESTROY() for any fop.
  if stub->fn is NULL, call_resume does STACK_WIND() orSTACK_UNWIND() using the stub->frame.
  return - call resume fails only if stub is NULL. call resume failswith errno set to EINVAL.
  -------------------------posix---------------------------------------
  ---------------
1.9.9  * storage/posix
  ---------------
  - SET_FS_ID
  This is so that all filesystem checks are done with the user's
  uid/gid and not GlusterFS's uid/gid.
  - MAKE_REAL_PATH
  This macro concatenates the base directory of the posix volume
  ('option directory') with the given path.
  - need_xattr in lookup
  If this flag is passed, lookup returns a xattr dictionary thatcontains
  the file's create time, the file's contents, and the version number
  of the file.
  This is a hack to increase small file performance. If anapplication
  wants to read a small file, it can finish its job with just alookup
  call instead of a lookup followed by read.
  - getdents/setdents
  These are used by unify to set and get directory entries.
  - ALIGN_BUF
  Macro to align an address to a page boundary (4K).
  - priv->export_statfs
  In some cases, two exported volumes may reside on the same
  partition on the server. Sending statvfs info for both
  the volumes will lead to erroneous df output at the client,
  since free space on the partition will be counted twice.
  In such cases, user can disable exporting statvfs info
  on one of the volumes by setting this option.
  - xattrop
  This fop is used by replicate to set version numbers on files.
  - getxattr/setxattr hack to read/write files
  A key, GLUSTERFS_FILE_CONTENT_STRING, is handled in a special wayby
  getxattr/setxattr. A getxattr with the key will return the entire
  content of the file as the value. A setxattr with the key willwrite
  the value as the entire content of the file.
  - posix_checksum
  This calculates a simple XOR checksum on all entry names in a
  directory that is used by unify to compare directory contents.
  -------------write-behind----------------------------------------
  basic working
  --------------
  write behind is basically a translator to lie to the applicationthat the write-requests are finished, even before it is actually finished.
  on a regular translator tree without write-behind, control flow islike this:
  1. application makes a write() system call.
  2. VFS ==> FUSE ==> /dev/fuse.
  3. fuse-bridge initiates a glusterfs writev() call.
  4. writev() is STACK_WIND()ed upto client-protocol or storagetranslator.
  5. client-protocol, on recieving reply from server, startsSTACK_UNWIND() towards the fuse-bridge.
  on a translator tree with write-behind, control flow is like this:
  1. application makes a write() system call.
  2. VFS ==> FUSE ==> /dev/fuse.
  3. fuse-bridge initiates a glusterfs writev() call.
  4. writev() is STACK_WIND()ed upto write-behind translator.
  5. write-behind adds the write buffer to its internal queue anddoes a STACK_UNWIND() towards the fuse-bridge.
  write call is completed in application's percepective. afterSTACK_UNWIND()ing towards the fuse-bridge, write-behind initiates a fresh writev()call to its child translator, whose replies will be consumed by write-behinditself. write-behind _doesn't_ cache the write buffer, unless 'optionflush-behind on' is specified in volume specification file.
  windowing
  ---------
  write respect to write-behind, each write-buffer has three flags:'stack_wound', 'write_behind' and 'got_reply'.
  stack_wound: if set, indicates that write-behind has initiatedSTACK_WIND() towards child translator.
  write_behind: if set, indicates that write-behind has doneSTACK_UNWIND() towards fuse-bridge.
  got_reply: if set, indicates that write-behind has recieved replyfrom child translator for a writev() STACK_WIND(). a request will be destroyedby write-behind only if this flag is set.

  currently pending write requests = aggregate>
  window>  blocking is only from application's perspective. write-behind doesSTACK_WIND() to child translator straight-away, but hold behind theSTACK_UNWIND() towards fuse-bridge. STACK_UNWIND() is done only oncewrite-behind gets enough replies to accomodate for currently blocked request.
  flush behind
  ------------
  if 'option flush-behind on' is specified in volume specificationfile, then write-behind sends aggregate write requests to child translator,instead of regular per request STACK_WIND()s.
1.9.10              BDB
  ----------------BDB-------------------------------------------
  * How does file translates to key/value pair?
  ---------------------------------------------

  in bdb a file is>  the file) and file contents are stored as value corresponding to the keyin database
  file (defaults to glusterfs_storage.db under dirname() directory).
  * symlinks, directories
  -----------------------
  symlinks and directories are stored as is.
  * db (database) files
  ---------------------
  every directory, including root directory, contains a database filecalled
  glusterfs_storage.db. all the regular files contained in the directory arestored
  as key/value pair inside the glusterfs_storage.db.
  * internal data cache
  ---------------------

  db does not provide a way to find out the>  so, bdb makes DB->get() call for key and takes the length of the valuereturned.
  since DB->get() also returns file contents for key, bdb maintains aninternal cache and
  stores the file contents in the cache.
  every directory maintains a seperate cache.
  * inode number transformation
  -----------------------------
  bdb allocates a inode number to each file and directory on its own.bdb maintains a
  global counter and increments it after allocating inode number for eachfile
  (regular, symlink or directory). NOTE: bdb does not guarantee persistentinode numbers.
  * checkpoint thread
  -------------------
  bdb creates a checkpoint thread at the time of init(). checkpointthread does a
  periodic checkpoint on the DB_ENV. checkpoint is the mechanism, providedby db, to
  forcefully commit the logged transactions to the storage.
  NOTES ABOUT FOPS:
  -----------------
  lookup() -
  1> do lstat() on the path, if lstat fails, we assume that thefile being looked up
  is either a regular file or doesn't exist.
  2> lookup in the DB of parent directory for key corresponding topath. if key exists,
  return key, with.
  NOTE: 'struct stat' stat()ed from DB file is used as acontainer for 'struct stat'
  of theregular file. st_ino, st_size, st_blocks are updated with file's values.
  readv() -
  1> do a lookup in bctx cache. if successful, return the requesteddata from cache.
  2> if cache missed, do a DB->get() the entire file content andinsert to cache.
  writev():
  1> flush any cached content of this file.
  2> do a DB->put(), with DB_DBT_PARTIAL flag.
  NOTE: DB_DBT_PARTIAL is used to do partial update of avalue in DB.
  readdir():
  1> regular readdir() in a loop, and vomit all DB_ENV log filesand DB files that
  we encounter.
  2> if the readdir() buffer still has space, open a DB cursor anddo a sequential
  DBC->get() to fill the reaadir buffer.
  由evan于 26th March 2012发布


运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-670367-1-1.html 上篇帖子: GlusterFS 源码安装 下篇帖子: 深入理解GlusterFS之POSIX接口
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表