starxzj 发表于 2018-8-11 12:16:09

Python Coding Guidelines

Python Coding Guidelines 12/14/07      Written by Rob Knight for the Cogent project Table of ContentsWhy have coding guidelines?    What should I call my variables?What are the naming conventions?How do I organize my modules (source files)?How should I write comments? How should I format my code? How should I write my unit tests? Are there any handy Python>As project>This project will start with isolated tasks, but we will integrate the piecesinto shared libraries as they mature. Unit testing and a consistent style arecritical to having trusted code to integrate. Also, guesses about names andinterfaces will be correct more often.  Good code is useful to have around.Code written to these standards should be useful for teaching purposes, andalso to show potential employers during interviews. Most people are>What should I call my variables?  Choose the name that people will most likely guess.Make it descriptive, but not too long: curr_record isbetter than c, or curr, orcurrent_genbank_record_from_database. Part of the reasonfor having coding guidelines is so that everyone is more likely to guessthe same way. Who knows: in few months, the person doing the guessingmight be you.
  Good names are hard to find.Don't be afraid to change names except when they are part of interfaces thatother people are also using. It may take some time working with the code tocome up with reasonable names for everything: if you have unit tests, it'seasy to change them, especially with global search and replace.
  Use singular names for individual things, plural names for collections.For example, you'd expect self.Name to hold something like a singlestring, but self.Names to hold something that you could loop throughlike a list or dict. Sometimes the decision can be tricky: isself.Index an int holding a positon, or a dict holding records keyedby name for easy lookup? If you find yourself wondering these things, thename should probably be changed to avoid the problem: tryself.Position or self.LookUp.
  Don't make the type part of the name.You might want to change the implementation later. Use Recordsrather than RecordDict or RecordList, etc. Don't useHungarian Notation either (i.e. where you prefix the name with the type).
  Make the name as precise as possible.If the variable is the name of the input file, call itinfile_name, not input or file (which you shouldn'tuse anyway, since they're keywords), and not infile (because thatlooks like it should be a file object, not just its name).
  Use result to store the value that will be returnedfrom a method or function. Use data for input in caseswhere the function or method acts on arbitrary data (e.g.sequence data, or a list of numbers, etc.) unless a more descriptive nameis appropriate.
  One-letter variable names should only occur in math functionsor as loop iterators with limited scope. Limited scope coversthings likefor k in keys: print k, where ksurvives only a line or two. Loopiterators should refer to the variable that they're looping through:for k in keys, i in items, orfor key in keys, item in items. If the loop is long or thereare several 1-letter variables active in the same scope, rename them.
  Limit your use of abbreviations.A few well-known abbreviations are OK, but you don't want to come back to yourcode in 6 months and have to figure out what sptxck2 is. It'sworth it to spend the extra time typing species_taxon_check_2, butthat's still a horrible name: what's check number 1? Far better to go withsomething like taxon_is_species_rank that needs no explanation,especially if the variable is only used once or twice.
  Acceptable abbreviations.The following abbreviations can be considered well-known and used withimpunity:
full   abbreviatedalignment   alnarchaeal   archauxillary   auxbacterial   bactcitation   citecurrent   currdatabase   dbdictionary   dictdirectory   dirend of file   eofeukaryotic   eukfrequency   freqexpected   expindex>Always use from module import Name, Name2, Name3... syntaxinstead of import module or from module import *. This ismore efficient, reduces typing in the rest of the code, and makes itmuch easier to see name collisions and to replace implementations.
What are the naming conventions?Summary of Naming ConventionsTypeConventionExamplefunctionaction_with_underscoresfind_allvariablenoun_with_underscorescurr_indexconstantNOUN_ALL_CAPSALLOWED_RNA_PAIRSclassMixedCaseNounRnaSequencepublic propertyMixedCaseNounIsPairedprivate property_noun_with_leading_underscore_is_updatedpublic methodmixedCaseExceptFirstWordVerbstripDegenerateprivate method_verb_with_leading_underscore_check_if_pairedreally private data__two_leading_underscores__delegator_object_refparameters that match propertiesSameAsPropertydef __init__(data, Alphabet=None)factory functionMixedCaseInverseDictmodulelowercase_with_underscoresunit_testglobal variablesgMixedCaseWithLeadingGno examples in evo - should be rare!  It is important to follow the naming conventions because they make itmuch easier to guess what a name refers to.In particular, it should be easyto guess what scope a name is defined in, what it refers to, whether it'sOK to change its value, and whether its referent is callable. The followingrules provide these distinctions.
  lowercase_with_underscores for modules andinternal variables (including function/method parameters).Exception: in __init__, any parameters that will be used tointitialize properties of the object should have exactly the same spelling,including case, as the property. This lets you use a dict with the rightfield names as **kwargs to initialize the data easily.
  MixedCase for classes and public properties,and for factory functions that act like additional constructors fora>  mixedCaseExceptFirstWord for public methods andfunctions.
  _lowercase_with_leading_underscore for privatefunctions, methods, and properties.
  __lowercase_with_two_leading_underscores forprivate properties and functions that must not beoverwritten by a subclass.
  CAPS_WITH_UNDERSCORES for named constants.
  gMixedCase (i.e. mixed case prefixed with 'g') forglobals. Globals should be used extremely rarely and with caution,even if you sneak them in using the Singleton pattern or some similarsystem.
  Underscores can be left out if the words read OK run together.infile and outfile rather than in_file andout_file; infile_name and outfile_namerather than in_file_name and out_file_name orinfilename and outfilename (getting too long toread effortlessly).
How do I organize my modules (source files)?  The first line of each file shoud be#!/usr/bin/env python. This makes it possible to run thefile as a script invoking the interpreter implicitly, e.g. in a CGIcontext.
  Next should be the docstring with a description. If the description is long, the first lineshould be a short summary that makes sense on its own, separatedfrom the rest by a newline.
  All code, including import statements, should follow the docstring.Otherwise, the docstring will not be recognized by the interpreter, and youwill not have access to it in interactive sessions (i.e. throughobj.__doc__) or when generating documentation with automated tools.
  Import built-in modules first, followed by third-party modules,followed by any changes to the path and your own modules. Especially,additions to the path and names of your modules are likely to change rapidly:keeping them in one place makes them easier to find.
  Next should be authorship information. This information should follow this format:
__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell"__copyright__ = "Copyright 2007, The Cogent Project"__credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley",                  "Matthew Wakefield"]__license__ = "GPL"__version__ = "1.0.1"__maintainer__ = "Rob Knight"__email__ = "rob@spot.colorado.edu"__status__ = "Production"Status should typically be one of "Prototype", "Development", or "Production".__maintainer__ should be the person who will fix bugs and make improvementsif imported. __credits__ differs from __author__ in that __credits__ includespeople who reported bug fixes, made suggestions, etc. but did not actuallywrite the code.Example of module structure:#!/usr/bin/env python"""Provides NumberList and FrequencyDistribution,> Always update the comments when the code changes.Incorrect comments are far worse than no comments, since they are activelymisleading.

  Comments should say more than the code itself. Examine yourcomments carefully: they may indicate that you'd be better off rewritingyour code (especially, renaming your variables and getting rid ofthe comment.) In particular, don't scatter magic numbers and other constantsthat have to be explained through your code. It's far better to usevariables whose names are self-documenting, especially if you use the sameconstant more than once. Also, think about making constants into>Wrong: win_size -= 20      # decrement win_size by 20OK:    win_size -= 20      # leave space for the scroll barRight: self._scroll_bar_size = 20            win_size -= self._scroll_bar_size  Use comments starting with #, not strings, inside blocks of code.Python ignores real comments, but must allocate storage for strings (whichcan be a performance disaster inside an inner loop).

  Start each method,>The docstring should start with a 1-line description that makes sense byitself (many automated formatting tools, and the>  For example:
def __init__(self, data, Name='', Alphabet=None):    """Returns new Sequence object with specified data, Name, Alphabet.    data: The sequence data. Should be a sequence of characters.    Name: Arbitrary label for the sequence. Should be string-like.    Alphabet: Set of allowed characters. Should support 'for x in y'    syntax. None by default.    Note: if Alphabet is None, performs no validation.    """  Always update the docstring when the code changes.Like outdated comments, outdated docstrings can waste a lot of time.
How should I format my code?  Use 4 spaces for indentation. Do not use tabs (set your editor toconvert tabs to spaces). The behavior of tabs is not predictable acrossplatforms, and will cause syntax errors. Several people have been bittenby this already.
  Lines must not be longer than 79 characters.Long linesare inconvenient in some editors, can be confusing when broken up forprinting, and make code snippets difficult to email (especially if youremail client or the recipients 'helpfully' wraps the lines automatically).Use \ for line continuation. Note that there cannot be whitespace after the \.

  Blank lines should be used to highlight>Separate>  Be consistent with the use of whitespace around operators. Inconsistentwhitespace makes it harder to see at a glance what is grouped together.
Good:((a+b)*(c+d))OK:    ((a + b) * (c + d))Bad:   ( (a+ b)*(c +d))  Don't put whitespace after delimiters or inside slicing delimiters.Whitespace here makes it harder to see what's associated.
Good: (a+b), dBad:( a+b ), d , d[ k]How should I write my unit tests?  Every line of code must be tested.For scientific work, bugs don't just mean unhappy users who you'll neveractually meet: they mean retracted publications and ended careers. It iscritical that your code be fully tested before you draw conclusionsfrom results it produces.

  Tests are the opportunity to invent the interfaces you wish you had.Write the test for a method before you write the method: often, this helpsyou figure out what you would want to call it and what parameters it shouldtake. Think of the tests as a story about what you wish the interface lookedlike. It's OK to write the tests a few methods at a time, and to changethem as your>  Never treat prototypes as production code.It's fine to write prototype code without tests to try things out, butwhen you've figured out the algorithm and interfaces you must rewrite itwith tests to consider it finished. Often, this helps you decide whatinterfaces and functionality you actually need and what you can get rid of.
  Write a little at a time.For production code, write a couple of tests, then a couple of methods,then a couple more tests, then a couple more methods, then maybe change someof the names or generalize some of the functionality. If you have a hugeamount of code where 'all you have to do is write the tests', you're probablycloser to 30% done than 90%. Testing vastly reduces the time spent debugging,since whatever went wrong has to be in the code you wrote since the last testsuite.
  Always run the test suite when you change anything.Even if a change seems trivial, it will only take a couple of secondsto run the tests and then you'll be sure. This can eliminate long andfrustrating debugging sessions where the change turned out to have beenmade long ago, but didn't seem significant at the time.
  Use the unittest framework with tests in a separate file for each module. Name the test file test_module_name.py. Keeping the testsseparate from the code reduces the temptation to change the tests when thecode doesn't work, and makes it easy to verify that a completely newimplementation presents the same interface (behaves the same) as the old.
  Use evo.unit_test if youare doing anything with floating point numbers or permutations (useassertFloatEqual). Do not try to compare floating pointnumbers using assertEqual if you value your sanity.assertFloatEqualAbs and assertFloatEqualRel canspecifically test for absolute and>  Test the interface of each> This shouldcontain tests for everything in the public interface.
  If the> These might subclassClassNameTests in order to share setUp methods, etc.
  Tests of private methods should be in a separate TestCasecalled> Private methods may change if youchange the implementation.It is not required that test casesfor private methods pass when you change things (that's why they're private,after all), though it is often useful to have these tests for debugging.
  Test all the methods in your>You should assume that any method you haven't tested has bugs. The conventionfor naming tests is test_method_name. Any leading and trailing underscores on the methodname can be ignored for the purposes of the test; however, all tests muststart with the literal substring test for unittest to findthem. If the method is particularly complex, or has several discretelydifferent cases you need to check, use test_method_name_suffix, e.g.test_init_empty, test_init_single,test_init_wrong_type, etc. for testing __init__.
  Write good docstrings for all your test methods.When you run the test with the -v command-line switch for verboseoutput, the docstring for each test will be printed along with ...OKor ...FAILED on a single line. It is thus important that yourdocstring is short and descriptive, and makes sense in this context.
Good docstrings:NumberList.var should raise ValueError on empty or 1-item listNumberList.var should match values from R if list has >2 itemsNumberList.__init__ should raise error on values that fail float()FrequencyDistribution.var should match corresponding NumberList varBad docstrings:var should calculate variance         #lacks> Module-level functions should be tested in their own TestCase,called modulenameTests. Even if these functions are simple, it'simportant to check that they work as advertised.
  It is much more important to test several small cases that you cancheck by hand than a single large case that requires a calculator. Don'ttrust spreadsheets for numerical calculations -- use R instead!
  Make sure you test all the edge cases: what happens when the inputis None, or '', or 0, or negative? What happens at values that cause aconditional to go one way or the other? Does incorrect input raise theright exceptions? Can your code accept subclasses or superclasses of thetypes it expects? What happens with very large input?
  To test permutations, check that the original and shuffled versionare different, but that the sorted original and sorted shuffled version arethe same. Make sure that you get different permutations onrepeated runs and when starting from different points.
  To test random choices, figure out how many of each choice you expectin a large sample (say, 1000 or a million) using the binomial distributionor its normal approximation. Run the test several times and check that you'rewithin, say, 3 standard deviations of the mean.
  Example of a unittest test module structure:
#!/usr/bin/env python"""Tests NumberList and FrequencyDistribution,>
页: [1]
查看完整版本: Python Coding Guidelines