Monday, March 14, 2011

Splitting a Path to a list in Python

At work today I needed to split a file path into it's various directory and subdirectory components using Python. While the result of os.path.split can be inversed with os.path.join, they don't do what you might intuitively think they do. A cursory search on Google, while turning up various and sometimes overbuilt solutions, didn't give me a quick and dirty function to accomplish this task, so I wrote one myself.

In you look up the function definition, os.path.split works like this:
 >>> os.path.split(filepath) == (os.path.dirname(filepath), os.path.basename(filepath))
 True  
The (not really) inverse function, os.path.join, could be more appropriately named os.path.concat, since it actually concatenates its path arguments into a combined path.

A coworker came up with this:
 parts = filepath.split(os.path.sep)  
However, calling os.path.join using the result will not return the original path if the path is absolute. You can get the original path back using:
 os.path.sep.join(parts)  
But generally you want to manipulate the path in some way. A valid path like "/path/to//file" will return [ "", "path", "to", "", "file" ], which isn't too helpful when you're trying to deal with the different parts. Calling os.path.normpath first can help (generally a good idea when dealing with paths of unknown format), but you still end up with "" (an empty string) for the root directory.

Here's my solution:
 import os.path
  
 def splitpath(path, maxdepth=20):
     ( head, tail ) = os.path.split(path)
     return splitpath(head, maxdepth - 1) + [ tail ] \
         if maxdepth and head and head != path \
         else [ head or tail ]
(code line split for easier reading)

Now "/path/to//file" will return [ "/", "path", "to", "file" ]. You can also get the original path using os.path.join:
 >>> os.path.normpath(filepath) == os.path.join(*splitpath(os.path.normpath(filepath)))
 True  

So there it is, mostly for my reference. Some other pedant might find it useful, too.

2 comments:

  1. "Some other pedant might find it useful, too. "

    Like me - this is perfect THANK YOU!

    ReplyDelete
  2. crappy but at least non-recursive in case you have huge paths which admittedly is unlikely:

    def path_to_list( path ):
    l = []
    f = os.path.basename( path )
    while f and f != "":
    l.append( f )
    path = os.path.dirname( path )
    f = os.path.basename( path )
    l.reverse()
    return l

    ReplyDelete