PHP用闭包思想写的torrent文件解析工具

xian123 · 发表于 2015-8-25 09:37:51

　　
　　本文原载于我的wp博客 http://mi-wiki.info/wp/2011/12/23/php-torrent-file-parser-again/
　　转载请注明。
　　

PHP对静态词法域的支持有点奇怪，内部匿名函数必须在参数列表后面加上use关键字，显式的说明想要使用哪些外层函数的局部变量。

1 function count_down($count)
2 {
3    return $func = function()
4       use($count,$func)
5    {
6       if(--$count > 0)
7          $func();
8       echo "wow\n";
9    };
10 }
11 $foo = count_down(3);
12 $foo();

我本来是想这样的。但是不行，会在第7行调用$func的时候报错。

错误是Fatal error: Function name must be a string in - on line 7

反复试验后发觉，外部的匿名函数应该通过引用传值传给内部，否则是不行的：

function count_down($count)
{
return $foo = function()
      use(&$count,&$foo)
{
      echo $count."\n";
      if(--$count > 0)
         $foo();
};
}
$foo = count_down(4);
$foo();

　　像上面这样写就对了。
下面是另一种方法：

function count_down_again($count)
{
return function()use($count)
{
      printf("wow %d\n",$count);
      return --$count;
};
}
$foo = count_down_again(5);
while($foo() >0);

不过，这段代码有点小错误。编译虽然没错，但是$foo函数每次返回的都是4.

也就是use关键字看上去像是支持静态词法域的，在这个例子上，它只是对外层函数使用的变量作了一个简单拷贝。

让我们稍微修改一下，把第3行的use($count)改为use(&$count)：

function count_down_again($count)
{
return function()use(&$count)
{
      printf("wow %d\n",$count);
      return --$count;
};
}
$foo = count_down_again(5);
while($foo() >0);

这样才正确。

我个人使用的方式是基于类的，做成了类似下面的形式：

class Foo
{
public function __invoke($count)
{
      if($count > 0)
         $this($count - 1);
      echo "wow\n";
}
}
$foo = new Foo();
$foo(4);

这样做的行为也是正确的。

这样不会像前一个例子那样失去了递归调用的能力。

虽然这是一个类，但是只不过是在手动实现那些支持闭包和静态词法域的语言中，编译器自动实现的动作。

其实今天早上，我本来准备用类scheme的风格写一个解析器的。可能稍微晚点吧。scheme风格的函数式编程是这样的：

function yet_another_count_down($func,$count)
{
$func($count);
if($count > 0)
      yet_another_count_down($func,$count - 1);
}
yet_another_count_down(function($var){echo $var."\n";},6);

它不是很依赖静态词法域，虽然scheme对静态词法域的支持还是很不错的。它主要还是利用了first-class-function。当然，这也是一种典型的闭包。

我实现的torrent解析工具的代码如下：

<?php
$file_name = '1.torrent';
$file = fopen($file_name,'r');
$nil = new Parser($file);//构造解析器
$nil = $nil();//进行解析
$pos = ftell($file);
echo '读取到文件位置'.sprintf('0x%08X',$pos)."\r\n";
fseek($file,0,SEEK_END);
echo '还剩下'.(ftell($file) - $pos).'字节未读取'."\r\n";
if(!feof($file))
{
echo '文件还未结束，再读一个字符:';
$ch = fgetc($file);
if(is_string($ch) && ereg('\w',$ch))
{
      echo $ch."\r\n";
}
else
{
      printf('0x%02X',$ch);
      echo "\r\n";
}
echo '现在的文件位置是'.sprintf('0x%08X',ftell($file))."\r\n";
echo '文件'.(feof($file)?'已结束':'还未结束')."\r\n";
}
fclose($file);//解析器后面不再工作了，此时可以释放文件指针了。
$info = @$nil['value'][0]['info'];
if(!$info)
{
echo '这是一个有效的B-Encoding文件，但它不是一个有效的种子文件';
exit();
}
$name = $info['name.utf-8'] ?$info['name.utf-8']:$info['name'];
if(!$name)
{
echo '这是一个有效的B-Encoding文件，但它不是一个有效的种子文件';
exit();
}
echo $name."\r\n";
if($info['files'])
{
$index = 0;
foreach($info['files'] as $f)
{
      $index += 1;
      $path = $f['path.utf8'] ?$f['path.utf8'] :$f['path'];
      if(!$path)
      {
         echo '文件列表中的第'.$index."个文件不含目录\r\n";
         continue;
      }
      if(0 === strpos($path[0],"_____padding_file_"))continue;
      $under_folder = false;
      foreach($path as $item)
      {
         if($under_folder)
         {
            echo '/';
         }else{
            $under_folder = true;
         }
         echo $item;
      }
      echo "\r\n";
}
}
else
{
echo "仅有一个文件\r\n";
}
class Parser
{
private $_file;
public function __construct($file)
{
      $this ->_file = $file;
}
public function __invoke($parent = array())
{
      $ch = $this ->read();
      switch($ch)
      {
      case 'i':
         {
            $n = $ch;
            while(($ch = $this ->read()) != 'e')
            {
                  if(!is_numeric($ch))
                  {
                     echo '在';
                     echo sprintf(
                              '0x%08X',ftell($this ->_file));
                     echo '解析数字时遇到错误',"\r\n";
                     echo '在i和e之间不应该出现非数字字符'."\r\n";
                     echo '意外的字符'.sprintf('0x%02X',$ch);
                     exit();
                  }
                  else
                  {
                     $n .= $ch;
                  }
            }
            $n += 0;
            $offset = count($parent['value']);
            $parent['value'][$offset] = $n;
            return $parent;
         }
         break;
      case 'd':
         {
            $node = array();
            //这个$node变量作为字典对象准备加入到$parent的孩子节点中去
            //$node['type'] = 'd';
            while('e' != ($tmp = $this($node)))
            {//每次给$node带来一个新孩子
                  $node = $tmp;
            }
            $child_count = count($node['value']);
            if($child_count % 2 != 0)
            {
                  echo '解析结尾于';
                  echo sprintf('0x%08X',ftell($this ->_file));
                  echo '的字典时遇到错误:'."\r\n";
                  echo '字典的对象映射不匹配';
                  exit();
            }
            $product = array();
            for($i = 0; $i < $child_count; $i += 2)
            {
                  $key = $node['value'][$i];
                  $value = $node['value'][$i + 1];
                  if(!is_string($key))
                  {
                     echo '无效的字典结尾于';
                     echo sprintf('0x%08X',ftell($this ->_file));
                     echo ":\r\n";
                     echo '解析[k => v]配对时遇到错误，k应为字符串';
                     exit();
                  }
                  $product[$key] = $value;
            }
            /*
               * 思想是这样的：子节点想要加入父节点时，
               * 往父节点的value数组添加。
               * 当父节点收集好所需的信息后，
               * 父节点自身再从它的value节点整合内容
               * 对于字典和列表统一这样处理会大大降低代码量
               */
            $offset = count($parent['value']);
            $parent['value'][$offset] = $product;
            return $parent;
         }
         break;
      case 'l';
         {
            $node = array();
            while('e' != ($tmp = $this($node)))
            {
                  $node = $tmp;
            }
            $offset = count($parent['value']);
            $parent['value'][$offset] = $node['value'];
            return $parent;
         }
         break;
      case 'e':
            return 'e';
         break;
      default:
         {
            if(!is_numeric($ch))
            {
                  $this ->unexpected_character(
                     ftell($this ->_file) - 1,$ch);
            }
            $n = $ch;
            while(($ch = $this ->read()) != ':')
            {
                  $n .= $ch;
                  if(!is_numeric($n))
                  {
                     unexpected_character(
                        ftell($this ->_file) - 1,$ch);
                  }
            }
            $n += 0;
            $str = '';
            for(; $n > 0; --$n)
            {
                  $str .= $this ->read();
            }
            $offset = count($parent['value']);
            $parent['value'][$offset] = $str;
            return $parent;
         }
         break;
      }
}
/*
   * read函数包裹了$this ->_file变量
   */
function read()
{
      if(!feof($this ->_file))
      {
         return fgetc($this ->_file);
      }else{
         echo '意外的文件结束';
         exit();
      }
}

/*
   * unexpected_character函数接收2个参数
   * 它用于指明脚本在何处遇到了哪个不合法的字符，
   * 并在返回前终止脚本的运行。
   */
function unexpected_character($pos,$val)
{
      $hex_pos = sprintf("0x%08X",$pos);
      $hex_val = sprintf("0x%02X",$val);
      echo 'Unexpected Character At Position ';
      echo $hex_pos.' , Value '.$hex_val."\r\n";
      echo "Analysing Process Teminated.";
      exit();
}
}
?>

这里很有趣的是，明明我对文件调用了fseek($file,0,SEEK_END);移动到文件末尾了，但是feof还是报告说文件没有结束，并且fgetc返回一个0，而没有报错。但是此时文件实际上已经到末尾了。

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] PHP用闭包思想写的torrent文件解析工具

浏览过的版块

扫码加入运维网微信交流群