跳转至

MLIR初遇–Toy2

初遇

  • AI编译器和AI部署框架的区别?
  • MLIR和tvm的区别?

MLIR的作用

初次接触MLIR是对模型进行量化操作的工具,通过MLIR多层级的逐步Lowering,然后可以对算子进行融合等优化操作。 AI编译器最常用的是使用MLIR方式和tvm方式,tvm方式采用类似遗传算法等在一个空间进行自动搜索,得到优化结果。MLIR提供了一个基础的框架,可以通过将专家知识编写成各种Pass,逐层优化的IR,最终部署到各种硬件设备上。

比如: + sophgo tpu-mlir 算能科技的编译器,对模型进行量化,部署到TPU上。 + triton,有两个知名的项目都叫做triton,一个是英伟达出的服务端部署框架tritonserver,一个是openai根据mlir的一个编程语言。

编译llvm-project项目

通过llvm-project项目中的toy example对mlir进行了源码学习和调试,一起一窥究竟。 编译命令:

cmake -G Ninja ../llvm    -DLLVM_ENABLE_PROJECTS=mlir    -DLLVM_BUILD_EXAMPLES=ON    -DLLVM_TARGETS_TO_BUILD="X86"    -DCMAKE_BUILD_TYPE=Release    -DLLVM_ENABLE_ASSERTIONS=ON  -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_LLD=ON

# cmake -G Ninja ../llvm    -DLLVM_ENABLE_PROJECTS=mlir    -DLLVM_BUILD_EXAMPLES=ON    -DLLVM_TARGETS_TO_BUILD="X86"    -DCMAKE_BUILD_TYPE=Debug    -DLLVM_ENABLE_ASSERTIONS=ON  -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_LLD=ON

cmake --build . --target check-mlir

自定义Op

根据toy2的文档描述,我们知道定义一个Dialect和相关的Op,可以有两种方式,第一种是采用他专门的描述语言编写td文件,通过mlir-tblgen命令可以生成头文件声明和函数定义。第二种是手动编写cpp文件。

在toy2的

.
├── CMakeLists.txt
├── include
│   ├── CMakeLists.txt
│   └── toy
│       ├── AST.h
│       ├── CMakeLists.txt
│       ├── Dialect.h
│       ├── Lexer.h
│       ├── MLIRGen.h
│       ├── Ops.td
│       └── Parser.h
├── mlir
│   ├── Dialect.cpp
│   └── MLIRGen.cpp
├── parser
│   └── AST.cpp
└── toyc.cpp

4 directories, 13 files

首先对文件相关内容进行一个简单的介绍: + AST和Parser相关的都是对自定义的Toy语言(类似python函数定义语法def,只支持double类型)进行词法语法分析的内容,如果只是学习MLIR编译可以先跳过 + Dialect.h导入了td文件生成的头文件定义和mlir相关头文件 + Ops.td文件通过td特有的这种领域语言描述了ToyDialect和ToyOp,通过命令生成代码,避免了大量的编写cpp的工作 + Dialect.cpp中有部分手写cpp实现的Op等 + toyc.cpp实现了命令行读取,DumpAst,DumpMlir的功能,原生的MLIR肯定不能解析toy定义的内容,因为原生MLIR提供的默认Dialect根本都不认识Toy的语法,那么在toyc.cpp中对ToyDialect进行了注册

toy.cpp中注册toyDialect的逻辑:

// toy.cpp
int dumpMLIR() {
  mlir::MLIRContext context;
  // Load our Dialect in this MLIR Context.
  context.getOrLoadDialect<mlir::toy::ToyDialect>();
    ......
}
在inlude/toy/Ops.td文件中,定义了ToyDialect和相关的Op,比如PrintOp。 在mlir/Dialect.cpp中手动实现了部分Op,比如ConstantOp的Parse和Print函数。相对比PrintOp的Parse函数和Print函数是td文件定义,然后通过mlir-tblgen命令生成的。

下面通过代码的小改动来了解这个过程,你将会了解到如下内容: + 如何使用命令生成Dialect的声明和定义? + 如何使用命令生成Op的声明和定义,如何过滤只生成某个Op的声明和定义? + 如何为PrintOp手写Parse函数和Print函数? + 如何将编译代码,将改动应用到生成的mlir文件中?

正常编译toy示例流程

先看一下toy例子定义,在mlir/test/Examples/Toy/Ch2/codegen.toy文件中:

// codegen.toy
def multiply_transpose(a, b) {
  return transpose(a) * transpose(b);
}

def main() {
  var a<2, 3> = [[1, 2, 3], [4, 5, 6]];
  var b<2, 3> = [1, 2, 3, 4, 5, 6];
  var c = multiply_transpose(a, b);
  var d = multiply_transpose(b, a);
  print(d);
}
可以看到定义了两个函数multiply_transpose和main主函数,在main函数中调用了print函数,也就是printOp。 第一步,编译toy文件
#!/bin/bash
mlir_src_root=$(pwd)/mlir
build_root=$(pwd)/build

${build_root}/bin/toyc-ch2 ${mlir_src_root}/test/Examples/Toy/Ch2/codegen.toy -emit=mlir -mlir-print-debuginfo

编译后生成的内容:

module {
  toy.func @multiply_transpose(%arg0: tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":4:1), %arg1: tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":4:1)) -> tensor<*xf64> {
    %0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":5:10)
    %1 = toy.transpose(%arg1 : tensor<*xf64>) to tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":5:25)
    %2 = toy.mul %0, %1 : tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":5:25)
    toy.return %2 : tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":5:3)
  } loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":4:1)
  toy.func @main() {
    %0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":9:17)
    %1 = toy.reshape(%0 : tensor<2x3xf64>) to tensor<2x3xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":9:3)
    %2 = toy.constant dense<[1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00, 5.000000e+00, 6.000000e+00]> : tensor<6xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":10:17)
    %3 = toy.reshape(%2 : tensor<6xf64>) to tensor<2x3xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":10:3)
    %4 = toy.generic_call @multiply_transpose(%1, %3) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":11:11)
    %5 = toy.generic_call @multiply_transpose(%3, %1) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":12:11)
    toy.print %5 : tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":13:3)
    toy.return loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":8:1)
  } loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":8:1)
} loc(unknown)
上面就是编译生成的mlir的内容,我们先关注print调用的部分代码:
toy.print %5 : tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":13:3)
打印中的内容和格式,在td文件中进行了定义:
def PrintOp : Toy_Op<"print"> {
  let summary = "print operation";
  let description = [{
    The "print" builtin operation prints a given input tensor, and produces
    no results.
  }];

  // The print operation takes an input tensor to print.
  let arguments = (ins F64Tensor:$input);

  let assemblyFormat = "$input attr-dict `:` type($input)";
}
assemblyFormat就是输出的定义,包括了$input以及输入类型等。
let assemblyFormat = "$input attr-dict `:` type($input)";
第二步,对PrintOp的声明和定义进行生成。
${build_root}/bin/mlir-tblgen -gen-op-decls ${mlir_src_root}/examples/toy/Ch2/include/toy/Ops.td --op-include-regex "print" -I ${mlir_src_root}/include/ 
op-include-regex对op进行了过滤,指定只生成printOp的声明。 执行结果:
{{
......
class PrintOp : public ::mlir::Op<PrintOp, ::mlir::OpTrait::ZeroRegions, ::mlir::OpTrait::ZeroResults, ::mlir::OpTrait::ZeroSuccessors, ::mlir::OpTrait::OneOperand, ::mlir::OpTrait::OpInvariants> {
public:
  using Op::Op;
  using Op::print;
  using Adaptor = PrintOpAdaptor;
  template <typename RangeT>
  using GenericAdaptor = PrintOpGenericAdaptor<RangeT>;
  using FoldAdaptor = GenericAdaptor<::llvm::ArrayRef<::mlir::Attribute>>;
  static ::llvm::ArrayRef<::llvm::StringRef> getAttributeNames() {
    return {};
  }

  static constexpr ::llvm::StringLiteral getOperationName() {
    return ::llvm::StringLiteral("toy.print");
  }

  std::pair<unsigned, unsigned> getODSOperandIndexAndLength(unsigned index) {
    return {index, 1};
  }

  ::mlir::Operation::operand_range getODSOperands(unsigned index) {
    auto valueRange = getODSOperandIndexAndLength(index);
    return {std::next(getOperation()->operand_begin(), valueRange.first),
             std::next(getOperation()->operand_begin(), valueRange.first + valueRange.second)};
  }

  ::mlir::TypedValue<::mlir::TensorType> getInput() {
    return ::llvm::cast<::mlir::TypedValue<::mlir::TensorType>>(*getODSOperands(0).begin());
  }

  ::mlir::OpOperand &getInputMutable() {
    auto range = getODSOperandIndexAndLength(0);
    return getOperation()->getOpOperand(range.first);
  }

  std::pair<unsigned, unsigned> getODSResultIndexAndLength(unsigned index) {
    return {index, 1};
  }

  ::mlir::Operation::result_range getODSResults(unsigned index) {
    auto valueRange = getODSResultIndexAndLength(index);
    return {std::next(getOperation()->result_begin(), valueRange.first),
             std::next(getOperation()->result_begin(), valueRange.first + valueRange.second)};
  }

  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Value input);
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes, ::mlir::Value input);
  static void build(::mlir::OpBuilder &, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes, ::mlir::ValueRange operands, ::llvm::ArrayRef<::mlir::NamedAttribute> attributes = {});
  static ::mlir::ParseResult parse(::mlir::OpAsmParser &parser, ::mlir::OperationState &result);
  void print(::mlir::OpAsmPrinter &p);
  ::mlir::LogicalResult verifyInvariantsImpl();
  ::mlir::LogicalResult verifyInvariants();
public:
};
} // namespace toy
} // namespace mlir
MLIR_DECLARE_EXPLICIT_TYPE_ID(::mlir::toy::PrintOp)

可以看到声明中有parse和print函数。下面再生成函数定义,命令如下:

//===----------------------------------------------------------------------===//
// ::mlir::toy::PrintOp definitions
//===----------------------------------------------------------------------===//

namespace detail {
} // namespace detail
PrintOpAdaptor::PrintOpAdaptor(PrintOp op) : PrintOpGenericAdaptor(op->getOperands(), op) {}
.......

::mlir::ParseResult PrintOp::parse(::mlir::OpAsmParser &parser, ::mlir::OperationState &result) {
  ::mlir::OpAsmParser::UnresolvedOperand inputRawOperand{};
  ::llvm::ArrayRef<::mlir::OpAsmParser::UnresolvedOperand> inputOperands(&inputRawOperand, 1);  ::llvm::SMLoc inputOperandsLoc;
  (void)inputOperandsLoc;
  ::mlir::Type inputRawType{};
  ::llvm::ArrayRef<::mlir::Type> inputTypes(&inputRawType, 1);

  inputOperandsLoc = parser.getCurrentLocation();
  if (parser.parseOperand(inputRawOperand))
    return ::mlir::failure();
  {
    auto loc = parser.getCurrentLocation();(void)loc;
    if (parser.parseOptionalAttrDict(result.attributes))
      return ::mlir::failure();
  }
  if (parser.parseColon())
    return ::mlir::failure();

  {
    ::mlir::TensorType type;
    if (parser.parseCustomTypeWithFallback(type))
      return ::mlir::failure();
    inputRawType = type;
  }
  if (parser.resolveOperands(inputOperands, inputTypes, inputOperandsLoc, result.operands))
    return ::mlir::failure();
  return ::mlir::success();
}

void PrintOp::print(::mlir::OpAsmPrinter &_odsPrinter) {
  _odsPrinter << ' ';
  _odsPrinter << getInput();
  ::llvm::SmallVector<::llvm::StringRef, 2> elidedAttrs;
  _odsPrinter.printOptionalAttrDict((*this)->getAttrs(), elidedAttrs);
  _odsPrinter << ' ' << ":";
  _odsPrinter << ' ';
  {
    auto type = getInput().getType();
    if (auto validType = ::llvm::dyn_cast<::mlir::TensorType>(type))
      _odsPrinter.printStrippedAttrOrType(validType);
   else
     _odsPrinter << type;
  }
}

} // namespace toy
} // namespace mlir
MLIR_DEFINE_EXPLICIT_TYPE_ID(::mlir::toy::PrintOp)
可以看到生成的parse函数和print函数的定义,print函数很好理解,获取内容,如getInput(),然后调用 _odsPrinter << 打印内容,就像cout一样。

手写PrintOp的Print和Parse方法cpp内容

刚刚生成了Print函数和Parse函数的定义,我们如果要手动改写cpp怎么办呢?聪明如你想到了,把生成的两个函数粘贴到Dialect.cpp中。 那么怎么避免自动生成这两个函数呢?参考ConstantOp操作,改写了td文件中PrintOp的定义,注释了assemblyFormat,添加hasCustomAssemblyFormat。意思是,不指定Format,然后自定义Format。

// Ops.td
def PrintOp : Toy_Op<"print"> {
  let summary = "print operation";
  let description = [{
    The "print" builtin operation prints a given input tensor, and produces
    no results.
  }];

  // The print operation takes an input tensor to print.
  let arguments = (ins F64Tensor:$input);

  // Indicate that the operation has a custom parser and printer method.
  let hasCustomAssemblyFormat = 1;
  // let assemblyFormat = "$input attr-dict `:` type($input)";
}

粘贴Print函数和Parse函数到Dialect.cpp中。

void PrintOp::print(::mlir::OpAsmPrinter &_odsPrinter) {
  _odsPrinter << ' ';
  _odsPrinter << getInput();
  _odsPrinter << "hello_change";
  ::llvm::SmallVector<::llvm::StringRef, 2> elidedAttrs;
  _odsPrinter.printOptionalAttrDict((*this)->getAttrs(), elidedAttrs);
  _odsPrinter << ' ' << ":";
  _odsPrinter << ' ';
  {
    auto type = getInput().getType();
    if (auto validType = ::llvm::dyn_cast<::mlir::TensorType>(type))
      _odsPrinter.printStrippedAttrOrType(validType);
   else
     _odsPrinter << type;
  }
}

由于改动了cpp源码,需要先编译toy-ch2

cmake --build . --target toyc-ch2

重新执行:

#!/bin/bash
mlir_src_root=$(pwd)/mlir
build_root=$(pwd)/build

${build_root}/bin/toyc-ch2 ${mlir_src_root}/test/Examples/Toy/Ch2/codegen.toy -emit=mlir -mlir-print-debuginfo
生成的mlir内容如下,可以看到其它内容没有发生变化,但是增加了我们新添加的”hello_change”内容。
module {
  toy.func @multiply_transpose(%arg0: tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":4:1), %arg1: tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":4:1)) -> tensor<*xf64> {
    %0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":5:10)
    %1 = toy.transpose(%arg1 : tensor<*xf64>) to tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":5:25)
    %2 = toy.mul %0, %1 : tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":5:25)
    toy.return %2 : tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":5:3)
  } loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":4:1)
  toy.func @main() {
    %0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":9:17)
    %1 = toy.reshape(%0 : tensor<2x3xf64>) to tensor<2x3xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":9:3)
    %2 = toy.constant dense<[1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00, 5.000000e+00, 6.000000e+00]> : tensor<6xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":10:17)
    %3 = toy.reshape(%2 : tensor<6xf64>) to tensor<2x3xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":10:3)
    %4 = toy.generic_call @multiply_transpose(%1, %3) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":11:11)
    %5 = toy.generic_call @multiply_transpose(%3, %1) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":12:11)
    toy.print %5hello_change : tensor<*xf64> loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":13:3)
    toy.return loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":8:1)
  } loc("/home/ken/Codes/mlir_about/llvm-project/mlir/test/Examples/Toy/Ch2/codegen.toy":8:1)
} loc(unknown)

参考

https://mlir.llvm.org/docs/Tutorials/Toy/Ch-2/

评论