Abstract: The next recognized development direction of large language models (LLMs) is to integrate and enhance multimodal capability. Although current multimodal large language models (MLLMs) have ...