Large language models (LLMs) have achieved remarkable success in various natural language processing tasks. Scientific text summarization is a particularly complex task due to the technical nature of scientific documents. Evaluating LLMs on this specific task requires thoroughly constructed benchmarks and evaluation criteria. Several research pap