Hi,
We've been investigating this problem and it looks like renders using curve type grass have the best performance with 4 threads (only!) and if you add more threads to the calculation, the benchmark gets lower; on the other hand the performance of renders with flat type grass scale much better as the number of threads increases.
For the moment, the suggestion is to use flat type instead of curve unless needed and use fewer steps with curve type (as it doesn't require as much as flat).
We'll keep investigating.
Best wishes,
Fernando