Case study:

Analysing a large trace

The problem:

A CAD engineer customer of an EDA vendor ran a trace in Breeze but had to stop it with “kill -9” because it appeared to have hung. The vendor did a “ps -fu $USER –forest” to see the process tree and there was nothing left running under the trace_program.sh, but it remained active for a long time. The resulting trace directory was more than 5GB in size.

So the question from the vendor is: did the trace go wrong somehow? As they can’t export that much data from the customer network for analysis, is there anything they can do to validate that the traces are OK first?

The solution:

Breeze does a lot of processing outside of the normal set of processes so it may be it was still finishing things off when it was killed, but that shouldn’t matter. If it was a long trace with a lot of dependencies then the trace file could get that big, but that is unusual.

We’ve found that some semiconductor companies that are known for having very complex scripted flows have traces that run to several gigabytes. As such, you can choose to import just the top level trace file – this option pops up when you start to import the trace.

Once you have worked out which program is of interest you can then make a copy of the trace that contains just the top level trace file (called ‘trace’) then the sub directories of the trace contain trace files for a particular process. They are named and organised by process ID.

The result:

The particular customer’s company has some incredibly wrapper-heavy flows. Although the sheer size of the file made Breeze a little slow to operate, the vendor was able to show clearly enough that the test jobs were launched satisfactorily by its tools, so was in a good position to engage with the CAD engineer to find the issue. It turned out that the jobs were exceeding an allocated memory limit and were being killed brutally with no warning nor record of the reason.