Nilay
2013-03-21 03:54:15 UTC
Hi
While testing some patches for the x86 architecture, I came across a
problem in which the system does nothing for several seconds. This time is
the time of the target machine. This behavior is accompanied by the
following message on the console for target machine --
hda: dma_timer_expiry: dma status == 0x64
hda: DMA interrupt recovery
hda: lost interrupt
Joel Hestness, who had seen the problem before, provided me with a patch
that solves the problem. From my conversation with Joel and after looking
at the code my self, it appears that the problem is with the fact that the
commit stage of the pipeline keeps a local copy of the interrupt object.
Since the interrupt is usually handled several cycles after the commit
stage becomes aware of it, it is possible that the local copy of the
interrupt object may not be the correct interrupt when the interrupt is
actually handled. It is possible that another interrupt occurred in the
interval between interrupt detection and interrupt handling. I am
proposing the following solution (slightly different from Joel's proposal)
to handle this problem:
diff --git a/src/cpu/o3/commit_impl.hh b/src/cpu/o3/commit_impl.hh
--- a/src/cpu/o3/commit_impl.hh
+++ b/src/cpu/o3/commit_impl.hh
@@ -753,7 +753,7 @@
}
// CPU will handle interrupt.
- cpu->processInterrupts(interrupt);
+ cpu->processInterrupts(cpu->getInterrupts());
thread[0]->noSquashFromTC = false;
The code above ignores the local copy of the interrupt object and fetches
a new one from the CPU object.
There are several questions that need to be addressed here. Is there an
actual bug in the o3 cpu? Is the diagnosis correct? Is the solution
acceptable?
Thanks
Nilay
While testing some patches for the x86 architecture, I came across a
problem in which the system does nothing for several seconds. This time is
the time of the target machine. This behavior is accompanied by the
following message on the console for target machine --
hda: dma_timer_expiry: dma status == 0x64
hda: DMA interrupt recovery
hda: lost interrupt
Joel Hestness, who had seen the problem before, provided me with a patch
that solves the problem. From my conversation with Joel and after looking
at the code my self, it appears that the problem is with the fact that the
commit stage of the pipeline keeps a local copy of the interrupt object.
Since the interrupt is usually handled several cycles after the commit
stage becomes aware of it, it is possible that the local copy of the
interrupt object may not be the correct interrupt when the interrupt is
actually handled. It is possible that another interrupt occurred in the
interval between interrupt detection and interrupt handling. I am
proposing the following solution (slightly different from Joel's proposal)
to handle this problem:
diff --git a/src/cpu/o3/commit_impl.hh b/src/cpu/o3/commit_impl.hh
--- a/src/cpu/o3/commit_impl.hh
+++ b/src/cpu/o3/commit_impl.hh
@@ -753,7 +753,7 @@
}
// CPU will handle interrupt.
- cpu->processInterrupts(interrupt);
+ cpu->processInterrupts(cpu->getInterrupts());
thread[0]->noSquashFromTC = false;
The code above ignores the local copy of the interrupt object and fetches
a new one from the CPU object.
There are several questions that need to be addressed here. Is there an
actual bug in the o3 cpu? Is the diagnosis correct? Is the solution
acceptable?
Thanks
Nilay