There's an "unknown incompatibility" problem mentioned in the comments, which forced the original author of "VSTS Test Runner for xUnit" to add the XUnitForceLegacyCallback hack was quite tough to trace. It took me more than 8 hours to diagnose and fix, so if you, dear reader, are here already, please at least skim through it and either remember what IMessageSink
really is, or write down somewhere the helpful external links :))
Quick recap - the original problem
The xUnit's test runner was crashing badly when run within the VisualStudio environment (by, for example, "Run all tests in solution"), because a NotSupportedException
was thrown somewhere in the middle. With a bit of bad luck, you could also see RemotingException or SecurityException instead, or some messages about "Method not supported in a dynamic assembly".
The author of the runner had to inject a hotfix into xUnit's ExecutorWrapper
that forced the xUnit to use a "legacy callback mechanism", that is, to turn off new messaging layer introduced I think somewhere around xUnit 1.6, in favor of old layer that used handcrafted callbacks. In fact, the xunit.utility.runner's "version-resilience" keeps him to hog all the previous APIs and it dynamically switches between them depending on what version of xunit.dll it notices, so the hotfix simply forced him to always fallback to pre-1.6 version. This hotfix is easily noticeable in the VSTS Runner's code, just search for XUnitForceLegacyCallback
.
The runner is currently not supported by xUnit team, as they look forward for VS2011. However, it does not mean that noone is using VS2010 now. Well, at least I am, and I want to have xUnit right at hand. I was working on some tweaks for Moq and AutoMoq, and actually I managed to extend Moq with something quite reasonable. However, some DLL version collisions forced me to recompile most of the libraries under .Net4.0 and .. it broke the runner somehow! Diffing the code, I noticed that fresh checkout from the xUnit's repository has removed the fix I was talking about earlier, and so my curiosity ignited.
What
After a bit of mindless digging, it turned out that the direct cause was passing some IMessage
objects across the AppDomain, and the root error message was "The method was called with a Message of an unexpected type". That was the easier part to trace, and it looked quite easy: remote method was passed some message object it could not handle.
When
The more tricky part was finding out which object was that and why. Fortunatelly, NotSupportedException
s are not thrown very often and a first-chance exception catcher quickly found out the place:
ExecutorWrapper.OutgoingMessage
that was returned from
XmlNodeCallbackHandlerWithIMessageSink
and its implementation of
IMessageSink.SyncProcessMessage
It is called in the last steps of most of the workflows, wrapped with OnTestStart/End
methods, for example:
// Executor.OnTestStart:285:
callback.Notify(node.OuterXml) // callback was the XmlNodeCallback
// called from Executor+RunTest.RunTest:286:
executor.RunOnSTAThreadWithPreservedWorkingDirectory(() =>
TestClassCommandRunner.Execute(testClassCommand,
methods,
command => OnTestStart(command, handler),
result => OnTestResult(result, handler)));
Why (teaser)
All the message passing was implemented with IMessageSink
interface, that accepts any IMessage
. Surely the OutgoingMessage implemented it. And there are no manual argument assertions in the xUnit's code that would throw NotSupportedException
when receiving such object. What's more, it was a callback! So a similar ExecutorCallback.MessageSinkCallback.OutgoingMessage
already have passed properly - all in all the handler.Notify
caused the remote sink in XmlNodeCallback
to be called, right? NOT!
The code of the xUnit test runner is version-resilient and was recently upgraded from using 'callbacks' to using those two IMessage
and IMessageSink
interfaces. In this new version, the proxies and IMessageSink
s were used 'directly', as in a classic Remoting samples - the xUnit was first manually casting the proxy to IMessageSink
, and then, later, calling IMessageSink.SyncProcessMessages() manually, too:
// ExecutorCallback.Wrap():26
IMessageSink messageSink = handler as IMessageSink;
if (messageSink != null)
return new MessageSinkCallback(messageSink);
return new CallbackEventHandlerCallback(handler);
// ExecutorCallback.MessageSinkCallback.Notify():58
OutgoingMessage message = new OutgoingMessage(value);
IMessage response = messageSink.SyncProcessMessage(message); // <----
if (response != null && response.Properties.Contains("data"))
shouldContinue = Convert.ToBoolean(response.Properties["data"]);
Oh boy.. the .Net Remoting's Message Sinks are a litle more complicated than that!
IMessageSink is an infrastructure interface!
Why (retrospection)
The xunit.console and xunit.gui use intra-process AppDomain to launch the code, probably because it is quite easy to setup and it seems natural to do so after you read on MSDN all the descriptions about the gained safety and isolation. However, it is tremendously insecure for this use case.
There are a few tiny causes, because of which, for example, the SqlServer allows only '100% verifiable modules' to be loaded as its plugins. One of them is, what happens if your code under test loads a NATIVE DLL, and what if the dll causes an EPIC NATIVE CRASH(tm)? It's a thing from under the .Net layer, and the core Windows system will simply kill your process. And I mean, whole process, not just AppDomain, can get NATIVELY INSTA-KILLED(tm). Or, depending on your luck, it may survive but with some random parts of it may get rendered unusable, in a more or less controlled manner.
AppDomains provide almost no security against ungraceful native crashes. I have learnt it the hard way in the last few years when I had to develop incredible amounts of wrapper/transport code just because one old native library (whose code I don't have access to and authors were unavailable due to political reasons) liked to trash it's process-static data every few dozens of calls (guess what! that process-static data was initialized only once during module load in DllMain) and what's even worse, at every few hundreds of calls, it faulted so hard, that my application was killed, with no exceptions raised on the .Net side. Puff, DrWatson, and gone. The best thing what could be done about that was a detection of first failure, feature lockdown and displaying apologizing message "please restart me". Shame!
Ok, whining aside, the point is, that if you want your "runner" to be safe against a completely unknown code that may crash in a completely unknown manner, then the only way is either completely prohibit usage of any of the dangerous things (->the SqlServer way) or to use a ...
... and this is why the VisualStudio test runner uses dedicated worker process, the QTAgent32, which can happily crash anytime it wants without affecting VisualStudio (well, almost, but it's another story).
Why (gory details)
So here's what happened:
It turned out that the SyncProcessMessage
in XmlNodeCallback
was being given something completely different that
the xUnit's code was expecting. Of course, it was an IMessage
instance. Looking at that implementation of SyncProcessMessage
, it's clear that it is assuming that the IMessage msg
argument will be a (possibly proxized) reference to a ExecutorCallback.MessageSinkCallback.OutgoingMessage
just sent over the nonexistent wire. And actually,
for the bit earlier call of EnumerateTests
it is absolutely true!
However, the EnumerateTests
is anticipated by the VisualStudio to be safe, because it is meant to only call the
test plugin's code, which was assumed to be well written. However, all the other calls, like RunAssembly
or RunTests
are not, and calls to them are being passed through a safer channel. Please note the complicated callstack chain, passing through two AppDomains, one process and a few Threads:
FROM devenv.exe: Worker Thread "Controller: state execution thread for run 5448f460-1b83-4d4f-81b7-41c54b7d738b"
Microsoft.VisualStudio.QualityTools.Common.dll!Microsoft.VisualStudio.TestTools.Common.StateMachine<Microsoft.VisualStudio.TestTools.Common.RunState>.Execute()
THROUGH the remoting
FROM QTAgent32.exe: Worker Thread "Agent: state execution thread for test 'Test1' with id '83ee227c-1684-406a-9d29-d97bb8b20166'"
Microsoft.VisualStudio.QualityTools.AgentObject.dll!Microsoft.VisualStudio.TestTools.Agent.AgentExecution.ExecuteTest(object obj)
Microsoft.VisualStudio.QualityTools.AgentObject.dll!Microsoft.VisualStudio.TestTools.Agent.AgentExecution.ExecuteTest(bool forceSynchronousExecution = false)
Microsoft.VisualStudio.QualityTools.Common.dll!Microsoft.VisualStudio.TestTools.Common.StateMachine<Microsoft.VisualStudio.TestTools.Common.TestState>.Execute()
Microsoft.VisualStudio.QualityTools.AgentObject.dll!Microsoft.VisualStudio.TestTools.Agent.AgentExecution.TestStateStarted()
Microsoft.VisualStudio.QualityTools.ExecutionCommon.dll!Microsoft.VisualStudio.TestTools.Execution.ExecutionUtilities.StartNewThread(System.Threading.ParameterizedThreadStart parameterizedThreadStart, System.Threading.ApartmentState apartmentState, int maxStackSize, object parameter, string threadName)
THROUGH thread change
INSIDE QTAgent32.exe: Worker Thread "Agent: adapter run thread for test 'Test1' with id '83ee227c-1684-406a-9d29-d97bb8b20166'"
Microsoft.VisualStudio.QualityTools.AgentObject.dll!Microsoft.VisualStudio.TestTools.Agent.AgentExecution.CallAdapterRunMethod(object obj)
XUnitForVS.dll!XUnitForVS.UnitTestAdapter.Run(Microsoft.VisualStudio.TestTools.Common.ITestElement testElement = {XUnitForVS.UnitTest}, Microsoft.VisualStudio.TestTools.Execution.ITestContext testContext = {Microsoft.VisualStudio.TestTools.Agent.TestContext})
XUnitForVS.dll!XUnitForVS.UnitTestRunner.ExecuteTest(Xunit.IExecutorWrapper executor = {Xunit.ExecutorWrapper}, System.Guid runId = {System.Guid}, XUnitForVS.UnitTest test = {XUnitForVS.UnitTest})
xunit.runner.utility.dll!Xunit.TestRunner.RunTest(string type, string method)
xunit.DLL!Xunit.Sdk.Executor.RunTest..ctor(Executor executor, string _type, string _method, object _handler)
xunit.DLL!Xunit.Sdk.Executor.RunTests..ctor(Executor executor, string _type, List<string> _methods, object _handler)
THROUGH thread change
INSIDE QTAgent32.exe: Worker Thread "xUnit.net STA Test Execution Thread"
xunit.DLL!Xunit.Sdk.Executor.ThreadRunner(object threadStart = {Method = Implicit function evaluation is turned off by user.})
xunit.DLL!Xunit.Sdk.Executor.RunTests..ctor.AnonymousMethod__f()
xunit.DLL!Xunit.Sdk.TestClassCommandRunner.Execute(Xunit.Sdk.ITestClassCommand, System.Collections.Generic.List<Xunit.Sdk.IMethodInfo>, System.Predicate<Xunit.Sdk.ITestCommand>, System.Predicate<Xunit.Sdk.ITestResult>)
xunit.DLL!Xunit.Sdk.Executor.RunTests..ctor.AnonymousMethod__10(Xunit.Sdk.ITestCommand command = {Xunit.Sdk.TimedCommand})
xunit.DLL!Xunit.Sdk.Executor.OnTestStart(Xunit.Sdk.ITestCommand command = {Xunit.Sdk.TimedCommand}, Xunit.Sdk.ExecutorCallback callback = {Xunit.Sdk.ExecutorCallback.MessageSinkCallback})
xunit.DLL!Xunit.Sdk.ExecutorCallback.MessageSinkCallback.Notify(string value)
THROUGH the remoting
FROM QTAgent32.exe: Worker Thread "xUnit.net STA Test Execution Thread" [different AppDomain than the above]
ExecutorWrapper.XmlNodeCallbackHandlerWithIMessageSink....IMessageSink.SyncProcessMessage(IMessage msg)
And here's a dump from the IMessage msg
argument that the SyncProcessMessage
was provided with.
IMessageSink.SyncProcessMessage(IMessage msg)
msg = System.Runtime.Remoting.Messaging.MethodCall [implements IMethodCallMessage]
ArgCount 1 int
argMapper null System.Runtime.Remoting.Messaging.ArgMapper
args {object[1]} object[]
{1#} [0] {__TransparentProxy}
callContext {System.Runtime.Remoting.Messaging.LogicalCallContext}
ExternalProperties {System.Runtime.Remoting.Messaging.MCMDictionary}
fSoap false
fVarArgs false
HasVarArgs false
identity null System.Runtime.Remoting.Identity
InArgCount 1 int
InArgs {object[1]} object[]
{1#} [0] {__TransparentProxy}
instArgs null System.Type[]
InternalProperties {System.Collections.Hashtable, Count=0}
LogicalCallContext {System.Runtime.Remoting.Messaging.LogicalCallContext}
MethodBase {System.Reflection.RuntimeMethodInfo}
-> MethodName "SyncProcessMessage"
methodName "SyncProcessMessage"
methodSignature {System.Type[1]}
MethodSignature {System.Type[1]}
MI {System.Reflection.RuntimeMethodInfo}
Properties {System.Runtime.Remoting.Messaging.MCMDictionary}
msg.Properties["__Uri"] -> "/7398fd15_73d1_495a_9057_32080c85b197/zmwu73vp7yrx9qkmpxacddum_6.rem"
-> msg.Properties["__MethodName"] -> "SyncProcessMessage"
msg.Properties["__MethodSignature"] -> {System.Type[1]}
msg.Properties["__TypeName"] -> "System.Runtime.Remoting.Messaging.IMessageSink, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"
msg.Properties["__Args"] -> {object[1]}
{1#} [0] {__TransparentProxy}
msg.Properties["__CallContext"] -> {System.Runtime.Remoting.Messaging.LogicalCallContext}
srvID {System.Runtime.Remoting.ServerIdentity}
typeName "System.Runtime.Remoting.Messaging.IMessageSink, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"
TypeName "System.Runtime.Remoting.Messaging.IMessageSink, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"
uri "/7398fd15_73d1_495a_9057_32080c85b197/zmwu73vp7yrx9qkmpxacddum_6.rem"
Uri "/7398fd15_73d1_495a_9057_32080c85b197/zmwu73vp7yrx9qkmpxacddum_6.rem"
Why (grande finale)
So the XmlNodeCallback
was given a System.Runtime.Remoting.Messaging.MethodCall
, nothing what the author of that code expected!
And, more importantly, this was an infrastructure message!
From MSDN on IMessageSink:
A remote method call is a message that goes from the client end to the server end and possibly back again. As it crosses remoting boundaries on the way, the remote method call passes through a chain of IMessageSink objects. Each sink in the chain receives the message object, performs a specific operation, and delegates to the next sink in the chain. The proxy object contains a reference to the first IMessageSink it needs to use to start off the chain.
Please look at code at the bottom of the article and at the message types they are anticipating in the sink method for a handmade proxy: IMethodCallMessage
and IMethodReturnMessage
. The former is exactly what the XmlNodeCallback
was provided with: the lower-layer IMethodCallMessage
. What's more, it was 100% properly filled with all the data: it specified that a SyncProcessMessage is to be called with "args={1#}:__TransparentProxy" and guess what!
- this proxy pointed out right to the OutgoingMessage
from ExecutorCallback
!
At this point, for me, it was obvious that the remoting layer somehow assumed that this proxy has a "IMessageSink chain", in the global 'transport' sense, and helpfully tried to filter/process the message through the chain.
Looking at the sample at MSDN, it is obvious that they have their sink built by the framework, and neither their remote object nor their handmade proxy does implement the IMessageSink
. From this article, it may seem safe to implement and use IMessageSink
to your own solutions. Nothing indicates that such behaviour may (or is meant to!) occur.
However, it occurs at least in the QTAgent32.
So, I've checked also:
- WROX: .NET Remoting Architecture
- MSDN: Sinks and Sink Chains
- Lluis Sanchez page
- Napstronic: NETRemoting
and noone mentions any case of the RemoteObject implementing IMessageSink
, but at least the last link provides a very detailed description of the sink chain architecture. Looking at the serverside sink (it was an INCOMING call!), the remote object could have been mistakenly taken as a:
- CrossContextChannel -- nope, as there was
CrossContextChannel.SyncProcessMessageCallback
higher in the callstack - ServerContextTerminatorSink -- nah, as
ServerObjectTerminatorSink.SyncProcessMessage
was direct caller of our failing SyncProcessMessage! - LeaseSink -- possible, but the
InitializeLifetimeService
returns null - ServerObjectTerminatorSink -- possible, but the article says 'from message's serveridentity'
- StackBuilderSink -- possible, but as the article says 'from message's serveridentity'
Actually, I think someone may find interesting that the nearest few callstack entries were:
00:> xunit.runner.utility.dll!Xunit.ExecutorWrapper.XmlNodeCallbackHandlerWithIMessageSink.System.Runtime.Remoting.Messaging.IMessageSink.SyncProcessMessage(System.Runtime.Remoting.Messaging.IMessage msg = {System.Runtime.Remoting.Messaging.MethodCall}) Line 415 C#
-1: mscorlib.dll!System.Runtime.Remoting.Messaging.ServerObjectTerminatorSink.SyncProcessMessage(System.Runtime.Remoting.Messaging.IMessage reqMsg) + 0xad bytes
-2: mscorlib.dll!System.Runtime.Remoting.Messaging.ServerContextTerminatorSink.SyncProcessMessage(System.Runtime.Remoting.Messaging.IMessage reqMsg) + 0x8a bytes
-3: mscorlib.dll!System.Runtime.Remoting.Channels.CrossContextChannel.SyncProcessMessageCallback(object[] args) + 0x94 bytes
-4: mscorlib.dll!System.Threading.Thread.CompleteCrossContextCallback(System.Threading.InternalCrossContextDelegate ftnToCall, object[] args) + 0x8 bytes
... [Native to Managed Transition]
Why (epilogue)
I wouldn't be myself, if I hadn't checked the implementation of the direct caller for some hints:
// mscorlib.dll:System.Runtime.Remoting.Messaging.ServerObjectTerminatorSink.SyncProcessMessage
[SecurityCritical]
public virtual IMessage SyncProcessMessage(IMessage reqMsg)
{
IMessage message;
IMessage message1 = InternalSink.ValidateMessage(reqMsg);
if (message1 == null)
{
ServerIdentity serverIdentity = InternalSink.GetServerIdentity(reqMsg);
ArrayWithSize serverSideDynamicSinks = serverIdentity.ServerSideDynamicSinks;
if (serverSideDynamicSinks != null)
{
DynamicPropertyHolder.NotifyDynamicSinks(reqMsg, serverSideDynamicSinks, false, true, false);
}
IMessageSink serverObject = this._stackBuilderSink.ServerObject as IMessageSink; // <--- HERE, source of 'serverObject'
if (serverObject == null)
{
message = this._stackBuilderSink.SyncProcessMessage(reqMsg);
}
else
{
message = serverObject.SyncProcessMessage(reqMsg); // <---- HERE, the calling line
}
if (serverSideDynamicSinks != null)
{
DynamicPropertyHolder.NotifyDynamicSinks(message, serverSideDynamicSinks, false, false, false);
}
return message;
}
else
{
return message1;
}
}
The 'ServerObject' is of course the 'RemoteObject' that was exposed via remoting -- the ours XmlNodeCallback
. So it is not a bug, but a cool but quite obscure feature of the .Net Remoting, available only on the server side: XmlNodeCallbackHandlerWithIMessageSink
has been taken as a StackBuilderSink
overrider.
Revisiting MSDN: Sinks and Sink Chains
Custom Channel Sinks
On the client, custom channel sinks are inserted into the chain of objects between the formatter sink and the last transport sink. Inserting a custom channel sink in the client or server channel enables you to process the IMessage at one of the following points:
- During the process by which a call represented as a message is converted into a stream and sent over the wire.
- During the process by which a stream is taken off the wire and sent to the last message sink before the remote object on the server or the proxy object (on the client).
Custom sinks can read or write data (depending if the call is outgoing or incoming) to the stream and add additional information to the headers where desired. At this stage, the message has already been serialized by the formatter and cannot be modified. When the message call is forwarded to the transport sink at the end of the chain, the transport sink writes the headers to the stream and forwards the stream to the transport sink on the server using the transport protocol dictated by the channel.
Seriously, from this description, I wouldn't think it is done this way.
So, how about a fix?
Having said all the above, the solution is trivial: the only thing needed is to refactor everything to cease using the mscorlib's IMessageSink
and use an identical custom interface with different name, similarly to what was before the upgrade.
But, "only refactor" sometimes takes years. In this case, I assume that it can take some time, I myself didn't have enough free time neither.
So, there's also a quicker hot fix: if a MethodCall
arrives -- just dispatch it properly!
Remember the site Lluis Sanchez page I mentioned earlier? Check its TypeTerminatorSink
:
public IMessage ExecuteMessage (IMethodCallMessage call)
{
MarshalByRefObject target = InterceptionManager.GetObject (call.Uri);
if (target == null) throw new InvalidOperationException ("Object for uri '" + call.Uri + "' not found");
return RemotingServices.ExecuteMessage (target, call); // <---- HERE
}
So, to fix the problem, in the xUnit's XmlNodeCallbackHandlerWithIMessageSink
, only a single sanity check has to be added:
IMessage IMessageSink.SyncProcessMessage(IMessage msg)
{
if (msg is IMethodCallMessage)
return System.Runtime.Remoting.RemotingServices.ExecuteMessage(
this, (IMethodCallMessage)msg);
and also it needs to be copied a little above, to a sister IntCallbackHandlerWithIMessageSink
.
I'll probably post an those fixes to the xUnit's issue tracker in a few days.
PS. If anyone from xUnit's team is reading this by chance:
If are going to stick with the IMessageSink
and this hot fix, I suggest also borrowing AsyncProcessMessage
implementation from MSDN: Sinks and Sink Chains as the IMessageSink objects are said to be required, it is explicitely written right in this article:
Notes to Implementers
It is important to note that code implementing the current interface must provide implementations for both SyncProcessMessage and AsyncProcessMessage, since synchronous calls can be converted to asynchronous calls and vice versa. Both methods must be implemented, even if the sink does not support asynchronous processing.
And more importantly, the IMessageSink
is disallowed to throw any exceptions (and now there are some NotImplemented ones in place of the async part of it). If you need a proof for the no-exceptions clause, I can find a link for it too.
At some point of time I'll probably post a patch with the async part, too, or with the better fix (removing the IMessageSink
in favor of custom interface), but don't wait for me if the grue eats me.
HTH, q
No comments:
Post a Comment