Upgrade NUnit to v4; misc test improvements, #1001#1271
Conversation
|
Marking as draft until NUnit 4.6 is released with my merged PR (nunit/nunit#5236) included, to remove the reflection for ArgDisplayNames. However, the rest is still able to be reviewed. cc @NightOwl888 |
|
NUnit 4.6 is now out with my PR fix included, so I'll pick this up again today. |
|
|
||
| // LUCENENET specific - time out test projects at 55 minutes to allow the results | ||
| // to be uploaded before the 60 minute Azure DevOps job cutoff for easier troubleshooting | ||
| [assembly: Timeout(3300000)] |
There was a problem hiding this comment.
This serves an important purpose. Azure DevOps gives us access to a build agent for only 1 hour. That includes any tasks that run after the test task. If Azure DevOps forcefully shuts us down before the tests complete, we don't get any logs to inspect to figure out why our tests didn't complete. That includes any test seeds to repeat what happened in CI.
On the other hand, if we can force NUnit to shut down within that 1 hour window (with some wiggle room to upload the logs), we will get rich info about what happened.
Keep in mind, for performance reasons we run test projects in parallel in background threads so the results are not printed directly on the output window Azure DevOps gives us.
NOTE: There were a couple of efforts to allow our tests to be run in parallel within test projects, but the codecs rely on static state to function. I made an attempt to alter the design a bit to allow injection, but it affected several user components (like Directory). So, ultimately it seemed simpler just to attempt other ways to run tests in parallel (which is how we ended up with multiple test projects for the core even though upstream there was one).
Granted, we should be stable enough now so this never happens, but:
- We will likely run into hangs again when we upgrade to the next version
- This feature is technically built-in to .NET 5+ in
dotnet test, but it doesn't work on .NET Framework - That hour of run time is an hour we don't get back if it doesn't even provide any clue as to why the test runner hung
Also, note we duplicated this in a few different projects that were particularly susceptible to hangs.
There was a problem hiding this comment.
First, to clarify: TimeoutAttribute has never worked correctly on .NET 5+. It requires Thread.Abort, which throws a PlatformNotSupportedException in .NET 5+. So among our current targets, this has only worked for .NET Framework.
We definitely can't just add this back in as-is, because the tests fail at runtime now. NUnit now throws an exception when this attribute is used on .NET 5+. And, it's marked as Obsolete to boot, giving warnings.
But, if we feel this is important for .NET Framework tests only, I could add this back in with a FEATURE_ gate and suppress the obsolete warning. Let me know.
Note that --blame-hang-timeout already handles this for .NET 5+, but it is unclear to what degree that works correctly on .NET Framework (and I am not particularly interested in digging into that to find out).
There was a problem hiding this comment.
Okay, so I couldn't let this rest. It was bothering me. Had Claude Code on Windows (so it could test .NET Framework properly) run a test and here is its report:
TL;DR
dotnet test --blame-hang-timeout does work on .NET Framework (net472 and net48) with current Microsoft.NET.Test.Sdk versions. The Microsoft docs statement that --blame-hang-timeout is netcoreapp2.1+ only appears to be stale. We can use it across all our test TFMs.
Why we tested
Microsoft's docs say --blame-hang-timeout only works on netcoreapp2.1 and later on Windows. Lucene.NET still targets net472 and net48 for tests, so before relying on this option for hang detection in CI we wanted to confirm the limitation for ourselves.
Setup
Standalone scratch project, multi-targeted net10.0;net9.0;net8.0;net48;net472, with one NUnit test that sleeps for 2 minutes:
[Test]
public void HangFor2Minutes()
{
Console.WriteLine($"[HangTest] Starting hang at {DateTime.Now:HH:mm:ss.fff} on TFM '{RuntimeInformation.FrameworkDescription}'");
Thread.Sleep(TimeSpan.FromMinutes(2));
Console.WriteLine($"[HangTest] Finished hang at {DateTime.Now:HH:mm:ss.fff} (this should NOT print if blame-hang-timeout works)");
}Each TFM was run as:
dotnet test -c Release -f <TFM> --no-build --blame-hang-timeout 1mThe test is designed to hang twice as long as the configured timeout. If --blame-hang-timeout works, the run aborts at ~60 s; if it doesn't, the test runs to completion at ~120 s.
We ran the experiment twice — once with the package versions currently pinned in the repo, and once with the latest stable versions of all three packages.
Results
Run 1 — repo-pinned versions
Microsoft.NET.Test.Sdk 17.11.1, NUnit 3.14.0, NUnit3TestAdapter 4.6.0
| TFM | Time to abort | Aborted by blame-hang-timeout |
|---|---|---|
| net10.0 | 62 s | ✅ |
| net9.0 | 62 s | ✅ |
| net8.0 | 62 s | ✅ |
| net48 | 62 s | ✅ |
| net472 | 62 s | ✅ |
Run 2 — latest stable versions
Microsoft.NET.Test.Sdk 18.5.1, NUnit 4.6.0, NUnit3TestAdapter 6.2.0
| TFM | Time to abort | Aborted by blame-hang-timeout |
|---|---|---|
| net10.0 | 61 s | ✅ |
| net9.0 | 62 s | ✅ |
| net8.0 | 62 s | ✅ |
| net48 | 62 s | ✅ |
| net472 | 62 s | ✅ |
In every case the runner emitted:
Data collector 'Blame' message: The specified inactivity time of 60 seconds has elapsed.
Collecting hang dumps from testhost and its child processes.
Data collector 'Blame' message: Dumping <pid> - testhost(.netXX).
Test Run Aborted.
Process exit code was 1, so CI will correctly flag a hung run as a failure.
Conclusion
--blame-hang-timeoutworks for our .NET Framework test runs (net472,net48) just as it does for.NET 8/9/10. Microsoft's docs are out of date.- The behavior is identical between
Microsoft.NET.Test.Sdk17.11.1(currently pinned) and18.5.1(latest), so the limitation — if it ever applied to our pinned version — is already lifted. - We can safely use
--blame-hang-timeouton all our test TFMs as a guard against deadlocked tests.
There was a problem hiding this comment.
Yeah, --blame-hang-timeout works on net5.0 and higher, so no issue there.
Although, we needed this feature on .NET Core 1.0, but didn't get it until .NET 5!
But --blame-hang-timeout is a blunt instrument. If there is only 1 test in the whole assembly that will hang, then it will show in the log which test it hung on, but if there are multiple, it means we only get info about 1 hang per test run. But [TimeoutAttribute] allows us to annotate individual tests so we can fail them and let the test run complete. We also use it at the assembly level on projects that have many tests that could hang as a backstop to ensure the test runner finishes.
The thing is, we are in a good place now where we don't have any hangs or crashes. But what about when we upgrade? Do you have any idea how long it will take to find all of the hangs if we have to rely on reporting of only 1 hanging test per CI run? Even worse, the first one that hangs might be really far away from the source of the problem and one or more tests that hangs or fails after it may provide more clues.
🧩 What NUnit 4 removed (and why it hurts you)
In NUnit 4:
- TimeoutAttribute → obsolete
- Assembly-level timeout → effectively unsupported
- Thread abort–based enforcement → discouraged/removed
The reasoning is sound in isolation:
- Thread abort is unsafe
- Async tests don’t behave well with it
- Cross-platform consistency matters
But…
👉 None of that replaces:
- Global safety nets
- Per-test hard cutoffs
- Guaranteed termination when things wedge
So you’ve lost:
- Per-test hard kill (reliable-ish on .NET Framework)
- Assembly-wide guardrails
- A fallback when the test host itself misbehaves
🚨 The uncomfortable truth
What you’re doing is still necessary in CI environments like Azure DevOps.
There is currently no single replacement that covers:
- Hung individual test
- Hung test host
- Hung background threads
- Agent-level kill before cleanup
So the answer is not “switch to the new way”—because there isn’t one.
We have had to deal with every one of these issues.
To me, this seems like a downgrade, not an upgrade. We are basically back on .NET Core 1.0 where we had no control at all over the test run and cannot kill it to preserve any logging info to figure out what went wrong. It is really frustrating not even having the name of the test that caused the hang so we don't even have a clue where to start.
My biggest worry is that NUnit 4 may have modernized to support CancellationToken which effectively takes this critical forceful termination tool away from us.
Maybe this would be fine for a home-grown product that improves in stability over time. But we will be going into cycles of instability on every Lucene upgrade and hanging tests are a reality that we must deal with every time.
Custom TimeoutAttribute?
Since we cannot compromise on the feature and cannot rely on continued support from NUnit to support it, the only alternative is to maintain it ourselves. ChatGPT sketched this implementation for an assembly level timeout.
AssemblyTimeoutAttribute
You can get very close to the old assembly-level TimeoutAttribute behavior with a self-contained watchdog that:- Starts once per test assembly
- Tracks activity (so it’s not just a blind wall-clock kill)
- Forces a process-level termination when the timeout is exceeded (so Azure DevOps can’t strand you)
- Emits diagnostics before dying
This works in both NUnit 3 and 4 because it uses only public, stable hooks.
🧩 Design goals (matching old behavior)
We want to approximate:
✔ Assembly-wide timeout
✔ Resets when tests are actively running (optional but recommended)
✔ Kills everything if exceeded
✔ No reliance on ThreadAbort
✔ Works under dotnet test / VSTest
✔ Produces logs before termination
✅ Drop-in replacement: AssemblyTimeoutGuard
- Usage (what you put in your test assembly)
[assembly: AssemblyTimeout(30 * 60 * 1000)] // 30 minutes- Implementation
using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
using NUnit.Framework;
using NUnit.Framework.Interfaces;
// Attribute applied at assembly level
[AttributeUsage(AttributeTargets.Assembly)]
public sealed class AssemblyTimeoutAttribute : Attribute, IApplyToTest
{
private readonly int timeoutMs;
public AssemblyTimeoutAttribute(int timeoutMs)
{
this.timeoutMs = timeoutMs;
}
public void ApplyToTest(Test test)
{
// Attach a singleton guard to the root test suite
if (test is TestSuite suite && suite.Parent == null)
{
suite.Properties.Set(nameof(AssemblyTimeoutAttribute), timeoutMs);
}
}
}
// Global setup fixture that runs once per assembly
[SetUpFixture]
public sealed class AssemblyTimeoutGuard
{
private static CancellationTokenSource? _cts;
private static Task? _watchdogTask;
private static int _timeoutMs;
private static long _lastActivityTicks;
[OneTimeSetUp]
public void Start()
{
var context = TestExecutionContext.CurrentContext;
var test = context.CurrentTest;
// Read timeout from assembly attribute
if (test.Properties.ContainsKey(nameof(AssemblyTimeoutAttribute)))
{
_timeoutMs = (int)test.Properties.Get(nameof(AssemblyTimeoutAttribute));
}
else
{
return; // no timeout configured
}
_cts = new CancellationTokenSource();
_lastActivityTicks = Stopwatch.GetTimestamp();
_watchdogTask = Task.Run(() => WatchdogLoop(_cts.Token));
}
[OneTimeTearDown]
public void Stop()
{
try
{
_cts?.Cancel();
_watchdogTask?.Wait(TimeSpan.FromSeconds(5));
}
catch
{
// ignore
}
}
private static async Task WatchdogLoop(CancellationToken token)
{
var frequency = Stopwatch.Frequency;
while (!token.IsCancellationRequested)
{
await Task.Delay(1000, token).ConfigureAwait(false);
var elapsedMs = (Stopwatch.GetTimestamp() - _lastActivityTicks) * 1000 / frequency;
if (elapsedMs > _timeoutMs)
{
DumpDiagnostics(elapsedMs);
HardKill();
}
}
}
public static void NotifyActivity()
{
Interlocked.Exchange(ref _lastActivityTicks, Stopwatch.GetTimestamp());
}
private static void DumpDiagnostics(long elapsedMs)
{
try
{
TestContext.Progress.WriteLine(
$"[AssemblyTimeout] No activity for {elapsedMs} ms. Terminating test run.");
}
catch
{
// last resort logging
Console.Error.WriteLine(
$"[AssemblyTimeout] No activity for {elapsedMs} ms. Terminating test run.");
}
}
private static void HardKill()
{
// FailFast ensures:
// - Process terminates immediately
// - Dumps can be collected by blame-hang
// - Azure doesn't just "hang forever"
Environment.FailFast("Assembly timeout exceeded.");
}
}- Activity tracking (this is the key to making it “smart”)
Without activity tracking, this becomes a blunt wall-clock timeout (worse than NUnit 3 behavior).
You want to reset the timer whenever tests are progressing.
Option A: Base class (simplest)
public abstract class TestBase
{
[SetUp]
public void Touch() => AssemblyTimeoutGuard.NotifyActivity();
[TearDown]
public void TouchAfter() => AssemblyTimeoutGuard.NotifyActivity();
}Then:
public class MyTests : TestBase
{
[Test]
public void Test1() { ... }
}Option B: NUnit action attribute (cleaner, no inheritance)
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Class)]
public sealed class ActivityAttribute : Attribute, ITestAction
{
public void BeforeTest(ITest test)
=> AssemblyTimeoutGuard.NotifyActivity();
public void AfterTest(ITest test)
=> AssemblyTimeoutGuard.NotifyActivity();
public ActionTargets Targets => ActionTargets.Test;
}Apply globally:
[assembly: Activity]🔥 Why this works better than old NUnit behavior
Old TimeoutAttribute (assembly-level):
❌ Didn’t account for progress
❌ Used thread abort
❌ Could leave process in bad state
This approach:
✔ Detects real hangs (no activity)
✔ Lets long-running but healthy tests continue
✔ Uses process-level termination (reliable in CI)
✔ Plays nicely with:
dotnet test --blame-hang-timeout- Azure DevOps timeouts
⚠️ Important caveats (honesty section)
This is very close to the old behavior, but:
- It kills the entire process
- Not just one test
- This is intentional (and more reliable in CI)
- Activity tracking is cooperative
- If a test hangs mid-execution → watchdog fires ✔
- If a test is CPU-bound but alive → may still fire unless you add more hooks
- You should still use
--blame-hang-timeout
- This complements it, not replaces it
🧠 Recommended final setup
Combine all three layers:
- Per-test
[Timeout](selective, legacy safety net) - This
AssemblyTimeoutwatchdog (your replacement) dotnet test --blame-hang-timeout(outer kill + dumps)
That gives you:
- Per-test protection
- Assembly-level protection
- Process-level protection
- CI-level protection
But frankly, NUnit 3 already supports [Timeout] and it is stable, so why would we want to add this maintenance burden?
Test Framework Status
We are currently at a point where the test framework:
- Is stable enough to run tests and debug
- Can reliably repeat random failure conditions across TFMs and OSes
- Has enough features to support all of Lucene's tests
- Is reliable enough to release Lucene.NET 4.8.0
- Is close enough in design to Lucene's version that we can upgrade it
- It relies on a hack, but since we are on NUnit 3 we don't have to worry much about version drift
In my view, we are basically done for this version of Lucene except for a few optional niceties for a better UX and better alignment with Lucene. We are missing some features that Lucene has, but:
- They are not required for a release
- We have more important issues to deal with
- User support is a secondary concern
- Any changes to what we have now are adding risk
- Moving to NUnit 4 means we have to worry about version drift and/or work with the NUnit team to add more NUnit features to remove our hack
- We have to give up
[Timeout]attribute that we will need for the Lucene upgrade to deal with the (completely expected) hangs in CI or develop a reliable replacement
To which I have to ask, what is so important in NUnit 4 that is worth giving up the stability that we have to get there? So, far we have:
- Better user support so they get the latest NUnit features (a secondary concern)
- The ability to upgrade (which we don't really need)
- Execution hooks (which we could use to block users from being able to override the test framework startup)
- A chance to work with the NUnit team to add more support for our test framework without hacks (a time-consuming task that could wait for the Lucene upgrade)
- Possibly better performing asserts so we don't have to ship our own
Did I miss anything?
The test framework has been the costliest part of this project, primarily because it had been unstable for far too long until we got around to addressing the biggest issues like repeatability. It seems like the risk to upgrade to NUnit 4 is high, the benefit is low, and it is just going to delay the release if we have to add "fully support NUnit 4 without hacks" and "find a replacement for [Timeout]" to the list of tasks we already have. In addition, there may be other unforeseen issues that may arise because of the upgrade to NUnit 4.
To me this is just a shiny object that is a distraction from the fact that the test framework (along with the build and CI pipeline) took a very, very long time to stabilize, possibly means we have to do it all over again, and certainly means more work to do that is avoidable by staying on NUnit 3. Where is the pot of gold at the end of the rainbow?
There was a problem hiding this comment.
But
[TimeoutAttribute]allows us to annotate individual tests so we can fail them and let the test run complete.
This is already done in my PR, via CancelAfter. There is only one case where there wasn't a good seam to do this, Test4GBStoredFields.Test(). I can file a separate issue to look into adding cancellation support to ForceMerge to accomplish this, but I don't think this invalidates a NUnit 4 upgrade. Also worth noting is the cooperative cancellation, but we get forced cancellation via --blame-hang-timeout for the project with being able to see which test hung (more details below).
The thing is, we are in a good place now where we don't have any hangs or crashes. But what about when we upgrade? Do you have any idea how long it will take to find all of the hangs if we have to rely on reporting of only 1 hanging test per CI run? Even worse, the first one that hangs might be really far away from the source of the problem and one or more tests that hangs or fails after it may provide more clues.
To start with, we don't get this today on .NET 8/9/10, because Timeout doesn't work correctly there. So considering that 60% of our targets and 75% of our target runs on ADO don't support what you're describing is notable.
Second, we're in a much better place today than we were before. We've solved a lot of those test stability issues, with fixes that are not undone by upgrading NUnit (because it really hasn't changed much under the hood). Lucene upgrade changes can be more easily incrementally tested with our robust testing infrastructure. AI can more easily search through hang dumps if they do happen. And, NUnit has already shown itself to be more reliable than before in my testing, and we'll continue to benefit from NUnit bugs being fixed (unlike if we stay pinned to one version).
To me, this seems like a downgrade, not an upgrade. We are basically back on .NET Core 1.0 where we had no control at all over the test run and cannot kill it to preserve any logging info to figure out what went wrong. It is really frustrating not even having the name of the test that caused the hang so we don't even have a clue where to start.
This is exaggerated a bit IMO. We do have control over the test run to avoid the concern about ADO limits, via --blame-hang-timeout which we're already using and still will use with NUnit 4.
Blame hang dumps give you the information needed to diagnose what went wrong, including the name of the test. Using my example above, I was able to use dotnet-dump to analyze the blame hang dump files. It took just a little grepping to find the stack trace of the hung test, even in the .NET Framework dump.
dotnet-dump analyze <hangdump.dmp> -c parallelstacks -c exitPartial output from analyzing the net472 dump:
~~~~ 2958
1 System.Threading.Thread.Sleep(Int32)
1 System.Threading.Thread.Sleep(TimeSpan)
1 HangTest.HangTests.HangFor2Minutes()
1 System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object, Object[], Object[])
1 System.Reflection.RuntimeMethodInfo.Invoke(Object, BindingFlags, Binder, Object[], CultureInfo)
1 NUnit.Framework.Internal.Reflect.InvokeMethod(MethodInfo, Object, Object[])
1 NUnit.Framework.Internal.Commands.TestMethodCommand.Execute(TestExecutionContext)
1 NUnit.Framework.Internal.ContextUtils+<>c__DisplayClass1_0<System.__Canon>.<DoIsolated>b__0(Object)
1 System.Threading.ExecutionContext.RunInternal(ExecutionContext, ContextCallback, Object, Boolean)
1 System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object, Boolean)
1 System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
1 NUnit.Framework.Internal.ContextUtils.DoIsolated(ContextCallback, Object)
1 NUnit.Framework.Internal.ContextUtils.DoIsolated(Func<__Canon>)
1 NUnit.Framework.Internal.Execution.SimpleWorkItem.PerformWork()
1 NUnit.Framework.Internal.Execution.ParallelWorkItemDispatcher.Dispatch(WorkItem, ParallelExecutionStrategy)
1 NUnit.Framework.Internal.Execution.CompositeWorkItem.RunChildren()
1 NUnit.Framework.Internal.Execution.TestWorker.TestWorkerThreadProc()
Note the third line of the stack trace is the test class/name. Additionally, the generated Sequence_*.xml files can give hints based on the test order that was run and which ones were not completed.
So we have three viable replacements:
- Assembly-level timeout: solved via
--blame-hang-timeouton all targets, with dump and sequence files that are easy to either inspect withdotnet-dumpby hand or via AI tools. - Test-level timeout: mostly solved via
CancelAfterwith cancellation tokens. Any tests that truly hang (instead of just taking a while to run) will be caught by the assembly-level timeout. - Test-level long-run reporting: for any tests where we just want to make sure it didn't go to slowly, we can use the MaxTime attribute - this does not kill/cancel the test, but if it runs longer than the specified time, it'll report that.
But frankly, NUnit 3 already supports
[Timeout]and it is stable, so why would we want to add this maintenance burden?
Skipping over the "it is stable" premise, but I do not think we need to add a custom assembly-level timeout replacement at this time given the viable replacements above.
That said, we have an escape hatch from this TimeoutAttribute discussion right now if we really want to retain it for .NET Framework: we can conditionally compile it in for platforms where it works. It's only an Obsolete warning right now, it hasn't been removed yet. We can kick the can down the road on coming up with our own assembly-level timeout until NUnit 5.0 is released and presumably removes it, if we decide we still need that support. It doesn't have to be done with this PR.
To which I have to ask, what is so important in NUnit 4 that is worth giving up the stability that we have to get there? So, far we have:
- Better user support so they get the latest NUnit features (a secondary concern)
- The ability to upgrade (which we don't really need)
- Execution hooks (which we could use to block users from being able to override the test framework startup)
- A chance to work with the NUnit team to add more support for our test framework without hacks (a time-consuming task that could wait for the Lucene upgrade)
- Possibly better performing asserts so we don't have to ship our own
I do not accept the premise that upgrading to NUnit 4 "gives up stability." I think their core team would disagree with that point as well. There have been many bug fixes since 3.14.0 over 3 years. I had Claude Code review all changes between 3.14.0 and 4.6.0, and it determined that "4.6.0 is a stability win — full stop" (its words). And again, none of this actually materially changes the logic in our tests.
I'd argue that we do need the ability to upgrade. It is usually harder and harder to keep things like this pinned as new .NET and VSTest SDK versions come out. We should aim to keep evergreen with our dependencies to help contributors, our users, and to be the beneficiaries of the latest bug fixes and improvements.
Also, there are benefits like my PR that was merged, to let us remove private reflection. And we can benefit from the new APIs that are available now, too.
I do think #3 and #4 should not be understated. This is a prerequisite for those. I don't think #5 is going to be all that impactful, but it is worth noting.
It seems like the risk to upgrade to NUnit 4 is high...
I don't see this as being true. As far as I can tell, it's a greater risk to our time being wasted to stay on 3.x.
... the benefit is low ...
As noted above, I disagree with this as well.
... and it is just going to delay the release if we have to add "fully support NUnit 4 without hacks" and "find a replacement for
[Timeout]" to the list of tasks we already have.
I don't think we necessarily need to find a replacement for Timeout, as demonstrated above. The primary concern is around hangs, and those are evident in the dump files. If analyzing dump files becomes an untenable chore, we can reevaluate adding in our own hang tracking. I don't think it will.
In addition, there may be other unforeseen issues that may arise because of the upgrade to NUnit 4.
This is always a fair point, however, the major version bump just sounds more scary than it is. The code is largely still the same, and if we adopt the extension asserts, even a lot of those have returned now. We already successfully upgraded to 3.14, I expect this upgrade to go even better with that under our belt.
There was a problem hiding this comment.
I do not accept the premise that upgrading to NUnit 4 "gives up stability." I think their core team would disagree with that point as well. There have been many bug fixes since 3.14.0 over 3 years. I had Claude Code review all changes between 3.14.0 and 4.6.0, and it determined that "4.6.0 is a stability win — full stop" (its words). And again, none of this actually materially changes the logic in our tests.
Now you are putting words in my mouth. I wasn't implying that NUnit 4 is unstable.
What I meant was that no tests were provided in #547. Our custom attributes that extend NUnit (AwaitsFix, Nightly, Slow, Weekly) have no tests. Furthermore Lucene 4.8.1 didn't provide any tests for Lucene.Net.TestFramework. What we have was borrowed from Lucene 6.x and Lucene 8.x and coverage is very spotty at best and completely incompatible at worst. There are no tests at all for LuceneTestCase. The parts that were derived from randomizedtesting were adaptions of concepts rather than true ports of features, so no tests are available to port.
The only thing we do have adequate test coverage of is the new [Repeat] attribute. That was a lot of effort to test, but now that we have pulled in some of NUnit testing approach, we can probably put attribute tests together more quickly.
All of these things have been tested manually and have had a good track record of reliability on NUnit 3.
That said, I don't see any new tests in this PR that confirm it all still works, but it would go a long way toward getting me fully on board with this. Particularly the RandomizedContext, LuceneTestFrameworkInitializer and friends that rely on NUnit behavior to function and leave us stranded if they stop working. But see the section on RandomizedTesting first.
Adding Test Framework Tests
After giving this some thought, the way NUnit does their testing is exactly the way that we need to build integration tests for the parts of our test framework that directly extend NUnit. They set up a mock test environment with classes that contain test methods and then run various scenarios over them.
Lucene upgrade changes can be more easily incrementally tested with our robust testing infrastructure.
You are overlooking an important phase of the upgrade - even if the upgrade goes well, the first thing we have to do is upgrade the Lucene.Net assembly and its tests, Lucene.Net.TestFramework assembly and its tests, and any dependencies that are required for the tests to function.
So, we will be in a phase where we will have new code in the test framework itself as well as code that is very likely to hang in this first group of assemblies. We won't be able to rely on the test framework at all to bail us out because there may be bugs in it just like the production code - that is, unless we figure out how to mock Lucene.Net assemblies and port the test framework and its tests first.
But you have proven your point that --blame-hang-timeout and temporarily enabling [Timeout] is our stopgap in this phase and has some guarantees that we can still get test logs if we hang, so let's move on.
I'd argue that we do need the ability to upgrade. It is usually harder and harder to keep things like this pinned as new .NET and VSTest SDK versions come out. We should aim to keep evergreen with our dependencies to help contributors, our users, and to be the beneficiaries of the latest bug fixes and improvements.
Also, there are benefits like my PR that was merged, to let us remove private reflection. And we can benefit from the new APIs that are available now, too.
Fair points.
Although, removing the Lucene.Net.TestFramework package from the release could solve our user upgrade problem, also. I don't think the people working on this project before me intended to release it, anyway.
I do think #3 and #4 should not be understated. This is a prerequisite for those. I don't think #5 is going to be all that impactful, but it is worth noting.
Execution Hooks
DI Hook
Execution hooks don't buy us much, although I think we can move our DI hook there and do reference counting as long as the reference counting is filtered on LuceneTestCase where all 4 attributes are present (OneTimeStartUp, StartUp, TearDown, OneTimeTearDown). This does mean that it will start up later and end earlier than it does with our current approach, but I think it will still work.
Overriding Users
As for #1087, we may or may not need execution hooks for this. A lot of our early issues may have stemmed from:
- Lucene was using static methods for these attributes
- We had some order-dependent setup that belonged in DI, but we had no DI layer to put it before NUnit called the attribute
- Lucene used JUnit's test rule chains to do some tasks we now do in DI
- .NET has no concept of system properties and we use
Microsoft.Extensions.DependencyInjection.Abstractionsso users can extend it, which requires a DI hook
However, when I looked into this a couple of weeks ago, OneTimeSetUp and OneTimeTearDown fire in the same order that beforeClass() and afterClass() do in Lucene. Maybe it was the SetUp and TearDown methods that fired in opposite order than Java or maybe the misordering problem was due to using static methods. But I am not 100% sure that we need execution hooks to solve this.
Emulating JUnit Test Rules
It would probably work to use execution hooks for #1088, but it would take some analysis to work out if everything that Lucene needs can be emulated using execution hooks and time to work out a solution. But it is not a requirement to release Lucene 4.8.0 to have this. It only gains us line-by-line parity with Lucene and potentially a useful feature for users.
Frankly, it would be better to push something like this into RandomizedTesting or even make it into an NUnit 4 extension package so it is maintained outside of Lucene.NET. Or perhaps even request that NUnit add it. I requested it when they were planning for NUnit 4 but my request was ignored.
Seeding the Tests
This is the real risk area. Since NUnit 3 has very low churn our hack can be made to work as is. There is no extension point in NUnit that lets us hack into the discovery phase without re-implementing the whole test discovery sequence ourselves. Upgrading means our hack could break at any point and is much more likely to break.
Technically, this hack uses supported mechanisms, IFixtureBuilder2 and ITestFixtureData, but AFAIK, there is no way to specify an alternate NUnitTestFixtureBuilder without doing this and no way to get the raw scanned test data before NUnit applies filters without replacing it.
So, we require a formal hook if we are to get any stability here. The real trick is dealing with NUnit's insistence that everything be built on the assumptions that they will be calling us (we cannot call them) and that we get no say if we need our assembly to be scanned for extensions and we are not the owner of the top-level test project.
Our requirements for this extension:
- Ensure there are no filters on the tests so we can generate a seed for all of them
- Ensure the tests are sorted in the same order every time for seeding
- Ensure the same instance of a seeded Randomizer class is used to sequence the seeds of every test
- Ensure if something goes wrong, a message is routed to the correct place in NUnit to display it
This cannot be done at the beginning of the execution phase because by that point the tests can have filters applied (which may be applied by an IDE). The way it is done now is that we handle the scan, wire in the test, and track the assembly where each test lives, along with some Lucene-specific tasks like setting the StringHelper.goodFastHashSeed to ensure repeatability there, also. It seems like we may need more than one extension, but this is going to take some analysis to figure out. And also work out how to design it so the same Randomizer instance can be used to seed everything, which is not easy if we have to span multiple independent events.
If we upgrade to NUnit 4, we have only choices:
- Stick with our hack and deal with any issues NUnit creates for us
- Invest all of the work necessary to analyze what our requirements are and work with NUnit to build an extension point
Frankly, I wasn't even planning to look at this code again until we upgrade and then rewrite it all as something viable for RandomizedTesting at that point (more on that later), with full tests and skip over any need to refactor this for Lucene 4.8.0. It took 6 weeks of analysis and development (before AI) to come up with this solution - even without investing more time into figuring out how to make automated tests for it.
It was worth it to skip the tests because within a couple of weeks every last random test failure outside of a few exceptions was fixed.
Exception Handling
This one is easy to miss and didn't occur to me until now. The exception wrappers that we added also depend on all of the NUnit exceptions to be mapped so they are handled correctly by our catch blocks the same way they are in Java. These are set by the LuceneTestFrameworkInitializer to avoid adding a dependency on NUnit in Lucene.Net.
If NUnit added (or ever adds) any other exceptions that could potentially be thrown, we will need a new mappings for them. While neither NUnit adding unexpected exception types or users (including Lucene tests) performing tasks that throw these new exceptions are likely, it is worth noting that this logic was all built around the assumption that we would be staying on NUnit 3 for the release.
In .NET, Exception is the root exception class
Exception
But in Java, it is a subclass of Throwable
Throwable
Error
Exception
JUnit is set up to throw classes that derive from Error. But most catch blocks in tests catch Exception or something that derives from it. So, the tests rely on our mapping to ensure they are not treated as catchable errors within tests.
NUnit Messaging
Several pieces of our test framework tap into the NUnit test messaging to get/set failure states and messages. While it is not likely that any of has changed, it is worth noting that we don't have any automated tests for any of it, so if a message that worked on NUnit 3 suddenly stops working on NUnit 4, we have no way of detecting it until we realize it is missing.
2 Versions of NUnit
At one point, I thought there would be some advantage to supporting all 3 major test frameworks:
- NUnit
- Xunit
- MSTest
Unfortunately, the other two were even less friendly to the idea of a 3rd party owning a subclass that did its own initialization and teardown steps than NUnit was. It took me a while to spot the flaw in this idea - users can have more than one test project for a given assembly so there is very little point in providing support for all of these test frameworks.
Correct me if I am wrong, but I think you are making that same assumption here. Putting a version range into the test framework so it can't be upgraded to NUnit 4 doesn't stop anyone from adding a different test project that depends on NUnit 4 to put their non-Lucene.NET tests in. This isn't a hard limit for our users that there is no workaround for.
RandomizedTesting
Now for the real crux of the issue. Our RandomizedContext and other concepts from randomizedtesting were prototyped here. That is why they are not public - they simply aren't ready for users to use.
The randomizedtesting library is not very large. Between our RandomizedContext.Generators package and the RandomizedContext and other concepts we borrowed from randomizedtesting, we have more than half of it. It just seemed like a lot of extra work to make all of this end-user ready that would delay the Lucene.NET release, so I didn't consider porting the whole thing after fixing random repeatability.
So, now that we are talking about working with NUnit to build extensions and doing the hard work of analyzing our requirements and working out how to extend it the way we need, it makes very little sense to do it based on a prototype that is a half-measure to get us to the Lucene.NET 4.8.0 release. Instead, we should analyze the randomizedtesting code to figure out what we actually need to support Lucene into the future. Note that:
- Our current requirements probably don't meet the needs of randomizedtesting and may in fact need completely different extensions to support it
- randomizedtesting has some features (like executing tests in random order) that I consider optional now, but may be worth pursuing if we are building extensions for NUnit 4
- randomizedtesting has tests we may be able to utilize to meet the same set of requirements that they have
- If we port randomizedtesting and work with NUnit at the same time, it is realistically possible to set it up so we can produce the same seed strings that are used in Lucene to produce the same exact test scenario between Lucene and Lucene.NET
- We already have reserved the package ID RandomizedTesting on NuGet and have a project set up for it already plus a CI pipeline
We have 4 choices now:
- Upgrade to NUnit 4, port randomizedtesting now + work with NUnit to support it
- Upgrade to NUnit4, upgrade our existing prototype + work with NUnit to support it
- Stay on NUnit 3 and keep our hack in place
- Upgrade to NUnit 4 and keep our hack in place
So far, you have been assuming that option 2 is an option worth pursuing. But IMO, there is little point in doing that without a full port of randomizedtesting. We would probably end up asking NUnit for extensions that are not required for the long term support of Lucene and have to go back to the well to get what we really need later.
My preferred choice is to do option 3. Options 1 and 2 are pushing the release off further into the future. And option 4 risks our hack getting out of sync with NUnit.
But failing option 3, option 1 is clearly better than option 2. If we are talking about investing the work now to remove our hack, it would be sensible to also remove the prototype from Lucene.NET that matches some of what randomizedtesting does in concept only and move in a direction that more closely aligns with the upstream code.
I don't have an issue with option 1 if you are willing to put in the time to do it. This is an easier task now that AI is a thing. But it is still hard being that NUnit 4 is not anything close to a port of JUnit.
In Summary
There were a lot of assumptions that went into Lucene.NET that depend on the assumption that the release would be based on NUnit 3. If we upgrade to NUnit 4, these assumptions no longer hold true. The above list was as comprehensive as I could come up with, but I may have missed something.
And since we have tested nearly every assumption manually and have almost no test coverage of the parts that really matter, this upgrade is risky.
Furthermore, we are talking about investing time now to patch a prototype design that isn't what we need for supporting Lucene.NET into the future. IMO, it is not worth doing that without actually porting randomizedtesting too. Both tasks require a lot of analysis to do and only one moves us in a forward direction.
But at the end of the day, this is going to burn time that could be better spent on getting ICU4N into a releasable state. Staying on NUnit 3 and living with the fact our test framework is not upgradable will not.
There was a problem hiding this comment.
Thanks for the detailed response. I do not fully agree with all of the conclusions made, but there are some good points here particularly around test coverage. I think we can pursue option 4 responsibly and iteratively by covering our bases with test coverage, because at the end of the day NUnit 4 is not significantly different from NUnit 3 in its core test architecture. So I'm going to mark this as draft for now, add that test coverage as lower priority, and re-open this when I feel we have protected ourselves from upgrades. I do not believe that a full RandomizedTesting implementation is necessary to do this.
NUnit 4.6 added Action overloads alongside the existing TestDelegate ones, making `() => action()` lambdas ambiguous. Pass the Action directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cyLevelTaskScheduler Add tests covering the custom NUnit-derived attributes (Nightly/Weekly/AwaitsFix/ Slow), the LuceneDelegatingTestCommand exception-recording messaging tap, RandomizedContext, and the LuceneRandomSeedInitializer seed-wiring used by the custom fixture builder. The attribute tests read the current tests:nightly/weekly/ awaitsfix/slow flags and assert the matching run/skip branch, so they pass regardless of how the suite is configured. Also add the missing `using Assert = Lucene.Net.TestFramework.Assert;` alias to TestLimitedConcurrencyLevelTaskScheduler.cs, which was binding Assert.Throws to NUnit's ambiguous TestDelegate/Action overloads (CS0121) after the global using revert. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y test
Cast RandomGenerator to J2N.Randomizer before calling NextInt64(), since
System.Random.NextInt64() does not exist on .NET Framework (net48/net472),
which broke the TestFramework.NUnitExtensions build on those targets.
Also add GenerateRandomSeedsIsReproducibleFromTheInitialSeed, verifying the
fixture-builder seeding ("the real risk area") reproduces the exact per-test
seed sequence from a single initial seed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…estCase lifecycle Close the two remaining test-coverage gaps for the NUnit integration: - TestLuceneTestFrameworkInitializer (in Lucene.Net.Tests.TestFramework, which has InternalsVisibleTo to Lucene.Net): asserts InitializeStaticState wires the Lucene.ExceptionExtensions NUnit*Type mappings to the correct types, so a future NUnit rename/move that nulls a mapping is caught directly. - LuceneTestCaseLifecycleTests + LifecycleRecordingFixture: assert LuceneTestCase's OneTimeSetUp/SetUp/TearDown/OneTimeTearDown fire in the expected order (once-each bookends, each test wrapped by SetUp/TearDown), guarding the NUnit lifecycle integration directly rather than only via the repeating-tests fixtures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Upgrade NUnit to v4; misc test improvements.
Fixes #1001
Description
This PR upgrades NUnit to v4. For more details on some of the migration, see the docs here: https://docs.nunit.org/articles/nunit/release-notes/Nunit4.0-MigrationGuide.html
NUnit 4 dropped support for netstandard2.0; thus, this PR drops support for it from the TestFramework project. This only really would affect users who need to create an intermediate unit testing library (note: not a unit test project) that targets netstandard2.0 and references TestFramework. We expect this to not have a wide impact given current relatively low usage of TestFramework on NuGet (~100 downloads/mo). But if this is you, there is a simple workaround: either choose a single supported target (i.e. net8.0 or net462), or multi-target (i.e.
<TargetFrameworks>net462;net8.0</TargetFrameworks>in your project). Getting Lucene.NET on NUnit 4 will be more impactful in the long run and less restrictive to users than having to support a limited set of netstandard2.0 runtimes that are still in use (i.e. maybe Mono and some versions of Unity), and again this is just for unit tests.My first pass at this added a global using alias for Assert to point it to our Lucene.Net.TestFramework.Assert class, but that created way too many files changed in this PR, and it can be added later. So while there are some namespace cleanups here, I tried to minimize the amount of files changed since the other stuff is more important. Also, this upgrade exposed several places where we were not using our Assert class, so that has been fixed in several places.
One notable breaking change in NUnit 4.5 and later is that it now throws at runtime if you try to use the TimeoutAttribute on modern .NET. This was previously documented as not working on .NET 5 and later, but now it is an exception. To solve this at the class/test level, it was replaced with CancelAfterAttribute, which will trigger a CancellationToken parameter if the test runs longer than the configured value. So CancellationToken support was added to most of these tests and threaded through to calls where it made sense to do so. Only one test did not have an obvious seam since ForceMerge is not currently cancelable.
For TimeoutAttribute at the assembly level, there is not a good replacement. So I've removed it, since the existing
--blame-hang-timeoutlikely already covers this for both .NET Framework and modern .NET.Assert saw many documentation and correctness improvements in this PR; too many to mention. Notably, it switched to XML doc comments. Additionally, many unit tests were added for the NUnit-derived attributes.
The private/internal ArgDisplayNames property in NUnit changed to be an auto-prop from a declared field, so the reflection code that pokes into that has been updated to get the internal property instead of the private field. Unit tests were also added for this functionality to help detect and prevent regressions. Although if this regresses with a newer version of NUnit, the code should safely fall back to just using the ToString representation of the class fixture arguments, if anyone is using that (note that we are not using this functionality in our code).
Finally, a concurrency bug in TestRAMDirectory was fixed (which might have been surfaced by the NUnit 4 upgrade), which has the benefit of better matching the upstream Java code anyways. The root case was two different runs of the test could be writing to the folder at the same time.
AI: This was largely done by hand, but some repetitive parts (like removing usings) and some unit tests were done with the help of Claude Code.