-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When is the release version planned? #437
Comments
the version is very mature but still not the final release. |
Hi. Sorry for the late reply. Our goal is to have a release candidate by the end of 2021. Note that we are still analyzing to make sure we are aware of all of the issues and gaps that need to be finished before the release as well as stabilizing APIs and dependencies such as ICU4N and J2N. Although we have all of the modules ported except for a few features, there are likely some issues that are yet unknown which will come up during review and it is not possible to predict whether they are blocking the release or how long they will take to complete. |
We are getting pretty close to the end of 2021 any updates? |
We are working on 4.8.0-beta00016 at present and will likely be releasing it within the next couple of weeks.
While we don't consider We are always looking for additional people to help us reach the goal. Per community request to provide more info about what to work on, we moved from JIRA to GitHub issues and marked several issues up for grabs. While we are committed to completing Lucene.NET, if you are waiting for us to complete it with the small number of resources we have, make no mistake about it you will be waiting. On the other hand, we have recently added some information on the NuGet Readme page on how to get involved and/or sponsor the project if you wish to help us to make Lucene.NET 4.8.0 a production release faster. |
Unfortunately the company I am working for is quite small and doesn't have the extra resources to assist with the development in any substantial way. Thank you for all your hard work and detailed drilldown, I think it important that the roadmap be visible so that individuals or companies with the resources, can assist and with the development where they can. |
I guess this is the crux of the misunderstanding. We don't need a few companies to put in tons of time or funding, we need many companies to put in a small amount of time or funding to keep the project going. We are getting more than 3400 downloads per day on NuGet, which is significantly higher than it was in 2016 when I started working on this (around 600 per day). If only a fraction of the companies and individuals doing those downloads would put in 1 weekend, or 1 day, or 1 hour of time or were financially contributing $50, $20, or $5 per month, we would be moving significantly faster than we are now (mind you, porting on Lucene.NET 4.8.0 started in September 2014). For most small businesses, $50 per month is not going to break their budget. But for us, it is a crucial lifeline to completing the port.
Thanks for the feedback. Creating such a list is a drain on our already small number of resources. Porting work isn't quite the same as developing new applications where the requirements can be well-defined. There is a significant amount of analysis and research that goes into finding the best API or technology to map to on the new platform, and if it doesn't exist or behaves radically differently, we have to build it. What was done initially was a "best guess" for how it should fit together, but in some cases the wrong technology was picked and in others a significant performance degradation happened or unintentional bug was introduced because the new API differs in behavior from the original one. For many of the "tasks" that we are aware of to complete, the analysis has not yet been done. In other words, they are just a general "analyze this and determine how to break this into more tasks". Of course, crowd sourcing the analysis work would be a big help. If everyone reading this spent just 30 minutes to pick a random file and compare the Lucene.NET source line by line against lucene 4.8.0, Googling to verify that each line they are unsure of is ported in a reasonable way, and made us aware of all of the uncommented differences, we would have a leg up on the analysis work. Most of what has been determined has already been posted above or as a GitHub issue. There is a spreadsheet that exists to keep track at a high level, but it is a significant amount of work to keep it up to date. I have made an effort to put some of the more well-defined tasks into GitHub issues that would take anywhere from 30 minutes to a weekend to complete, but despite some of them being on the board for 3 years, I am the one who ends up closing them. I am just not convinced that spending all of my time to pre-analyze everything and update a list is going to bring more help. We have tried this to some degree, and it has not worked. Even for basic stuff like creating a wrapper batch file, creating icons for our NuGet package dependencies, and updating the theme of the code colorizer on our website. As part of #460, I got involved with the effort to revive IKVM. On that project, people offer to help out frequently. But since IKVM was abandoned by its original contributor and has some confusing native bits, it is difficult to navigate through how to help. I can see how creating a list could help out on that project. But for some reason, despite Lucene.NET having nearly 13 million downloads on NuGet and a steady increase in demand over time, we aren't getting many offers for help to complete the port.
|
Of course, crowd sourcing the analysis work would be a big help. If everyone reading this spent just 30 minutes to pick a random file and compare the Lucene.NET source line by line against lucene 4.8.0, Googling to verify that each line they are unsure of is ported in a reasonable way, and made us aware of all of the uncommented differences, we would have a leg up on the analysis work. If this is indeed the approach, then perhaps splitting the work up into buckets with todos for people to take up pieces as part of a roadmap would be helpful. Unfortunately there is no visibility on this project and citing reliance and a complete rework of a now alpha version of IKVM means this project will not likely see a release for a very long time. Additionally, I just spent 30 minutes verifying what you suggest is not feasible. There is no one to one file map between the repos. There file and folder structures structures. Perhaps if this repo were reorganized then tools like win merge would at least facilitate a file by file mapping. Then digging through to fine files there is also not a 1 to 1 line mapping within the files. For example int tokenType = scanner.GetNextToken();
if (tokenType == WikipediaTokenizerImpl.YYEOF)
{
return false;
} String type = WikipediaTokenizerImpl.TOKEN_TYPES[tokenType];
if (tokenOutput == TOKENS_ONLY || untokenizedTypes.contains(type) == false){
setupToken();
} else if (tokenOutput == UNTOKENIZED_ONLY && untokenizedTypes.contains(type) == true){
collapseTokens(tokenType); So if we can come up with a strategy to synchronize the file/folder structure and then reorganize the C# files line by line to match the java, we can take the suggested approach. Alternatively, there is the triage approach of focusing on producing release version(s) of the core product(s). This could be code reviewed and released first while external dependencies which have no visibility or will take a long time to complete (like nlp) can be moved to separate pre-release packages. Additionally similar decisions can be made about other parts of the code base that aren't 'core functionality. For example, given the the tokenizer is likely now out of date being over 8 years old with github warning |
Didn't realize Lucene was on version 9.1 already. This is just a beta version of 4.8,which was released in on Apr 27, 2014, nearly 8 years ago. Looking at the current features and changes, the CSharp codebase is leaps and bounds behind. IMHO, I am highly skeptical this will ever reach be brought on par, Or that we this will see production grade release within the next few years, if ever. With this project being blocked by much larger projects like IKVM, which are estimated at 30% of the outstanding work, and the numerous unaddressed bugs in the codebase (estimated at ~70% of the remaining work), it is unfortunate to say with all of the incredible work that has been done but this repo has the hallmarks of open source projects left on the vine to die. But who knows? |
The file-by-file mapping is accurate (in fact, we have kept the file names the same even if we ended up renaming the type inside the file to follow .NET conventions), but since we are building a .NET application and not a Java application, the deep folder structure has changed to move the files closer to the top level of the project. For example, the files in the https://github.com/apache/lucene/tree/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/index directory exactly correspond to the files in https://github.com/apache/lucenenet/tree/d6f3c3e7aad1847f5df69e4c080f45dad318a3ad/src/Lucene.Net/Index. But let us know if you find any files that were renamed or were added in the Java-ported directories. All of the files that we have added are supposed to be under the Lucene.Net/Support directory. The
When I said "pick a random file", I didn't mean anywhere, I meant specifically in the Of course, due to the differences in using statements, file headers, coding style, etc., we don't expect that every line in a file will correspond to the same line number as it was in Java. As you can see, the Java line 192 corresponds with line 198 in .NET. However, unless commented otherwise (or at least it should be) we have generally kept the order of members in the same order, unless there is no sensible way to port the functionality in .NET. That being said, many types were de-nested to make them more easily discoverable in .NET, but we always ensure those types remain in the same file as the Java code.
Not sure what we expect to gain by this. This is not an automated port, but mostly a manual one. The parts that were automatically converted with a tool had major problems that we had to go back and rework that took longer than simply porting manually. We are still finding some breaking problems, such as incorrect index file naming that are keeping us as a pre-release, and the only way to track them down is by analyzing and scrutinzing the code.
Right, that is the 30% of the remainder that I referred to before. We would have to change the build to allow us to do partial releases on different schedules and fit it in with the Apache release policy to allow us to do separate release votes for each batch of components we are releasing. NLP is a non-issue, really, as only the Lucene.Net.Analysis.OpenNLP project depends on and the public API of The main blocker is ICU4N. Due to the fact that .NET has no built-in In turn,
You are receiving that warning, because recently Lucene split off SOLR into a separate repository, so they did some reworking of the repo history to keep it in place on both projects. The tokenizer is 8 years old, true, but it doesn't get many updates and is not something you are likely to notice. Although the documentation (incorrectly) states that Character classes in .NET Regex use Unicode 8.0, I have used ICU4N's UnicodeSet to analyze it, and it actually only supports Unicode 3.0.1 characters (the version that was released in August, 2000 just before .NET 1.0 was released). Has this had any impact on your usage of Regex? If you do need more up-to-date Unicode, do note that ICU4N is ported from 60.1 (we started when it was still a release candidate) which supports up to Unicode 10.0. Lucene 4.8.0 depended on ICU4J 52.1. Alternatively, the analyzers are relatively painless to port from newer versions of Lucene. The components in the
In 2016, when I started working on Lucene.Net it was being downloaded around 600 times per day. Now it is being downloaded 3600 times per day according to NuGet. It is one of the top 250 packages on Nuget.org. That doesn't seem to be the hallmark of a dying project. ICU4N is probably around 70% of the remaining work (it would be more if we were planning to port more of it, but the plan is to fix the gaps and failing tests we currently have and mark components internal that don't have stable APIs without porting any additional components). We are not even including IKVM in our estimates as it is not blocking the Lucene.Net release. What I said previously was it would probably take 30% of the total amount of work remaining (that is roughly half of the amount of effort it takes to finish up ICU4N) to break up Lucene.NET into multiple segments that we can release separately, which would effectively defer the work on ICU4N until later. I haven't pushed the changes yet, but the biggest ICU4N issue of moving the embedded resources into satellite assemblies that end users can delete if they are not using them is nearly completed. In addition, we have gone from 6 NuGet packages down to just 1 code package and 1 data package, as was done in ICU4J. As @rclabo has just documented in the new quick start section that will soon be on our website, more than 90% of the features that changed in Lucene were between 3.x and 4.x. Since then, there have only been a couple of dozen new features and a handful of new modules. He also goes into an overview of how we ported it, so it is definitely worth a read. Unless Lucene decides to have a major redesign again, this will be the last full port of Lucene.NET. Once it is stable, we will simply upgrade it by porting only the changes in each changed file to the latest version. It has been a while since I have checked, but when I did (I believe it was version 7.2.0) around 80% of the files have had less than 10 changed lines since 4.8.0. Most of the changes have been to the structure of the index. We have just 5 tests that are marked with the Changing our target to 9.1 now only means we add ~1800 extra hours of work on top of few hundred hours of work that remains. It also robs the community of a stable 4.8.0 release that will work on .NET core for the amount of time it takes to do the upgrade. We still have the same unstable dependency issues we have now, plus more work on new dependencies that Lucene has taken on since 4.8.0. We would also have to figure out a way to make Lucene 9.1 read Lucene.Net 3.0.3 indexes to allow people to upgrade the software first and the index later, which is something that doesn't exist in Lucene 9.1 (although, it is fairly easy by comparison to port the backwards-codecs support for 4.x since it is tested independently from everything else), where the Lucene 3.x tests are integrated directly into Of course, there is benefit to doing the upgrade to 9.x after we have a stable 4.8.0 release, it just doesn't make much sense to do it first, especially when most of the gaps of porting to 4.8.0 have already been worked out, all of the modules have been ported, and all but 5 of the tests are passing. |
@alexhiggins732 I agree with @NightOwl888, it’s much better to fully complete version 4.8 before working to upgrade it to whatever the current version of Java Lucene is at that time. You mentioned:
I can understand how the casual observer might reach that conclusion. In fact when I first discovered Lucene.NET 4.8 Beta a couple years ago I too wondered whether its feature set was current enough to be of value. But the more I dug into the project the more I was blown away by Lucene.NET 4.8’s power and the advanced engineering it contains. It’s truly a remarkable piece of software. It’s architecture and features are as relevant today as ever. I know that people see the 4.8 version number, Beta status and length of time it’s taken to port it and they wonder how relevant it is. That’s understandable And that’s the reason I recently I wrote a blog article Lucene.NET 4.8 vs Java Lucene 9.x to help people realize that Lucene.NET 4.8 has a wealth of features and the majority of the features of Java Lucene 9.x. Release Schedule AnalysisBesides what I mention in the blog article, it’s important to realize that the time period between major releases was much longer during the version 1.x to version 4.8 era. And since then, the Lucene team, like most software teams, has moved to doing smaller more frequent releases. Let’s look at the release timeline of Java Lucene. 2000 – First open source version of Lucene. Source: https://www.elastic.co/celebrating-lucene#2020 I don’t know how many years Doug Cutting worked on Lucene before making it open source in 2000. But ya gotta guess it was at least two years. Now look when Lucene 4.8 was released, in 2014. So at the release of Lucene 4.8 there was probably 16+ years of development behind it. That’s not labor years, just calendar years. Think about that for a minute. But how many years difference is there between Lucene 4.8 to Lucene 9? 7 years. Now 7 years is a lot, I grant you that. But it’s important to remember that the single biggest new set of features ever to be introduced in the history of Lucene came in version 4.0. That’s why it took more than 3 years for the Java team get to the 4.0 release even with a large team. In our case 8 years have passed in going from Lucene.NET 3.03 to 4.8. But our team is much smaller and we don’t currently have any corporate backing (Java Lucene has LOTS of corporate backing). Why even compare to Java Lucene?Honestly, I don’t see a lot of point in comparing Lucene.NET to Java Lucene, unless you are equally happy to use a Java library as a .NET one. If that’s the case, the comparison is valid and you should seriously consider using Java Lucene 9.1. But if you are a .NET developer developing a .NET application, website or mobile App then the comparison isn’t really between Lucene.NET 4.8 and Java Lucene 9.1, it’s between Lucene.NET and other .NET based search libraries. What I see when I look at Lucene.NET 4.8So when I look at Lucene.NET 4.8 I see something totally different than what you see. I see a software architecture that was initially created by someone (Doug Cutting) who was creating his “fifth search engine, having previously written two while at Xerox PARC, one at Apple, and a fourth at Excite.” source. Hat’s off to Doug for sharing this with the world. Wow! I see an insanely large feature set hammered out by numerous developers over a 16+ year timeframe. I see more than 644K+ lines of code (not counting dependencies!) that have been ported to c#, and run on .NET on Windows, Linux or MacOS. I see a powerful search library that can be used to search enable desktop applications, websites or mobile apps (Android or iOS). I see a multi-targeted search library that runs on the .NET Full Framework, .NET Core 3.1 LTS, .NET 5 or even the latest and greatest .NET 6. I see a project that while calling itself beta because a few method signatures may still change, has 7800+ passing unit tests and is clearly production worthy right now in my mind. |
@NightOwl888 any update? |
See #793 for more info of the latest updates. |
Due to some annoying rules, we have to use release version. So, when is the release version planned?
The text was updated successfully, but these errors were encountered: