Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterating over a string builder by index becomes ~exponentially slow for large builders #24395

Closed
danmoseley opened this issue Dec 8, 2017 · 29 comments
Labels
area-System.Runtime enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Milestone

Comments

@danmoseley
Copy link
Member

This is motivated by this issue originally opened against MSBuild, which internally in its reuseable string builder factory compares string builders by index. It happens to start from the end, but that doesn't seem to matter. The opener observed this can be enormously slower that creating a string first.

Using this benchmark program I get the following results:

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.98)
Processor=Intel Xeon CPU E5-1620 0 3.60GHz, ProcessorCount=8
Frequency=3507174 Hz, Resolution=285.1299 ns, Timer=TSC
  [Host]     : .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2600.0
  DefaultJob : .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2600.0


          Method |    N |         Mean |       Error |      StdDev |
---------------- |----- |-------------:|------------:|------------:|
   ToStringFirst |  100 |     2.028 us |   0.0403 us |   0.0465 us |
 IndexedBackward |  100 |    21.399 us |   0.4123 us |   0.5063 us |
  IndexedForward |  100 |    23.020 us |   0.4461 us |   0.4581 us |
   ToStringFirst | 1000 |    43.210 us |   0.5032 us |   0.4202 us |
 IndexedBackward | 1000 |   323.959 us |   7.3317 us |   6.8581 us |
  IndexedForward | 1000 |   335.538 us |   6.3072 us |   5.8997 us |
   ToStringFirst | 5000 |   217.657 us |   4.1058 us |   4.3931 us |
 IndexedBackward | 5000 | 6,542.491 us | 131.7964 us | 223.8003 us |
  IndexedForward | 5000 | 6,301.246 us | 102.0129 us |  90.4317 us |

64 bit

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.98)
Processor=Intel Xeon CPU E5-1620 0 3.60GHz, ProcessorCount=8
Frequency=3507174 Hz, Resolution=285.1299 ns, Timer=TSC
  [Host]     : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2600.0
  DefaultJob : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2600.0


          Method |    N |         Mean |       Error |      StdDev |
---------------- |----- |-------------:|------------:|------------:|
   ToStringFirst |  100 |     1.944 us |   0.0369 us |   0.0410 us |
 IndexedBackward |  100 |    22.955 us |   0.2145 us |   0.1902 us |
  IndexedForward |  100 |    22.674 us |   0.2493 us |   0.2332 us |
   ToStringFirst | 1000 |    40.258 us |   0.4520 us |   0.4007 us |
 IndexedBackward | 1000 |   325.231 us |   4.6369 us |   4.3374 us |
  IndexedForward | 1000 |   332.819 us |   8.0014 us |   7.4845 us |
   ToStringFirst | 5000 |   200.979 us |   1.2610 us |   1.1179 us |
 IndexedBackward | 5000 | 7,950.240 us | 157.2641 us | 253.9522 us |
  IndexedForward | 5000 | 8,137.382 us | 161.3316 us | 251.1739 us |

Clearly indexing is going to have a lot of overhead such as bounds checking. But I would expect that to be proportional to N, and we see a big jump when N gets large. Related to fragmentation of the SB? Can we do better?

@danmoseley
Copy link
Member Author

MSBuild has this code here and here

This would not show up unless the MSBuild is expanding eg., large lists (such as the content of a large-ish file read in as items) which is not uncommon but quite likely isn't showing up in the ASP.NET scenarios we're looking at right now.

@davkean

@AndyAyersMS
Copy link
Member

You might see different behavior on Core than on Desktop here.

@danmoseley
Copy link
Member Author

ToString (or string comparison) is measurably faster on Core, but indexing seems just as slow.

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.98)
Processor=Intel Xeon CPU E5-1620 0 3.60GHz, ProcessorCount=8
Frequency=3507174 Hz, Resolution=285.1299 ns, Timer=TSC
.NET Core SDK=2.1.2
  [Host]     : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT


          Method |    N |         Mean |       Error |      StdDev |
---------------- |----- |-------------:|------------:|------------:|
   ToStringFirst |  100 |     1.516 us |   0.0174 us |   0.0154 us |
 IndexedBackward |  100 |    21.894 us |   0.3194 us |   0.2988 us |
  IndexedForward |  100 |    22.972 us |   0.3801 us |   0.3556 us |
   ToStringFirst | 1000 |    39.239 us |   0.1815 us |   0.1515 us |
 IndexedBackward | 1000 |   324.277 us |   5.4631 us |   5.1102 us |
  IndexedForward | 1000 |   324.691 us |   3.3203 us |   3.1058 us |
   ToStringFirst | 5000 |   201.110 us |   3.0653 us |   2.8673 us |
 IndexedBackward | 5000 | 7,822.458 us | 141.3221 us | 125.2783 us |
  IndexedForward | 5000 | 7,552.085 us | 146.0999 us | 184.7696 us |

@danmoseley
Copy link
Member Author

Oops, desktop was on 32 bit. Let me try 64 bit.

@danmoseley
Copy link
Member Author

Edited to add 64 bit for desktop above. Now Core and Desktop are basically identical.

@danmoseley
Copy link
Member Author

The discrepancy is much smaller when the StringBuilder is presized (gist):

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.98)
Processor=Intel Xeon CPU E5-1620 0 3.60GHz, ProcessorCount=8
Frequency=3507174 Hz, Resolution=285.1299 ns, Timer=TSC
  [Host]     : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2600.0
  DefaultJob : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2600.0


         Method |    N |       Mean |      Error |     StdDev |
--------------- |----- |-----------:|-----------:|-----------:|
  ToStringFirst |  100 |   1.810 us |  0.0298 us |  0.0279 us |
 IndexedForward |  100 |  16.534 us |  0.2751 us |  0.2573 us |
  ToStringFirst | 1000 |  40.155 us |  0.7265 us |  0.6440 us |
 IndexedForward | 1000 | 166.396 us |  3.3039 us |  3.8047 us |
  ToStringFirst | 5000 | 206.052 us |  4.0090 us |  6.1221 us |
 IndexedForward | 5000 | 848.451 us | 12.1369 us | 11.3528 us |

Both 1000 and 5000 are approximately 4x, suggesting a more natural linear effect consistent with eg., bounds checks.

@danmoseley
Copy link
Member Author

This is presumably because every index has to walk the chain of chunks, checking the offset for each until it finds the one with the indexed location. In the 5000 case above, there are 250K characters so 32 chunks of 8000 characters. On average it has to loop 16 times to find the right chunk - for every index. No doubt this was a conscious compromise when making stringbuilder chunky. @vancem ?

@AndyAyersMS
Copy link
Member

When we did stuff like this in the past (for chunky sparse bitvectors in a now-defunct compiler) we would cache the result of the last index lookup; subsequent index values tended to be "nearby". Random access would still be slow but this happens less often, but sequential accesses were now good.

This might require some invalidation logic elsewhere if the chunk structure can be messed with.

@danmoseley
Copy link
Member Author

Another option is to store the chunk offsets in a shared array that could be binary searched. But I like your idea, it is simple and unlikely to hurt other cases.

@danmoseley
Copy link
Member Author

danmoseley commented Dec 11, 2017

The user always works against the StringBuilder that owns the end chunk and that has is a singly linked list of chunks back to the start (so incidentally the smaller the index the more blocks to traverse to find it).

In the first idea, there would be two fields in the end StringBuilder: the last accessed index and a pointer to the chunk for it. Whenever the chunk chain is changed (by setting m_ChunkPrevious) those would need to be invalidated with some O(1) math.

This scheme could also speed up FindChunkForIndex and FindChunkForByte by reducing (on average halving) the space they need to walk (using those might not actually set the last accessed, just read it). But for indexing anywhere near the last index, it would change O(chunks) to O(1).

@vancem @jkotas does this seem a reasonable approach and unlikely to measurably affect other perf?

Ideally these 2 fields could somehow be put to use in the StringBuilders representing the chunks other than the final one.

@jkotas
Copy link
Member

jkotas commented Dec 11, 2017

Would it be better to just document that StringBuilder indexer is not fast?

StringBuilder is designed for building strings. It is not designed to inspect them - it will never work great for that. And trying to make it work ok for that just hurts the primary purpose what it is meant to be used for.

@danmoseley
Copy link
Member Author

Indexing as fast as a string or array would compromise its goals for sure. I just hate to think of people getting O(n^2) indexing when a few field gets and sets could get it to O(n) and potentially help other operations like StartsWith. If not the pit of success then at least the pit of mediocrity.

I looked at Azure profiler data and fortunately the usage of get_Chars, set_Chars, StartsWith, FindChunkForIndex are vastly lower than Append, etc. So I'll close this for now...

@Alois-xx
Copy link
Contributor

Alois-xx commented Dec 12, 2017

This should definitely be fixed. I have some build targets which have 8 minutes of "compile" time due to a large string with 900KB length which is compared in MSBuild properties. The resulting string is always the same but trying to intern it takes much longer than to create it and be done with with it. The actual work is < 40s which has definitely some room for improvement.

The MSBuild syntax to batch targets could definitely help to prevent stuffing a large list into a single string but the syntax is rather awkward and only understood by a few. This did lead to less than optimal msbuild targets which were wasting a significant amount of time and computing power at our dev shop.

@davkean
Copy link
Member

davkean commented Dec 12, 2017

@Alois-xx It would be better to file that issue over on MSBuild so that we can fix its interning behavior. Fixing that issue at the StringBuilder level isn't the right change.

@danmoseley
Copy link
Member Author

There is already dotnet/msbuild#1593 to add a cut over to To String at say 50K chars. For the long term I have proposed a method on String builder to equate efficiently with String.

@Alois-xx
Copy link
Contributor

Thanks! You are faster solving issues than I can find them ;-).

@danmoseley
Copy link
Member Author

@Alois-xx any interest in offering the PR for that MSBuild issue? Cutover to string with a simple test. Not sure if they want a perf measurement.

@Alois-xx
Copy link
Contributor

@danmosemsft: Sure. I need something to work with over christmas.
But first I deleted the the single line in our msbuild target file which was only there for a consistency check where no one remembers what it is actually good for.

@ghost
Copy link

ghost commented May 4, 2018

@danmosemsft
I looked at the source code.. The indexer uses this to traverse the chunks:
chunk = chunk.m_ChunkPrevious;
Each chunk refers to the its prev. chunk omly. This means we should iterate the StringBuilder backward (benchmark it). If you try to move forward, the sarch for each index will start from the last chunk, which is a disaster!
I think this can be solved if each chunck points to its next chunck as well, so we can traverse the SB in both directions effictively. This will eliminate the need for a chunk table.

@ghost
Copy link

ghost commented May 4, 2018

@danmosemsft
I have implemented a linked list as a sample of data structure in one of my books. I solved the iteration problem by addind these methods to be used instead of for loop:

        public bool MoveFirst()
        {
            if (FirstCell == null) return false;
            pCurCell = FirstCell;
            return true;
        }

        public bool MoveNext()
        {
            if (pCurCell == null)
                throw new Exception("لا توجد خانة حالية");
            if (pCurCell.NextCell == null) return false;
            pCurCell = pCurCell.NextCell;
            return true;
        }

        public bool MoveLast()
        {
            if (LastCell == null) return false;
            pCurCell = LastCell;
            return true;
        }

        public bool MoveTo(int Index)
        {
            try
            {
                pCurCell = GetCell(Index);
                return true;
            }
            catch
            {
                return false;
            }
        }

        public ListType FirstItem
        {
            get
            {
                if (FirstCell == null)
                    throw new Exception("القائمة فارغة");
                return FirstCell.Value;
            }
            set
            {
                if (FirstCell == null)
                    throw new Exception("القائمة فارغة");
                FirstCell.Value = value;
            }
        }

        public ListType LastItem
        {
            get
            {
                if (LastCell == null)
                    throw new Exception("القائمة فارغة");
                return LastCell.Value;
            }
            set
            {
                if (LastCell == null)
                    throw new Exception("القائمة فارغة");
                LastCell.Value = value;
            }
        }

        private Cell pCurCell;
        public ListType CurItem
        {
            get
            {
                try
                {
                    if (pCurCell == null)
                        throw new Exception("لا توجد خانة حالية");
                    return pCurCell.Value;
                }
                finally { }
            }
        }

        private Cell GetCell(int Index)
        {
            try
            {
                if (Index < 0 || Index >= pCount)
                    throw new Exception("رقم العنصر غير صالح");
                Cell C = FirstCell;
                int i = 0;
                while (i != Index)
                {
                    C = C.NextCell;
                    i += 1;
                }
                return C;
            }
            finally
            {

            }
        }

So, I can iterat the list like this:

var L = new LinkedList<string>( );
L.AddArray(new string[]{ "x", "y", "z"});
if (L.MoveFirst( ))
    do
        Console.WriteLine(L.CurItem);
    while (L.MoveNext( ));

This is way better than using the Indexer which starts from the first element each time (the last chunk in the case of SB)!.. The MoveTo method allow the user to jumb to the start he wants and continue iteration.
This can be used internally in SB. Of course we don't want to access chunks directly, so these methods should be private. SB should implement the IEnumerator interface, se we can benefit of this technique if we use foreach. There will be nested iterations: the outer for the chuncks, and the inner for the chars in each chunck. Users should not be aware of that. They only get the chars.
Not that foreach needs to make each chunck point to its next chunck as well with a field named m_chunkNext. If you don't want to do this, then the iteration should be backward. I asked for a backward foreach in C# before, but this can be done be adding public MoveXX mehods to SB that returns char not a chunck, so se can MoveLast() and loop backward by MovePrev().
I think adding the m_chunkNext field is better, becuase it allows us to add IndexOf, IndexOfMany , and Contians methods, and it will improve the Replace method that needs to find the string first.

@ghost
Copy link

ghost commented May 4, 2018

To optimize the Indexer also, you should do this:
1- If all chunks (except the last one) full of 8000 chars, then you can compute the index of the chunks that contains the wanted index directly without more checking for each chunk.
int chunkId = (int)Math.Truncate(index / 8000);
Its important to ensure that each chunk is full to the lst char before creating a new chunk.
2- Use MoveTo(chunked) to get to the chunk.
3- Read the char at index - chunked * 8000
Of cource there is a const for the 8000 length. this is a quick outline.

Again: this indexer is not suitable for iteration, and foreach must be used instead.

@ghost
Copy link

ghost commented May 4, 2018

This is the full LinkedList Class:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace LinkedList
{
    class LinkedList<ListType> where ListType : IComparable 
    {
        private class Cell
        {
            public ListType Value;
            public Cell NextCell;
            public Cell()
            {

            }

            public Cell(ListType Value, Cell NextCell)
            {
                this.Value = Value;
                this.NextCell = NextCell;
            }

        }

        private Cell FirstCell, LastCell;

        public void Add(ListType Item)
        {
            if (pCount == 0)
            {
                // يجب إنشاء نسخة جديدة من خانة البداية
                FirstCell = new Cell() { Value = Item };
                LastCell = FirstCell;
            }

            else {
                Cell E = new Cell() { Value = Item };
                LastCell.NextCell = E;
                LastCell = E;
            }
            pCount += 1;
        }

        private int pCount = 0;
        public int Count => pCount;

        private Cell GetCell(int Index)
        {
            try
            {
                if (Index < 0 || Index >= pCount)
                    throw new Exception("رقم العنصر غير صالح");
                Cell C = FirstCell;
                int i = 0;
                while (i != Index)
                {
                    C = C.NextCell;
                    i += 1;
                }
                return C;
            }
            finally
            {

            }
        }

        public ListType this[int Index]
        {
            get
            {
                try
                {
                    return GetCell(Index).Value;
                }
                finally { }
            }
            set
            {
                try
                {
                    if (Index == pCount)
                        Add(value);
                    else
                        GetCell(Index).Value = value;
                }
                finally { }
            }
        }

        public void AddArray(ListType[] Items)
        {
            for (int i = 0; i <= Items.GetUpperBound(0); i++)
                Add(Items[i]);
        }

        public void AddList(LinkedList<ListType> NewList)
        {
            if (pCount == 0)
            {
                FirstCell = NewList.FirstCell;
                LastCell = NewList.LastCell;
            }
            else {
                LastCell.NextCell = NewList.FirstCell;
                LastCell = NewList.LastCell;
            }
            pCount += NewList.pCount;
        }

        public void CopyList(LinkedList<ListType> NewList)
        {
            Cell C = NewList.FirstCell;
            while (C != null)
            {
                this.Add(C.Value);
                C = C.NextCell;
            }
        }

        public LinkedList<ListType> SubList(int St)
        {
            try
            {
                var L = new LinkedList<ListType>();
                L.FirstCell = GetCell(St);
                L.LastCell = LastCell;
                L.pCount = pCount - St;
                return L;
            }
            finally { }
        }

        public void Insert(ListType Item, int Index)
        {
            try
            {
                if (Index == pCount)
                    Add(Item);
                else {
                    Cell C = new Cell(Item, GetCell(Index));
                    if (Index == 0)
                        FirstCell = C;
                    else
                        GetCell(Index - 1).NextCell = C;
                    pCount += 1;
                }
            }
            finally { }
        }

        public void InsertList(LinkedList<ListType> NewList, int Index)
        {
            try
            {
                if (Index == pCount)
                    AddList(NewList);
                else
                    if (Index == 0)
                    FirstCell = NewList.FirstCell;
                else
                    GetCell(Index - 1).NextCell = NewList.FirstCell;
                NewList.LastCell = GetCell(Index);
                pCount += NewList.Count;
            }
            finally { }
        }

        public void CopyList(LinkedList<ListType> NewList, int Index)
        {
            Cell C = NewList.FirstCell;
            while (C != null)
            {
                Insert(C.Value, Index);
                C = C.NextCell;
            }
        }

        public void InsertArray(ListType[] Items, int Index)
        {
            var L = new LinkedList<ListType>();
            L.AddArray(Items);
            InsertList(L, Index);
        }

        public void Clear()
        {
            FirstCell = null;
            LastCell = null;
            pCount = 0;
        }

        public void RemoveAt(int Index)
        {
            try
            {
                if (Index == 0) // حذف أول عنصر
                    if (pCount == 1)
                    {
                        // العنصر الأول هو العنصر الأخير وسيتم حذفه
                        FirstCell = null;
                        LastCell = null;
                    }
                    else // جعل العنصر الثاني هو العنصر الأول
                        FirstCell = FirstCell.NextCell;
                else if (Index == pCount - 1)
                {
                    // حذف آخر عنصر بجعل العنصر السابق له هو العنصر الأخير
                    LastCell = GetCell(Index - 1);
                    LastCell.NextCell = null;
                }
                else {//حذف العنصر الحالي، بجعل العنصر السابق يشير إلى العنصر التالي
                    var prev = GetCell(Index - 1); // العنصر السابق
                    prev.NextCell = prev.NextCell.NextCell;
                }
                pCount -= 1;
            }
            finally { }
        }

        public void RemoveLastItem()
        {
            try
            {
                if (pCount == 1)
                {
                    FirstCell = null;
                    LastCell = null;
                }
                else {
                    LastCell = GetCell(pCount - 2);
                    LastCell.NextCell = null;
                }
                pCount -= 1;
            }
            finally { }
        }

        public int RemoveRange(int St, int DelCount)
        {
            try
            {
                if (St + DelCount > pCount)
                    DelCount = pCount - St;
                int nxtId = St + DelCount;
                if (St == 0)
                    if (pCount == DelCount)
                    {
                        FirstCell = null;
                        LastCell = null;
                    }
                    else
                        FirstCell = GetCell(DelCount);

                else if (nxtId == pCount)
                {
                    LastCell = GetCell(St - 1);
                    LastCell.NextCell = null;
                }
                else
                    GetCell(St - 1).NextCell = GetCell(nxtId);
                pCount -= DelCount;
                return DelCount;
            }
            finally { }
        }

        private Cell pCurCell;
        public ListType CurItem
        {
            get
            {
                try
                {
                    if (pCurCell == null)
                        throw new Exception("لا توجد خانة حالية");
                    return pCurCell.Value;
                }
                finally { }
            }
        }

        public bool MoveFirst()
        {
            if (FirstCell == null) return false;
            pCurCell = FirstCell;
            return true;
        }

        public bool MoveNext()
        {
            if (pCurCell == null)
                throw new Exception("لا توجد خانة حالية");
            if (pCurCell.NextCell == null) return false;
            pCurCell = pCurCell.NextCell;
            return true;
        }

        public bool MoveLast()
        {
            if (LastCell == null) return false;
            pCurCell = LastCell;
            return true;
        }

        public bool MoveTo(int Index)
        {
            try
            {
                pCurCell = GetCell(Index);
                return true;
            }
            catch
            {
                return false;
            }
        }

        public ListType FirstItem
        {
            get
            {
                if (FirstCell == null)
                    throw new Exception("القائمة فارغة");
                return FirstCell.Value;
            }
            set
            {
                if (FirstCell == null)
                    throw new Exception("القائمة فارغة");
                FirstCell.Value = value;
            }
        }

        public ListType LastItem
        {
            get
            {
                if (LastCell == null)
                    throw new Exception("القائمة فارغة");
                return LastCell.Value;
            }
            set
            {
                if (LastCell == null)
                    throw new Exception("القائمة فارغة");
                LastCell.Value = value;
            }
        }
        }

        public ListType[] ToArray(int St = 0, int En = -1)
        {
            try
            {
                if (En == -1 || En >= pCount)
                    En = pCount - 1;
                ListType[] A = new ListType[En - St + 1];
                Cell C = GetCell(St);
                for (int i = 0; i <= En - St; i++)
                {
                    A[i] = C.Value;
                    C = C.NextCell;
                }
                return A;
            }
            finally { }
        }

        public void ShiftRight(int Times = 1)
        {
            if (FirstCell == null)
                return;

            if (Times >= pCount)
                Times -= pCount;

            if (Times <= 0)
                return;

            var NewLastCell = GetCell(pCount - 1 - Times);
            LastCell.NextCell = FirstCell;
            FirstCell = NewLastCell.NextCell;
            LastCell = NewLastCell;
            NewLastCell.NextCell = null;

        }

        public void ShiftLeft(int Times = 1)
        {
            if (FirstCell == null)
                return;

            if (Times >= pCount)
                Times -= pCount;

            if (Times <= 0)
                return;

            for (int I = 1; I <= Times; I++)
            {
                LastCell.NextCell = FirstCell;
                LastCell = FirstCell;
                FirstCell = FirstCell.NextCell;
                LastCell.NextCell = null;
            }
        }

        public static bool operator ==(LinkedList<ListType> List1, LinkedList<ListType> List2)
        {
            if (List1 == List2) return true;
            if (List1.Count != List2.Count) return false;
            Cell C1 = List1.FirstCell;
            Cell C2 = List2.FirstCell;
            while (C1 != null)
            {
                if (!C1.Value.Equals(C2.Value)) return false;
                C1 = C1.NextCell;
                C2 = C2.NextCell;
            }
            return true;
        }

        public static bool operator !=(LinkedList<ListType> List1, LinkedList<ListType> List2)
        {
            return !(List1 == List2);
        }

        public static bool operator >(LinkedList<ListType> List1, LinkedList<ListType> List2)
        {
            return (List1.Count > List2.Count);
        }

        public static bool operator <(LinkedList<ListType> List1, LinkedList<ListType> List2)
        {
            return (List1.Count < List2.Count);
        }

        public static bool operator >=(LinkedList<ListType> List1, LinkedList<ListType> List2)
        {
            return (List1.Count >= List2.Count);
        }

        public static bool operator <=(LinkedList<ListType> List1, LinkedList<ListType> List2)
        {
            return (List1.Count <= List2.Count);
        }

        public static bool operator !(LinkedList<ListType> L1)
        {
            return (L1.Count == 0);
        }

        public static LinkedList<double> operator +
        (LinkedList<ListType> List1, LinkedList<ListType> List2)
        {
            LinkedList<double> L = new LinkedList<double>();
            Cell C1, C2;
            if (List1.Count < List2.Count)
            {
                C1 = List1.FirstCell;
                C2 = List2.FirstCell;
            }
            else {
                C1 = List2.FirstCell;
                C2 = List1.FirstCell;
            }
            while (C1 != null)
            {
                L.Add(Convert.ToDouble(C1.Value) + Convert.ToDouble(C2.Value));
                C1 = C1.NextCell; C2 = C2.NextCell;
            }
            while (C2 != null)
            {     // نقل العناصر الزائدة من القائمة الأطول
                L.Add(Convert.ToDouble(C2.Value));
                C2 = C2.NextCell;
            }
            return L;
        }

        public static implicit operator LinkedList<double>(LinkedList<ListType> List1)
        {
            LinkedList<double> L = new LinkedList<double>();
            Cell C = List1.FirstCell;
            while (C != null)
            {
                L.Add(Convert.ToDouble(C.Value));
                C = C.NextCell;
            }
            return L;
        }

        public static explicit operator LinkedList<int>(LinkedList<ListType> List1)
        {
            LinkedList<int> L = new LinkedList<int>();
            Cell C = List1.FirstCell;
            while (C != null)
            {
                L.Add(Convert.ToInt32(C.Value));
                C = C.NextCell;
            }
            return L;
        }

        public LinkedList<T> ConvertTo<T> () where T : IComparable
        {
            if (this.Count == 0)
                return null;

            var L = new LinkedList<T>();
            var C = this.FirstCell;
            do
            {
                L.Add((T)Convert.ChangeType(C.Value, typeof(T)));
                C = C.NextCell;
            } while (C != null);

            return L;
        }

        public static LinkedList<ListType> operator >> (LinkedList<ListType> List1, int Times)
        {
            var L = new LinkedList<ListType>();
            L.CopyList(List1);
            L.ShiftRight(Times);
            return L;
        }

        public static LinkedList<ListType> operator <<(LinkedList<ListType> List1, int Times)
        {
            var L = new LinkedList<ListType>();
            L.CopyList(List1);
            L.ShiftLeft(Times);
            return L;
        }

        public static bool operator true(LinkedList<ListType> List1)
        {
            return (List1.pCount > 0);
        }

        public static bool operator false(LinkedList<ListType> List1)
        {
            return (List1.pCount == 0);
        }

        public static implicit operator bool(LinkedList<ListType> List1)
        {
            return (List1.pCount > 0);
        }

        private void QuickSort(ref ListType[] Arr, int SortDirection = 0, int St = 0, int En = -1)
        {
            if (En == -1) En = Arr.GetUpperBound(0);
            int i = 0, LAC = St;
            ListType Temp;
            for (i = St + 1; i <= En; i++)
                if ((SortDirection == 1) ^ (Arr[i].CompareTo(Arr[St]) < 1))
                {
                    LAC += 1;
                    if (LAC != i)
                    {
                        Temp = Arr[LAC];
                        Arr[LAC] = Arr[i];
                        Arr[i] = Temp;
                    }
                }

            if (LAC != St)
            {
                Temp = Arr[LAC];
                Arr[LAC] = Arr[St];
                Arr[St] = Temp;
            }
            if (LAC - St > 1)
                QuickSort(ref Arr, SortDirection, St, LAC - 1);
            if (En - LAC > 1)
                QuickSort(ref Arr, SortDirection, LAC + 1, En);
        }

        public enum SortState : int
        {
            ASC = 0,
            DESC = 1,
            None = 2
        }
        public void Sort(SortState State)
        {
            if (State != SortState.None)
            {
                ListType[] S = this.ToArray(0, -1);
                QuickSort(ref S, (int)State, 0, -1);
                this.Clear();
                AddArray(S);
            }
        }

        private int BinarySearch(ref ListType[] Arr, ListType Target, int SortDirection = 0, int St = 0, int En = -1)
        {
            if (En == -1) En = Arr.GetUpperBound(0);
            // شرط توقف: العثور على عنصر البحث في خانة البداية أو النهاية
            if (Arr[St].Equals(Target)) return St;
            if (Arr[En].Equals(Target)) return En;
            int Mdl = (St + En + 1) / 2;
            if (Arr[Mdl].Equals(Target))
                //شرط توقف: العثور على نصّ البحث في خانة المنتصف
                return Mdl;
            else if ((SortDirection == 1) ^ (Arr[Mdl].CompareTo(Target) == -1))
            {
                // الهدف أكبر من قيمة خانة المنتصف
                // شرط توقف: خانة المنتصف هي خانة النهاية 
                // سنعيد سالب (رقم الخانة التالية لخانة المنتصف + 1)
                if (En == Mdl) return -(Mdl + 2);
                // نتيجة البحث هي ناتج البحث في الجزء الأخير من المصفوفة
                return BinarySearch(ref Arr, Target, SortDirection, Mdl + 1, En);
            }
            else {
                // شرط توقف: خانة المنتصف هي خانة البداية 
                // سنعيد سالب (رقم خانة المنتصف + 1) 
                if (Mdl == St)
                    return -(Mdl + 1);
                // نتيجة البحث هي ناتج البحث في النصف الأوّل من المصفوفة
                return BinarySearch(ref Arr, Target, SortDirection, St, Mdl - 1);
            }
        }

        public int IndexOf(ListType Item, SortState State, int St, int En)
        {
            ListType[] S = this.ToArray( );
            return BinarySearch(ref S, Item, (int)State, St, En);
        }

        public static LinkedList<ListType> operator &
            (LinkedList<ListType> L1, LinkedList<ListType> L2)
        {
            var L = new LinkedList<ListType>();
            ListType[] A = L1.ToArray();
            L.QuickSort(ref A);
            if (L2.MoveFirst())
                do
                    if (L.BinarySearch(ref A, L2.CurItem) > -1)
                        L.Add(L2.CurItem);
                while (L2.MoveNext());
            return L;
        }

        public static LinkedList<ListType> operator |
            (LinkedList<ListType> L1, LinkedList<ListType> L2)
        {
            var L = new LinkedList<ListType>();
            ListType[] A = L1.ToArray();
            L.QuickSort(ref A);
            if (L2.MoveFirst())
                do
                    if (L.BinarySearch(ref A, L2.CurItem) < 0)
                        L.Add(L2.CurItem);
                while (L2.MoveNext());
            L.AddArray(A);
            return L;
        }

        public static LinkedList<ListType> operator ^
            (LinkedList<ListType> L1, LinkedList<ListType> L2)
        {
            var L = new LinkedList<ListType>();
            ListType[] A = L1.ToArray();
            byte[] B = new byte[A.Length];
            L.QuickSort(ref A);

            if (L2.MoveFirst())
                do
                {
                    int i = L.BinarySearch(ref A, L2.CurItem);
                    if (i < 0)
                        L.Add(L2.CurItem);
                    else
                        B[i] = 1;
                } while (L2.MoveNext());

            for (int i = 0; i <= A.GetUpperBound(0); i++)
                if (B[i] == 0)
                    L.Add(A[i]);

            return L;
        }

    }
}

@ghost
Copy link

ghost commented May 4, 2018

@danmosemsft @AndyAyersMS @jkotas @Alois-xx @davkean
I proposed to add a MutableString class that contains the missing StringBuilder functinality dotnet/corefx#29510
Hope you help implement it.

@jkotas
Copy link
Member

jkotas commented May 4, 2018

@MohammadHamdyGhanem The efficient string builder iterator was proposed and being worked on by @vancem in https://github.com/dotnet/corefx/issues/21248 / dotnet/coreclr#17530.

@ghost
Copy link

ghost commented May 5, 2018

@jkotas
A linked list should not have a chunk list. Consider this situation:
If you use the StringBuilder.Insert( ) Or Replace method they may need more space, and hence call the
private void MakeRoom that can create a new chunk before the current one:

///If dontMoveFollowingChars is true, then the room must be made by inserting a chunk BEFORE the current chunk (this is what it does most of the time anyway)

This will corrupt the chunk list, and if you try to sync it, it may affect the performance badly. So, as I proposed before, a double linked list maybe the solution, in addition to using MoveToXXCunk methode to traverse the chunks, and MoveToXXChar to traverse the chars. The stringBuilder should keep the value for currentChunkIndex, and currentCharIndex so these two sets of methods can use to work.
More details here: dotnet/corefx#29510

@ghost
Copy link

ghost commented May 5, 2018

@jkotas
I can't believe that the best SB iterator is a recursive function that calls a callback function!.. If SB has only 10 MB of text, this means 10 million pushes to the stack waiting for 10 millions callbacks, just to print or count something. Using enumerators is the only efficient way.

@jkotas
Copy link
Member

jkotas commented May 5, 2018

None of the proposals considered had the characteristics you are describing. The latest version of the proposal is in fact enumerator. Check the proposed implementation in dotnet/coreclr#17530.

@ghost
Copy link

ghost commented May 5, 2018

@jkotas
Sorry, I read the first proposal, but it chanded after a long discussion.

@stephentoub
Copy link
Member

The first proposal didn't have those characteristics, either.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

7 participants