Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maybe 3 bugs #28

Closed
Dexus77 opened this issue Sep 3, 2018 · 7 comments
Closed

maybe 3 bugs #28

Dexus77 opened this issue Sep 3, 2018 · 7 comments

Comments

@Dexus77
Copy link

Dexus77 commented Sep 3, 2018

Is this a BUG report or FEATURE request?:
BUG report
Environment:

  • OS CentOS-7
  • Kernel 3.10.0-862.3.3.el7.x86_64
  • Nuster v2.0.2.18

Thank you very much for your work. A very interesting solution is to unite HAproxy with Cache and NoSQL.

BUG 1
a bug with a freeze request cache from the backend.
    - Create in the settings 2 backend the first for static caching, the second for html hitting without caching.
    - The page should have several static elements.
    - Start nuster in debug mode.
    - Open the browser in Inspector mode. Refresh the page several times, observe the speed of getting the page.

approximate config

global
   nuster cache on uri /cache data-size 200m

  frontend http_front
   bind *:80
   mode http
 
   acl is_static path_end  -i .jpg .png .gif .css .js .ico .txt
   acl example.com hdr(host) -i example.com
   use_backend static if is_static
   use_backend bk1 if example.com

   backend static
    nuster cache on
    nuster rule allstatic
    server nginx1 127.0.0.1:1122 check

   backend bk1
    server srv1 127.0.0.1:3344

Debug nuster in freeze moment

   00000040:example.com.srvcls[000a:adfd]
   00000040:example.com.clicls[adfd:adfd]
   00000040:example.com.closed[adfd:adfd]
   00000041:example.com.srvcls[000c:adfd]
   00000041:example.com.clicls[adfd:adfd]
   00000041:example.com.closed[adfd:adfd]

Effects:
      I had a freeze of feedback html response backend client from 10 seconds to infinity.
      Judging by the nuster log he tried to find the backend cache by which the cache was not included.
![wait_req]
(https://user-images.githubusercontent.com/16289977/44980870-b8bb8200-af79-11e8-8dab-e9f7dca44e87.png)

BUG 2

I wanted to transfer the session from Redis to Nuster NoSql but I encountered 2 bugs.

- Create any key in NoSql

    - Get it for example using the wrk utility or any other fast asynchronous method.

    ./wrk -t2 -c200 -d30s http://127.0.0.1/nosql/key1

    See to the number of errors.

  Effect: After a few requests, Nuster breack connections with the client.

BUG 3

Bug with NoSql + CPU utilization
    Similar to the bug above, but you need to create and get the same key simultaneously.

    Effect: In the uniprocessor version, the CPU will be fully utilized 100% and only restart Nuster will help.
    In a lot of CPU, if you repeat the procedure several times, all the kernels except for one trampoline are 100% recycled and restart will be required.

@jiangwenyuan
Copy link
Owner

Hi, thanks for the reporting:)

I've did a little investigation and confirmed bug 1 but cannot confirm bug2/3.

bug 2:

if you didn't set key first, 404 is returned:

$ wrk -t2 -c200 -d30s http://127.0.0.1/nosql/key11
Running 30s test @ http://127.0.0.1/nosql/key11
  2 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.05ms    1.16ms 203.06ms   92.31%
    Req/Sec    24.27k     1.20k   28.12k    79.83%
  1450289 requests in 30.03s, 62.24MB read
  Non-2xx or 3xx responses: 1450289
Requests/sec:  48286.80
Transfer/sec:      2.07MB

if you have set key already:

$ curl -d vv http://127.0.0.1/nosql/key11
$ wrk -t2 -c200 -d30s http://127.0.0.1/nosql/key11
Running 30s test @ http://127.0.0.1/nosql/key11
  2 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.02ms    2.41ms 205.12ms   84.18%
    Req/Sec    13.87k     2.88k   17.05k    80.67%
  828870 requests in 30.05s, 70.35MB read
Requests/sec:  27582.70
Transfer/sec:      2.34MB

bug 3:

I use this script:

local url = '/nosql'
local methods = {
    'GET',
    'POST',
    --'DELETE',
}
math.randomseed(os.time())
request = function()
    local r = math.random(1, 650000)
    wrk.body = r .. "\n"
    return wrk.format(
    methods[r % 2],
    url
    )
end

cannot reproduce it.

could you please help me reproduce it?

@Dexus77
Copy link
Author

Dexus77 commented Sep 4, 2018

Bug 2-3 is really tricky only occurs in a certain configuration. Today, I sketched out the code to check the bug and it did not work for me, it was ok.

If you checked with the configuration of nuster for a bug 1 where on one backend the cache is disabled, and chache on the second the bugs 2-3 are disabled in this configuration do not show themselves.

Try to enable caching on both backends, but even in this configuration I can not yet achieve such a 100% effect as yesterday. The bug 2 is very rarely seen by Nuster, it breaks the connection and issues Internal Server Error. To achieve leakage of the processor while it was not possible.

Very sample code is just for you to understand.

#!/usr/local/python/3.7.0/bin/python3
import asyncio
import uvloop
import aiohttp
from aiohttp import web

url = 'http://127.0.0.1/nosql/key1'
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())


async def func(request):
    async with aiohttp.ClientSession() as session:
        try:
            async with session.post(url, data='value1', ) as response:
                if response.status != 200:
                    print('status: ', response.status)
        except Exception as ex:
            print('Set exception type: ', type(ex).__name__)

        try:
            async with session.get(url) as response:
                if response.status != 200:
                    print('status: ', response.status)
        except Exception as ex:
            print('Get exception type: ', type(ex).__name__)
    return web.Response(text="ok")


async def init():
    app = web.Application()
    app.router.add_route('GET', '/', func)
    return app

web.run_app(init(), port=8111)

The bugs surfaced under the coded code, which I already loaded in the wrk.
I tested on 2 frameworks, in both there were connection breaks and processor utilization.

If I get to find the configuration in which 100% worked the bug I will write.

@Dexus77
Copy link
Author

Dexus77 commented Sep 5, 2018

I was able to find out why there is a processor leak.
It is necessary as a value to put a dict and load wrk.

#!/usr/local/python/3.7.0/bin/python3
import asyncio
import uvloop
import aiohttp
from aiohttp import web

url = 'http://127.0.0.1/nosql/key1'
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

async def func(request):
    async with aiohttp.ClientSession() as session:
        data = {'app': 11, 'value': 'sdf sf asf fhw adsf rhrtw'}
        try:
            async with session.post(url, data=data, ) as response:
                if response.status != 200:
                    print('status: ', response.status)
        except Exception as ex:
            print('Set exception type: ', type(ex).__name__)

        try:
            async with session.get(url) as response:
                if response.status != 200:
                    print('status: ', response.status)
        except Exception as ex:
            print('Get exception type: ', type(ex).__name__)
    return web.Response(text="ok")

async def init():
    app = web.Application()
    app.router.add_route('GET', '/', func)
    return app

web.run_app(init(), port=8111)

@jiangwenyuan
Copy link
Owner

Thanks, but I still cannot reproduce bug2/3, your app only gives me this error which i think has nothing to do with nuster, I tried with nginx(only get part) , and can get this error too.

Get exception type:  CancelledError
Unhandled exception
Traceback (most recent call last):
  File "/home/jwy/.pyenv/versions/3.6.6/lib/python3.6/site-packages/aiohttp/web_protocol.py", line 410, in start
    await resp.prepare(request)
  File "/home/jwy/.pyenv/versions/3.6.6/lib/python3.6/site-packages/aiohttp/web_response.py", line 300, in prepare
    return await self._start(request)
  File "/home/jwy/.pyenv/versions/3.6.6/lib/python3.6/site-packages/aiohttp/web_response.py", line 608, in _start
    return await super()._start(request)
  File "/home/jwy/.pyenv/versions/3.6.6/lib/python3.6/site-packages/aiohttp/web_response.py", line 367, in _start
    await writer.write_headers(status_line, headers)
  File "/home/jwy/.pyenv/versions/3.6.6/lib/python3.6/site-packages/aiohttp/http_writer.py", line 110, in write_headers
    self._write(buf)
  File "/home/jwy/.pyenv/versions/3.6.6/lib/python3.6/site-packages/aiohttp/http_writer.py", line 67, in _write
    raise ConnectionResetError('Cannot write to closing transport')
ConnectionResetError: Cannot write to closing transport

BTW, bug 1 has beed fixed in 2d94461

@Dexus77
Copy link
Author

Dexus77 commented Sep 5, 2018

Yes this is what I was talking about. The bug 2 nuster terminates the connection and the aiohttp client can not write to it. CancelledError can still be a ServerDisconnect. To exclude errors aiohttp checked on another web framework the situation was similar. Did you notice after these errors that Nuster did not consume 100% of the CPU? (Bag3)

@jiangwenyuan
Copy link
Owner

jiangwenyuan commented Sep 6, 2018

Hi, if you try this code:

import asyncio
import uvloop
import aiohttp
from aiohttp import web

url = 'http://127.0.0.1/nosql/key1'
url = 'http://127.0.0.1/index.html'
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

async def func(request):
    async with aiohttp.ClientSession() as session:
        try:
            async with session.get(url) as response:
                if response.status != 200:
                    print('status: ', response.status)
        except Exception as ex:
            print('Get exception type: ', type(ex).__name__)
    return web.Response(text="ok")

async def init():
    app = web.Application()
    app.router.add_route('GET', '/', func)
    return app

web.run_app(init(), port=8111)

and run nginx to serve 'http://127.0.0.1/index.html' , the error still appears.

Did you notice after these errors that Nuster did not consume 100% of the CPU? (Bag3)

Do you mean consume 100% cpu instead of did not consume ?

No, nuster did not consume 100% of the CPU.

@jiangwenyuan
Copy link
Owner

Hi, I've being trying to reproduce bug2/3, but still cannot reproduce. I'm closing it now, feel free to reopen it.

BTW bug1 has fixed and merged into master.

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants