-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VisionKit bindings #592
Comments
The documentation claims that it is available on macOS 13, but ...
Currently PyObjC can only be used with interfaces that can be used in Objective-C code. It might be possible to expose Swift frameworks as well, but this likely requires significant engineering to design and implement. |
@ronaldoussoren right, sorry for not realizing and wasting your time! On another note (this is probably not something to officially support/implement I guess) I noticed there's an underlying objective C implementation for the stuff I need in VisionKit, it's not documented but WebKit does use it directly import Cocoa
import objc
ns_image = Cocoa.NSImage.alloc().initWithContentsOfFile_("/Users/aurora/Downloads/tg_image_1633323779.jpeg")
objc.loadBundle('VisionKit', globals(), '/System/Library/Frameworks/VisionKit.framework')
req=VKImageAnalyzerRequest.alloc().initWithImage_requestType_(ns_image, 1)
req.setLocales_('ja-JA')
objc.registerMetaDataForSelector(
b"VKImageAnalyzer",
b"processRequest:updateHandler:completionHandler:",
{
"arguments": {
4: {
"callable": {
"retval": {"type": b"v"},
"arguments": {
0: {"type": b"^v"},
1: {"type": b"@"},
2: {"type": b"@"},
3: {"type": b"@"},
},
}
}
}
},
)
analyzer=VKImageAnalyzer.alloc().init()
def update(self, progress:float):
pass
def process(self, analysis:VKImageAnalysis):
pass
analyzer.processRequest_updateHandler_completionHandler_(req, update, process) According to WebKit source processRequest is defined like this: How should I define it in registerMetaDataForSelector? |
No need to apologise, it wouldn't be the first time that I missed a new API.
You got it almost right, but the method has two arguments that are blocks. Both return "void", the first one has a single argument of type double, the second has to arguments and both are Objective-C objects: objc.registerMetaDataForSelector(
b"VKImageAnalyzer",
b"processRequest:updateHandler:completionHandler:",
{
"arguments": {
3: {
"callable": {
"retval": { "type": "v" },
"arguments": {
0: { "type": "^v" },
1: { "type": "d" },
}
},
4: {
"callable": {
"retval": {"type": b"v"},
"arguments": {
0: {"type": b"^v"},
1: {"type": b"@"},
2: {"type": b"@"},
},
}
}
}
},
) I haven't used the Vision framework myself yet, but it does seem to have some options for recognizing text, see https://developer.apple.com/documentation/vision/vnrecognizetextrequest?language=objc and https://developer.apple.com/documentation/vision/recognizing_text_in_images?language=objc (both have sample code in Swift, but hopefully that has enough context to be clear how to reproduce this in Python) |
Thanks, that did the trick! For what it's worth I do have a Vision fraemwork option in my OCR program but since it's for Japanese and vertical text is really helpful wanted to try getting the VisionKit stuff working too (it seems Apple updated VisionKit with vertical text in Sonoma but Vision still doesn't support it - actually, while in Ventura it tried to read vertical text horizontally in Sonoma it returns an empty array for the results). This is the working VisionKit code: import Cocoa
import objc
from PyObjCTools.AppHelper import runConsoleEventLoop, stopEventLoop
ns_image = Cocoa.NSImage.alloc().initWithContentsOfFile_("/Users/aurora/Downloads/Untitled.jpg")
objc.loadBundle('VisionKit', globals(), '/System/Library/Frameworks/VisionKit.framework')
req=VKCImageAnalyzerRequest.alloc().initWithImage_requestType_(ns_image, 1)
req.setLocales_(['ja','en'])
analyzer=VKCImageAnalyzer.alloc().init()
objc.registerMetaDataForSelector(
b"VKCImageAnalyzer",
b"processRequest:progressHandler:completionHandler:",
{
"arguments": {
3: {
"callable": {
"retval": { "type": "v" },
"arguments": {
0: { "type": "^v" },
1: { "type": "d" },
}
}
},
4: {
"callable": {
"retval": {"type": b"v"},
"arguments": {
0: {"type": b"^v"},
1: {"type": b"@"},
2: {"type": b"@"},
},
}
}
}
},
)
def update(progress:float):
pass
def process(analysis:VKCImageAnalysis, error:NSError):
lines = analysis.allLines()
for line in lines:
print(line.string())
stopEventLoop()
analyzer.processRequest_progressHandler_completionHandler_(req, update, process)
runConsoleEventLoop() The only drawback is that it takes a couple seconds for objc.loadBundle() but I assume can't do much about that |
The WebKit SPI header for this: https://github.com/WebKit/WebKit/blob/main/Source/WebCore/PAL/pal/spi/cocoa/VisionKitCoreSPI.h That appears to use a private framework, see https://github.com/WebKit/WebKit/blob/7cd082919192095d0b017c6e5f7a36a47135bb8c/Source/WebCore/PAL/pal/cocoa/VisionKitCoreSoftLink.mm#L36 Exposing this through PyObjC shouldn't be too hard, but I don't know yet if I'll do so because I don't like exporting private APIs (mostly because those might break between releases of the OS). The Swift interface for the framework also doesn't look to complicated, with some luck it is possible to expose that to Python. But as said, this does require some engineering because I currently don't interface to Swift framework. I don't known when I'll get around to this. |
Is your feature request related to a problem? Please describe.
The VisionKit APIs seem to be more actively supported, as an example starting in Sonoma the text recognition now supports vertical text for CJK languages (Japanese, Chinese, Korean) which is not yet supported in Vision.
Describe the solution you'd like
VisionKit bindings to be available for use on 13.0+
Describe alternatives you've considered
There's no real alternative right now other than using the less updated Vision api or invoking external command line tools.
Additional context
The docs state that VisionKit "is only available in Catalyst" but that doesn't seem to be the case (anymore?) from Ventura onwards. There are apps using the new APIs on macOS (eg https://github.com/Shakshi3104/LiTeX, TextSniper also seems to use it according to a friend's reverse engineering). Apple's API docs claim it's available on macOS as well https://developer.apple.com/documentation/visionkit/imageanalyzer
The text was updated successfully, but these errors were encountered: