티스토리 뷰

728x90
반응형

이전 포스팅에서 이어집니다.

정말 간단하게 엄지 척, 엄지 다운을 detect해서 이모지를 보여주는 예제를 해보겠습니다. 

 

[1] HandGestureProcessor 만들기

HandGestureProcessor 를 아래와 같이 만들어주세요

현재 화면의 center보다 엄지손가락의 TIP 포인트가 위에 있으면 엄지척,

밑에 있으면 엄지 다운으로 인식해주는 로직입니다.

(실제로 이 기능 구현하려면 더 디테일하게 로직짜야할텐데, 엄청 간단하게 시작해볼게요-!)

 

class HandGestureProcessor {
enum State {
case thumbUp
case thumbDown
}
func getHandState(thumbTip: CGPoint, center: CGPoint) -> State {
if thumbTip.y < center.y {
return .thumbUp
} else {
return .thumbDown
}
}
}

 

그리고 뷰컨트롤러가 processor 가지고 있게 해주세요

 

 

[2] emojiView 만들기

 

emojiView만들어서 view에 붙여주겠습니다. 

 

 

[3] captureOutput 메소드에서 confidence가 너무 낮지 않은 thumbTipPoint만 processPoints함수에 넘겨주기

 

저번 포스팅에서 thumbTipPoint 알아내는 것까지 했는데요

 

confidence(정확도)가 0.3보다 크지 않으면 무시하는 코드랑

thumbTipPoint를 processPoints함수에 넘겨주는게 추가되었습니다.

 

extension CameraViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let handler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: .up, options: [:])
do {
try handler.perform([handPoseRequest])
// Continue only when a hand was detected in the frame.
// Since we set the maximumHandCount property of the request to 1, there will be at most one observation.
guard let observation = handPoseRequest.results?.first as? VNRecognizedPointsObservation else {
return
}
// type: [VNRecognizedPointKey : VNRecognizedPoint]
let thumbPoints = try observation.recognizedPoints(forGroupKey: .handLandmarkRegionKeyThumb)
// type: VNRecognizedPoint?
guard let thumbTipPoint = thumbPoints[.handLandmarkKeyThumbTIP] else {
emojiView?.text = "😰"
return
}
// Ignore low confidence points.
guard thumbTipPoint.confidence > 0.3 else {
emojiView?.text = "😭"
return
}
DispatchQueue.main.async {
self.processPoints(thumbTipPoint: thumbTipPoint)
}
} catch {
print("에러 \(error)")
}
}
}

 

[4] processPoints 함수 만들기 

 

그럼 processPoints 함수를 만들어줄게요-!! ⭐️ 오늘의 핵심 ⭐️

 

1) Vision coordinates -> AVFoundation coordinates -> UIKit coordinates로 컨버팅 해주기

 

 

예를 들면 이렇게 나옵니다. 

 

 

우리는 현재 화면의 center값(UIKit 좌표값)으로 비교해줘야하니까 UIKit coordinate를 구해줘야합니다.

 

 

2) handGestureProcessor로부터 현재 손동작 state를 구해서 이모지 띄워주기

 

private func processPoints(thumbTipPoint: VNRecognizedPoint) {
// Convert points from Vision coordinates to AVFoundation coordinates.
let thumbTipCGPoint = CGPoint(x: thumbTipPoint.location.x, y: 1 - thumbTipPoint.location.y)
// Convert points from AVFoundation coordinates to UIKit coordinates.
guard let thumbTipConvertedPoint = videoPreviewLayer?.layerPointConverted(fromCaptureDevicePoint: thumbTipCGPoint) else {
emojiView?.text = "😱"
return
}
let state = handGestureProcessor.getHandState(thumbTip: thumbTipConvertedPoint, center: self.view.center)
switch state {
case .thumbUp:
emojiView?.text = "👍"
case .thumbDown:
emojiView?.text = "👎"
}
}

 

 

 

근데 잘 안됩니다...ㅠㅠ 이 움짤은 움짤 업로드 용량 때문에 그나마 잘 되는 부분 짤라서 만든것...! 

 

 

 

전체 코드 

 

import UIKit
import AVFoundation
import Vision
class CameraViewController: UIViewController {
private var captureSession: AVCaptureSession?
private var videoPreviewLayer: AVCaptureVideoPreviewLayer?
private let handPoseRequest = VNDetectHumanHandPoseRequest()
private let handGestureProcessor = HandGestureProcessor()
private weak var emojiView: UILabel?
override func viewDidLoad() {
super.viewDidLoad()
prepareCaptureSession()
prepareCaptureUI()
prepareEmojiView()
// The default value for this property is 2.
handPoseRequest.maximumHandCount = 1
}
private func prepareCaptureSession() {
let captureSession = AVCaptureSession()
// Select a front facing camera, make an input.
guard let captureDevice = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: .front) else { return }
guard let input = try? AVCaptureDeviceInput(device: captureDevice) else { return }
captureSession.addInput(input)
let output = AVCaptureVideoDataOutput()
output.setSampleBufferDelegate(self, queue: .main)
captureSession.addOutput(output)
self.captureSession = captureSession
self.captureSession?.startRunning()
}
private func prepareCaptureUI() {
guard let session = captureSession else { return }
let videoPreviewLayer = AVCaptureVideoPreviewLayer(session: session)
videoPreviewLayer.videoGravity = AVLayerVideoGravity.resizeAspectFill
videoPreviewLayer.frame = view.layer.bounds
view.layer.addSublayer(videoPreviewLayer)
self.videoPreviewLayer = videoPreviewLayer
}
private func prepareEmojiView() {
let emojiView = UILabel()
emojiView.frame = self.view.bounds
emojiView.textAlignment = .center
emojiView.font = UIFont.systemFont(ofSize: 300)
view.addSubview(emojiView)
self.emojiView = emojiView
}
}
extension CameraViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let handler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: .up, options: [:])
do {
try handler.perform([handPoseRequest])
// Continue only when a hand was detected in the frame.
// Since we set the maximumHandCount property of the request to 1, there will be at most one observation.
guard let observation = handPoseRequest.results?.first as? VNRecognizedPointsObservation else {
return
}
// type: [VNRecognizedPointKey : VNRecognizedPoint]
let thumbPoints = try observation.recognizedPoints(forGroupKey: .handLandmarkRegionKeyThumb)
// type: VNRecognizedPoint?
guard let thumbTipPoint = thumbPoints[.handLandmarkKeyThumbTIP] else {
emojiView?.text = "😰"
return
}
// Ignore low confidence points.
guard thumbTipPoint.confidence > 0.3 else {
emojiView?.text = "😭"
return
}
DispatchQueue.main.async {
self.processPoints(thumbTipPoint: thumbTipPoint)
}
} catch {
print("에러 \(error)")
}
}
private func processPoints(thumbTipPoint: VNRecognizedPoint) {
// Convert points from Vision coordinates to AVFoundation coordinates.
let thumbTipCGPoint = CGPoint(x: thumbTipPoint.location.x, y: 1 - thumbTipPoint.location.y)
// Convert points from AVFoundation coordinates to UIKit coordinates.
guard let thumbTipConvertedPoint = videoPreviewLayer?.layerPointConverted(fromCaptureDevicePoint: thumbTipCGPoint) else {
emojiView?.text = "😱"
return
}
let state = handGestureProcessor.getHandState(thumbTip: thumbTipConvertedPoint, center: self.view.center)
switch state {
case .thumbUp:
emojiView?.text = "👍"
case .thumbDown:
emojiView?.text = "👎"
}
print("1. Vision Coordinates: \(thumbTipPoint)")
print("2. AVFoundation coordinates: \(thumbTipCGPoint)")
print("3. UIKit coordinates: \(thumbTipConvertedPoint)")
print(state)
}
}

 

Reference 

developer.apple.com/documentation/vision/detecting_hand_poses_with_vision

 

Apple Developer Documentation

 

developer.apple.com

 

반응형
댓글