React Native Radio

RNR 269 - React Native VisionCamera v3 with Marc Rousavy

Episode Summary

Jamon and Mazen sit down with Marc Rousavy to talk about the latest version of his popular React Native library, React Native VisionCamera.

Episode Notes

Jamon and Mazen sit down with Marc Rousavy to talk about the latest version of his popular React Native library, React Native VisionCamera.

This episode is brought to you by Infinite Red! Infinite Red is a premier React Native design and development agency located in the USA. With five years of React Native experience and deep roots in the React Native community (hosts of Chain React and the React Native Newsletter), Infinite Red is the best choice for your next React Native app.

Helpful Links:

Connect With Us!

React Native Radio - @ReactNativeRdio
Jamon - @jamonholmgren
Mazen - @mazenchami
Marc - @mrousavy

Episode Transcription

Jed Bartausky:

Welcome to another glorious episode of React Native Radio Podcast, brought to you by Apple Vision Pro, because my life will seem way more awesome when I look at it through a pair of $3,500 goggles. Episode 269. Nice. React Native Vision Camera V3 with Marc Rousavy.

Jamon Holmgren:

Mazen, have you ever been to Germany or Austria before?

Mazen Chami:

As long as the airport in Frankfurt counts, yes.

Jamon Holmgren:

You know what, I'll count that. You must be basically fluent in German by this point, right?

Mazen Chami:

Oh, absolutely. We're going to do this episode in Germany. Is that why?

Jamon Holmgren:

Yeah. Well, let's try. Yeah, let's try. No. Boy, I took two years of German in high school, but it's so bad and my pronunciation is so horrible that our guest, who I will introduce in a bit was cringing when he heard me say anything in German. So I'm not even going to say it. I'm not even going to try. But I did spend a week in Germany and it was fun. I was doing a road trip, so that was really cool. But our guest is not from Germany. He's from Austria. Marc, you speak German though, right? That's your native language?

Marc Rousavy:

Yeah. So it's kind of like the same thing, but it's not really, so it's kind of a southern accent in Austria.

Jamon Holmgren:

Okay. Yeah, I mean, it's Austrian German, so it's a little different flavor of it. Okay, that makes sense.

Marc Rousavy:

But the people in Munich, for example, they speak basically the same kind of accent.

Jamon Holmgren:

I don't know why, but I'm absolutely fascinated by accents and dialects in other languages. I don't really care about English, whatever. But other languages, I find endlessly fascinating, the different dialects, and sometimes they turn into full other languages, even though they're pretty similar. So yeah, that's kind of interesting.

Mazen Chami:

Any chance you're a football fan or soccer as we call it, in this great country?

Marc Rousavy:

I am kind of. We don't really have a strong national team in Austria, but I watched the World Cup and everything.

Mazen Chami:

Well, thank you and the Austrians for Marcel Sabitzer. He's a very good player.

Marc Rousavy:

Yeah, I agreed. Yeah.

Jamon Holmgren:

I knew Mazen was going to try to weave in sports in here.

Mazen Chami:

I have to.

Jamon Holmgren:

Hey, let the record show. It wasn't me this time.

Mazen Chami:

Yes.

Jamon Holmgren:

I did spend a week in Germany and my two years of high school German, where I got a C, and I got all As other than that in high school. But German was tough for me. It actually came in handy. I was able to have a conversation with an Uber driver who didn't speak English. That was kind of fun, even though it was just asking where he was from and how long he'd been in Berlin and stuff.

Marc Rousavy:

Do you know how to say, you know, speak a little bit of German if someone asked you if speak any German?

Jamon Holmgren:

I don't. I would say, what would it be? Okay, I'm going to embarrass myself in front of 5,000, 6,000 people, many of whom are Germans or speak German.

Marc Rousavy:

Oh no.

Jamon Holmgren:

But I would probably say something like nicht so viel

Marc Rousavy:

Oh, that's good, actually, that's good. Yeah.

Jamon Holmgren:

Really?

Marc Rousavy:

Yeah. Actually, that was kind of surprising. That was actually good.

Mazen Chami:

What does that translate to? Is that like so-so?

Marc Rousavy:

You said what? Nicht so viel?

Jamon Holmgren:

Nicht so viel.

Marc Rousavy:

Yeah, it means not so much. Like a little bit. If you want to directly translate just a little bit, then it would be ein klein das bisschen.

Jamon Holmgren:

A small amount?

Marc Rousavy:

Yeah,

Jamon Holmgren:

Okay.

Marc Rousavy:

Yeah. Kind of like that. Yeah, that's what most people say.

Jamon Holmgren:

I know klein means small.

Marc Rousavy:

Yeah, klein means small.

Jamon Holmgren:

Yeah. Okay. Yeah. See, I probably know more, and that was actually the thing I found out when I was traveling through Germany was that when I would read signs and stuff, I could actually read a lot of them and I could actually understand, and there were more words than I expected to know, but it was just more maybe the confidence and it was also just sort of gluing it all together. That was tough for me.

Marc Rousavy:

I mean, at least it's like the same letters, right? If you were to go to Russia or something like that, or I don't know, China? That's a bit more tricky.

Jamon Holmgren:

I also learned some Finnish because, my mom, that was her first language was Finnish, but she never taught us. But she also knew a very old dialect of Fin, over a hundred years old. That's when my ancestors moved to America was very long ago. And so their dialect survived.

Marc Rousavy:

So they were from Finland?

Jamon Holmgren:

Yes.

Marc Rousavy:

Yeah. Okay.

Jamon Holmgren:

Yeah, originally from Finland. My mom's a hundred percent Finnish. So, yeah, that was pretty cool to learn. A little bit of Finnish. And now if I were to go back to Finland, I've been to Helsinki once. I would be able to at least communicate a little bit. But the thing is most people speak English there, so I don't know. Just your accent's beautiful. It's like awesome. You're amazing at English.

Marc Rousavy:

Me?

Jamon Holmgren:

Yes. Absolutely. Yeah.

Marc Rousavy:

Okay. Wow, thank you.

Jamon Holmgren:

I mean, you don't sound maybe like an American, but I'm always blown away by how good people's English is.

Marc Rousavy:

I mean, the funny thing is about I speak German during the day, and then English, if I talk to clients or some people I work with, and then later in the day, my English skills kind of get less and less good. So it's becoming harder for me to speak the later it gets during the day. It's kind of weird to explain, but I mean, obviously you get a little bit tired and everything, and the first thing you notice is that your English skills kind of decrease, I would say. But for me it's like 6:00 PM right now, which is fine. But yeah.

Jamon Holmgren:

Is it coding? Does coding do the same thing? You start getting a little bit less effective as time goes on, or do you do better coding at night?

Marc Rousavy:

Actually, no. I think coding is, I don't want to say I can do it all day. That sounds kind of like nerdy, but I think I could do it for a very, very long time. Whereas speaking English is something, that's the first thing that's starting to fail.

Jamon Holmgren:

Yeah. Yeah.

Mazen Chami:

Switching the context in your head is tough. Switching the language context in your head all the time is tough.

Marc Rousavy:

Oh yeah. One thing I wanted to ask you. You said, I mean, have you been to Finland often?

Jamon Holmgren:

Just once. I went to Helsinki and gave a talk there at React Finland.

Marc Rousavy:

Okay, okay, I see. Because I think, I'm not sure how big of a thing it is there, but I see a lot of people going there too, for snow drifting. There's this one Austrian YouTuber who went there. I'm not sure if it was Finland or some other Scandinavian country. I'm not too sure about that. But yeah, they all go there with their Mitsubishi Evos, Nissan GT-Rs and everything, and just do lots of snow drifting there. I always wanted to try that. I mean, we have a little bit of snow here in Austria, and drifting is always possible here, but it's not as cool as a frozen lake or something like that.

Jamon Holmgren:

Oh man, yeah. Yeah, that'd be amazing. Northern Finland or really anywhere if it's cold enough.

Marc Rousavy:

Yeah.

Jamon Holmgren:

No, Finland was great. It was cold, it was like April. So it was about this time of year, and it was very cool.

Marc Rousavy:

Yeah. I bet it was.

Jamon Holmgren:

It was really cool. I really liked it. I should do intros. So I'm Jamon Holmgren. I am your host and friendly CTO of Infinite Red. I live in the USA Pacific Northwest with my wife and four kids, and I am joined today by my unparalleled co-host, Mazen. Mazen Chami lives in Durham, North Carolina with his wife and baby boy. He is a former pro soccer player and coach, and is a Senior React Native Engineer here at Infinite Red.

With us today is Marc Rousavy. He is the CEO at Margelo, and Marc is an Austrian full-stack developer, not Australian, for those of you in America. Those are two different countries, although he does apparently have some Australian in his background, which is really weird, but he's Austrian full-stack developer. He excels in mobile back, or sorry, mobile apps, back-ends and AI, which is kind of fun. Very similar to me. Works a lot with UI UX. He's done C-plus-plus, Node.js, and of course React Native. He created popular open source libraries. One of which we're going to be talking about today. And of course, also the CEO of the elite app development agency, Margelo and Marc and I have been connected on Twitter for quite some time. Really happy to have you on, Marc.

Marc Rousavy:

Yeah, thanks for having me. I'm really excited to be here.

Jamon Holmgren:

Awesome. Before we get started, I want to say that this episode is sponsored by Infinite Red. Infinite Red is Premier React Native Design Development Agency located fully remote in the US and Canada. If you're looking for React Native expertise for your next React Native project, hit us up at infinite.red/react-native, and don't forget to mention that you heard about us through the React Native Radio podcast. All right, let's get into our topic for today. Marc, you made React Native VisionCamera, and we're going to be talking about version three. But before we get into that, I want to ask you, how did you get into coding? You're a fairly young guy, but you've been doing this for a little while now. How did you get into it?

Marc Rousavy:

I guess it's kind of tricky to answer. I mean, I'm 22 now. I'm turning 23 next month. So I'm fairly young and I guess I kind of got into coding through video games. I was always a big, big fan of Call of Duty and those kinds of games. And obviously the small kid playing video games, he always wanted to be a game, I always wanted to be a game developer, which is a very unrealistic career. But, yeah. So I always wanted to do that and I decided to go to, we have this whole different school system here in Austria where you go to, I guess it would be high school for you guys when you're, which school are you when you're 17?

Mazen Chami:

High school.

Jamon Holmgren:

That would be high school.

Mazen Chami:

Yeah.

Marc Rousavy:

Yeah. So if you go to high school, you have this one exam at the end, right? I guess finals would be for you, which is almost university already. It's kind of like a bachelor's degree almost, but not really, right? So there's this one thing where, which is enough to get a very good job. And then additionally you can go to Uni, which I didn't do. I mean, I went to Uni for three days, but whatever. So yeah, I went into a school where you get this IT background and you know, you learn lots of IT and have this one IT exam as well. And there's different subjects for IT. We had data systems and software development and all those kinds of subjects.

So I went to that school and I think the first two, three years maybe, so it's five years in total. The first two, three years, I wasn't interested in programming at all, and I wasn't really good at school at this point, so I had bad grades and everything. But then in summer, my dad was like, yo, you need to get a job and everything. And then I thought, why not get a developer job? And I found a job where I did C Sharp for a big corporation here, and I kind of liked C Sharp. I was like, oh, this language is actually cool. It's kind of like the cooler Java. And I started doing lots of C Sharp. I started writing kind of annoying trolling software. Not really viruses, but annoying software was the first thing that actually made lots of fun and those kind of spread around school, right?

There was, at one point I had this one big annoying troll software where I could fully control other people's laptops, and even the teacher had it. It was very, very bad. So it was a full privacy violation and everything. So that was fun. That was kind of when I got into programming. And so after school, we have this thing in Austria where you have to do one year of military or civil service. I did civil service where I was a paramedic for one year, and I kind of got bored during a job. Sometimes you just sit there and wait until there's an alarm or something, and when there's nothing happening, you don't have anything to do. And I wanted to build an app. So that's how I got into app development. I started doing Swift first. I wanted to build a cocktail mixer app where you can film a glass and it tells you how much you have to put in to mix the perfect cocktail.

So a really tricky app as a first app, it was already camera related. So I guess this kind of moves into another topic then in a second. But yeah, I didn't manage to release that because it was just too tricky at that point. But I got into app development I then changed to React Native because I wanted to make it cross-platform as well. And then I started doing a used goods marketplace app, so kind of like Craigslist or whatever, but for Austria. And then I joined a startup and during that startup I started working on some of the libraries that we know today. So React Native MMKV, VisionCamera, and React Native Blurhash, some other libraries as well. And that's how I got into React Native and programming in general.

Jamon Holmgren:

How did you start Margelo? It's a very young age to be starting an agency. I started mine when I was 23, so I wasn't much older than you, but I was a little older than you. How did you start Margelo?

Marc Rousavy:

I did join the startup at that time. I worked for that startup and it wasn't a very early startup. It wasn't going to be successful, but at that time I thought, why not try it? And I didn't earn any money. So yeah, it was kind of a bit of a tricky time at that point. But I made all of those libraries. I put in countless of hours. I think I had a point where I was at 100 or 80 hours a week or something that, which is insane, right? That was, what is called summer break, I guess?

Jamon Holmgren:

Yeah.

Marc Rousavy:

So yeah, worked a lot. I did all of those libraries and then the startup failed. So I was like, oh yeah, what do I do now? And at that time, my libraries and my Twitter was becoming more popular, and the libraries gained lots of traction and lots of people started using React Native Blurhash, MMKV and VisionCamera. And actually, I think VisionCamera, it was around the time when I was about to, or when I published VisionCamera. So, React Native MMKV and Blurhash gained lots of traction and my Twitter gained lots of traction as well. And I kind of teased VisionCamera there a bit, and then I released VisionCamera and then everything started to explode. So there was lots of companies DMed me. I think Microsoft and all those kinds of big companies just reached out to me and they were like, yo, we saw your libraries and whatever, real personal DMs, not like the normal job offers you get on LinkedIn.

So I was like, oh wow, okay. So this is actually a very, I would say it's a volatile, is that how you pronounce it? Volatile space? Yeah. But it was so fast moving that I was like, okay, you can really pull something there. You can really invent something there. And I started putting more effort into those libraries. And at first it sounds stupid, if you try to explain this concept to other people. You put countless of hours into an open source project. You basically upload your code and everybody can use it for free. You get GitHub sponsors. I got lots of GitHub sponsors. So huge shout out to everyone who's sponsoring me on GitHub. I love you all. At the end of the day, it's not going to pay for rent and everything, but it is good amount of support. So it becomes really interesting though, when lots of companies or bigger companies start using your libraries, right?

So that's the whole idea of React Native, everything is open source, everything is a third party library with the Lean Core concept. So everybody pulls in another library, and then there's multiple people maintaining multiple aspects of the ecosystem. So for me, it was like MMKV, local storage kind of alternative for React Native and cameras. So I did VisionCamera, and basically at that point, VisionCamera was already, it had more features than all the other camera libraries, but we can talk about that in a second. So every app that had a camera, every React Native app that had a camera kind of thought about using VisionCamera. So there was the point when I got lots of job offers and I was like, okay, I really want to work for basically all of those companies or most of those companies, but I don't want to decide which one I want to work with all the time.

So I was like, okay, I'm going to do some freelance work. The first project that I actually worked for was ClipDrop, which is a French AI startup, which allowed you, and I think everybody has seen it at this point, this is the top post on Reddit on r/reactnative of all time, where they have their camera and they scan this flower pot and then hold onto it, then move over to the laptop and release the finger and it paste it with a cutout onto the screen, which is insane.

Jamon Holmgren:

I've seen that. It's amazing.

Marc Rousavy:

Yeah. Yeah. You've seen that. It feels like magic. And I worked for them. I built a quick editor for them, which is a Native Module and everything. So this was also the point where I was like, okay, I think my area of interest is Native Module development. I really like React Native. I'm really good at building apps and writing efficient JavaScript code and clean types and everything, but I'm mostly interested in this native kind of structure and then abstracting things. So I was like, okay, API design is what I want to do. Then I joined Expo for also a very short duration. I was like, I think two months at Expo, because Charlie DMed me. Charlie is a really amazing guy. I'm not sure if, have you talked to Charlie before?

Jamon Holmgren:

Yeah, yeah. Charlie and I are fairly close. He's awesome.

Marc Rousavy:

Oh, that's awesome. Okay. Yeah, no, Charlie is Charlie is absolutely awesome. I met him in San Francisco actually in January, and then I worked at Expo, but I think after that I decided to just create an agency or make this freelancing more like a group project almost. So it was all very organically. I was like, okay, I have too many clients right now, too much traction. I want to do everything. I know a few people through Twitter, I was like, yo, let's do this together. Let's do three projects at a time. And then we're three people. And then we kept growing, growing, growing, and then this kind of structure formed around it. And now we have an office here in Vienna. We have employees, we have an assistant, we have HR, and she's doing HR as well. We have accounting and everything. So it's kind of like organically, but at the same time, it's also this kind of very specific market that we fit in.

Jamon Holmgren:

How big is the company? How many people?

Marc Rousavy:

So we're 11 in total. We have, most of our teammates are under freelance contracts because that's, I think, the simplest approach for working together. And then in Vienna, we are five people.

Jamon Holmgren:

I mean, that's really good. You're in an accelerated timeline. I didn't get my first employee until I was probably, oh, I don't know, I was probably 26, 27, something like that. And then I grew up to about 12 people by the time I was, I think 30. So it took me a little longer to get to where you are. That's amazing. You're doing great.

Marc Rousavy:

It's going very fast. Yeah, it's kind of cool, but also kind of scary.

Jamon Holmgren:

It's a little scary.

Marc Rousavy:

It can go up fast. It can also go down fast.

Jamon Holmgren:

Absolutely.

Marc Rousavy:

But yeah, just my role here in this company is making sure that everything's stable, everything works, and we're on the right track.

Jamon Holmgren:

Fantastic job.

Mazen Chami:

That's pretty cool. I mean, running a company.

Jamon Holmgren:

Thank you.

Mazen Chami:

And then, doing all these open source stuff. But I kind of mentioned it in the chat a little bit earlier. Your name sounded very familiar, and then when you mentioned MMKV, I was like, that's where I've heard. That's where I've heard your name before. That's a pretty cool library. But anyways, we're shifting to VisionCamera here. What's the inspiration? What inspired you to actually build that library in the first place? Because it sounds like a very niche one. You don't have a lot of apps that necessarily use the camera, but when they do, they rely on it heavily, right? So what inspired you to do that one niche like specific?

Marc Rousavy:

As I said before, everything was kind of organically. In the first startup that I joined, we, our main competitor was Snapchat. So it was already very tricky to succeed. But yeah, we wanted to have a camera that's high quality, starts really, really fast, can do all those kinds of modern things that a Snapchat and Instagram camera can do. And if you just think about a camera right now, it might sound very simple, but then if you use, for example, React Native Camera or React Native Camera Kit, I mean React Native Camera Kit, you can't even record videos. It's very simple and it's very cool for taking photos and everything. It's a cool library, and I took actually lots of inspiration from it, but then you can't really record videos. And then we took a look at React Native Camera, and in React Native Camera, you can record videos, but it's a bit tricky. Some things are kind of broken. It was unmaintained at the time already.

And then we found all those kinds of very, very specific things. For example, if you record a video on Snapchat and then double tap the screen, it continues the recording and flips the camera and then it records from the front camera or from the back camera, which sounds simple, but that's not an out of the box feature, and that's really, really hard to implement because, so I don't want to get into too much technical stuff, but just this as an example, this was one thing that was really annoying for our CEO at the time. It was like, yo, this should work. And I thought it would work out of the box. But then I took a look at the native iOS code, and the way you implement video recording was by using this AV Capture Video file delegate, and you just started recording. It writes to file, and then it stops it.

And if you want to switch or flip the camera, you can't really do that because the video capture output is bound to the camera input. So you can't really continue the recording there. I mean, you could use some magic like stopping the recording and starting it again, but then you lose some frames and some, I guess some video frames and audio frames, and it gets really, really tricky. So what you had to do, or what we had to do was implementing AV Capture Video data output delegate, which is receiving buffers, like the raw video buffers and raw audio buffers, and then manually write those buffers to the file. So building a new media encoder. There's some helpers there in the iOS framework, but as you can probably imagine, that's lots of lines of code just for this one simple or seemingly simple feature. So all those kinds of features kind of summed up.

What else was it? I think recording while music or Spotify is playing, that's also something that's really tricky to do with React Native Camera. And then we just decided, okay, React Native Camera also doesn't have any filters or custom filters. I mean, I think there's like sepia color filters or something like that, but not 3D masks and everything. So I was like, okay, this isn't enough for our, I guess use case. So we need to build our own camera library. And that's what I did. So I created a new Native Module. I did have some experience at that time from React Native MMKV and Blurhash. But yeah, that's kind of how we built VisionCamera. At this time, it was still a private native module, but I built everything very generically and, me, loving API design, I was like, okay, I can make this very generic.

I can make this work for every use case. Because some people might want just photo, some people just want to record video. Some people want to record GIFs or video without audio, and some people might want it to start really, really fast. So that's when you don't want to use video stabilization because that's a hardware feature and also takes time to put up. So all those kinds of very, very specific things. And I could really be a nerd and talk about all of those kinds of very, very specific things. But that's way too much for a single podcast right now. I think that video stabilization is a good example. So there's like three stages: off, on, and hardware. And the video stabilization being off obviously starts the quickest. That's how it works in Snapchat. So if you record a video and you have shaky hands, it's actually shaking a lot.

If you record the same video on Instagram, which is a similar camera to Snapchat, it has basic video stabilization, but it starts a little bit slower than Snapchat. Difference being that Snapchat's main or home screen is the camera. So that's why it needs to be as fast as possible. And on Instagram, it's one separate step away. And then the most tricky video stabilization is obviously hardware, which takes almost, I think two seconds or something then on my iPhone 11 to start up. And that's the cinematic mode on your stock camera app. So yeah, all those kinds of very specific features made me build a separate module for that and pack all of that under a nice and easy to use API.

Mazen Chami:

That's pretty cool. And I mean, you just mentioned some things that make VisionCamera stand out from the core ones that are available out there, which is amazing. And your documentation is also very well written. So I think that's very helpful for developers out there trying to do that.

Marc Rousavy:

Thank you.

Mazen Chami:

Yeah. So V2 is out right now, and I see you're work, you're currently working on V3. I want to say something real quick that you posted. I think it was you that posted this, right? Let me find, yes, you posted this, and I think this is amazing for people that aren't native developers or maintainers of libraries that are working on the pull in, all you do is npm, yarn add, VisionCamera, follow the docs on the first page, and you're done. Well, for those people, one thing you posted on here was V3, how much code is involved in V3?

And I think some people should hear these stats because this is incredible for you to be doing this in your spare time at the same time. So if you wanted to build an app from purely native, you're talking about 700 lines of codes across five files. In addition, about 300 lines of very low level C Style code, about 40 lines of Metal Shader code. I don't know what that means, but that's still 40 more lines than I know, 30 lines of code to set up face detector. Meanwhile, on the other hand, if you're using VisionCamera, it's four lines of code, 35 lines of code for face detector and 13 lines of code for frame processing. You add all those up, there's no competition there. So I think that's an amazing stat there. And this is all coming from, I'm assuming V3. So what is, for me, as someone who's developing an app and leveraging this library, what are the benefits for me to move from V2 to V3? What am I gaining? What new features are available for us?

Marc Rousavy:

So in V2, VisionCamera aims to provide all of the basic camera features that you want to use in an app like Snapchat and Instagram, but without the filters aspect. And for V3, I guess the biggest feature is filters. And with filters, I mean there's different kinds of filters that you can do in a camera app. So first of all, there's very simple color filters, which are, I think possible with React Native Camera as of today. I'm not sure about that, but I think it is something like a sepia color filter or something like that. Then there's some 2D filters, let's say, you want to draw a rectangle around a face and update that in real time. If it's face tracking me or blurring a face, for example. Or if maybe you want to build a real time license plate blurring app so that every time you know, look at a car's license plate, it automatically blurs the plate.

And then there's 3D filters, which are really, really tricky. There's completely separate modules for that. So on iOS, you wouldn't use a normal AV capture session or camera, whatever. You would use this ARkit session that you set up. And then it sets up all kinds of 3D context, and then you have your models in there and then bind them to faces and have these landmarks, landmark points and VisionCamera V3 aims to support basic filters and the 2D filters. For 3D filters, I don't think that there's a way to build that into a camera library. That's also, without compromising performance, and I guess maintenance speed as well, because if I actually implement 3D filters in VisionCamera, then I would have two separate camera sessions and then something like, I don't know, video stabilization won't work in the 3D filter session, but only works in a 2D filter session.

I think that's where you can draw the line, if you really need 3D filters, then I guess this just can't be done in a simple module that can do everything. So I guess this is where you have to draw the line. 3D filters or something you have to build yourself and make them specifically for your app. But VisionCamera V3 aims to support basic filters and 2D filters. And all of this is powered by Skia actually. So I'm very active in the ecosystem, obviously, and everybody wants to use the latest tech and everything. So this is the philosophy I have with VisionCamera as well. It is a lean core concept, so it doesn't even have a face detector in there. So you can build everything using a frame processor plugin. And I'm kind of mentioning lots of different things right here, but it all makes sense in a second. Sorry.

So with the lean core concept and React Native Skia integration, you can combine the two. And if you want to enable this feature, you can set up a Skia canvas, or a Skia preview view, I would say for drawing stuff onto the camera per se. This means, whereas right now you have this one camera component, which is kind of like the preview view in your app, and you see what the camera sees. If you want to draw stuff onto that camera view, you have to set up a separate canvas and everything. And that's what this integration allows you to do. So it allows you to set up a React Native Skia canvas, and then you get the frame and inside the frames coordinate, let's say the frame is a 4K frame, you can draw, for example, if you run a face detector module, you can draw a rectangle box around the user's face.

Previously, you would have to show a separate view on top of the camera view in the React Native view coordinate system, and move this box around with screen coordinates, which is hacky and not really embedded in the recording as well. So yeah, this is one thing that it allows you to do. And again, as I said, you know, can also do license plate blurring or VHS filters, for example. Snapchat also has some VHS filters and distortion and some text on there, or inverting colors, sepia filters, all those kinds of things that you can do with Skia. You can also do in VisionCamera then. And yeah, I think this also includes beauty filters, would also probably be very, very simple to implement. And the good thing about this is that it builds a foundation or it provides a foundation that allows you to embed all of this in the preview and in the recording and in the final folder as well.

So yeah, let's say you want to blur something on a license plate. You can see that on the screen in real time at 60fps or maybe even 120fps. You can take a photo and it's still blurred in a photo and you can also record a video. And the thing about photos is it's a different resolution obviously, than a preview view. And you can also embed it in a video where it's already pre-recorded, there's no after processing. There's no post-processing to it. You record it directly frame by frame. And then with V3, I was like, okay, I'm doing a new version, but what about the people that don't use those kinds of 2D filters, because it's a very small user base, I would say. So I was like, okay, let's also build a bunch of other features. So sync from frame processors, that's the second thing.

It's required for drawing onto a frame because you can draw asynchronously, you have to draw synchronously. So frame processors will be synchronous, meaning everything you do blocks the next frame from coming in. This has lots of benefits, obviously, but also drawback. If you want to do some very long processing, then it needs to be in sync, if you want to do some complex face detection or post detection and this cannot run at the same time that the camera can run, then you need to run it asynchronously. And for this, I implemented also a helper function called run-async, and you also have a function called run at target fps, which allows you to throttle, I guess alumna function as well.

Then we have React Native 0.71 benefits with a much simpler build setup. We're up-to-date with the latest React Native version. I'm thinking about making it a turbo module, but then this would block lots of users that are not yet using Turbo modules from using it. So this is maybe just a version 3.1 thing, and then we have also two more interesting things. No, no, three more very interesting things. So first is a new declarative API for device and format selection. I think this is something that we can talk about here as well because the way you, I'm not sure, have you guys used Vision Camera before?

Mazen Chami:

Yes. Jamon hasn't, but I have.

Marc Rousavy:

Okay, so did you select a very specific device, like a camera device, or did you just use the default device?

Mazen Chami:

Default.

Marc Rousavy:

Okay, so I think this is what most people do. I think most people just use a default camera device, but if you have, what kind of phone do you guys have?

Mazen Chami:

I have an iPhone 12, I think it is.

Marc Rousavy:

12. Okay. So on an iPhone 12 and on my iPhone 11 Pro and any, I guess latest iPhone, you have multiple cameras and you have multiple capture devices actually. So for example, I'm holding my iPhone 11 Pro here on the front side. I have actually, I think two, I have two cameras on the front side. I have this face, how do you call it, FaceTime, front camera. And I have the face ID, is it infrared, I guess device?

And on the back I have three more cameras, which is the wide angle. This is the normal or default camera that everybody uses. Then we have the ultra-wide angle, which is the fisheye effect, like the zoomed out or 0.5 camera. And then I have a telephoto camera, which is the more zoomed in camera. So if you zoom into two x or three x, then it switches to this camera, which is, it has a different feel of view and better quality at a zoomed in state.

So there's three cameras. If you, or I think most apps and all React Native cam or all other React Native camera libraries, just use the default camera. So that's the wide angle on the front and the wide angle on the back. There's no way of zooming out to the ultra-wide angle or the telephoto camera, in VisionCamera there is because, the user is in full control over which device you select.

And it gets even more interesting because there's also virtual camera devices. For example, on my iPhone 11 Pro, on the backside there's this virtual or logical device that's called the Multi-cam, and that's all three devices combined together. So I can just start in a normal wide angle camera, which is the camera everybody uses, and then smoothly zoom out and it automatically switches under the hood without any black screen or anything to the ultra-wide angle camera, which is the default behavior that you have in your camera app. So if you open your camera app and then zoom out, it automatically switches the cameras and this is, it's not tricky to implement with a VisionCamera. You can just find the multi-camera device and then just use that, pass it to VisionCamera and it automatically sets everything up for you.

Mazen Chami:

That's pretty cool. I just learned something new today.

Jamon Holmgren:

Yeah, that's really cool. And that's the type of thing where when you think about this library, you probably wouldn't even think about that as being a whole thing that you have to think about. But it's like when you actually try to implement these features, then the complexity starts coming through.

Mazen Chami:

And based on what you mentioned, I assume once you capture a picture or a video, you're getting a higher resolution image at the end of the day. Is that correct?

Marc Rousavy:

Yeah, if the preview that you see on your screen is small, it's a different type of input that's streaming in from the hardware sensor. And when you take a picture that's a separate, I guess, call to the native hardware sensor, which is a higher resolution image, and that's delegated through the actual camera that you are given or with the given zoom state right now. And then it resolves with a high resolution picture. There's also multiple different steps or configuration options there. On iOS, you can tell the camera session, what do you want to prioritize? Speed or quality? Or I guess there's also an in between balance mode. But yeah, there's lots of configuration there.

You can tell iOS to also deliver. I think the delegate has seven steps. So you can have the state where I think there's a callback for before flash, while flash and after flash, and then before take picture. Then there's the raw data available callback, then there's like the depth data callback, and then there's the actual picture callback, and then there's the, I think shutter effect. There's also a callback when exactly the point when a shutter effect is being called. So all those kinds of things, you get callbacks for that and then you can decide what you actually want to do. So one, also fun. I'm dropping a lot of fun facts here. One fun fact, the iOS camera app, when you take a picture, it saves or it screenshots the preview view and puts it in the bottom left corner because the high resolution picture isn't fully available yet. And this way it seems like the picture is taken quicker.

Jamon Holmgren:

That's really cool.

Marc Rousavy:

So that's actually one of the fun thing that you can work with.

Jamon Holmgren:

Yeah, that's a neat user experience, affordance. That's really neat.

Marc Rousavy:

Yeah. Remind me to talk about the Snapchat trick later when we talk about Android, because that's, I think the coolest fun fact I know.

Jamon Holmgren:

It's all smoke and mirrors, isn't it?

Marc Rousavy:

So you know, you can select different devices depending on what you actually want to achieve. As we said before, there's multiple camera devices available on your phone. The 0.5 x camera, the ultra-wide angle camera is also not very suitable for I guess low light conditions. And I guess in most cases you really just want to use the one X camera by default or the wide angle camera by default because adding multiple cameras to your camera session. So using the multi-cam device also increases startup time. So if you really want to use, or if you really want to allow the user to zoom out or zoom in to the different camera devices, there is a compromise in performance and that's what VisionCamera is about. I want to have the user decide what he actually wants to do. If the user wants to prioritize speed like on Snapchat, then he will probably only want to select a wide angle camera and he will disable video stabilization.

And the same thing also applies to formats. So on one phone you have N Devices and per device you have N formats. So let's say I take the default wide angle back camera on my iPhone 11 Pro, I think there's, what was it, 30, 40 formats available? And there's like a format with a resolution of 192 by 192, which is a very, very small thumbnail resolution. And then there's 5K formats and then ProRes and some formats record in different photo resolutions, different video resolutions. Some have very, very high photo resolutions, but then low frame rates because they can stream in all of this resolution at 60fps and some have lower resolution. So for example, there's like a full HD format, but you can record in 240fps. And then VisionCamera provides an API or an easy to use API to actually select a different device or a different format depending on what your app wants to do.

If you don't want to record any videos, then you don't need to look for formats that have high FPS or good video quality or something that your primary focus would be photo resolution. So it is up to the user to filter the formats and you get all of that through one single API call to get all of the formats and filter through them and then pass the matching format to the camera component. And by default, it uses the default, most best matching format or whatever, which works in every case, probably. This is a very advanced feature, and I want to make it even simpler to use. Right now you have to imperatively sort the formats and find the best matching one. I want to make that even simpler maybe by using a builder format. So I haven't quite thought that out yet, but something like format dot where I don't know, something like quality is highest and FPS is highest and not photo quality is highest, something like that.

And then we have two more interesting features. So that's the Android rewrite. On Android, I use camera X. Okay, there's three different libraries on Android for cameras. On iOS, there's just this one AV foundation you use that, it gets improved. That's it. On Android, there's three libraries. There's camera one, camera two, and camera X, whereas camera X is kind of built on camera two. So in reality there's two hardware obstruction layers. That's like camera one and camera two. There is a camera three hardware obstruction layer, but that's what camera two uses. So it gets really complicated, but in reality there's three libraries that you can use or three official libraries that you can use. And camera X is the easiest to use. Camera two is the hardest to use, and camera one is deprecated. If camera two, you have more flexibility with the compromise of some devices not working.

Some devices like some Huawei devices requiring some workarounds and some custom patches and it's really hard to use. But yeah, I use camera X for now and I want to rewrite it to camera two, which is going to be a huge pain. But yeah, I think I need to do that to support, because right now you can record slow motion on Android, and that's kind of annoying me. So I really want that to also work on Android. And for that I need camera two. And then this is a very big under the hood change, which allows many more features and much more stability to also work on Android. But then with the compromise of some features that worked before not working again and requiring me to build a fix afterwards. So it's going to be a tricky transition, but yeah, it's going to be fun.

And then lastly, I want to replace the workload runtime or make it more stable to integrate the whole frame processor thing. So right now I'm using Reanimated and I built frame processors around Reanimated V2, which required bunch of changes in the Reanimated Repro. And so huge shout out to the Software Mansion guys for reviewing my PRs and getting stuff merged and helping me and answering my questions.

Jamon Holmgren:

Yeah, they're great.

Marc Rousavy:

Yeah, they're amazing. It was a huge help. But yeah, so VisionCamera depends on Reanimated V2 to provide worklets. A frame processor is a worklet. I'm not sure, I'm just quickly going to explain what it is. If you have your camera component and you want to add some kind of frame processing, so this is a very generic topic, let's say face detection, maybe even hand detection, post detection, number plate scanning, or even QR code scanning, all those kinds of things, or even WebRTC. You want to upload your frame to implement a video chat like FaceTime.

All those kinds of things require you to process the frames the camera sees. And before VisionCamera, let's say a React Native Camera, React Native Camera Kit, this was not possible. So what you had to do is either implement your own native camera module or patch React Native camera with a very custom code, and we all hate patches or forks. So with VisionCamera and worklets, all of this changed because I implemented a feature where you can write a function called a frame processor. It's this one hook, it's called Use Frame Processor that gets called on every frame the camera sees. So if your camera runs at 60fps or even 240fps or 30fps, whatever, this function gets called for every single frame and you receive the frame as a parameter, which is magic.

If you think about it. It's mind-blowing. It's real magic. The frame. If you have a 4K camera, that's like a 10 megabyte buffer each frame that's getting sent.

Jamon Holmgren:

That is incredible.

Marc Rousavy:

Yeah, it's getting sent to JavaScript basically. So you have a new 10 megabyte buffer allocated on a GPU, 10 megabytes per frame on each frame, 60 times a second. That's insane.

The magic behind here is that the frame is a host object and it's not copying the frame at all. So you just have this one thin wrapper around the actual 10 megabyte GPU buffer, which allows you to access stuff on the frame. Let's say we're all staying in JavaScript here, so that GPU buffer is coming from C-plus-plus, and in JavaScript, you can do frame dot width and kind of just log it to the console and on every frame 60 times a second or even 240 times a second, it outputs the frame width, which would be, let's say 4,000 something. And then you can also, in V3, fun fact, you can also convert the buffer to a JavaScript array buffer, which is obviously not really fast because it's copying from the GPU to the CPU into JavaScript. So that's obviously not recommended. But for debugging or for slower running operations, something I give you, if it's fine to run it once a second or something like that, or maybe even 20 times a second, you can do that.

Mazen Chami:

Yeah, the GIF you have in the pull request, which we'll link in the show notes, you can actually see it in action with the code. It's just pretty cool how it just kind of overlays in this picture, the red box on your face.

Marc Rousavy:

Yeah, that's one example of, or the first draft that I got working, what a pretty exciting day for me when I integrated Skia into VisionCamera. And it kind of really showcases how simple frame processing in general, the general abstract topic gets with React Native. And this is not just a library for React Native, this is in general, the whole AI and realtime processing space on mobile got way simpler with this because it's way simpler to do this in React Native right now than in native apps. I'm not sure how Flutter works in this area. I should probably take some a look at that there. But this is as simple as it gets right now.

Jamon Holmgren:

So Marc, unfortunately we're running out of time here, but this, I've just been like soaking this in. This is unbelievable how much work you've done and also how deep this topic goes. You wouldn't think React Native Vision Camera. You think about that and it's like, okay, how much depth can you get out of one particular feature, one particular library? But as you can see here, we could probably go two or three podcast episodes and you still would have more to say. It's incredible. Obviously this is a passion too, because it kind of has to be in order to get this level of depth. It's pretty amazing.

Marc Rousavy:

Yeah, of course. Yeah, thank you. It's lots of stuff. It's interesting stuff, but at the same time it's just so many hours.

Mazen Chami:

Put into this, I'm sure.

Jamon Holmgren:

Oh yeah, absolutely. And I've known that a lot of times people look at stuff like this and they'll be like, wow, Marc is really talented, which is true, you're a talented developer, but a lot of it is hard work. It's a lot of hard work too. It's like people can do this. It's just a matter of will you sit down and learn all the APIs and really think through what the public API should look like? What should it look like, the interface between that and React Native, and how will you utilize GSI and how will you utilize The Bridge and how will you handle the new architecture when it lands and everything else. I really hate to interrupt you, but we are unfortunately out of time. Thanks so much, Marc, for coming on, and I know people are going to have more questions for sure. So where can people find you online so they can ask you more questions on Twitter or wherever else?

Marc Rousavy:

So if you want to ask React Native questions, you can follow me on Twitter at M R O U S A V Y, which is my name everywhere. So GitHub, Twitter, everything. And if you have some very specific camera questions, like 3D models and all those kinds of filter stuff, you can also send me an email. Everything is linked on my Twitter as well, so yeah, you'll find a way to contact me.

Jamon Holmgren:

That's awesome. Yeah,

Marc Rousavy:

And I'll post updates there as well on VisionCamera V3 and some fun GIFs to watch about some projects I build with VisionCamera, some examples.

Jamon Holmgren:

Very cool. Well, I'd love to have you on at some point onto my Twitch channel when I do restart that. I do plan to do that. Maybe you and I could hack on some camera stuff or even MMKV. You'd mentioned AI. I do want to make a little side note here. One of my first AI code mod like experiments was making an AI that would convert React Native project from AsyncStorage to React Native MMKV.

Marc Rousavy:

Oh, that's cool.

Jamon Holmgren:

It actually worked. It actually did work.

Marc Rousavy:

Wow.

Jamon Holmgren:

I was able to get it to work. I was kind of blown away. Yeah. So I could show you that and we could play around with that because I don't know if we've mentioned, but you created React Native MMKV, so that's relevant here. I do want to also say you can find React Native Radio at React Native R D I O.

You can find Mazen @mazenchami and me @jamonholmgren on Twitter. You can also find me on Bluesky, the new Bluesky @jamon.dev. Thanks so much to our guest, Marc, for joining us today. And as always, thanks to our producer and editor, Todd Worth, our assistant editor and episode release coordinator, Jed Bartowski, our designer Justin Husky, and our guest coordinator, Derek Greenberg. Thanks to our sponsor, Infinite Red. Checks out at infinite.red/react-native. A special thanks to all of you listening today. Make sure to subscribe. And Mazen, do you have a Robin's mom joke queued up?

Mazen Chami:

I do.

Jamon Holmgren:

Let's hear this. Let's get it over with.

Mazen Chami:

You have to savor it, Jamon. What do you call a fish wearing a bow tie?

Jamon Holmgren:

I don't know.

Mazen Chami:

Sofishticated.

Jamon Holmgren:

Sofishticated.

Mazen Chami:

I need to do a better Sean Connery impression there. Yeah, but that one was brought to you by,

Jamon Holmgren:

What does Sean Connery call fish?

Mazen Chami:

Yeah, no, I'd be interested. That was brought to you by Frank Calise.

Jamon Holmgren:

Oh, thanks a lot, Frank. We really appreciate it. We'll see you all next time.