IP Cameras contain a lot more features than just the ability to send video back to a recording system. Far from the basic camera that just streams video over a network modern ones have lots of smarts in them like the ability to track subjects of interest, local storage for recording right on the camera, and nightvision. One feature that caught my attention on one tech-rabbit hole session was the ability for some cameras to both send and receive audio. I thought that this. combine with an old internet router, and arduino, a speaker & microphone and some old tables (most of which I had lying around) could be combined to make a door entry system.
The user interface for the system is a pair of old Nexus 7 android tablets, one of them wall mounted and the other in a dock (does anyone make a 7-inch android tablet with a dock these days? Let me know if you know of one). Old tablets like these can be had for very little money these days second hand, and are perfect for applications like this. If they are running a basic version of LineageOS with no Google Play services they have more than enough power, and have everything you need like WiFi, speakers and microphone. If I was doing this from scratch and had to buy both tables (I already had one before I started this project) I would use Amazon Fire tablets because they are far more readily available.
The core of the system is an old internet router that I had lying around. This has openWRT flashed onto it, and this allows it to control the arduino micro that is connected to the USB port on the router. This opens the door by sending 12V to the electonic lock via a relay. I'm not going to go into much detail into how this works, as the arduino program is pretty simple, and adding extra code to openWRT is well covered on the internet. It's key to note that this router is not connected to the internet. The only devices on the network are the two android tables that act as the user interface and the IP camera and the web interface in the router itself. This is sipmly because having this camera be accessible on the internet was simply something I didn't need for this project, and frankly I don't like the idea of my front door being connected to the internet. Finally, the speaker and mic (seen below the camera below) can as a single module. You can get one like it here. If this link is dead, just search for "IP Camera Two Way Audio Speaker"
Initially I bought a cheep, morotized, no-name camera from China which was said to conform to the ONVIF standard. Although it did conform for some functions, like sending and receiving video and contorolling the position of the camera, the only way I could find to send and receive audio from the camera was via the proprietary app. I did decompile the code of this app, and look at the traffic in wireshark, but it appeared that it didn't send any data to the app directly. Rather it woudl send it to some cloud system first, and then back to the phone. Even if I was willing to use this app, it could not work in the scenario I wanted to use it in because it woudl have no access to the cloud service.
The camera I eventually settled on was the one you see above, a HikVision DS-2CD2145FWD-IS. This camera had fewer features than the no-name one but I went for it because I figured that if any company would conform to the ONVIF standard it woudl be HikVision. But things went in a slightly different direction
Sending and receiving audio and video data
The first thing I tried was ti implement the ONVIF audio backchannel specification in python, but I didn't get any aydio out of the camera this way. So, I found some apps on Google's app store for IP cameras, downloaded one that could send and reveive audio, setup the camera so I couls monitor its network traffic, and put the resulting packet capture while sending audio through wireshark. From here I found that a different API was being used: ISAPI ("Integrated Security API", not the more well known "Internet Server Application Programming Interface"). I found a document for this API here, and it turned out it was a proprietary API by HikVision. However, it was a well documented API, and its TwoWayAudio functionality seemed to suit my application much better. This is because I don't need to send audio to the tablets constantly, only when the "talk" button in the app is pressed. In the ONVIF backchannel scenario I woudl need probably have needed to have sent audio back with the video stream and increased and decreased the volume on the tablet when the button was pressed.
One problem I encountered with the audio was that it initially came out very distorted. At first I put this down to hardware issues, and this did account for some of it, but I got a real jump in quality when I changed the cameras settings from G.711ulaw encoding to 16bit PCM. In order to use the former, I had to use a peice of code that I pulled from a SIP client because android could not do it natively, and the resolution per sample wasn't bit enough. Normally I wouldn't go for PCM because it's a bit of a bandwidth hog, but in this case I was only doing 8000 samples/second, and there isn't going to be anything else on the network anyway.
Building the app
The one problem with TwoWayAudio functionality in ISAPI is that it handles the data transfer ina a very simple way, too simple for most standard HTTP libraries. instead of using something like websockets, ISAPI simply leaves the tcp/ip connection of the REST call to send data open and data is sent or received over that. This breaks the way most HTTP libraries abstract away the details of the raw tcp/ip connection. So, I had two options: either write my own HTTP library, including the authentication code, or hack up a pre-existing library to expose access to the raw tcp/ip connection. I went with the latter option, but in hindsight it might have been easier to go with something closer to the first, opening the connection using the HttpURLConnection class, and writing my own authentication code. The http library I was using was okhttp, and that had a lot of layers of abstraction which needed to be replciated. I couldn't just break the code off into one set of code for the audio and one set of code for the regular requests because of the way the authentication worked.
Apart from that writing the rest of the app was really straightforward. I had to play around with a few different RTSP libraries for displaying the video to find a lightweight one that introduced very little latency, and I settled on the rtsp-client-android library. Apart from this there was the small issue of making the tablet run just this app, and nothing else. I read a lot about android's "kisok mode", which sounded promising, but there doesn't seem to be much out there about how to get it to work without adding some kind of MDM (mobile device management) platform to your device. In the end I found it was esiest to just make the app a launcher app, and then to disable the lock screen Android's settings. This way whnever the tablet is woken up it will go straight to the intercom app.
The resulting code is all in the git repo pdovided. Now all I need to to go an install this thing.