|Version 13 (modified by piers, 7 years ago)|
Information on Development and Operation of VIC
Vic's origin's and operation are described in the Van and Steve's orginal paper vic: A Flexible Framework for Packet Video from ACM Multimedia 1995.
NEW: Working with VIC
Mark Petrovic provided VIC with the very useful 'DirectShow?' capture interface grabber-win32DS.cpp last year. He recently posted his experiences and some very useful details on his blog UCL vic: Tcl, C++, and DirectShow.
Below are some notes on VIC's operation by P.O'Hanlon - (there maybe some inaccuracies):
Vic uses a combination of C++ and tcl - it implements a number of tcl procedures (in C++ ) which are then called by the tcl scripts which create various objects and connect them together and also providing the user interface. The whole program is single threaded and operates around a event handling loop Tk_MainLoop() (vic/main.cpp:758) which services the timers and file descriptor events (e/.g. packet arrivals) which then call the appropriate event handlers instantiated by vic.
Video grabbing/capture and sending
Basically the encoding process starts with the grabber code being started by clicking on the transmit button which calls the start() on the Grabber class. The grabber is subclasssed into a number of classes which are specific to the video format that is required for the particular codec selected (each codec is contains it's frame-format as it's constructor argument (FT_...) - which is mapped to frame-format for the grabber by the function 'Module::fttoa()' when frame-format is called on the encoder object - the plumbing for transmission is done in ui-ctrlmenu.tcl). E.g. in the case of JPEG it is "422" (for H261/263 its is CIF/411) - On windows vic/video/grabber-win32.cpp:851 void Vfw422Grabber::start()) - these methods do conversion from the native grabbed video to the desired input format for the codec. This then calls the general start method of the particular grabber (in the specific case of win32 it is the vic/video/grabber-win32.cpp:584 VfwGrabber::start() method). This starts a timer - in the windows case it is slightly more complex as the windows grabber code uses callbacks which may only be attached to window event handlers so an invisible 'capturewindow' to which the grabber callback (vic/video/grabber-win32.cpp:973 VfwGrabber::VideoHandler()) is attached. The VideoHandler then called when the capture device has a frame for vic which then passes it to the grab method VfwGrabber::grab() which does the conversion and finally calls the consume function - which is the entry point to the encoder;
For more info on windows video capture: http://msdn.microsoft.com/library/psdk/multimed/avicap_3bfs.htm
The encoder classes are subclassed from the TransmitterModule class (vic/module.h). Every encoder subclasses a specific encoder from TransmitterModule to create an encoder - see for example vic/codec/encoder-jepg.cpp which will implement a method: int encode(const VideoFrame*) which encodes and returns the size (in bytes) of the resulting encoded frame. This encode method may have various other encoding stages (eg.encode_blk() in JPeg). Ultimately every encoder makes the consume (which usually calls the encode()) method available which is the entry point to the encoder int consume(const VideoFrame*). The VideoFrame object is generally a subclassed object that contains the raw (converted) image data from the grabber (in whatever format the grabber has obtained it in) - in the JPEG case it is JpegFrame (see vic/module.h). The encoder is called by the grabber as above. The video frame sizes are set in the VideoFrame? object by the grabber and the encoder checks the the size when its consume method is called.
The encoder creates the resulting packet in a pktbuf* pb class - the constructor method of the packetbuffer class inits the RTP header and the resulting codec data is put in the rest of the packet. At the end of the encode method the flush() method is called which calls the send method: SessionManager::send (which is a subclass of Transmitter ) implemented in rtp/transmitter.cpp:199 void Transmitter::send(pktbuf* pb) - this starts another timer to schedule to the packet output dependent upon the choosen bitrate and frame rate in the GUI. The timeout method (void SessionManager::timeout() (implemented in vic/rtp/transmitter.cpp class) then send packets from a buffer by calling the SessionManager::output(pktbuf* pb) (implemented in vic/rtp/transmitter.cpp class) method which calls the loopback method which loops the packet back into the decoder section of vic for local display and then calls the SessionManager::transmit() method (implemented in vic/rtp/session.cpp) which calls the actual network level send - which can do layered transmission if required.
There is also a framer-jpeg.cpp - this is a helper class for formats that use hardware assisted coding. Additionally there is a compositor.cpp class which in principle (I haven't used it) allows an externally supplied image to be composited onto the video as a simple watermark.
Reception and Decoding of video packets
The decoder section of vic works when the SessionManager object is created and initialised in tcl (in vic/tcl/cf_network.tcl) net_open_ip or ip6 which creates two network objects per layer (for JPEG there's only one layer) - one for RTP, the other for RTCP packets. The RTP (or data) packets are handled by DataHandler (in vic/rtp/session.h) which links the receive file descriptor of the RTP network object to the into the TCL mainloop - which enables calling of the DataHandler::dispatch() method (which is subclassed via the Transmitter class from the IOHandler class (vic/iohandler.h). When a packets arrives the DataHandler::dispatch(int) (rtp/session.cpp line 118) method is called which in turn calls the SessionManager::recv(DataHandler* dh) method on the session manager which then calls SessionManager::demux which demultiplexes the RTP packets based on RTP SRCID (for each stream a separate source object is created and indexed through a HASH table by it SRCID). The corresponding source object is found and the video decoded depending on its payload type. If no source object is present - (i.e. It is the first packet of new stream) and new source object is created after two packet arrivals which is then activated firstly by calling the Source::activate(int format) (in rtp/source.cpp line 263) which then calls the tcl procedure activate (tcl/ui-main.tcl line 626: "proc activate src") which creates a new decoder based on the RTP payload type (eg JPEG). After a small timeout an other tcl procedure (tcl/ui-main.tcl line 646: really_activate) is called which creates the video thumbnail window (in the build.src procedure, and then attaches to the, previously created, decoder to the source object. The video from that source is then displayed in a window, which can be clicked on to see full motion video from that source.
The decoder (eg. vic/codec/decoder-jpeg.cpp) implements MotionJpegDecoder::recv(pktbuf* pb) which is passed the packets received via the demux operation above. The packets are read in with the RTP header and the JPEG data is passed to the decoder portion of codec which decodes the packets and finally calls the consume method BlockRenderer::consume(const VideoFrame* vf) which actually display the video - the render may be called less often and the video output maybe resized depending on where it is being displayed. It may be displayed in the thumbnail (small and slow rate) or in video window (larger and full rate decode).
Basically each tcl procedure is implemented in C++ and the command() method implements command options (eg. "grabber decimate 2" sets the decimate variable in grabber procedure [implemented in C++] equal to 2) At the moment just command "q" is implemented which just allows the user to alter the Q factor of the video.
The image size is selectable from the user interface - depending upon the type of video input (NTSC[640x480] or PAL[768x568]) the three sizes are available (though not all codecs support all three):
- "normal" (default) "grabber decimate 2" - A half of the base resolution.
- "small" -command "grabber decimate 4" - A quarter of the base resolution.
- "large" -command "grabber decimate 1" - The base resolution.