Disclaimer: The thoughts and opinions expressed in this blog are my own, and not representative of Swinburne University. I also reserve the right to change my opinion without updating this blog, so consider it a "snapshot" of my thoughts at a given epoc.
I've often pondered the pedagogical value of Talking Head Cameras (THCs) in automated recording systems. I don't pretend to be an expert in the area - my skills lie more in the technology implementation domain. However, I get the feeling that such studies and opinions are often lacking in details as to how the THC is implemented. I think the way THCs are implemented have a large impact on the success of the recording, and warrants ongoing consideration. I believe the issue can be broken down into two entirely separate categories: Capture & Delivery.
Capture:
Automated recording systems which support THC will often be marketed in the best possible light. This is normal/expected business practice for any company wanting to sell a product. Often the THC footage is provided by a camera which is human-operated, producing clear medium close shots. This is sufficiently close for the viewer to discern facial expression, which can enhance the communication of the presenter. The presenter is free to walk around, since the operator can pan the camera to track the presenter.
Scalable capture solutions can not provide such luxury, due to both ongoing funding for such camera operators as well as simultaneous availability of operators. A fixed camera is the alternative. This raises issues regarding what area the camera will be framed on. If the shot is too close it is easy for dynamic presenters to step out of the picture entirely, and such a shot does not cope well with presenters of varying height. If the shot is too wide then facial expression is lost, resulting in potentially useless footage.
An additional complexity for implementing THCs is the architectural/geometrical restrictions which we are faced with. Only our premium venues have fixed furniture (for "nostril cam") between the presenter and the audience. However, this does not cater for dynamic presenters who like to get out "from behind the desk" and engage with their audience.
Capture "solutions":
It seems clear to me that auto-tracking cameras have a promising future in this arena. While current and previous implementations have had a variety of success, the notion aught not to be discounted entirely from past failures. Take for example the Axis Communications range of cameras. They have made public an SDK for 3rd party vendors to enhance the behaviour of the camera. If there is a sufficient market, they may get directly involved with a customer to create a custom solution. It is interesting to note that they deliberately over-spec the inbuilt CPU to allow additional processing power for such 3rd party enhancements. I've seen various applications where the camera is making decisions based on activity within a "hot spot" region of the image. E.g. People-counting in a popular thoroughfare; Auto-zoom when activity occurs in a specified image region, etc. Regardless of the success of potential tracking enhancements, at a base level the PTZ cameras can be controlled by existing external tracking solutions. E.g. Pressure mats recalling camera presets, using hysteresis to control slew.
A further benefit from using such cameras comes to mind. The EchoSystem capture hardware saves the frame-grabbed data as .h264 files. The current line of Axis cameras can produce an .mp4 file using .h264 codec.... See where I'm going with this? Imagine a scenario/solution where an Axis camera is paired to EchoSystem capture hardware. The heavy lifting of .h264 encoding is done by the camera, so the capture hardware would simply need to write the stream to disk. Current Axis cameras can even buffer the video in an onboard SD card until the receiver is able to digest the video. Both devices are time-synced rather accurately, aught to assist with potential lip-sync issues.
If such a solution were to become available I believe it would liberating for those of us whose task is hardware implementation.
Delivery:
Coming from a Lectopia background, our THC video has been composited side-by-side with screen capture footage. For recordings longer than 5 minutes this can become rather numbing. In download formats designed for portable players the THC image is scaled down to such a small size that it's value becomes questionable.
The EchoSystem player provides more flexibility than this, with the viewer able to select between the multiple video streams. However, this doesn't translate well to download formats. Downloads current make up over 50% of our hits. Interestingly, most of our hits are on-campus, where access to the content is fast and free (no ISP charges).
Delivery "solution":
Given that EchoSystem is able to provide thumbnails for a video (which I believe is also leveraged for OCR indexing), perhaps these same points in time could be used for an auto-edit mechanism. The point-in-time provides a reference for "interesting" content in the screen capture stream. After XX seconds of no changes, revert to the THC stream. "Smart-splicing" (not yet trade-marked!) the two streams together into one stream would provide an optimised experience for the viewer who choses the download format.
Conclusion:
If anyone is involved with pedagogical studies on the value of THCs, it would be useful to consider/compare the above variations. Static wide shots vs tracking close shots. Side-by-side vs spliced presentation. Both variations (and combinations of both) could be piloted on small scale using human operated cameras, and human-based editing. Such findings could justify further development for automating such solutions to keeps such features scalable.
I've often pondered the pedagogical value of Talking Head Cameras (THCs) in automated recording systems. I don't pretend to be an expert in the area - my skills lie more in the technology implementation domain. However, I get the feeling that such studies and opinions are often lacking in details as to how the THC is implemented. I think the way THCs are implemented have a large impact on the success of the recording, and warrants ongoing consideration. I believe the issue can be broken down into two entirely separate categories: Capture & Delivery.
Capture:
Automated recording systems which support THC will often be marketed in the best possible light. This is normal/expected business practice for any company wanting to sell a product. Often the THC footage is provided by a camera which is human-operated, producing clear medium close shots. This is sufficiently close for the viewer to discern facial expression, which can enhance the communication of the presenter. The presenter is free to walk around, since the operator can pan the camera to track the presenter.
Scalable capture solutions can not provide such luxury, due to both ongoing funding for such camera operators as well as simultaneous availability of operators. A fixed camera is the alternative. This raises issues regarding what area the camera will be framed on. If the shot is too close it is easy for dynamic presenters to step out of the picture entirely, and such a shot does not cope well with presenters of varying height. If the shot is too wide then facial expression is lost, resulting in potentially useless footage.
An additional complexity for implementing THCs is the architectural/geometrical restrictions which we are faced with. Only our premium venues have fixed furniture (for "nostril cam") between the presenter and the audience. However, this does not cater for dynamic presenters who like to get out "from behind the desk" and engage with their audience.
Capture "solutions":
It seems clear to me that auto-tracking cameras have a promising future in this arena. While current and previous implementations have had a variety of success, the notion aught not to be discounted entirely from past failures. Take for example the Axis Communications range of cameras. They have made public an SDK for 3rd party vendors to enhance the behaviour of the camera. If there is a sufficient market, they may get directly involved with a customer to create a custom solution. It is interesting to note that they deliberately over-spec the inbuilt CPU to allow additional processing power for such 3rd party enhancements. I've seen various applications where the camera is making decisions based on activity within a "hot spot" region of the image. E.g. People-counting in a popular thoroughfare; Auto-zoom when activity occurs in a specified image region, etc. Regardless of the success of potential tracking enhancements, at a base level the PTZ cameras can be controlled by existing external tracking solutions. E.g. Pressure mats recalling camera presets, using hysteresis to control slew.
A further benefit from using such cameras comes to mind. The EchoSystem capture hardware saves the frame-grabbed data as .h264 files. The current line of Axis cameras can produce an .mp4 file using .h264 codec.... See where I'm going with this? Imagine a scenario/solution where an Axis camera is paired to EchoSystem capture hardware. The heavy lifting of .h264 encoding is done by the camera, so the capture hardware would simply need to write the stream to disk. Current Axis cameras can even buffer the video in an onboard SD card until the receiver is able to digest the video. Both devices are time-synced rather accurately, aught to assist with potential lip-sync issues.
If such a solution were to become available I believe it would liberating for those of us whose task is hardware implementation.
Delivery:
Coming from a Lectopia background, our THC video has been composited side-by-side with screen capture footage. For recordings longer than 5 minutes this can become rather numbing. In download formats designed for portable players the THC image is scaled down to such a small size that it's value becomes questionable.
The EchoSystem player provides more flexibility than this, with the viewer able to select between the multiple video streams. However, this doesn't translate well to download formats. Downloads current make up over 50% of our hits. Interestingly, most of our hits are on-campus, where access to the content is fast and free (no ISP charges).
Delivery "solution":
Given that EchoSystem is able to provide thumbnails for a video (which I believe is also leveraged for OCR indexing), perhaps these same points in time could be used for an auto-edit mechanism. The point-in-time provides a reference for "interesting" content in the screen capture stream. After XX seconds of no changes, revert to the THC stream. "Smart-splicing" (not yet trade-marked!) the two streams together into one stream would provide an optimised experience for the viewer who choses the download format.
Conclusion:
If anyone is involved with pedagogical studies on the value of THCs, it would be useful to consider/compare the above variations. Static wide shots vs tracking close shots. Side-by-side vs spliced presentation. Both variations (and combinations of both) could be piloted on small scale using human operated cameras, and human-based editing. Such findings could justify further development for automating such solutions to keeps such features scalable.
1 Comments On This Entry
Having a "connector" on the server to allowing ingest of IP video streams would be a really great feature.
paulb,
22 June 2011 - 06:52 PM
Page 1 of 1
Help









