Introduction
Seeing AI is a powerful tool by Microsoft for the visually impaired. It has several features, such as describing what is in front of you, converting text from any handwritten book or printed media into PDF, recognizing different currencies, describing colors and lighting conditions, and much more. These features have a recognition button option on the right side of the screen, below the middle portion of the app. It will vary based on the channel you select. For example, if you are in 'short text' channel, the Talkback will say 'recognizing English,' and if the channel is 'product,' it will say 'recognizing barcode'. It also has a quick help button on each page for your reference. We will discuss each function and overview of this app in this article below.
Overview of the app
First comes the navigation drawer button, which is at the top left corner of the app. Then comes the 'quick help' button, which is at the top right corner of the app. Then there is a recognition or switch camera button, which changes based on the channel. It is located on the right side, below the center of the screen. Then there are nine different channels, all located in a horizontal line below the center of the screen. There is also a 'take picture' button, which appears only for specific channels. Its position is nearly in the center of the screen.
To switch between different buttons, change the Talkback mode to controls by swiping down and up (or vice versa).
Button 1: Navigation Drawer
Talkback will announce it as 'open navigation drawer button'. It has options like
- 'Help', where you can get a description of each channel with a link to video tutorials.
- 'feedback,' where you can share any feedback with Microsoft for any improvement or any missing features.
- 'settings.' Here, you can manage:
- Currency settings: Select different currencies that Seeing AI should recognize by default.
- Reorder channels: There are different channels for different purposes; you can reorder them and disable them based on your preference. To reorder, you need to drag and place them in the position you want. To drag the channel, explore the screen, find the channel you want to reorder, double-tap and hold to select, place the channel in the desired position. The list of channels is vertical, so move your finger from top to down, do not lift your finger until the channel is placed in its desired position. To remove any channel, simply double-tap on it.
- Configure shortcuts: Shortcuts are used to quickly open any channel; you can double-tap and hold the Seeing AI app on the app screen or home screen to view the shortcuts. However, if you want to add or remove any particular shortcut or reorder, you can use this function. Currently, you can add only four channels to shortcuts. When you open this, you will first see the 'current shortcuts.' For reordering, you can use the drag and drop method as described earlier. To remove, simply double-tap on it, and to add, swipe left to right until you find 'Available shortcuts.' You will see the list of available shortcuts that you can add; to add, simply double-tap on the shortcut.
- Manage Lighting: You can switch it on or off. If on, the app will automatically see the surrounding and turn on the light; if off, you need to manually turn the flash on or off.
- Speech: Here, you can set the text-to-speech engine, select voice, and adjust pitch for announcements. The good part of this app is that it will not affect the settings of Talkback; all the changes made in this part will be restricted to this app.
- 'About': Here, you will see the app version, all policies link, and a disclaimer, which states that
© 2024 Microsoft. All rights reserved. Seeing AI is not always accurate. It should not be used in situations where you could be harmed or injured. It is not intended for use in the detection, diagnosis, monitoring, management, or treatment of any medical condition or disease. Users should seek any medical advice from a physician. The user is responsible for understanding the functionality of the Seeing AI app, for assessing the conditions in which Seeing AI will be used, and for assessing the risk that use in those circumstances could result in unreasonable harm to the user or others.
Button 2: Quick Help
You can use it to see the description for the currently selected channel; you will also get a video tutorial in the description explaining how to use the channel.
Button 3: Recognize or switch camera
It is the dynamic button that changes based on the selected channel. Below is the table showing what will be the name of that button based on the selected channel.
| Channel name | Button name |
|---|---|
| Short text | Recognizing English |
| Document | -- |
| Product | Recognizing bar codes |
| Seen (preview) | -- |
| Person | Switch to front camera |
| Currency | Recognizing British pounds |
| Color (preview) | -- |
| Handwriting (preview) | -- |
| Light | -- |
Note: These names are default; you can manage these options by simply double-tapping on them with Talkback. For instance if you are in short text channel then using this button you can change the language which seeing AI should recognize or if you are in person then the button name would be “switch to front camera” or ”switch to back camera”. We will discuss all the channel in detail in this article.
Button 4: Take picture
It is also a dynamic button which appears in limited channels. Below is the list of channels where this button will appear.
- Document
- Seen preview
- Person
- Handwriting preview
Channel 1: Short Text
This option initiates immediate text reading within the camera's range and is recommended for small amounts of text. When activated, the app will promptly vocalize any text detected by the camera. If the image becomes clearer, it will re-read the text for enhanced clarity. Use this option for quick access to text such as room numbers, bus signage, shop names, short passages in books, or food packaging labels. It's ideal for instances where you need instant auditory feedback for brief text snippets without the need for extensive processing.
Channel 2: Document
Position the camera over a printed page to capture it. Once the text is recognized, you can utilize Talkback commands to navigate through it effectively.
Seeing AI provides guidance for camera placement until all edges of the document are visible and a photo is taken. Make necessary adjustments until you hear "Hold steady." A helpful technique involves placing the camera in the center of the page and gradually moving it away while making slight adjustments.
This channel performs optimally when there's a high contrast between the page and the background, such as a white document on a dark surface.
After the text is recognized, you can use the 'Add' button to scan additional pages or tap the 'Play' button to have the text read aloud with synchronized word highlighting. Upon tapping the 'Play' button, three additional buttons will appear: 'Skip Back', 'Play/Pause', and 'Skip Forward'. They will disappear when the 'Stop' button is clicked.
Besides having the text read from start to finish, you can ask Seeing AI questions about the document by tapping 'Ask Seeing AI' and typing/dictating your question. Note that answers are AI-generated, so errors are possible. To assist Microsoft in improving accuracy, please provide feedback.
There's also a 'More' option button that allows you to rescan a page, delete the current page, or delete all pages. Additionally, there's a 'Share' button with two options: one to share the image and the other to send the text. The text option is particularly useful as the app converts the scanned text into an HTML file, which is an optimal format for accessing any document.
Note: The app currently cannot scan mathematical equations, so it's recommended for reading printed media with text or studying literature subjects.
For your reference, all buttons on this screen (excluding the 'Add Page' and 'More Options' buttons) are located at the bottom, while these two buttons are positioned at the top right.
Channel 3: Product
Seeing AI can recognize products based on various types of codes printed on the packaging, including barcodes. You can select the type of code to recognize by tapping the button on the main screen. By default, barcodes will be detected, as they are the most common type of product code and may be found on the back or bottom of a container. Some products, including those manufactured by Unilever, feature an accessibility-enhanced QR code with a special border around it, making it easier to detect. These codes are typically found on the front of the packaging.
To scan a product, hold the camera over the item, and Seeing AI will guide you with camera placement until the code is detected. Once the code is detected, Seeing AI will read the product name aloud. Move the phone over the product until you hear beeps indicating that a code is nearby. Starting farther away and slowly moving the phone closer works best. The faster the beeps, the closer you are to the code. In the case of accessibility-enhanced QR codes, the distance will also be announced.
When a barcode or QR code is detected, Seeing AI will announce the product name. If additional information about the product is available, you can tap 'More Info' to access it.
Channel 4: seen (preview)
This channel features the latest Artificial Intelligence for describing an overall scene. This work is still experimental, so please use caution. Take a photo and hear a description of the scene it captured.
To hear a more detailed description of the photo, tap 'More Info'. Image descriptions are AI-generated, so mistakes are possible. To help Microsoft improve , please send feedback to Microsoft.
Channel 5: Person
Scan your surroundings to determine the number of people nearby, their proximity, and their facial expressions.
To add a specific person, tap on the 'Face Recognition' button on the main screen. Then, instruct the individual to take three photos, and a pop-up will prompt you to add a name to that person. Finally, click on the 'Add' button to save the information.
To edit or delete the data of saved faces, click on the same 'Face Recognition' button on the main screen. Here, you'll find 'Edit' and 'Delete' buttons corresponding to each face. Additionally, you can click on the 'Add' button to include new faces.
When utilizing the Person channel, Face Recognition can identify individuals nearby. Instead of hearing generic descriptions like "One face near center, 4 feet away," you'll hear their name announced, for example, "Mohit near center, 4 feet away".
It's advisable to seek permission from individuals before training Seeing AI to recognize them. Once you've taught the app to recognize a specific person, their name will be announced when they appear in view. After taking a photo, Seeing AI will give you an estimate of the person's facial characteristics and expressions. For this you can tap on ‘take picture’ button on main screen. If you want to take a selfie, use the button on the main screen to change to the front-facing camera.
Channel 6: Currency (preview)
This option is use to recognize different currencies. Currency recognition is always improving, so please have someone you trust confirm a note's value. Hold the camera over a single note to hear the estimated value. Use the button on the main screen to select which currency should be recognized. Send feedback to Microsoft if you wish to recognize a currency that isn't listed - more will be added over time.
Please note: Seeing AI will not differentiate between real and counterfeit currency.
Channel 7: Color (preview)
Use this channel to hear the perceived color of objects.
Note that this may depend on several factors - colors appear darker when there is less light, or if the object is in shadow; a white surface may appear slightly yellowish when the lights are on.
Channel 8: Handwriting (preview)
This experimental channel allows you to recognize handwritten text. Note that this channel requires the text to be the right way up. Recognition accuracy will vary based on handwriting style, which can vary greatly from individual to individual. Please send feedback to Microsoft to improve it.
Channel 9: Light
Use this channel to detect the amount of light around you. The pitch of the tone is based on how much light your phone sees. The more light there is, the higher the pitch of the tone.
Using Seeing AI With Other Apps
Seeing AI can also recognize and describe photos from other apps like Mail, Photos, and Twitter. Simply share the photo and select "Recognize with Seeing AI" or “Seeing AI” from the list of actions. If you cannot find this option, tap on the “more apps” button, and you will definitely see the Seeing AI app in the list.
Once the processing is completed, you will see the description of the image, such as “screenshot of phone”. The text inside the image will be displayed below the image; explore the screen to find it. Here, you will find three buttons: ‘more info’, ‘share’, and ‘explore photo’.
- The 'More Info' button provides a description of the photo using AI. For example, if you share a screenshot taken from a phone, the description might be: “The image is a screenshot of a phone's display. The time displayed on the screen is 21:08 M. The phone's status bar indicates that it is connected to a VOD LTE+ network with an LTE2.Il signal and has an 11% battery charge remaining. The phone's home screen displays several app icons, including "GOM", "Jio", "MyJio", "GPay", "Play Store", and "Google””.
-
With the 'Explore Photo' button, you can examine the photo to understand which text was written in which part of the image. Explore the elements in the photo by moving one finger over the screen. You'll hear the names of objects spoken aloud as your finger passes over them.
Please note that object detection isn't always accurate. To help Microsoft improve it, please send your feedback. Seeing AI also recognizes text in the image. If you find there is too much text, you can switch to exploring only objects and people.
Tips & Tricks
- To navigate through this app, utilize different Talkback modes. For instance, if you wish to switch between different buttons, change the Talkback mode to controls by swiping down and up, or vice versa. Once you've selected the desired mode, swipe down to move to the next button and swipe up to go to the previous button. You can employ various modes like 'letters' to enhance your spelling, 'paragraph' to read paragraph by paragraph, and more modes to expedite your tasks.
- The larger the object, the further away you will need to be. The closer you are, the higher the quality.
- The camera is located on the back of the phone, on the top-right when the screen is towards you. For best results, keep the camera in the center of the object being captured.
- Seeing AI works best in a well-lit environment. It will automatically turn on the flash if needed.
Note that while the camera is running, it is using battery. Seeing AI will try to save battery when it detects inactivity, but it is best to lock your phone if you won't be using the app for an extended period.