There’s no arguing that AI still has quite a few unreliable moments, but one would hope that at least its evaluations would be accurate. However, last week Google allegedly instructed contract workers evaluating Gemini not to skip any prompts, regardless of their expertise, TechCrunch reports based on internal guidance it viewed. Google shared a preview of Gemini 2.0 earlier this month.
Google reportedly instructed GlobalLogic, an outsourcing firm whose contractors evaluate AI-generated output, not to have reviewers skip prompts outside of their expertise. Previously, contractors could choose to skip any prompt that fell far out of their expertise — such as asking a doctor about laws. The guidelines had stated, “If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task.”
Now, contractors have allegedly been instructed, “You should not skip prompts that require specialized domain knowledge” and that they should “rate the parts of the prompt you understand” while adding a note that it’s not an area they have knowledge in. Apparently, the only times contracts can skip now are if a big chunk of the information is missing or if it has harmful content which requires specific consent forms for evaluation.
One contractor aptly responded to the changes stating, “I thought the point of skipping was to increase accuracy by giving it to someone better?”
Shortly after this article was first published, Google provided Engadget with the following statement: “Raters perform a wide range of tasks across many different Google products and platforms. They provide valuable feedback on more than just the content of the answers, but also on the style, format, and other factors. The ratings they provide do not directly impact our algorithms, but when taken in aggregate, are a helpful data point to help us measure how well our systems are working.”
A Google spokesperson also noted that the new language shouldn’t necessarily lead to changes to Gemini’s accuracy, because they’re asking raters to specifically rate the parts of the prompts that they understand. This could be providing feedback for things like formatting issues even if the rater doesn’t have specific expertise in the subject. The company also pointed to this weeks’ release of the FACTS Grounding benchmark that can check LLM responses to make sure “that are not only factually accurate with respect to given inputs, but also sufficiently detailed to provide satisfactory answers to user queries.”
Update, December 19 2024, 11:23AM ET: This story has been updated with a statement from Google and more details about how its ratings system works.
Trending Products
LG 24MP60G-B 24″ Full HD (1920 x 1080) IPS Monitor with AMD FreeSync and 1ms MBR Response Time, and 3-Side Virtually Borderless Design – Black
LG UltraGear QHD 27-Inch Gaming Monitor 27GL83A-B – IPS 1ms (GtG), with HDR 10 Compatibility, NVIDIA G-SYNC, and AMD FreeSync, 144Hz, Black
Acer Nitro 27″ WQHD 2560 x 1440 PC Gaming IPS Monitor | AMD FreeSync Premium Up to 180Hz Refresh 0.5ms DCI-P3 95% 1 Display Port 1.2 & 2 HDMI 2.0 XV271U M3bmiiprx,Black
Logitech MK345 Wireless Keyboard and Mouse Combo with Palm Rest, 2.4 GHz USB Receiver, Compatible with PC, Laptop, Black
Motorola MG7550 – Modem with Built in WiFi | Approved for Comcast Xfinity, Cox | For Plans Up to 300 Mbps | DOCSIS 3.0 + AC1900 WiFi Router | Power Boost Enabled
HP 230 Wireless Mouse and Keyboard Combo – 2.4GHz Wireless Connection – Long Battery Life – Durable & Low-Noise Design – Windows & Mac OS – Adjustable 1600 DPI – Numeric Keypad (18H24AA#ABA)
ASUS TUF Gaming GT502 ATX Full Tower PC Case, Tempered Glass, Tool-free Side Panel, Modular Design, ARGB Hub, 2- way Graphic Card Mounting Orientation Compatible, 360mm and 280mm Radiator compatible
Lenovo 15.6″ FHD Laptop, Intel Pentium N6000 Quad-core Processor, 16GB Memory, 1TB SSD Storage, Ethernet Port, HDMI, USB-C, WiFi & Bluetooth, Windows 11 Home, WOWPC USB Bundle
Thermaltake View 200 TG ARGB Motherboard Sync ATX Tempered Glass Mid Tower Computer Case with 3x120mm Front ARGB Fan, CA-1X3-00M1WN-00
