Archive for August 14, 2023

Monday, August 14, 2023

Making an IPv6 URLRequest

Casey Liss:

I’m trying to make a URL GET request to a service I’m discovering via Bonjour.

I have gotten a NWBrowser.Result, and I’ve gotten an NWEndpoint.

The endpoint is an IPv6 link local address.

How the hell do I make a URLRequest to this? I don’t seem to be able to construct a URL from what I’ve got, but I suspect I’m holding it wrong.

[…]

Wait, it seems the presence of “%en0” at the end may be the problem?

Greg Thompson:

In a browser you would enter an IPV6 address like this: https://[XXXXXIPV6ADDRESS]/index.html

Andreas Hartl:

my reading of RFC 6874 is that you must percent-escape the %: http://[<IPv6address>%25<zoneID>]

Jira Burnout Chart:

TIL about encoding a desired network interface as part of the host name into a URL

It’s interesting that he asked on Mastodon rather than on Stack Overflow.

See also:

Previously:

GrammarlyGO Training on User Content With Questionable Opt Out

Rahul Roy-Chowdhury:

GrammarlyGO provides on-demand generative AI communication assistance directly in the apps where people write. Whether in an email thread or a long-form document, GrammarlyGO is right there with you and your teams during the writing process. GrammarlyGO understands context to quickly generate high-quality, task-appropriate writing and revisions.

Karolina Szczur (via Hacker News):

any product i’m using that announces AI features makes me instantly suspicious about privacy & security of my data. perfect example? grammarly.

[…]

i immediately contacted support asking:

  • how it was trained
  • can i opt out

it took me a while to get an honest answer but the ONLY way you can opt out is to pay for a business subscription for 500+ people.

Suha (Vocalize4754):

I’m Grammarly’s CISO.

[…]

When it comes to our genAI features, we use Microsoft Azure as our LLM provider and don’t allow Azure, or any third party, to use our customers’ data to train their models—this is contractually mandated. For text analyzed by Grammarly to provide revision suggestions (like adjusting tone or making text more concise), we may retain randomly sampled, anonymized, and de-identified data to improve the product. This data is disassociated from user accounts and ONLY used in aggregate.

We’ve devoted a ton of time and resources to developing methods that ensure the training data is anonymized and de-identified. And any Grammarly user (Free, Premium, Business) can view the data associated with their account by requesting a personal data report from us.

Re: opt-out: When we go through a security review with a business, if requested, that business can completely opt out of Grammarly training on their de-identified and anonymized data—opt-out is not limited to a 500+ license size.

This seems to directly contradict what Szczur was told by customer support.

I don’t see how viewing data associated with your account would be helpful if the worry is that the text isn’t properly cleaned before going into the anonymized soup. If they don’t store where it came from, you won’t be able to see which text you contributed.

Previously:

Zoom ToS Allowed Training AI on User Content With No Opt Out

Alex Ivanovs (via Hacker News):

Zoom Video Communications, Inc. recently updated its Terms of Service to encompass what some critics are calling a significant invasion of user privacy.

[…]

What raises alarm is the explicit mention of the company’s right to use this data for machine learning and artificial intelligence, including training and tuning of algorithms and models. This effectively allows Zoom to train its AI on customer content without providing an opt-out option, a decision that is likely to spark significant debate about user privacy and consent.

Additionally, under section 10.4 of the updated terms, Zoom has secured a “perpetual, worldwide, non-exclusive, royalty-free, sublicensable, and transferable license” to redistribute, publish, access, use, store, transmit, review, disclose, preserve, extract, modify, reproduce, share, use, display, copy, distribute, translate, transcribe, create derivative works, and process Customer Content.

Smita Hashim:

To reiterate: Zoom does not use any of your audio, video, chat, screen-sharing, attachments, or other communications like customer content (such as poll results, whiteboard, and reactions) to train Zoom’s or third-party artificial intelligence models.

Nick Heer:

But why is all of this contained in a monolithic terms-of-service document? Few people read these things in full and even fewer understand them. It may appear simpler, but features which require this kind of compromise should have specific and separate documentation for meaningful explicit consent.

Oliver Hunt:

If some company (like Zoom) posts an update to their terms of service that give them carte blanche access to your data for “AI” or any other reason, it doesn’t matter if their marketing department makes a post talking about how they won’t use that bit of their ToS.

The ToS change was made for a reason, and that reason is to abuse you and your data.

[…]

That their response to uproar about the ToS change is a blog post, and not to revert their ToS indicates that they intend to use that clause (if they hadn’t been doing so already without “explicit” consent)

Jay Peters:

Zoom has updated its terms of service and reworded a blog post explaining recent terms of service changes referencing its generative AI tools. The company now explicitly states that “communications-like” customer data isn’t being used to train artificial intelligence models for Zoom or third parties. What is covered by communications-like? Basically, the content of your videoconferencing on Zoom.

Jai Vijayan (via Hacker News):

Zoom’s decision — and the reason for it — is sure to add to the growing debate about the privacy and security implications of technology companies using customer data to train AI models.

In Zoom’s case, the company recently introduced two generative AI features — Zoom IQ Meeting Summary and Zoom IQ Team Chat Compose — that offer AI-powered chat composition and automated meeting summaries.

[…]

newly revised policy still gives Zoom all “rights, title, and interest” to a lot of service generated data including telemetry data, product usage data, and diagnostic data. But the company will not user customer content to train AI models.

Previously:

Update (2023-08-16): Bruce Schneier:

Of course, these are Terms of Service. They can change at any time. Zoom can renege on its promise at any time. There are no rules, only the whims of the company as it tries to maximize its profits.

JVM Compares Strings Using the pcmpestri x86 Instruction

Jackson Davis (2016, tweet, Hacker News):

String.compareTo is one of a few methods that is important enough to also get a special hand-rolled assembly version.

[…]

Introduced in SSE4.2, pcmpestri is a member of the pcmpxstrx family of vectorized string comparison instructions. With a control byte to specify options for their complex functionality, they are complicated enough to get their own subsection in the x86 ISR. […] Now that’s really putting the C in CISC!

[…]

If this wasn’t complicated enough for you, have a quick gander at the indexOfimplementations (there are 2, depending on the size of the matching string), which use control byte 0x0d, which does “equal ordered” (aka substring) matching.

It sounds like it only compares the Unicode code points, so that equivalent precomposed and decomposed strings are not considered equal.

pcwalton:

One thing I learned about pcmpxstrx is that it’s surprisingly slow: latency of 10-11 cycles and reciprocal throughput of 3-5 cycles on Haswell according to Agner’s tables, depending on the precise instruction variant. The instructions are also limited in the ALU ports they can use. Since AVX2 has made SIMD on x86 fairly flexible, it can sometimes not be worth using the string comparison instructions if simpler instructions suffice: even a slightly longer sequence of simpler SIMD instructions sometimes beats a single string compare.

Previously: